Cisco VOIP JTAPI magic
27 June 2005
So, for a variety of reasons, work decided about two years ago to install a Cisco VOIP telephone system in our office. One of the major selling points of Cisco's system is that it allows custom Java applications to interface with it using the JTAPI (Java Telephony API). Now, anybody who has worked with this API can tell you that it is neither simple to use nor well-documented. Cisco's implementation of the API is far from complete, and their documentation is even worse than Sun's. The problem is compounded by the fact that Cisco made all sorts of interesting custom extensions to the API, but failed to provide any useful documentation or example programs. In other words, this thing's a bitch to program for. After all sorts of drama, however, we were able to figure out what we needed to know and finish our project. In the interests of saving anybody who might be doing Cisco JTAPI development some serious pain, I'd like to put up a brief description of how to do something so basic, so simple, so crucial, and so seemingly obvious that the uninitiated might think that I'm lying when I say that Cisco gives almost no clues how to do it: stream audio from your application to a phone. (Quick disclaimer: I haven't done any hacking on Cisco's phone system in about a year, so it's quite possible that the situation has improved since I used it last.)What's that you say? Surely, this would be one of the most obvious features that you'd want in an application that interfaces with a phone system, right? Think of all the phone-based applications you interact with on a daily basis— voicemail, the bank's auto-teller, and so on— that rely on a computer program playing audio to your phone. Pretty much any application that has to interact with a user in some way over the phone relies on media streaming. Luckily, the JTAPI contains a whole package of classes that provide these functions, so clearly the designers of the API knew that it was something people would want to do.Cisco, however, did not see fit to implement those handy functions (the one thing that Cisco actually does document well is what is and is not implemented). Its documentation is, in fact, very sparse on the subject of just how somebody would go about streaming media over its phone system using Java. There are a few tantalizingly-named classes in the "com.cisco" section of their JTAPI implementation's Javadocs (MediaTerminal, etc), but there is basically no documentation on how to go about using them. No sample code, nothing. I know, I know— I'm crazy, to think that they'd provide sample code demonstrating how to do one of the (presumably) most common things you'd want to do with their phone system. :-)After banging our heads against the wall for a little while, we noticed a small paragraph buried deep within the Cisco JTAPI developer's guide explaining why. See, it turns out that Cisco thinks that, contrary to the API designer's point of view, you're not actually supposed to use the JTAPI to handle the audio transport— for that, you have to use something else. Cisco doesn't tell you what else to use, however. Here's where things get really fun.See, Cisco VOIP systems use the RTP (Real-Time Protocol) to handle audio transport. RTP is a UDP-based network protocol that's designed to handle all sorts of media streaming. It's a huge, gnarly, complex mess, but works very well. It turns out that if you want to send audio across your Cisco phone system, your code has to handle all of the transport. The good news is that doing this in Java is not that big of a deal, theoretically speaking, since Sun provides a massive library of classes called the Java Media Framework (JMF). The JMF is designed to handle pretty much anything you'd ever care to do with any sort of time-based media. Want to write a shoutcast-style server in Java? The JMF can do that. Want to write video-conferencing software? JMF's got you covered.The bad news is that, like all powerful libraries that manage extremely difficult and complex tasks, the JMF is a little bit tricky to use. OK, it's worse than that. It's really tricky to use. And the documentation's not great. OK, ok, you got me— the documentation totally sucks. Luckily, however, our problem— how to stream audio over RTP to a particular IP address— is just about the simplest thing that it is possible to do using the JMF. From the API's standpoint, what we're trying to do is a little bit like sandblasting a soup cracker. Once you've figured out how to work with the JMF, there is really only one tricky thing needed to get it to work with the Cisco VOIP system.Before I tell you all about what that tricky step is, though, let's go over the general process for media streaming:
So, there you have it. I know this is kind of an odd thing to start off a blog with, but it seemed appropriate. It falls squarely under the banner of "Stupid things I figured out so that you don't have to". If just one person out there is spared the weeks of pain caused by this stupid problem, this post will have done its job. Enjoy!
- Use JTAPI to somehow connect your application's code with a particular call.
- Determine the target endpoint's IP address and port number.
- Initiate an RTP session with that address/port
- Transmit your media.
- Do at least one of the following:
- Catch when your playback is complete, and take appropriate action
- Catch when the endpoint is no longer active (i.e., the user hangs up) and take appropriate action
- Catch user input events (i.e., DTMF) and take appropriate action
- Do whatever it is your application does
- Instantiate a Processor object
- Give it a source and a sink
- Configure it by invoking its configure() method
- Start playback
00001: private boolean setTracksAndCodec() { 00002: 00003: // will only work for RTP 00004: ContentDescriptor content 00005: = new FileTypeDescriptor(FileTypeDescriptor.RAW_RTP); 00006: mProcessor.setContentDescriptor(content); 00007: 00008: TrackControl track[] = mProcessor.getTrackControls(); 00009: 00010: boolean encodingOk = false; 00011: 00012: // Go through the tracks and try to program one of them to 00013: // output ulaw data. 00014: for (int i = 0; i < track.length; i++) { 00015: 00016: if (track[i].isEnabled()) { 00017: 00018: Codec[] ciscoCodecChain = new Codec[3]; 00019: 00020: ciscoCodecChain[0] = new RCModule(); 00021: ciscoCodecChain[1] = new JavaEncoder(); 00022: ciscoCodecChain[2] = new Packetizer(); 00023: ((Packetizer) ciscoCodecChain[2]).setPacketSize(160); // the magic happens here!!!00024: try { 00025: track[i].setCodecChain(ciscoCodecChain); 00026: } catch (Exception ex) { 00027: System.out.println("Couldn't set codec chain: " + ex); 00028: System.exit(-1); 00029: } 00030: 00031: Format[] supportedFormats = track[i].getSupportedFormats(); 00032: 00033: int formatToSet = -1; 00034: 00035: for (int j = 0; j < supportedFormats.length; j++) { 00036: if (supportedFormats[j].toString().indexOf("ULAW/rtp") >= 0) { 00037: formatToSet = j; 00038: } 00039: } 00040: 00041: if (formatToSet >= 0) { 00042: track[i].setFormat(supportedFormats[formatToSet]); 00043: 00044: encodingOk = true; 00045: } else { 00046: track[i].setEnabled(false); 00047: } 00048: 00049: } 00050: } 00051: 00052: if (encodingOk) { 00053: return true; 00054: } else { 00055: System.out.println("Couldn't program any tracks, quitting."); 00056: return false; 00057: } 00058: }