Jingle is a standard framework used for peer-to-peer communications. It allows multimedia communication to be established between two Extensible Messaging and Presence Protocol (XMPP) devices. Negotiation between the two is carried out over an XMPP channel, while the actual media uses a separate, dedicated data channel which employs Real-time Transport Protocol (RTP).

The main purpose of Jingle is to facilitate communication using VoIP and video conferencing. It was designed by Google and the XMPP Standards Foundation. [1] It’s not intended to replace other protocols such as SIP (Session Initiation Protocol) which allow for more general voice communication, nor does it support a full range of telephony functions like call forwarding, transfers and so on. It is, however, designed to work along with SIP so that XMPP clients can use existing VoIP networks from a specialist international VoIP wholesale provider such as IDT.

Okay, that’s a brief fly-past of what Jingle is and does. If you are still with us and want to know more then read on for some further details as to the use of the framework.

How Jingle works

As we’ve seen, Jingle allows a pair of XMPP clients to establish, maintain and terminate a multimedia session. Multimedia in this instance generally covers voice and video. Negotiation between the two happens over XMPP while the media transfer happens outside it.

Before we go further, we need to know a little about XMPP. XMPP is a set of open source standards for instant messaging and chat using voice and video. It was designed principally to provide an open, decentralised alternative to the closed, proprietary messaging systems around at the time of its introduction.

Key to the success of XMPP is that it has a decentralised infrastructure, in the same way as email, so that anybody can run their own XMPP server and control their own communications. XMPP can also be run securely, isolated from public networks, in order to provide private communications. A number of technologies can be run using XMPP of which Jingle is just one.

When you want to start a multimedia session, the first client the ‘initiator’ sends an invitation, a ‘session initiation offer’ to the second. The second client the ‘responder’ acknowledges this and asks the user if they want to proceed – although the client can be configured to accept requests from particular initiators automatically – in either case the responder accepts the session from the initiator.

On accepting the session, the responder will respond with a list of the codecs that it is able to accept. The initiator accepts the response and the two will then negotiate which codec is to be used for media transport and will begin a media session.

At the end of the call, either party can ask to terminate the session, once the other acknowledges this, then the link between the two is dropped. Simple.

Session management

In order for all of that to work, Jingle has to control the session flow. In setting up the session, the initiator has to find out which of the responder’s available XMPP resources is best for the application and which transport method can be used. It can also optionally specify a security condition that must be met – such as an encrypted link – before the two clients are allowed to exchange data.

There is a certain degree of flexibility in Jingle sessions so that once a session is active, it doesn’t necessarily have to remain fixed in its configuration. Active sessions can be changed to modify or remove content – keeping voice going while stopping video for example – or changing the transport protocol. Jingle can also send information messages between clients.

Two different transport types are available under Jingle. Datagram has components that exchange packets. These can be of any length and can be received in any order. When using Datagram, the transport has to specify which components are needed and how they will be used.

The alternative is streaming transport; this exchanges bi-directional streams akin to the method seen in TCP. Packets on the stream are received in order and each must have string identifier and a maximum packet length. Which transport should be used is established at the start of the session.

Protecting Jingle

As with any online service, security is an important consideration when using Jingle. Using some form of transport layer security is a good start and as we’ve seen, you can make starting a session conditional on this being present.

Jingle can also be vulnerable to DDoS attacks bombarding clients with too many requests and it’s important to guard against this by configuring the system to only accept connections from known entities. Similarly, you can avoid the interception and redirection of calls by ensuring that session IDs match.

[1] https://xmpp.org/