Our Blog
Your questions answered
Your questions answered

The Terrible, Terrible State of VOIP Security, Part I

>>> This is part one of a three-part article <<<

>>> Part II

>>> Part III

The beginning of this (sad) sad story dates back to December 2012, but ends somewhere in 2016 — so it took us almost 4 years to go from unknown to the impossible. It was a journey that involved us getting to know new people and technologies. It was actually so long that by its end new protocols were introduced in the market and some old ones deprecated. It was a journey full of frustration and constant experiments with uncommon setups. Read on to see who won the battle for encryption, and what happens when you bypass standards.

This article — despite being targeted at security professionals and network engineers —provides references and explanations in an attempt to be more accessible to a broader audience.

I. Introduction

The World We Live in

VOIP[1] is slowly but surely replacing traditional telephony. Nowadays, some telecoms aren't even offering conventional phone lines anymore — just VOIP channels on top of the Internet.

Businesses are also embracing the change, happily noticing that VOIP is overall a cheaper and a more flexible solution. A new market is emerging -— hosted Virtual IP-based PBX[2] services are being offered to simplify phone system management. IP is everywhere.

With change come new challenges. Among others: security. IP systems are susceptible to attacks and eavesdropping and, since they allow for a more distributed topology — e.g., people working from home, distributed offices, etc. — concerns for securing telephone conversations in these new systems are pretty common.

There are two main types of attacks that we are concerned about: compromised logins and eavesdropping. There are, actually, more than just these two security concerns, but these are beyond the scope of this rant.

If an attacker compromises a PBX login, the company may end up being abused by criminals as a free-for-all telephone exchange for long-distance calls, thus losing thousands and thousands on phone bills. One might think that eavesdropping isn't dangerous for legitimate conversations, but it has more far-reaching implications, starting just with basic privacy over industrial espionage and leading to governmental surveillance.

Having said that, we want to make sure that the technology we use is safe. And in the wake of E. Snowden's[3] revelations, we want our communications to be secure. Most Internet-users can now distinguish between an unencrypted (unsafe) and encrypted (safer) transmission indicated by their Internet browser. Using telephony, most people just assume it is safe and private — but with VOIP-based communication this is not necessarily the case.

Thus, we embarked on a trip of securing our VOIP setup so that our customers don't have to think about their privacy and security. And on the way we discovered that the world is not ready for change. While surfing the web, if we want to switch to secure transmission, we just change "http"[4] to "https"[5] in the web site address and enjoy immediate effect of encryption safeguarding our actions while we work our way through the web (provided the page administrator has enabled this feature, of course). VOIP, too, has this notion of securing the transmission by just changing the protocol to a secure counterpart. However, contrary to our expectations, it just didn't work out that way, and required a lot of knowledge, persistence and patience.

SIP World, the Good

VOIP is actually an umbrella term nowadays. Any program or protocol that communicates audio from one point to another over the Internet is considered as VOIP. IP Telephony is a more modern term for that because it doesn't restrict the protocols to voice only. Our concern was not IP Telephony in general, but a particular incarnation of it: SIP[6]. SIP-based telephony is probably the most widely used. There are numerous SIP servers available — free and commercial ones — and the communication is relatively easy to Setup.

Asterisk[7] is, for example, the most popular free SIP-based VOIP server, with numerous installations all over the world. Not only is it free but also open source, making it a good choice for administrators familiar with open/free software. It comes in different flavours and names, with and without a GUI[8], and is a de-facto standard in the world of VOIP.

There are also quite a few commercial SIP-based servers available, with better or worse in implementation discipline — some of them are very proprietary, others adhere more to standard features. Even though we chose 3CX for our needs (since we liked it Microsoft Windows affinity and administration-friendly concepts), it really could be any other vendor — the experience would have probably been very similar.

SIP Protocol, the Bad

When one speaks of SIP protocol, what is actually meant is a set of different protocols that don't necessarily even run on the same line. Let us briefly look at the whole suite in some detail.

SIP itself is one of them, and is the starting point of all communications. SIP's task is to provide signalling and other meta information, such as routing and voice channel assignment. Within SIP (Session Initiation Protocol), we can also find SDP[9] or Session Description Protocol. SDP bears media description information — such as audio codecs negotiation information and other audio endpoint parameters, including, actually, a specification of the protocol that will be used to communicate the audio streams. Since SDP runs on top of — or within — SIP, it is often referred to as part of SIP or SIP/SDP.

As you can see from above, SIP/SDP only specify where the audio will be running, but actually do not transmit the audio streams itself. For that yet another protocol is used. It could theoretically be any protocol capable of media transmission — but RTP[10] (Real-Time Transport Protocol) is a most widely-used one. This is where it becomes complicated, since RTP (or any other audio transmission protocol) runs over a different connection than SIP and asynchronously from SIP.

Thus, we have at least 3 levels here: the first one, a signalling protocol, SIP, that provides basic signalling and metadata transmission; a second one, SDP, that provides media and encoding specification; and the third, RTP, that transmits the actual encoded media.

For the sake of simplicity, we are not looking into other media protocols like video transmission or protocol extensions such as chat.

SIP Encryption, the Ugly

What we have described so far was communication in plain text, without encryption. Thus, in order to encrypt the parts involved we need to provide the means of encryption for all three protocols mentioned above.

Since SDP runs inside SIP, if we encrypt SIP, SDP will be encrypted along the way. Similar to secure HTTP, or HTTPS, there is a secure counterpart to SIP, or SIPS, which is the same protocol, but run within a TLS[11]-encrypted channel. Now originally, SIP runs over UDP[12] — but since TLS requires a stateful connection[13], secure SIP runs over TCP[14]. There are both positive and negative implications to switching to TCP, but they go beyond this document. This type of SIP-encryption is often referred to as SIP/TLS, rather than SIPS.

RTP does not have a secure counterpart, but when RTP's data is to be encrypted, the encryption key for the stream will be typically specified within stream metadata contained in SDP. And since SDP in inherently encrypted after SIP is, the encryption key for RTP, transmitted over SDP, is also transmitted in encrypted form and will be then used by RTP to encrypt the stream data itself. Encrypted RTP is generally referred to as SRTP, or Secure Real-Time Transport Protocol.

From that setup we can derive one important conclusion: In order for SIP-based communication to be secure, we must provide encryption for both SIP and RTP. By omitting either one, we open up the voice data to prying eyes — or ears, for that matter.

Continue to Part II or

Jump to Part III or

Read the whole article as one document


[1] Voice-Over-IP (Internet protocol), a set of standards and protocols that enableusing Internet channels for transmitting telephone calls.

[2] Private Branch Exchange, telephone switchboard installed locally in an office.

[3] E.Snowden is a NSA whistle-blower, who in 2013 revealed a big scale governmentalsurveillance program.

[4] HTTP: Hyper-Text Transfer Protocol, the main protocol used for surfing the internet

[5] Secure HTTP, an encrypted version of the HTTP protocol

[6] Session Initiation Protocol as defined in RFC 3261: https://www.ietf.org/rfc/rfc3261.txt

[7] http://www.asterisk.org/

[8] Graphical User Interface, http://en.wikipedia.org/wiki/Graphical_user_interface

[9] Session Description Protocol as described in RFC 4566: https://tools.ietf.org/html/rfc4566

[10] Transport Protocol for Real-Time Application as described in RFC4550: https://tools.ietf.org/html/rfc3550

[11] Transport Layer Security as described in RFC 5246 https://tools.ietf.org/html/rfc5246, a protocol that provides a secure layer to TCP-based data transmission.

[12] User Datagram Protocol, a stateless and efficient protocol according to RFC 768: https://www.ietf.org/rfc/rfc768.txt

[13] It is possible to run TLS over UDP using DTLS (Datagram TLS), but DTLS is notcompatible with current SIP implementation, so it is not used.

[14] Transmission Control Protocol as described in RFC 893: https://tools.ietf.org/html/rfc793, a stateful data-transmission protocol with flow-control capabilities.

Roman Kuznetsov @ 21.02.2017

Other Posts