In this article I want to demonstrate how I revealed parts of the WhatsApp VoIP protocol with the help of a jailbroken iOS device and a set of forensic tools. WhatsApp got a lot attention due to security vulnerabilities and hacks. So it is an interesting target for teaching security analysis.

While there is an official white paper describing the encryption of WhatsApp, there is no detailed overview of how its protocols work or how the security features are implemented. Consequently, there is no foundation for serious security related analysis.

My research is based on three steps:

Analysis of the network traffic. Analysis of the binary files. Analysis of the runtime behavior.

Tools

I used the following tools for analyzing an iOS WhatsApp client:

Decryption of binaries: bfdecrypt

Disassembling binary files: Hopper Disassembler and radare2

Observing network traffic: Wireshark

Analyzing runtime behavior: Frida

How I installed a Jailbreak on my iOS device is out of scope.

Network Traffic Analysis

This part examines the network traffic of the WhatsApp client during a call, which was recorded with Wireshark. For recording the network traffic of the iOS device, I created a remote virtual network interface. The shell command is as follows (works on MacOS), where <deviceUUID> has to be replaced with the UUID of the inspected iOS device:

rvictl -s <device UUID>

Wireshark detects the usage of the Session Traversal Utilities for NAT (STUN). STUN is a signaling protocol which handles necessary steps for establishing a peer-to-peer connection between clients. There are also many TCP and UDP packets in the Wireshark recording, which could not be related with a high-level protocol.

Wireshark Recording of a WhatsApp VoIP Call Session

TCP packets are exchanged between the inspected WhatsApp client and multiple WhatsApp servers. The UDP packets are exchanged between the caller and the callee. Hundreds of those UDP packets are sent within a minute. Since the WhatsApp white paper mentions the usage of the Secure Real Time Protocol (SRTP), it stands to reason that these UDP packets are SRTP packets containing the call data. The protocol provides encryption, message authentication and integrity, and protection against replay attacks to Real Time Protocol (RTP) packets.

The following listing shows an SRTP packet in hexadecimal representation, which was sent by the caller to the callee. It contains header fields from RTP, which forms the foundation of SRTP.

SRTP Packet

The first four bytes (red) contain seven RTP header fields. They can be inspected by looking at their binary representation:

0x8078001e = 0b10_0_0_0000_0_111100_00000000000011110 = V=10|P=0|X=0|CC=0000|M=0|PT=111100|SEQ=00000000000011110

RTP Packet Header Fields (RFC 3550)

The first two bits contain the RTP version (V) which is equal to version two in this case. The third bit, the padding field (P), indicates that there is no padding included in the packet. The fourth bit, the extension field (X), indicates that no other header follows the fixed RTP header. Bits at position five to eight, the CSRC count (CC), show that no contributing source (CSRC) identifiers follow the fixed header. CSRCs are a list of identifiers indicating which sources contributed to the payload of an SRTP packet. Also the marker bit (M) at position 9 is set to zero. It can be used to mark frame boundaries of the packet stream. The next six bits contain the packet type (PT), which is equal to the decimal value 60 in this case. The given packet type is not defined by the RTP or SRTP standard. It might be a custom value chosen by WhatsApp. The last 17 bits contain the sequence number (SEQ) of the given packet. The RTP standard recommends to randomize the initial value of the sequence number. This recommendation is not applied by WhatsApp, since the packet sequence numbers are increased from zero, as can be seen from the Wireshark recordings. The next four bytes (blue) represent the timestamp of the packet. The four bytes after that (green) represent the synchronization source (SSRC). It is an identifier used for distinguishing call sessions running in parallel. The remaining bytes represent the payload. Probably the payload of the inspected packet contains audio data of the call.

We know that WhatsApp applies SRTP for protecting calls. This is confirmed by the structure of UDP packets exchanged between WhatsApp clients. The Wireshark recording shows that also TCP packets are sent from the iOS client to WhatsApp servers. These packets represent messages encrypted with the Noise Pipes Protocol, as we will see later.

Binary Analysis

The iOS WhatsApp client contains two main binary files: the WhatsApp application binary and the WhatsApp core framework. This part examines these binary files with the Hopper Disassembler and radare2. The binaries of iOS applications are encrypted when downloaded from the App Store. For analyzing the iOS WhatsApp client, the security measures of Apple were circumvented. A Jailbreak was installed on the inspected iOS device for accessing its files. Adding to this, the binary files of WhatsApp were decrypted with the tool bfdecrypt.

Here I demonstrate how I gathered information about underlying protocols, algorithms, and open source libraries WhatsApp uses. Open source libraries are especially interesting because they can easily be analyzed.

libsignal-protocol-c

WhatsApp uses the libsignal-protocol-c open source library which implements the Signal Protocol. The protocol is based on the Double Ratchet Algorithm, which handles encryption of WhatsApp messages. The library was identified by the following function names in the binaries:

r2 WhatsAppCore

[0x0082b517]> / _signal_

Searching 8 bytes in [0x0-0x654000]

hits: 33

0x00837a7b hit2_0 .il_key_data_from_signal_keydispatch_.

0x0083df33 hit2_1 ._torlice_signal_protocol_paramet.

0x008407c0 hit2_2 .d_fac_3key_signal_message_big.

0x00840d50 hit2_3 .mmetric_signal_protocol_paramet.

0x00840e70 hit2_4 .ob_signal_protocol_paramet.

0x00841492 hit2_5 .pre_key_signal_messagesigna.

0x008de24b hit2_6 .agc_reset_alice_signal_protocol_paramet.

0x008de274 hit2_7 .rs_create_alice_signal_protocol_paramet.

0x008de440 hit2_8 .bitno_MRDTX_bob_signal_protocol_paramet.

0x008de467 hit2_9 .ters_create_bob_signal_protocol_paramet.

0x008e311c hit2_10 .pre_big_pre_key_signal_message_copy_pr.

0x008e3139 hit2_11 .ge_copy_pre_key_signal_message_create_.

0x008e3158 hit2_12 ._create_pre_key_signal_message_deserial.

0x008e317c hit2_13 .rialize_pre_key_signal_message_destroy.libsrtp

...

libsrtp

WhatsApp uses libsrtp for implementing the Secure Real Time Protocol. The symbol names of the library’s functions are stripped from the binaries. Despite, the application binary contains strings which reference libsrtp:

r2 WhatsApp

[0x1001ada34]> / libsrtp

0x100ee5546 hit1_0 .rc %08XUnknown libsrtp error %duns.

0x100ee57eb hit1_1 .d to initialize libsrtp: %sFailed to r.

0x100ee580a hit1_2 .led to register libsrtp deinit.Failed .

0x100ee5831 hit1_3 .to deinitialize libsrtp: %sAES_CM_128_.

0x100ee5883 hit1_4 .ck crypto Init libsrtp. create pool. .

0x100f07b80 hit1_5 . packet: %slibsrtpstat test%s: c.

Besides the binaries contain string constants which can also be found in the source code of libsrtp, like “cloning stream (SSRC: 0x%08x)”:

r2 WhatsApp

[0x1013ddb4f]> / cloning stream

Searching 14 bytes in [0x100000000-0x100fb4000]

hits: 1

0x100f07823 hit7_0 .sent!srtp%s: cloning stream (SSRC: 0x%08x).

PJSIP

WhatsApp uses PJSIP which implements multimedia communication, signaling and the encoding of audio and video data. Besides PJSIP implements STUN, which was also detected by the Wireshark recording. The library was identified by string constants in the binaries which contain debug information of PJSIP:

r2 WhatsApp

[0x1013ddb4f]> / pjmedia

Searching 7 bytes in [0x100000000-0x100fb4000]

hits: 180

0x100edd55f hit9_0 .io_piggyback.ccpjmedia_audio_piggyback.

0x100edd591 hit9_1 .r %d, stream %ppjmedia_audio_piggyback.

0x100edd5d4 hit9_2 .d, tx_packet %dpjmedia_audio_piggyback.

0x100edd601 hit9_3 .ideo_enabled %dpjmedia_audio_piggyback.

0x100eddcf3 hit9_4 .ibyuv converterpjmedia_converter_creat.

0x100eddd21 hit9_5 .rter count = %dpjmedia_converter_creat.

0x100ede3e3 hit9_6 .rame, status=%dpjmedia_delay_buf_get_s.

0x100ede46e hit9_7 .%sec_delay_bufpjmedia_echo_create2: %.

0x100ede64d hit9_8 .eUnknown pjmedia-videodev error .

0x100ede90c hit9_9 .o errorUnknown pjmedia-audiodev error .

0x100edebba hit9_10 .ATENCY)Unknown pjmedia error %dUnspec.

0x100ee027e hit9_11 .queue.format.cpjmedia_format_get_vide.

0x100ee02ca hit9_12 .mat info for %dpjmedia_format_get_vide.

0x100ee1446 hit9_13 .c_buf too shortpjmedia_h26x_packetize .

...

mbed TLS

WhatsApp applies mbed TLS which implements the TLS protocol. The library was identified by the following function names in the binaries:

r2 WhatsAppCore

[0x0082b517]> / mbedtls

Searching 7 bytes in [0x814000-0x934000]

hits: 41

0x008e299b hit5_0 .TLSErrorDomain_mbedtls_aes_crypt_cbc_.

0x008e29b2 hit5_1 ._aes_crypt_cbc_mbedtls_aes_crypt_cfb12.

0x008e29cc hit5_2 .s_crypt_cfb128_mbedtls_aes_crypt_cfb8.

0x008e29e4 hit5_3 .aes_crypt_cfb8_mbedtls_aes_crypt_ctr_.

0x008e29fb hit5_4 ._aes_crypt_ctr_mbedtls_aes_crypt_ecb_.

0x008e2a12 hit5_5 ._aes_crypt_ecb_mbedtls_aes_decrypt_mb.

0x008e2a27 hit5_6 .ls_aes_decrypt_mbedtls_aes_encrypt_mb.

0x008e2a3c hit5_7 .ls_aes_encrypt_mbedtls_aes_free_mbedt.

0x008e2a4e hit5_8 .edtls_aes_free_mbedtls_aes_init_mbedt.

0x008e2a60 hit5_9 .edtls_aes_init_mbedtls_aes_setkey_dec.

0x008e2a78 hit5_10 .aes_setkey_dec_mbedtls_aes_setkey_enc.

0x008e2a90 hit5_11 .aes_setkey_enc_mbedtls_cipher_auth_dec.

0x008e2aad hit5_12 .r_auth_decrypt_mbedtls_cipher_auth_enc.

0x008e2aca hit5_13 .r_auth_encrypt_mbedtls_cipher_check_ta.

...

XMPP

WhatsApp uses Extensible Messaging and Presence Protocol (XMPP) for exchanging messages asynchronously between clients in the form of XML stanzas. This is supported by the fact that many class names in the binaries contain keywords relating to the protocol:



[0x1013ddb4f]> / XMPP

Searching 4 bytes in [0x1013ac000-0x1014b4000]

hits: 150

Searching 4 bytes in [0x100fb4000-0x1013ac000]

hits: 150

Searching 4 bytes in [0x100000000-0x100fb4000]

hits: 396

0x1013d05b5 hit12_0 .XMPPAckStanza@_.

0x1013d05d6 hit12_1 .XMPPBinaryCoder.

0x1013d05fa hit12_2 .XMPPCallStanza.

0x1013d0624 hit12_3 .XMPPChatStateStanza.

0x1013d064b hit12_4 .XMPPConnection.

0x1013d0679 hit12_5 .XMPPError.

0x1013d069e hit12_6 .XMPPGDPRDeleteReport.

0x1013d06cd hit12_7 .XMPPGDPRGetReportSta.

0x1013d0707 hit12_8 .XMPPGDPRRequestRepor.

0x1013d0736 hit12_9 .XMPPIQStanza.

0x1013d0762 hit12_10 .XMPPMessageStanza.

0x1013d0787 hit12_11 .XMPPMessageStatusCha.

0x1013d07b9 hit12_12 .XMPPMultiReceipt.

0x1013d07dc hit12_13 .XMPPNotificationStan.

... r2 WhatsApp[0x1013ddb4f]> /Searching 4 bytes in [0x1013ac000-0x1014b4000]hits: 150Searching 4 bytes in [0x100fb4000-0x1013ac000]hits: 150Searching 4 bytes in [0x100000000-0x100fb4000]hits: 3960x1013d05b5 hit12_0 . @_OBJC_CLASS_ $_AckStanza@_.0x1013d05d6 hit12_1 . @_OBJC_CLASS_ $_BinaryCoder.0x1013d05fa hit12_2 . @_OBJC_CLASS_ $_CallStanza.0x1013d0624 hit12_3 . @_OBJC_CLASS_ $_ChatStateStanza.0x1013d064b hit12_4 . @_OBJC_CLASS_ $_Connection.0x1013d0679 hit12_5 . @_OBJC_CLASS_ $_Error.0x1013d069e hit12_6 . @_OBJC_CLASS_ $_GDPRDeleteReport.0x1013d06cd hit12_7 . @_OBJC_CLASS_ $_GDPRGetReportSta.0x1013d0707 hit12_8 . @_OBJC_CLASS_ $_GDPRRequestRepor.0x1013d0736 hit12_9 . @_OBJC_CLASS_ $_IQStanza.0x1013d0762 hit12_10 . @_OBJC_CLASS_ $_MessageStanza.0x1013d0787 hit12_11 . @_OBJC_CLASS_ $_MessageStatusCha.0x1013d07b9 hit12_12 . @_OBJC_CLASS_ $_MultiReceipt.0x1013d07dc hit12_13 . @_OBJC_CLASS_ $_NotificationStan....

Noise Protocol Framework

According to the WhatsApp white paper, the Noise Protocol Framework is used for securing the communication between clients and servers. The Noise Protocol Framework was developed for constructing easy-to-use cryptographic protocols from a set of small building blocks. To be more precisely, WhatsApp applies the Noise Pipes Protocol, which is derived from the Noise Protocol Framework. The following static string constants can be found in the WhatsApp binaries:

“Noise_XX_25519_AESGCM_SHA256”,

“Noise_IK_25519_AESGCM_SHA256”,

“Noise_XXfallback_25519_AESGCM_SHA256”.

These string constants describe handshake patterns implemented by WhatsApp clients. The first string is referenced within a class called WANoiseFullHandshake. The second string is referenced within a class called WANoiseResumeHandshake. The last string is referenced within a class called WANoiseFallbackHandshake. How these protocols work in detail is out of scope.

Runtime Analysis

This part examines the runtime behavior of the iOS WhatsApp client with the help of Frida. Frida is a command line tool, which creates JavaScript hooks for functions of a mobile application. These hooks can be utilized for observing or manipulating parameters and return values of called functions.

Key Transport

This part outlines how the key transport of the WhatsApp VoIP protocol works. According to the WhatsApp white paper, for encrypting a VoIP call, the “initiator generates a random 32-byte SRTP master secret”. The caller then “transmits an encrypted message to the recipient that signals an incoming call, and contains the SRTP master secret”. These information are utilized for reconstructing the key transport, i.e. the transport of the master secret to the callee.

As a starting point, I traced functions containing the word “secret”:

frida-trace -U WhatsApp -m "*[* *Secret*]" -m "*[* *secret*]"

When a WhatsApp call is initiated, the method deriveSecretsFromInputKeyMaterial of the class WAHKDF is called:

+[WAHKDF

deriveSecretsFromInputKeyMaterial: 0x121e08a20

salt: 0x0

info: 0x121e07840

outputLength: 0x2e

withMessageVersion: 0x3

]

The input values 0x121e08a20 and 0x121e07840 are pointers to Objective-C objects. Frida allows to create proxy Objective-C objects from pointers in JavaScript. The function hook of deriveSecretsFromInputKeyMaterial was used for printing debug descriptions of the objects:

The output of the script can be seen in the following:

+[WAHKDF deriveSecretsFromInputKeyMaterial: <09a38e76 fe90e4f1 26ed66d0 5a6783ba d48776b6 1daaf7c9 39c005ea 2d8ccdf6>

salt : nil

info : <34393135 39303537 37313632 3040732e 77686174 73617070 2e6e6574>

bytes: 46

withMessageVersion : 3

]

The first and third parameter seem to be NSData objects which contain a static byte buffer. The first parameter has the length of 32 bytes, like the master secret described in the WhatsApp white paper. The third parameter is an ASCII string representing the JID of the caller. We will see in the following that the first parameter is indeed the master secret.

Encryption of the Master Secret

According to the WhatsApp white paper, the master secret is essential for protecting a call session. This is why it has to be transported securely to the callee. For observing how the master secret is processed, I traced function calls containing key words relevant for encryption:

frida-trace -U WhatsApp -m "*[* *crypt*]" -i "*crypt*"

When a call is initiated, the function signal_encrypt of the libsignal-protocol-c library is called. The following shows the signal_encrypt function header:

The plaintext parameter was read with the Frida hook of signal_encrypt:

signal_encrypt Plaintext Value

The first four bytes are used for serializing the master secret with protocol buffers. The following bytes represent the master secret. The last 13 bytes represent the encryption padding. I discovered that the plaintext is encrypted with AES-256 in CBC mode. The encryption keys are derived by the Double Ratchet Algorithm which is part of the Signal Protocol. The inner workings of libsignal-protocol-c and the Signal Protocol are not investigated in this article. The output of signal_encrypt is represented by the following bytes:

signal_encrypt Output Value

The output carries more bytes because an authentication tag is appended to the message, which is computed with HMAC-SHA256.

This part revealed the first part of the WhatsApp VoIP protocol. The master secret is serialized, padded and encrypted with a 256-bit AES key in CBC mode. The encryption key, the IV as well as the authentication key are derived by the libsignal-protocol-c library, which implements the Signal Protocol.

Preparing the Master Secret

In the following, I demonstrate how the encrypted master secret is processed. I traced functions containing the keyword “signal”:

frida-trace -U WhatsApp -i “*signal*”

The Frida command reveals that the function textsecure__signal_message__pack processes the encrypted master secret. The function creates a Signal message containing the encrypted master secret and parameters relevant for the Signal Protocol:

textsecure__signal_message__pack Output Value

The gray bytes are used for serializing the Signal message. The blue bytes represent the sender ratchet key. The red byte represents the previous message counter. Then follows the message counter (orange). Finally, the encrypted master secret is represented by the following bytes (green) of the Signal message.

When tracing XMPP related Objective-C functions, we can see that a method named writeNoiseFrameToSocketWithPayload of the class XMPPStream is called. This method sends XMPP messages, which are encrypted with the Noise Pipes Protocol, via TCP to WhatsApp servers. I revealed the content of the payload parameter:

XMPPStream.writeNoiseFrameToSocketWithPayload Payload Value

It is a binary XMPP message containing the Signal message created above. For disassembling the message, I traced a class named XMPPBinaryCoder. This class has a method called serialize which creates the binary representation of an XMPP stanza. When printing out its parameters, I can see a variety of key-value pairs which are added to the XMPP message:

-[XMPPBinaryCoder serialize:

[call from=’49**********@s.whatsapp.net’

id=’1555415586-10’

to=’49**********@s.whatsapp.net’

[offer call-id=’45D7827C624353A70084AED9B8C509D3’

call-creator=’49**********@s.whatsapp.net’

[audio rate=’8000’ enc=’opus’]

[audio rate=’16000’ enc=’opus’]

[net medium=’3’]

[capability ver=’1’ {5b}]

[encopt keygen=’2’]

[enc v=’2’ type=’pkmsg’ {201b}]

]

]

] compressed: 0x0]

I was able to fake the indication of a missed call from Alice on Bob’s device, even though the call was initiated by Mallory. This was possible by overwriting the call-creator and from parameters with Alice’s JID. Although, the name of Mallory is shown in the message (“with Mallory”). When Bob responds to the notification, he starts a call with Alice instead of Mallory. I think that further research is required for analyzing the manipulation of the initial call message.

Faked Missed Call Notification

This part revealed how the encrypted master secret is processed by WhatsApp. The encrypted master secret is packed into a Signal message, which is added to a binary XMPP stanza. The XMPP stanza also contains the call ID and the JIDs of the caller and the callee.

Transmitting the Master Secret to the Callee

According to the WhatsApp white paper, “clients use Noise Pipes with Curve25519, AESGCM, and SHA256 from the Noise Protocol Framework for long running interactive connections”. When tracing functions containing key words relating to the Noise Protocol Framework, I can see that a class named WANoiseStreamCipher is used for encrypting traffic sent to WhatsApp servers. This class has a method called encryptPlaintext. The plaintext value after initiating a call is the XMPP message from above. The message is again encrypted with a function of the mbed TLS library called mbedtls_gcm_crypt_and_tag. Moreover mbedtls_gcm_setkey is called with a key size of 256 bit, which means that AES-256-GCM is applied. The encryption key is derived by the Noise Pipes Protocol, which is not investigated further in this article. The encrypted plaintext is sent via TCP to a WhatsApp server, which was revealed by the Wireshark recordings. The server then forwards the message to the callee for initiating the call.

Encrypted Call Initialization Message

Key Derivation

This part explains how the key material, used for encrypting WhatsApp calls, is created by a key derivation function (KDF). The results of this part are retrieved with the help of Frida by tracing a class called WAHKDF and the library libcommonCrypto. The WAHKDF class is applied for deriving keys, salts and nonces for initializing SRTP streams. Its method deriveSecretsFromInputKeyMaterial is called ten times before a call starts:

+[WAHKDF deriveSecretsFromInputKeyMaterial: <09a38e76 fe90e4f1 26ed66d0 5a6783ba d48776b6 1daaf7c9 39c005ea 2d8ccdf6>, salt: nil, info: <34393135 39303537 37313632 3040732e 77686174 73617070 2e6e6574>, bytes: 46, withMessageVersion: 3] => result: <4633c47f 94d5ed59 93a6dba8 514d5fb8 5092ba90 4256f8d3 4d56e72e 665bcd4c 5b6c418b db811e7f 84a70c83 f401> +[WAHKDF deriveSecretsFromInputKeyMaterial: <09a38e76 fe90e4f1 26ed66d0 5a6783ba d48776b6 1daaf7c9 39c005ea 2d8ccdf6>, salt: nil, info: <34393137 ******** ******** ******** ******** 6170702e 6e6574>, bytes: 46, withMessageVersion: 3] => result: <a174670a e25d8138 4de0ed3b f4ce7f76 c62c1d00 9ece6573 2ecb497b 1f6ed09c 18c444b9 c180fbd3 51713739 761c> +[WAHKDF deriveSecretsFromInputKeyMaterial: <34354437 38323743 36323433 35334137 30303834 41454439 42384335 30394433>, salt: <00000000>, info: <34393135 39303537 37313632 3040732e 77686174 73617070 2e6e6574>, bytes: 4, withMessageVersion: 3] => result: <0ec654fd> +[WAHKDF deriveSecretsFromInputKeyMaterial: <34354437 38323743 36323433 35334137 30303834 41454439 42384335 30394433>, salt: <01000000>, info: <34393135 39303537 37313632 3040732e 77686174 73617070 2e6e6574>, bytes: 4, withMessageVersion: 3] => result: <a060fa73> +[WAHKDF deriveSecretsFromInputKeyMaterial: <34354437 38323743 36323433 35334137 30303834 41454439 42384335 30394433>, salt: <04000000>, info: <34393135 39303537 37313632 3040732e 77686174 73617070 2e6e6574>, bytes: 4, withMessageVersion: 3] => result: <b17d7f33> +[WAHKDF deriveSecretsFromInputKeyMaterial: <34354437 38323743 36323433 35334137 30303834 41454439 42384335 30394433>, salt: <00000000>, info: <34393137 ******** ******** ******** ******** 6170702e 6e6574>, bytes: 4, withMessageVersion: 3] => result: <f51e66eb> +[WAHKDF deriveSecretsFromInputKeyMaterial: <34354437 38323743 36323433 35334137 30303834 41454439 42384335 30394433>, salt: <01000000>, info: <34393137 ******** ******** ******** ******** 6170702e 6e6574>, bytes: 4, withMessageVersion: 3] => result: <ee328049> +[WAHKDF deriveSecretsFromInputKeyMaterial: <34354437 38323743 36323433 35334137 30303834 41454439 42384335 30394433>, salt: <04000000>, info: <34393137 ******** ******** ******** ******** 6170702e 6e6574>, bytes: 4, withMessageVersion: 3] => result: <c75099f3>

The method creates encryption keys, salts and nonces based on the master secret and the JID of the call participants. The resulting values are used for initializing six SRTP streams, three for each call direction.

The following code snippet shows the reconstruction of the key derivation function written in JavaScript:

HKDF JavaScript Implementation

This code snippet represents the key derivation for initializing a single SRTP stream. The input parameters and the function’s output were recorded with Frida. For reconstructing the KDF algorithm, the inputs and outputs of hash functions from the libcommonCrypto library were analyzed. Three HMAC-SHA256 computations are applied for deriving the final key. I found out that the KDF is based on RFC 5869.

Call Initialization

SRTP, which is implemented by libsrtp, is applied by WhatsApp for encrypting audio data exchanged between WhatsApp clients during a VoIP call. Unfortunately, the symbols of the libsrtp library are stripped from the WhatsApp binaries. This is why we cannot trace the library’s functions by their symbol name. Instead, I followed a different approach for analyzing functions of the libsrtp library.

Many functions of the libsrtp library contain debug statements, which carry information about internal library processing. These debug statements were utilized for identifying functions of the library. I searched for string constants in the data segment of the WhatsApp binaries which can also be found in libsrtp. Then I searched for function bodies in the binaries, which are referencing these string constants. When I identified a function of libsrtp in the binaries, I copied the first 12 bytes of its hexadecimal representation. Then I used Frida for searching the hexadecimal representation in memory. This way I revealed the function’s start address which can be traced by Frida.

As an example, I explain how I revealed the usage of a libsrtp library function called srtp_aes_icm_context_init. This function is used for initializing encrypted SRTP streams, based on AES-ICM. The other functions which are analyzed in this part were traced by applying the same methodology.

The implementation of srtp_aes_icm_context_init contains two debug statements:

We can see that the string constants in the debug_print calls occur as references in the application binaries of WhatsApp. When searching the reference location, it is possible to associate the string constants with a function which encloses them. The function containing the references was revealed with the Hopper Disassembler:

Line 19 and 22 contain the references to the debug string constants. When the location of the target function within the WhatsApp binaries is known, we still have to search its memory location at runtime. This is because Address Space Layout Randomization (ASLR) is applied on iOS devices. Functions change their addresses every time a mobile application is launched.

The following code snippet demonstrates how srtp_aes_icm_context_init can be located at runtime:

The ApiResolver by Frida is applied for finding a known memory location (as an anchor), where I start a linear memory search. I use functions as an anchor, which are located closely to the target function in the binaries and have a symbol name. If a function has a symbol name, it can easily be traced with Frida. This is why URLWithUnicodeString was traced in line 3. When the anchor has been found, its location is used for starting a linear search in memory. The value of SCAN_SIZE should be chosen depending on the distance between the anchor and the target function. Line 12 contains the first 12 bytes of the target function as a hexadecimal value. Finally, a NativeFunction is created in line 17, which can be traced with Frida if the hexadecimal pattern is found. The function accepts two parameters: a pointer to the encryption context (cv) and a pointer to the encryption key (key). Before a call is started, srtp_aes_icm_context_init is called six times for initializing six SRTP streams. Two streams receive the master secret from above as key parameter.

The streams are encrypted with AES-ICM. The purpose of all streams is not clear. There is also a function called srtp_aes_icm_alloc, which was identified by the string constant “allocating cipher with key length %d”. The function accepts a key length parameter which has the value of 16 bytes for every stream. As a result, AES-128-ICM is applied for encrypting the SRTP streams. Despite the fact that 46 bytes are derived with the key derivation function, only 30 bytes are actually used for initializing the first two streams. When overwriting the remaining 16 bytes in memory, the call between two WhatsApp clients still works. This shows that these 16 bytes are not used at all!

Call Encryption

There is a function called srtp_aes_icm_encrypt which is part of the libsrtp library. This function encrypts SRTP streams of WhatsApp clients based on AES-128-ICM. The function was identified by a reference to the following string constant in a debug statement: “block index: %d”.

The following represents the hexadecimal output of a single SRTP packet encrypted with srtp_aes_icm_encrypt:

srtp_aes_icm_encrypt Output

The meaning of the first 12 bytes (red) was already explained above. The following bytes (blue) represent the actual SRTP payload. The last four bytes represent an authentication tag, which is investigated below. As there are six SRTP streams, there have to be different kind of payloads. I could not identify the actual payload content transported by each stream.

Call Integrity

This part explains how the integrity of SRTP packets is protected. The libsrtp library contains a function named srtp_hmac_compute. This function computes authentication tags for SRTP packets exchanged between WhatsApp clients. srtp_hmac_compute could be located and traced with Frida by searching for a reference to the string constant found in the function’s implementation: “intermediate state: %s”.

The function header of srtp_hmac_compute can be seen in the following:

srtp_hmac_compute applies HMAC-SHA1 for computing authentication tags. By tracing the function with Frida, I revealed the input message and the output result, as well as the value of tag_len for each sent SRTP packet. The following logs show the tag_len and the message parameters of srtp_hmac_compute during a call:

Attaching...

search srtp_hmac_compute in memory from: 0x1016380ac

found srtp_hmac_compute at: 0x10163b5f4



tag_len: 10

message: 81 ca 00 07 fe 67 2e 32 56 14 89 75 c5 c0 39 4a d3 a0 cd 48 8c 4b 61 8a 78 32 a7 89 1e b7 71 26 80 00 00 01 tag_len: 4

message: 00 00 00 00 tag_len: 10

message: 81 d0 00 02 fe 67 2e 32 b5 6f 93 8e 80 00 00 02 tag_len: 4

message: 00 00 00 00 tag_len: 4

message: 00 00 00 00 tag_len: 4

message: 00 00 00 00 tag_len: 4

message: 00 00 00 00 tag_len: 10

message: 81 ca 00 07 83 42 f3 44 81 78 9f f5 39 b1 23 50 48 19 e0 f1 61 5b b5 32 dc b3 10 08 e7 47 a8 4b 80 00 00 01 tag_len: 10

message: 81 d0 00 02 83 42 f3 44 94 60 21 fe 80 00 00 02 tag_len: 4

message: 00 00 00 00 tag_len: 4

message: 00 00 00 00 tag_len: 10

message: 81 c8 00 12 fe 67 2e 32 87 b7 69 f8 5a 27 4c 76 b4 29 f6 5d 59 26 de af bd e9 4c 8b f3 ff 48 e3 a9 7e 62 cf db 9c 8a 3d 34 50 48 f8 fc 0e 88 7a 17 eb 17 94 9f 3d 91 27 89 d5 cc bd 21 ea 01 39 27 e1 05 07 66 69 1f 68 08 53 1a 18 02 9e bc 50 ed 8e 40 3e 8a 7b d3 b6 19 e8 54 6f 6b 58 ac 4e e3 25 f5 c2 e8 1c 97 bb 46 f9 38 45 80 00 00 03 ...

There are two things I noticed:

SRTP packets with a tag length of four bytes are authenticated incorrectly. The message parameter does not contain the actual SRTP packet. Instead, the constant value of four zero bytes is used for computing the authentication tag. However, when the tags of these packets are manipulated, the call is terminated after a few seconds. Maybe my observation that the authentication tag is computed incorrectly is not right, or the packet manipulation I made was invalid (because the packet encoding was destroyed). Streams which are authenticated with a tag length of ten bytes seem to be authenticated in a right way, i.e. the packets are input to the srtp_hmac_compute function as message parameter. Despite, the authentication tags are not checked for integrity during a VoIP call session. The following code snippet shows how I have overridden the authentication tags of SRTP packets which have an authentication tag of ten bytes:

When executing the Frida script at runtime, the VoIP call still works. Hence, integrity protection of these SRTP packets is broken. The consequences of this finding are unknown, since I could not reveal what these streams are actually used for. This behavior has to be analyzed more precisely.

Conclusion

This article revealed fundamental parts of the WhatsApp VoIP protocol. I demonstrated how the analysis of network traffic, binary application files and the dynamic runtime behavior of WhatsApp clients helped to reveal protocol steps.

The results of my analysis are the following:

WhatsApp applies open source libraries like libsignal-protocol-c, libsrtp, PJSIP and mbed TLS for implementing the VoIP protocol.

A value called “master secret” is used for initializing two SRTP streams, which encrypt payloads with AES-128-ICM. The master secret is used as input for a key derivation function (HKDF), which derives keys, salts and nonces as initialization parameters for SRTP.

The Noise Pipes Protocol, the Signal Protocol and XMPP interact for transporting the master secret to the callee for setting up a call session. The master secret is encrypted with the Signal Protocol, then packed into an XMPP message, which is encrypted with the Noise Pipes Protocol, and sent to a WhatsApp server. After that, the server passes the encrypted master secret to the callee for signaling an incoming call.

Integrity protection of VoIP calls seems to have flaws. This is because some SRTP streams are not checked for integrity. Moreover, there are streams which compute invalid authentication tags with zero bytes as input, instead of the actual SRTP packet.

SRTP packets do not reveal sensitive data, except the duration of a VoIP call session.

A malicious caller is able to manipulate the initial call message. This enables an attacker to confuse WhatsApp clients, so that the callee sees unintended caller information on his device. Social engineering attacks can be realized because of this vulnerability.

For cryptographers: https://github.com/schirrmacher/files/blob/master/WhatsApp%20VoIP%20Protocol.pdf

The binaries: https://github.com/schirrmacher/files/blob/master/WhatsApp and https://github.com/schirrmacher/files/blob/master/WhatsAppCore

The conducted research faces several limitations. There are four streams, which are initialized with encryption keys from an unknown origin. Adding to this, I do not know where the keys for integrity protection of the SRTP streams come from.

To conclude, this article showed that it can be difficult for application developers to hide the implementation of mobile applications. Tools like Frida enable researchers and attackers to gather critical information about the implementation of mobile applications in a short amount of time. Application developers should bear in mind that cryptographic keys can easily be extracted with such tools. For impeding the dynamic analysis of an application, it is useful to strip symbol names from application binaries. Moreover, application developers should remove string constants, which contain critical application information or help to locate functions.