Introduction

Apple introduced a new set of features in iOS 8 and Yosemite under the name “Continuity”. These features allow iPhones to work with other iDevices such as Macs and iPads in new ways. Handoff, Instant hotspot and Airdrop are some of the new services offered by Continuity. Among these new services is one named “Call Relay”. Essentially, it allows one to make and receive phone calls via iDevices and route them through the iPhone. This is not your typical VOIP service as it’s a P2P connection based on a proprietary protocol.

In order for it to work, both devices (iPhone and the iDevice that makes/takes the call) need to be on the same WiFi. This is what caught my attention. Apple’s security white-paper is short and vague on this particular topic. Only four paragraphs are dedicated to explain how Call Relay works and the only security relevant information is as follows: “The audio will be seamlessly transmitted from your iPhone using a secure peer-to-peer connection between the two devices.”

How it works

The first step is to get a high level understanding of the protocol and how the different actors interact. Wireshark is our friend here. For easy reading, we will take the case of an incoming call from now on. That is, somebody calls the victim, his iPhone rings but he picks up the call on his MacBook.

We can differentiate several actors in different environments. The cell tower will communicate with the iPhone to process the incoming call. Next, the iPhone starts ringing but also sends a push notification to Apple alerting of an incoming call. Apple’s Push Notification Service (APNS) then sends a notification down to all the iDevices connected with the same Apple ID informing that there is indeed an incoming call waiting to be picked up. The MacBook receives this push notification which includes the internal IP and port of the iPhone and displays a popup with the caller and the option to pick-up or hang up. Finally, the MacBook sends the very first packet over LAN to that IP to look for the iPhone and waits for a response. The P2P connection is established and the voice transmission between MacBook and iPhone begins.

Environments

The protocol works on 3 different environments that are also 3 possible targets. GSM, Internet and Local Network. We could target any of these but I focused on the Local Network for a number of reasons:

GSM would imply breaking LTE or taking advantage of existing downgrade attacks. Very illegal and not what I wanted to focus on. The internet part relates to Push notifications. Apple uses an encrypted channel with cert pinning. Doing something here would imply breaking/bypassing TLS. If I could achieve that, I would not have to look further. That would be the actual vulnerability to be reported and I would get tons of street cred. The local network environment is the interesting part here. There are multiple existing attack vectors we can take advantage of like ARP spoofing, DNS spoofing, etc. It is also UDP, a connectionless protocol, fuzzing friendly. UDP does not imply encryption either. We also don’t know anything about the protocol implemented on top of UDP. It can perfectly be a proprietary protocol (it is) waiting for someone to take a look at. This is where voice payloads are transmitted. Gaining access to it was the main goal of this research.

Approach

Once we have a high level understanding of the protocol we need to get down to raw bytes and work our way up by identifying headers, counters, checksums, payloads, etc.

Initially, I collected samples of calls under the same circumstances (same devices, same Apple ID, same phone number, same behavior, etc.). I therefore had samples I could compare and look for bytes that changed. I also collected samples in slightly different circumstances for further comparison with the hope of finding additional blocks of changing bytes that could identify OS versions, user ids, etc. This is all part of reversing a network protocol. Looking for changes, finding patterns, spot basic components of the protocol (headers, checksums, etc.) and figure out packet structures (header + counter + payload for example).

Doing all this by hand is tedious and hard. While our brain is very good at finding patterns, a protocol like this which can be thousands of packets in a 3 second call is just too much data to analyze manually. Therefore, I took advantage of a fantastic tool called Netzob. In a nutshell, Netzob (among other things) helps you figure out details of the protocol by finding patterns, data formats, structures, etc. To achieve this, it uses complex mathematical models and statistical analysis. In other words, you feed samples into Netzob, it does some magic and gives you a nice representation of the data divided into sections that have some sort of relationship.

I don’t want to extend this post with all the details of the protocol that I was able to reverse. I will only talk about the specifics that led to the vulnerabilities I found. If you’d like to know more, take a look at the talk I gave at Ekoparty (in Spanish) or at Kaspersky Security Analyst Submit (in English however note that it’s a turbo talk so I had to skip many parts).

Apple’s Call Relay Protocol

After many weeks of observing, coding and try/errors I had a pretty good idea of the protocol details. I used scapy to write some scripts and impersonate both the iPhone and the MacBook. This was helpful to validate my guesses and try some of the initial attack vectors I had in mind. You can find the scripts in my GitHub repo, however note that these are not PoCs, just quickly written code to run tests. Still, it is useful if you want to check out the protocol for further vulnerabilities and get a deeper understanding of its structure.

Also, keep in mind that all conclusions in this research are based on guesses and tests. There could be inaccurate assumptions I made while reversing the protocol.

The protocol consists of 4 different phases:

Discovery

This is the very first packet sent by the MacBook to the iPhone on an incoming call. The iPhone is waiting for this packet and upon receipt, it will flip 4 bits and send it back to the MacBook. This resembles a SYN-ACK.

Identification

After the previous 2 packets, another 2 packets follow that define the Identification phase. This time, it is 2 bytes that the iPhone changes when it responds. But the most interesting part here is the field 9. The bytes are all in the printable range. When we format them as a string, we can clearly identify a UUID. Most importantly though, we have the first proof of unencrypted text! These bytes are a perfect candidate for tampering/fuzzing.

Negotiation

The first thing we notice here is that the header changed from “0f” to “2004000400”. This alone tells us that we are in a different stage of the protocol. If we look at all packets with that same header, we can spot a couple interesting things. First, we see in red that the MacBook and iPhone agree on 8 random bytes by exchanging 4 bytes each. The MacBook generates and sends the first 4 bytes and the iPhone responds the same way. In the next packet, highlighted in green, we can see that the MacBook uses the agreed bytes as a type of counter (observe that just the last byte changes increasing by one). In the following packets the bytes increase but not exactly sequentially. The MacBook increases just this part of the counter while the iPhone does the same thing for the 4 bytes it generated.

Another interesting event is highlighted in orange. This field probably represents different stages of the packets in this phase that contain all the same header. This is relevant because it will lead to the first vulnerability I found.

Sound transmission

Again, we can observe that packet headers change from “2004000400” to “e000” which helps us identify a new phase. Just like in the identification phase, by using a different representation to display the bytes in Field 1-1, we see that it is a sequential decimal counter. Both parties decide randomly where to start in every call but something to keep in mind is that the key space is 2 bytes. Per my tests, the counter will wrap around every 20 minutes given the huge number of packets sent during a call. This could be used for encryption purposes, opening the possibility for replay attacks for example.

Field 1-2-1 is a per-call static value where each device has its own. By reversing this protocol and making hundreds of calls, I observed that it is not random. It always increases but I could not figure out based on what. This counter would take way longer to wrap around as it only changes per call, not per packet as the counter from Field 1-1. The key space is also bigger. The last field identified in blue is just random bytes. This is the actual audio being transmitted therefore we need to figure out how it is encoded and if it is encrypted.

Finding vulnerabilities

Now that we have a better understanding of the inner workings of the protocol, we can start looking for vulnerabilities. The most important thing is to have a scope and a goal. It is easy to start going down the rabbit hole and lose focus. It happened to me and there were times I had to take a step back and re-evaluate all the info I gathered.

There is a number of things I wanted to exploit and failed:

Eavesdrop ongoing calls Decode/Decompress/Decrypt voice payloads Replay attacks Redirect voice payload to attacker’s device

Make calls on behalf of the victim

Inject voice payloads

But there were also a number of things I tested and succeeded:

DoS calls

Spy on victims by leaving their mic open

Impersonate callers on multiparty calls

DoS calls

While this is not a significant attack, it is still interesting as it demonstrates how useful fuzzing can be. I was able to forge a packet that will end any call for any victim. You just need to send one packet and the call ends immediately.

Remember the different phases of the protocol? I wondered what would happen if I sent packets from different phases out of sync. For example, sending a “Call negotiation phase” packet during the “Sound transmission phase”. Indeed, the protocol does not expect this type of packet but does not handle the edge case either. Instead, the iPhone starts to negotiate a new call but because the MacBook is sending other types of packets the flow is messed up and the call ends.

The challenge here is to forge a “joker” packet. A packet that works for any call, victim and machine without having to know anything in advance like call ID, etc. As you can see in the picture above, I selected the first packet of the negotiation phase. It includes multiple fields related to the specific call and that is something we want to avoid. I started nullifying bytes to find which ones I had to respect in order for the iPhone not to ignore the packet and which ones I could set to null. After a long try-error exercise I got the magic DoS packet:

20040004 000000000000000000 b002 000000000000000000000000000000000000000000000000000000000000

Observe that I just have to respect the packet length, the header and the b002 field that looks like a packet type identifier. If you send this packet to the iPhone or MacBook the call ends immediately. You could use it to flood the machine and prevent the victim from picking up calls using this protocol.

Spy on victims by leaving mic open

This was my main goal. Somehow be able to spy on victims. I tried everything but truth is:

I could not eavesdrop calls

I could not inject voice data

I could not replay voice data

I could not redirect voice data

Use of encryption

So, how could I possibly do this if I could not get access to voice payloads? This is a good time to bring up Adi Shamir’s famous quote:

“In the future, cryptography won’t be broken, it will be bypassed”

I had to find some kind of side channel attack to get access to the victim’s voice. It is at this point when I started to think about how hanging up worked. I thought that, just as the other phases, if I hang up on the MacBook a message or special packet is sent to the iPhone informing that the call was ended. I collected a network trace including hanging up and checked it.

Nothing. All I could see was that the voice payload packets stopped. This was very odd as somehow, the MacBook had to inform the iPhone that the call ended so the iPhone could *actually* terminate the call over GSM. It took me a while to understand that sometimes you need to look at the bigger picture. I looked at the network trace again but this time, I did not just include the traffic between the iPhone and the MacBook but also traffic to APNS.

Aha! You can see in pink that when I hung up, some packets are sent to APNS.

What’s going on? Basically, when you hang up on the MacBook, instead of using the P2P connection to the iPhone to send the message, it sends a Push Notification to Apple informing that the user clicked “Hang Up”. Then, Apple sends another Push Notification down to the iPhone with the same message. At that time, the iPhone closes the ports and terminates the call. The MacBook cannot reach the iPhone anymore and the call is finished.

This is a very bad design. Apple implemented the protocol in a way in which a critical message such as “hang up” is delivered over an insecure channel (APNS). It is not me saying it, Apple itself claims:

“Push notifications are not guaranteed to be delivered”

“Do not rely on Push Notifications for sensitive actions”

Apple did not follow their own recommendations. Also OWASP mobile TOP 10 includes this type of issues.

With this in mind, how can we take advantage of it? What would happen if I could prevent the “hang up” message from being delivered?

By taking advantage of an ARP spoofing MiTM, I am able to block all outgoing traffic to APNS coming from the victim. This way, Apple never finds out that the victim hangs up and cannot notify the iPhone. At the UI level, the victim cannot notice anything as everything happens as expected. The popup fades away and the Facetime app closes gracefully. The problem is that the coreaudiod daemon which handles the call under the hood keeps running and the call is never terminated. The victim sees and believes that he hung up but the call is still ongoing with the attacker listening on the other side.

In other words, an attacker on the same network as the victim is able to call the victim, let him “hang up” (“Sorry, wrong number”) and keep listening on the line everything that is going on in the room. Watch a demo:

Impersonate caller on multiparty calls

Once I found out that Apple was using push notifications to deliver critical messages, I started to think what else was transmitted this way. I found that this protocol supports multi-party calls. That is, if you are on a call and someone else calls you, you have the ability to put the current call on hold and switch to the incoming call to answer. Then, you have the freedom to switch between calls as needed.

I found that the “Switch call” message is delivered over push notification as well. This means that an attacker can prevent the message from being delivered but the victim will still see the change in the UI and therefore think he is talking to the other caller. This becomes especially powerful if we combine both issues: block switching calls and hanging up.

Long story short, an attacker on the same network is able to fingerprint the victim’s traffic to detect ongoing calls. He then calls the victim which may place the current call briefly on hold to pick up. The attacker waits for the victim to switch calls back or even hang up but blocks those packets. Effectively, the victim will see in the UI that he is talking to the first caller again but he will still be connected to the attacker. Watch another demo:

DIY Spy program

At this stage, I knew these vulnerabilities could be used to spy on a girlfriend, colleagues, roommates or someone whose phone number you know and have connected to the same WiFi. Still, I wanted to see what else I could do given that:

I can interrupt calls

I can gather calls metadata

I can impersonate callers

I can leave microphones open

My good friends Federico Kirschbaum and Alexis Porros gave me the idea of looking into the possibility of building a cheap Government-like Spy Program while having drinks at Defcon. The idea was to leverage these vulnerabilities and see how they could be weaponized, massively distributed and exploited.

I already had the vulnerabilities and scapy scripts with working demos. I had to think how to distribute them massively and exploit them.

This part of the research is not focused on writing working malware or providing tools to make this happen. This part is limited to discuss the feasibility, the requirements and provide a scenario that would make it possible.

Distribution

In order to exploit these vulnerabilities we need a MiTM. Our ideal targets here are routers as it effectively gives us a MiTM scenario if we hack it. It is no secret that gaining access to routers can be trivially easy. There has been multiple cases of criminals targeting routers using default credentials exposed through services like Shodan. There are also cases in which vulnerabilities are published for specific router models that again, are exposed and traceable by specialized search engines. Many routers stay unpatched as updating their firmware is not common among most of Internet users.

Exploitation

There is a number of requirements we need to meet in order to be able to exploit these vulnerabilities massively:

We need to target Apple devices only We need to be able to drop APNS packets We need to know the victim’s phone number

Detecting Apple devices on the network

Once we are in control of thousands of routers as explained in the previous section, we only want to target Apple devices. In order to do so, we need a way to find them on the network and find out their internal IP.

MAC addresses are unique identifiers for network interfaces that contain special bytes that identify the vendor. Specifically, the first three octets (Organizationally Unique Identifier). The IEEE Standards Association offers a public database with all the OUI and the vendor it belongs to. This way, we can identify if a MAC address belongs to an Apple product.

We can find out which MAC addresses are operating on the network and their corresponding internal IPs by looking up the ARP table. With the list of MAC addresses, we can observe the first 3 octets and check the database to verify if those correspond to an Apple device.

Running a command like this one will get you the first 3 octets of every MAC address registered on the router:

arp -a | awk '{print $4}' | while read mac; do echo $mac | grep -io [0-9A-F][0-9A-F]:[0-9A-F][0-9A-F]:[0-9A-F][0-9A-F] | head -n 1; done | while read splitmac ; do echo $splitmac | tr -d : ; done

Dropping APNS packets

In the demo I show that with an ARP spoofing attack we can reroute the victim’s and router traffic through me and drop packets as needed. In the scenario in which the attacker is positioned on the actual router it works differently. The victim is connected to the APNS with a persistent connection. Dropping packets on a persistent connection is not easy but there are tools that simulate a connection termination (FIN-ACK-RST). After some cross-compiling my tests showed unreliable results terminating connections to the APNS.

I came up with a more straightforward way of doing this but it’s also more aggressive. The goal is to reset the external connection to APNS while leaving internal traffic flowing. Routers have typically 2 network interfaces: the internal for LAN traffic and the external for Internet.

As an attacker, we can briefly shutdown the router’s external interface and enable it again. This will kill persistent connections that go out to the Internet while maintaining the P2P traffic between the iPhone and MacBook flowing. A command like the following will do:

ifconfig eth0 down && sleep 2 && ifconfig eth0 up

Knowing the victim’s phone number

The last step is to know the victim’s phone number so we can briefly call when we want to spy on them. Because we are in control of thousands of routers, we have thousands of potential victims. We need a way to find out the number of the victim and correlate it to his router.

The first step is to obtain the public IP to get an approximation of the area the router is located in. Next, we want to obtain the BSSID. With all this information, we can use a service like wigle.net to get a pretty good idea of the physical location of the router. Now we have control over a router that we can check for Apple devices and we know it’s physical location.

In order to find phone numbers of victims on that router we can go oldskool and use wardialing. Wardialing requires a lot of phone calls but we can leverage the knowledge we have about the physical location of the router to significantly reduce them. For example, I found my own router in wigle.net with a very precise location. I live in San Francisco which area code is 415. Phone number are 10 digits which means I would have to wardial “just” 7 digits when targeting routers in San Francisco. Because we understand the protocol, we can easily fingerprint the network traffic and detect incoming calls. Therefore, I can monitor all the routers I pwned in San Francisco for incoming calls and start wardialing. This will allow me to correlate phone numbers to routers.

Conclusion

As mentioned, the method I described of building a spy program is just an exercise of discussing the feasibility and needs to be taken with a grain of salt. An attacker has to control and remotely manage thousands of routers, the technique to drop APNS traffic is very disruptive and wardialing requires that the iPhone and MacBook of the victim are on the same network at the time of the call. Still, it is a great exercise to think how to broaden the impact of vulnerabilities and it keeps the hacker mindset sharp :).

I am very interested in knowing how others would have done it. How could we improve the feasibility of building a spy program based on these vulnerabilities? Let me know as I am genuinely curious what others with more knowledge and imagination than me would have done differently.

Timeline

As always, I did disclose my findings to Apple responsibly. There were some hiccups, miscommunications and hurdles during the process but finally everything was sorted out and fixes were deployed. Part of the problem during the process was the fact that when iOS 9.3.5 and Sierra beta came out I tested again and found regressions. Finally, all issues were fixed in iOS 10.1 and MacOS 10.12.1.

Apple issued 4 CVEs: