This article demystifies how Bluetooth Low Energy (BLE) technology has been adapted to play a key role in almost all COVID-19 apps across the world.
We start by looking at the current context of the pandemic, understand a broad overview of Digital Contact Tracing Systems (DCTS) and narrow down our attention to the underlying network communication layer. We talk about how Bluetooth Low Energy (BLE) acts as the foundational technology for the majority of the automated contact tracing systems although originally it was not envisaged for such a purpose. We also outline some of the privacy and security issues applicable for device-to-device communication using BLE.
We are in the midst of a pandemic that seems unrelenting. Nation states all over the world are fighting a long and strenuous battle against the devious contagion. While mass vaccination appears to be the only long-term solution, various epidemiological tools are being used by all the countries to bring some semblance of control on the unpredictable nature of its spread. One such time-tested tool is called contact tracing.
In the human-led contact tracing process, a representative from the health authority interviews an infected individual and attempts to identify the list of people who might have come in close contact to that person (diagnosed positive) in the last 14 days or so (or as recommended by the epidemiologists). Subsequently, the health representative (or the contact tracer) contacts those identified individuals and informs them that they could be at the risk of being infected. They are also advised to take the appropriate next steps, like – testing, quarantining, self-isolating, etc.
Human-led contact tracing is error-prone since it depends upon the ability of the infected individual (let’s call him Bob) to recollect the close contacts from past several days, many of whom could be absolute strangers to Bob. Moreover, when a pandemic is spreading fast, the health authority could struggle to scale-up the manual contact tracing effort as it may not be possible to have so many personnel recruited and trained in a short span of time.
Digital Contact Tracing Systems (DCTS) have been envisaged to bridge this gap, solve the challenges of missed contacts or limited scalability and thereby supplement the manual contact tracing process. Also, in the manual approach, the human contact tracers are expected to maintain the confidentiality of Bob’s identity from his close contacts, which can never be fool-proof. In a DCTS, such privacy and security measures can be implemented in a robust manner that can withstand the attacks from adversaries.
A brief overview of the architecture
Most of the current Digital Contact Tracing Systems are designed as smartphone apps that communicate with one or more back-end servers. So in most countries in the world, Bob and his close contacts must have smartphones to take advantage of any automated contact tracing system. People who carry feature phones or do not have personal mobile devices (including many elderly individuals or children) would still have to rely upon the manual contact tracing process. Digital Contact Tracing apps are also referred to as COVID-19 apps.
The core functionality of an automated contact tracing system is to reliably detect and notify close encounters with infected individuals without revealing anyone’s identity in the process. Since identity cannot be revealed, the apps are designed not to share Personally Identifiable Information (PII) like name, location, phone number or device identification number (e.g., IMEI) between any two devices. During close encounters, each app shares pseudo-random byte-strings (also called “proximity identifiers” or “ephemeral IDs”) with other apps and logs the byte-strings that it receives from the nearby apps in its local device storage. These pseudo-random numbers act as pseudonymous identifiers (and not as anonymous identifiers as they can be mapped back to the originating secret keys) and are rotated at a regular frequency so that the movement of devices cannot be tracked through any of these identifiers. Figure 1 illustrates a sample exchange of 16-byte pseudonymous identifiers between two devices (A and B) having the same COVID-19 app installed in each of them.
Conceptually a COVID-19 app can be deemed to have three layers, where the topmost layer (say Layer-1) takes care of the User Interface (UI), the middle layer (say Layer-2) is called the API layer (most of the encoding-decoding and encrypting-decrypting functionalities reside in this layer) and the bottom-most layer (Layer-3) stands for the link layer that supports the packet communication. Conceptually, Layer-3 is divided in two parts and each part takes care of a particular type of communication. One type of information exchange occurs between the mobile device and the back-end server(s) through the encrypted mobile data network. The other type of packet exchange takes place between the neighbouring devices that have the DCT app installed. In this article, we primarily focus on the technology and protocol used in this latter type of communication. Figure 2 illustrates a basic schematic for this architecture.
There are three crucial requirements of the device to device communication in any DCTS. First, it should take place when the devices are physically near to each other so that a device can be realistically considered a proxy to the person carrying it for the purpose of detecting close contact events. Second, no device should be able to discover the identity of any other device or for that matter any personally identifiable information of its user. Third, the communication should take place autonomously, without an active intervention of the user at every step. Also, the overhead of such information exchange should be low.
When we consider the above requirements, the usual channels of communication like mobile to mobile call, SMS or any other instant messaging service would have to be ruled out. What technology can then be used to meet the needs of DCT apps? As it turns out, some of the possible candidates are WiFi, ultrasound and Bluetooth.
A group of researchers have created prototypes, deployed the same in a couple of university campuses and published their results in a paper in the month of May 2020, where they have proposed “a network-centric approach for contact tracing that relies on passive WiFi sensing”. Also, a company named Blyncsy claims to have come up with a contact tracing technology using WiFi.
Similarly, another approach is to use ultrasound as it has been observed that smartphones emit ultrasound signals. Currently NOVID is the only team in the world that has come up with an app that utilizes ultrasound and possesses sub-meter level accuracy in contact detection. However, NOVID app has WiFi and Bluetooth capabilities as well.
Bluetooth Low Energy (BLE) in DCTS
While there are examples of COVID-19 apps that utilize WiFi and ultrasound technologies (as discussed in the previous section), the large majority of these systems use Bluetooth as the underlying technology of communication. Within Bluetooth also, there is a low energy variant, called the Bluetooth Low Energy (BLE), that has caught the attention of most of the system designers and implementers. We would talk about some of the basics of BLE and delve deep in the ways that different DCT solutions utilize it, in the remaining part of this article.
Bluetooth Special Interest Group (Bluetooth SIG) originally proposed BLE as a wireless personal area network technology for “novel applications in the healthcare, fitness, beacons, security, and home entertainment industries.” BLE is natively supported by a host of mobile operating systems including iOS, Android, Windows Phone etc. It consumes much less power than the classic Bluetooth, operates in the same spectrum range and both the protocols can co-exist with each other in a mobile device. The possibility of using Bluetooth communication for contact tracing during an epidemic was shown as feasible through detailed simulations by a group of researchers and the results were published way back in 2014. During the current pandemic, this idea was adapted and expanded by multiple groups of experts and scientists from different countries and most of these protocol designs considered BLE as the underlying technology for device-to-device communication.
It’s important to understand that the BLE layer does not have any provision to encrypt packets. So packets exchanged between devices through BLE are transparent and anyone may eavesdrop and observe the data contained in their payloads. However, in case of COVID-19 apps this fact does not pose any direct threat to privacy or security. Let us understand why that is the case.
We have already mentioned earlier that the DCT apps share constantly rotating pseudo-random identifiers when any two devices come in proximity to each other. So when Alice and Bob meet for a brief period of 15 minutes in a park, Alice’s device (say A) emits a pseudo-random number (say RPI_1) for initial 10 minutes and then another number (say RPI_2) for the remaining 5 minutes and Bob’s device (say B) detects packets containing these numbers and stores those in B’s local storage. Similarly A also captures the numbers shared by B during this time-window. Even if an eavesdropper (say Eve) intercepts some of the packets and stores such data, she would not be able to determine any personal or Personally Identifiable Information (PII) of Alice or Bob solely through these packets. This is why the unencrypted nature of a BLE packet does not pose any serious complication. However, there could still be a possibility of a linkage attack – we would talk about it in a while.
If nothing can be determined from any of these pseudo-random identifiers then one may ask how these data packets may eventually be utilized for the purpose of contact tracing. Let us take a moment and clarify this doubt.
Broadly speaking, in every DCTS, there is a mechanism to store the secret keys or some variants of those keys in either the local storage of the devices or in a back-end server. When a user (say Bob) is diagnosed positive, the keys or the sent/received pseudo-random identifiers of his device (B) are analyzed to determine the possible matches with the keys or the sent/received identifiers of another user’s (say Alice’s) device (A). The fact that these identifiers can be regenerated anytime from their originating keys make the identifiers pseudonymous instead of anonymous. Hence, the system has a mechanism to determine the close encounters through the usage of secret keys even though the pseudo-random identifiers on their own may appear gibberish to Eve, the eavesdropper.
The communication of BLE packets may take place either in a broadcast topology mode or in a connected topology mode.
In the broadcast mode, a device alternates between two separate roles – the broadcaster role and the observer role. In the broadcaster role, the device periodically sends advertising packets that can be received by any other device. In the same way, in the observer role the device periodically scans (in a pre-set frequency) to receive advertised packets from other devices. During broadcasting, one device may send packets to multiple other devices and it works in a non-connectable fashion. In the broadcast mode, the payload size is restricted to be at most 31 bytes. There is an option to send a larger payload (called Scan Response) as well through which this size can be extended upto a maximum of 62 bytes. When a DCT protocol uses broadcast mode, the app would have no clue during transmission of proximity identifiers about whether any other nearby device/app is receiving the packets or not.
In the connected topology mode, the device to device communication takes place in pairwise fashion, where one device plays the role of Central (or master) and the other plays the role of Peripheral (or slave). A device alternates between the two roles. The connection is established by the device playing the role of Central when it detects packets advertised by another device running in Peripheral mode. A Peripheral can connect with multiple Centrals at a time and a Central can also connect with multiple Peripherals. In this mode of communication, a device always knows whether any other device has actually received the packet during device-to-device communication or not.
The higher layer DCT protocol can be designed in two different ways – one in which the protocol is dependent upon unidirectional packet communication. For example, if B receives a packet sent by A, it is not necessary for A to receive B’s packet for the DCT protocol to function properly. Almost all the implemented protocols fall in this category. However, there is a design (called DESIRE) by the PRIVATICS Team from Inria, France, where the mutual exchange of packets is mandatory for detection of “close contact” events.
Each BLE packet includes a 48-bit MAC-address called the BLE MAC. Interestingly, it is also a pseudo-random number and the BLE protocol layer rotates this address at a regular interval. Now consider a situation where the proximity identifier is rotated by a DCT app after a fixed duration of 10 minutes and the BLE MAC remains unchanged at the boundary of the 10 minute interval. Even when the BLE MAC changes at an interval of 10 minutes but the intervals are not aligned with the intervals of proximity identifier (see Table 1), an observer can easily link all the packets and identify that they have been emitted by the same device. This is a type of linkage attack.
This is a potential privacy risk and many currently rolled out automated contact tracing systems suffer from this issue. A typical example is BlueTrace, the protocol underneath Singapore’s TraceTogether app, which has not taken any special measure to counter this type of attack. On the other hand, the Apple Google Exposure Notification Framework has taken care of this issue by ensuring that the MAC address changes at the same time whenever the proximity identifier is rotated by the cryptographic API-layer. However, the way it has currently been implemented (by restarting the BLE layer every time the proximity identifier changes) appears suboptimal due to the associated performance and latency issues.
An important aspect that every DCT app must consider during design and implementation is to take care of the signal strength variations in BLE communications. The signal strength can be utilized as a measure of distance between the two devices and log entries in one device only when the other device is close enough (say within a distance of 2 meters). At the same time, we must keep in mind that the devices from different manufacturers use different levels of signal strengths and there is no uniformity or standardization. This means every app must be well calibrated to interpret the signal strengths from nearby devices properly – else there would either too many “false positives” (where devices are too far apart in reality but the apps still log the encounters) or too many “false negatives” (where genuine close contact events would be missed due to conservative assumptions on signal strengths). TraceTogether has published extensive data on the signal strength calibration performed before the launch of the app.
Let us investigate the payload part of a BLE packet. There is no uniform agreement about its structure among different COVID-19 apps and hence it varies from one DCT protocol to the other. Undoubtedly the most important part of the payload is the pseudo-random proximity identifier. Again the length of the identifier is not the same across different systems. Apart from the identifier, most of the protocols have a provision to include certain meta-data like the transmitting signal strength, protocol version, time-stamp etc. It is advisable to encrypt the meta-data so that no active attacker (i.e., an adversary who can not only eavesdrop but also modify a packet and resend) may create a spurious packet by combining the identifier from one packet and meta-data from another. In some of the protocols, the payload also contains an additional Message Authentication Code to ensure the authenticity of the entire payload irrespective of whether the packet includes any meta-data or not.
What can be the future direction of BLE technology when it comes to the use-case of automated contact tracing systems? The Bluetooth Special Interest Group (Bluetooth SIG) has already announced that they would soon come up with a specification targeted for the wearable devices so that exposure notification functionality can be extended for such usage in embedded systems. We may possibly expect that the current drawbacks like unpredictable rotation frequency of BLE MAC address, lack of native encryption support or packet-size limitation would get addressed in future versions. There can also be better predictability of signal strength variation over distance for each BLE implementation irrespective of the OS-type (e.g., Android or iOS or any other), the exact version or the manufacturer of the device. While we wait to see these changes, who knows – we may be surprised to see an altogether new technology replacing the Bluetooth Low Energy and proving as a better alternative to it in the near future. Let us wish ourselves Godspeed on that hope!