7 in 10 smartphone apps share your data with third-party services
 
                Where are all the data going? nmedia via shutterstock.com
Our mobile phones can reveal a lot about ourselves: where we live and work; who our family, friends and acquaintances are; how (and even what) we communicate with them; and our personal habits. With all the information stored on them, it isn’t surprising that mobile device users take steps to protect their privacy, like using PINs or passcodes to unlock their phones.
The research that we and our colleagues are doing identifies and explores a significant threat that most people miss: More than 70 percent of smartphone apps are reporting personal data to third-party tracking companies like Google Analytics, the Facebook Graph API or Crashlytics.
When people install a new Android or iOS app, it asks the user’s permission before accessing personal information. Generally speaking, this is positive. And some of the information these apps are collecting are necessary for them to work properly: A map app wouldn’t be nearly as useful if it couldn’t use GPS data to get a location.
But once an app has permission to collect that information, it can share your data with anyone the app’s developer wants to – letting third-party companies track where you are, how fast you’re moving and what you’re doing.
The help, and hazard, of code libraries
An app doesn’t just collect data to use on the phone itself. Mapping apps, for example, send your location to a server run by the app’s developer to calculate directions from where you are to a desired destination.
The app can send data elsewhere, too. As with websites, many mobile apps are written by combining various functions, precoded by other developers and companies, in what are called third-party libraries. These libraries help developers track user engagement, connect with social media and earn money by displaying ads and other features, without having to write them from scratch.
However, in addition to their valuable help, most libraries also collect sensitive data and send it to their online servers – or to another company altogether. Successful library authors may be able to develop detailed digital profiles of users. For example, a person might give one app permission to know their location, and another app access to their contacts. These are initially separate permissions, one to each app. But if both apps used the same third-party library and shared different pieces of information, the library’s developer could link the pieces together.
Users would never know, because apps aren’t required to tell users what software libraries they use. And only very few apps make public their policies on user privacy; if they do, it’s usually in long legal documents a regular person won’t read, much less understand.
Developing Lumen
Our research seeks to reveal how much data are potentially being collected without users’ knowledge, and to give users more control over their data. To get a picture of what data are being collected and transmitted from people’s smartphones, we developed a free Android app of our own, called the Lumen Privacy Monitor. It analyzes the traffic apps send out, to report which applications and online services actively harvest personal data.
Because Lumen is about transparency, a phone user can see the information installed apps collect in real time and with whom they share these data. We try to show the details of apps’ hidden behavior in an easy-to-understand way. It’s about research, too, so we ask users if they’ll allow us to collect some data about what Lumen observes their apps are doing – but that doesn’t include any personal or privacy-sensitive data. This unique access to data allows us to study how mobile apps collect users’ personal data and with whom they share data at an unprecedented scale.
In particular, Lumen keeps track of which apps are running on users’ devices, whether they are sending privacy-sensitive data out of the phone, what internet sites they send data to, the network protocol they use and what types of personal information each app sends to each site. Lumen analyzes apps traffic locally on the device, and anonymizes these data before sending them to us for study: If Google Maps registers a user’s GPS location and sends that specific address to maps.google.com, Lumen tells us, “Google Maps got a GPS location and sent it to maps.google.com” – not where that person actually is.
Trackers are everywhere
            
              Lumen’s user interface, showing the data leakages and their privacy risks, found for a mobile Android game called ‘Odd Socks.’
              ICSI, CC BY-ND
More than 1,600 people who have used Lumen since October 2015 allowed us to analyze more than 5,000 apps. We discovered 598 internet sites likely to be tracking users for advertising purposes, including social media services like Facebook, large internet companies like Google and Yahoo, and online marketing companies under the umbrella of internet service providers like Verizon Wireless.
            
              Lumen’s explanation of a leak of a device’s Android ID.
              ICSI, CC BY-ND
We found that more than 70 percent of the apps we studied connected to at least one tracker, and 15 percent of them connected to five or more trackers. One in every four trackers harvested at least one unique device identifier, such as the phone number or its device-specific unique 15-digit IMEI number. Unique identifiers are crucial for online tracking services because they can connect different types of personal data provided by different apps to a single person or device. Most users, even privacy-savvy ones, are unaware of those hidden practices.
More than just a mobile problem
Tracking users on their mobile devices is just part of a larger problem. More than half of the app-trackers we identified also track users through websites. Thanks to this technique, called “cross-device” tracking, these services can build a much more complete profile of your online persona.
And individual tracking sites are not necessarily independent of others. Some of them are owned by the same corporate entity – and others could be swallowed up in future mergers. For example, Alphabet, Google’s parent company, owns several of the tracking domains that we studied, including Google Analytics, DoubleClick or AdMob, and through them collects data from more than 48 percent of the apps we studied.
            
              Data transfers observed between locations of Lumen users (left) and third-party server locations (right). Traffic frequently crosses international boundaries.
              ICSI, CC BY-ND
Users’ online identities are not protected by their home country’s laws. We found data being shipped across national borders, often ending up in countries with questionable privacy laws. More than 60 percent of connections to tracking sites are made to servers in the U.S., U.K., France, Singapore, China and South Korea – six countries that have deployed mass surveillance technologies. Government agencies in those places could potentially have access to these data, even if the users are in countries with stronger privacy laws such as Germany, Switzerland or Spain.
            
              Connecting a device’s MAC address to a physical address (belonging to ICSI) using Wigle.
              ICSI, CC BY-ND
Even more disturbingly, we have observed trackers in apps targeted to children. By testing 111 kids’ apps in our lab, we observed that 11 of them leaked a unique identifier, the MAC address, of the Wi-Fi router it was connected to. This is a problem, because it is easy to search online for physical locations associated with particular MAC addresses. Collecting private information about children, including their location, accounts and other unique identifiers, potentially violates the Federal Trade Commission’s rules protecting children’s privacy.
Just a small look
Although our data include many of the most popular Android apps, it is a small sample of users and apps, and therefore likely a small set of all possible trackers. Our findings may be merely scratching the surface of what is likely to be a much larger problem that spans across regulatory jurisdictions, devices and platforms.
It’s hard to know what users might do about this. Blocking sensitive information from leaving the phone may impair app performance or user experience: An app may refuse to function if it cannot load ads. Actually, blocking ads hurts app developers by denying them a source of revenue to support their work on apps, which are usually free to users.
If people were more willing to pay developers for apps, that may help, though it’s not a complete solution. We found that while paid apps tend to contact fewer tracking sites, they still do track users and connect with third-party tracking services.
Transparency, education and strong regulatory frameworks are the key. Users need to know what information about them is being collected, by whom, and what it’s being used for. Only then can we as a society decide what privacy protections are appropriate, and put them in place. Our findings, and those of many other researchers, can help turn the tables and track the trackers themselves.

Narseo Vallina-Rodriguez receives funding from NSF and DataTransparencyLab.
Srikanth Sundaresan receives funding from the Hewlett Foundation and the Princeton CITP IoT Security and Privacy Consortium, and has received funding in the past from the National Science Foundation.
 
                       
                       
                       
                       
                       
                      