VisibleV8 on Browser Fingerprinting Detection
Alexandros Kapravelos, Junhua Su
Visiting websites with a browser is part of our daily routine today. However, this simple action involves countless components hidden under a beautiful user interface. Taking a deeper look, visiting a website can be seen as a client visiting a web server where the browser takes a central role between the client and the server. From the client side, the browser accepts the user’s input and sends requests to the server that the user wants to visit. From the server side, the browser takes the server’s code and executes the code on the user’s machine to render and display the webpage.
Nowadays, online advertisements grow rapidly due to billions of web users. To gain more profit from advertising, tracking techniques are used for target advertising. Browser fingerprinting is a stateless tracking technique that collects personal information from users. The majority of browser fingerprinting techniques are implemented in browser APIs. Initially, browser APIs were created to help web developers implement advanced features. For example, browser APIs can be used to inform browser type, language, and plugins. In 2010, Eckersley found that different users tend to use a combination of personalized browser languages, plug-ins, and other configurations. These APIs return information that categorizes different groups of people. Combined with information from a list of browser APIs that collect user information, the tracker is able to uniquely identify users without their consent or even notice. Using more APIs that return personally identifiable information about the user helps to fingerprint more users with higher accuracy.
We dive deeper into the browser and study how scripts use its functionality with the help of VisibleV8. We build an automated system that discovers the browser APIs involved in browser fingerprinting. We envision browser vendors incorporating our system to regularly monitor the privacy risks of released browser features in the wild.
As shown in Figure 1, our system starts by feeding the Top 10K domains into a crawler with VisibleV8 implemented. After that, we extract useful data from VisibleV8 logs with a new post-processor. Then, we use a locality algorithm to find APIs executed closely to known fingerprinting APIs. We apply data flow analysis based on JStap to verify the results of the locality algorithm. Finally, we obtain a list of APIs that are suspicious for fingerprinting.
The core algorithm of our system, called the locality algorithm, is motivated by an observation that programmers normally write codes in a file to achieve one goal. Also, browser fingerprinting has a special characteristic that collecting more information helps uniquely identify more people. Additionally, adopting more fingerprinting APIs brings higher accuracy and more coverage without a noticeable cost. Combining them together, we conclude that the tracker tends to use considerable fingerprinting APIs in one script to fingerprint users. When the file is executed, considerable fingerprinting APIs will be executed in sequence. In other words, fingerprinting APIs should cluster together in arrays of executed APIs collected during crawling.
Applying our system to crawl the Top 10K websites, we get the data stated in Figure 2. The second row demonstrates the number of origins in these websites. The number of crawled origins exceeds the number of domains by around three thousand. It is reasonable because there could be multiple origins under the same domain including both first-party origins and third-party ones. Then, it comes to the number of scripts which is 474,659. The total number of API calls is over 0.7 billion. Among them, one API can be called multiple times. By removing duplicate calls, there are over 0.2 million unique APIs executed. Using the WebIDL file, we obtain over 0.2 billion browser API calls from all API calls. Around one-third of API calls are from browser API and we get more than two thousand unique browser API calls.
Our system discovers 249 browser APIs that are abused for browser fingerprinting. We analyze the fingerprinting ability of these discovered browser APIs in the following paragraphs.
The window interface is a global object that corresponds to the browser window including the DOM document. The globality makes it easy to access and suitable for third-party APIs. A strong connection to the browser window provides trackers with an interface closer to users’ personalized information. The Window object itself can be used for fingerprinting by iterating the Window object and then hashing the sum of features available in the Window object. It’s feasible for browser fingerprinting because different browser types and versions equip different features in the Window object.
During iteration, each available feature will be subsequently called or accessed and the pattern is what we expected to catch. This technique can be applied to objects that have a large number of features like Navigator, WebGL, and Canvas. Theoretically, every object with its own properties and functions can become a target under this technique. The existing well-known compatibility tool, Modernizer, can help with this technique.
We summarize categories of APIs under the Window interface found by our system. One category of APIs provides window-specific information. The main functionality of this category is to provide information about the size and location of the browser window inside a screen. Representatives are Window.screenLeft, Window.innerHeight, and Window.outerHeight. The unit of this category is a pixel that has high precision and is indistinguishable from human eyes.
Another category contains objects that have fingerprinting features as Window’s properties. Performance object from window.performance, Navigator object from window.navigator, and VisualViewport from window.visualViewport. Besides, there are a few APIs we want to give special notes to. Our system catches Window.atob and Window.btoa which are used for Base64 encoding and decoding and seem irrelevant to fingerprinting. However, trackers who adopt browser fingerprinting can use obfuscation and minification to hide their purpose. These two APIs are widely used in obfuscation. If the tracker obfuscates its fingerprinting script, these two APIs are executed close to a chunk of fingerprinting APIs and therefore captured by our system.
We can access the navigator object by calling window.navigator and it contains information about the user agent. It contains ample APIs to assist with fingerprinting. Trackers call these APIs frequently since they usually contain distinguishable information. For example, trackers infer whether a gamepad, VR, or keyboard is connected to the user’s machine with corresponding Navigator APIs. The return values of Navigator.onLine, Navigator.webdriver, and Navigator.userAgentData can be directly used to distinguish online and offline browsers, automated crawlers and humans, and different operating systems. Navigator.webkitPersistentStorage and Navigator.webkitTemporaryStorage provide a way to access the file system or reveal the storage limits of given browsers.
Fingerprinting techniques based on Performance APIs are relatively more complicated. One approach is to measure the time spent on a set of computational heavy tasks. Because different machines have different sets of CPUs and GPUs, the computation speed varies. Trackers can abuse Performance.now and Performance.measure to measure the execution time.
In summary, our system combines VisibleV8 and JStap to discover 249 fingerprinting browser APIs by crawling the Top 10K websites. Our system can be executed to continuously monitor browser APIs abuse in the wild. We are glad to share our system and let the community understand browser fingerprinting better. If you are interested and want more information, please contact the authors.