VisibleV8 on Browser Fingerprinting Detection

December 5, 2022 Staff 7-min. read

Alexandros Kapravelos, Junhua Su

Visiting websites with a browser is part of our daily routine today. However, this simple action involves countless components hidden under a beautiful user interface. Taking a deeper look, visiting a website can be seen as a client visiting a web server where the browser takes a central role between the client and the server. From the client side, the browser accepts the user’s input and sends requests to the server that the user wants to visit. From the server side, the browser takes the server’s code and executes the code on the user’s machine to render and display the webpage.

Everything seems simple if everyone behaves appropriately. For security researchers and malicious adversaries, it is never the case. If the server wants to gain advantages over clients by executing code, there must be a tool for the client to understand what code is executed on the client’s machine. More specifically, in the web setting, JavaScript is the dominant programming language and V8 is the JavaScript engine that executes the JavaScript code provided by the server. If we can modify the V8 engine to let it record the execution trace, we have a reliable way to monitor behavior from the client side and even detect some malicious behaviors. Based on this intuition, Jordan Jueckstock and Alexandros Kapravelos proposed a system VisibleV8 by instrumenting the V8 engine to monitor application programming interfaces (APIs) executed when visiting a website. API has different meanings under different settings. In this blog, we focus on APIs implemented by browser vendors to assist web developers and we call them browser APIs the following. The exhaustive list of browser APIs is defined under WebIDL standards. VisibleV8 can capture APIs beyond browser APIs. The precise description, source code, and related papers are covered on its website.

Nowadays, online advertisements grow rapidly due to billions of web users. To gain more profit from advertising, tracking techniques are used for target advertising. Browser fingerprinting is a stateless tracking technique that collects personal information from users. The majority of browser fingerprinting techniques are implemented in browser APIs. Initially, browser APIs were created to help web developers implement advanced features. For example, browser APIs can be used to inform browser type, language, and plugins. In 2010, Eckersley found that different users tend to use a combination of personalized browser languages, plug-ins, and other configurations. These APIs return information that categorizes different groups of people. Combined with information from a list of browser APIs that collect user information, the tracker is able to uniquely identify users without their consent or even notice. Using more APIs that return personally identifiable information about the user helps to fingerprint more users with higher accuracy.

The innovation of the web relies on browser vendors shipping new features constantly. Yet, when new browser APIs are published, we do not have any system monitoring their browser fingerprinting abuse. Seemingly benign API, like the Battery API, which was designed to provide at the JavaScript level the percentage of battery left on the client’s device, was discovered to be actively abused in the wild for fingerprinting purposes. Understanding how a single browser API reveals information about users is fundamental to recognizing trackers. Meanwhile, knowing how many and which browser APIs can potentially help fingerprint is a necessary step to restrict fingerprinting and eventually disable it.

We dive deeper into the browser and study how scripts use its functionality with the help of VisibleV8. We build an automated system that discovers the browser APIs involved in browser fingerprinting. We envision browser vendors incorporating our system to regularly monitor the privacy risks of released browser features in the wild.

As shown in Figure 1, our system starts by feeding the Top 10K domains into a crawler with VisibleV8 implemented. After that, we extract useful data from VisibleV8 logs with a new post-processor. Then, we use a locality algorithm to find APIs executed closely to known fingerprinting APIs. We apply data flow analysis based on JStap to verify the results of the locality algorithm. Finally, we obtain a list of APIs that are suspicious for fingerprinting.

The core algorithm of our system, called the locality algorithm, is motivated by an observation that programmers normally write codes in a file to achieve one goal. Also, browser fingerprinting has a special characteristic that collecting more information helps uniquely identify more people. Additionally, adopting more fingerprinting APIs brings higher accuracy and more coverage without a noticeable cost. Combining them together, we conclude that the tracker tends to use considerable fingerprinting APIs in one script to fingerprint users. When the file is executed, considerable fingerprinting APIs will be executed in sequence. In other words, fingerprinting APIs should cluster together in arrays of executed APIs collected during crawling.

We use Data Flow Graphs (DFGs) to analyze whether a suspicious fingerprinting API has a data flow that leads to a sink. If trackers want to fingerprint users by collecting information, they have to either store the information or send the information to their server. Thus, we call the action of storing and transmitting the sink. For our data flow analysis, we use JStap to generate a Program Dependency Graph with a given JavaScript file.

**Figure 2: System Results on Top 10K websites**

Applying our system to crawl the Top 10K websites, we get the data stated in Figure 2. The second row demonstrates the number of origins in these websites. The number of crawled origins exceeds the number of domains by around three thousand. It is reasonable because there could be multiple origins under the same domain including both first-party origins and third-party ones. Then, it comes to the number of scripts which is 474,659. The total number of API calls is over 0.7 billion. Among them, one API can be called multiple times. By removing duplicate calls, there are over 0.2 million unique APIs executed. Using the WebIDL file, we obtain over 0.2 billion browser API calls from all API calls. Around one-third of API calls are from browser API and we get more than two thousand unique browser API calls.

Our system discovers 249 browser APIs that are abused for browser fingerprinting. We analyze the fingerprinting ability of these discovered browser APIs in the following paragraphs.

The window interface is a global object that corresponds to the browser window including the DOM document. The globality makes it easy to access and suitable for third-party APIs. A strong connection to the browser window provides trackers with an interface closer to users’ personalized information. The Window object itself can be used for fingerprinting by iterating the Window object and then hashing the sum of features available in the Window object. It’s feasible for browser fingerprinting because different browser types and versions equip different features in the Window object.

During iteration, each available feature will be subsequently called or accessed and the pattern is what we expected to catch. This technique can be applied to objects that have a large number of features like Navigator, WebGL, and Canvas. Theoretically, every object with its own properties and functions can become a target under this technique. The existing well-known compatibility tool, Modernizer, can help with this technique.

We summarize categories of APIs under the Window interface found by our system. One category of APIs provides window-specific information. The main functionality of this category is to provide information about the size and location of the browser window inside a screen. Representatives are Window.screenLeft, Window.innerHeight, and Window.outerHeight. The unit of this category is a pixel that has high precision and is indistinguishable from human eyes.

Another category contains objects that have fingerprinting features as Window’s properties. Performance object from window.performance, Navigator object from window.navigator, and VisualViewport from window.visualViewport. Besides, there are a few APIs we want to give special notes to. Our system catches Window.atob and Window.btoa which are used for Base64 encoding and decoding and seem irrelevant to fingerprinting. However, trackers who adopt browser fingerprinting can use obfuscation and minification to hide their purpose. These two APIs are widely used in obfuscation. If the tracker obfuscates its fingerprinting script, these two APIs are executed close to a chunk of fingerprinting APIs and therefore captured by our system.

We can access the navigator object by calling window.navigator and it contains information about the user agent. It contains ample APIs to assist with fingerprinting. Trackers call these APIs frequently since they usually contain distinguishable information. For example, trackers infer whether a gamepad, VR, or keyboard is connected to the user’s machine with corresponding Navigator APIs. The return values of Navigator.onLine, Navigator.webdriver, and Navigator.userAgentData can be directly used to distinguish online and offline browsers, automated crawlers and humans, and different operating systems. Navigator.webkitPersistentStorage and Navigator.webkitTemporaryStorage provide a way to access the file system or reveal the storage limits of given browsers.

Fingerprinting techniques based on Performance APIs are relatively more complicated. One approach is to measure the time spent on a set of computational heavy tasks. Because different machines have different sets of CPUs and GPUs, the computation speed varies. Trackers can abuse Performance.now and Performance.measure to measure the execution time.

Performance interface also provides a pointer to timing information of browser navigation by calling performance.timing. Performance.timing retains timings for DOM loading, domain lookup, load event, and navigation. It is unexpected that the Performance.memory reveals information about JavaScript heap memory. This information is related to the user’s machine or browser state. Therefore, trackers can take advantage of it.

In summary, our system combines VisibleV8 and JStap to discover 249 fingerprinting browser APIs by crawling the Top 10K websites. Our system can be executed to continuously monitor browser APIs abuse in the wild. We are glad to share our system and let the community understand browser fingerprinting better. If you are interested and want more information, please contact the authors.

Tags:
- Content Triage

VisibleV8 on Browser Fingerprinting Detection

More From Laboratory for Analytic Sciences (LAS)

Operator Comment Taxonomy Developed by NC State, UNC Added to National Security Policy

Laboratory for Analytic Sciences Releases 2024 Impact Report

SCADS Technical Report Highlights Innovative Prototypes for Intelligence Analysts

Find websites, locations and people

Resources

Locations

People