This repository contains the code for the paper titled Unveiling the Impact of User-Agent Reduction and Client Hints: A Measurement Study (to be presented at WPES'23.
Background: Browsers including Chrome recently reduced the user-agent string to make it less identifying. Simultaneously, Chrome introduced several highly
identifying (or high-entropy) the user-agent client hints (UA-CH) to allow access to browser properties that are redacted from the user-agent string. In this empirical study, we attempt to characterize the effects of these major changes through a large-scale web measurement on the top 100K websites. Using an instrumented crawler, we quantify access to high-entropy browser features through UA-CH HTTP headers and the JavaScript API (mainly the navigator.userAgentData.getHighEntropyValues
method). We measure access
delegation to third parties and investigate whether the new client hints are already used by tracking, advertising and browser fingerprinting scripts.
Project website: For a more detailed overview please visit the project's website.
We extended DuckDuckGo’s Tracker Radar Collector to record HTTP headers, JavaScript API calls and HTML elements that can be used to access, opt-in or delegate User-Agent Client Hints. Our main modifications can be found in the following files:
To start a crawl, 1) clone this repo, 2) install the required npm packages (npm i
) and 3) run the following command:
npm run crawl -- -u 'https://www.example.com' -o ./data/ -v -f -d "fingerprints,requests,cookies,screenshots,ch_delegation" --reporters 'cli,file' -l ./data/
Please check the upstream Tracker Radar Collector repository for other command line options.
The data from the crawl (performed in June'23) will be made available soon. For each visited website the crawler produces the following files:
- homepage screenshot
- homepage HTML source
- a JSON file that contains HTTP request and response details, cookies, JavaScript API calls, details of User-Agent Client Hint delegation or opt-in via HTML
The auxiliary data we use in the analysis includes the following:
100k_nyc_all_reqs.csv
: Request and response details extracted from the crawl JSONs.100k_nyc_delegation_df.csv
: Information about websites where User-Agent Client Hints are delegated via HTML, obtained from the crawl JSON files.100k_nyc_leaky_reqs_with_hashes.csv
: Request and response details where high-entropy hints are exfiltrated to a remote servers, created by using 100k_nyc_all_reqs.csv andleak-detector
code published in this repo. This leak detection methodology is based on the approach presented by Englehardt et al.'s work.site_rank.txt
: The ranking details associated with each visited website.tracker_category.json
: The categorization of domains (exfiltrating or accesing the User-Agent Client Hints) is established through the usage of DuckDuckGo's Tracker Radar dataset. Within the specified folders and their corresponding subfolders, all JSON files have been processed to extract information about their categories.tracker_owner.json
: The information about the owner of the tracker domains (exfiltrating or accesing the User-Agent Client Hints) is sourced from the data contained within the provided DuckDuckGo's Tracker Radar dataset. All JSON files have been parsed, and the displayName information has been extracted.100k_nyc_api_calls.csv
: JavaScript calls and property accesses related to User-Agent Client Hints, including function arguments and return values, extracted from the crawl JSON files.100k_nyc_fp_attempts.csv
: Detailed information about fingerprinting attempts, based on applying heuristics developed by Iqbal et al to the crawl JSON files.category_domains.json
: The category of the domains (exfiltrating or accesing the User-Agent Client Hints) determined by using DuckDuckGo's Tracker Radar repository.succeeded_hostnames.txt
: The list of URLs we succesfully visited during the crawl.
The Jupyter notebooks used for the analyses can be found at https://github.com/ua-reduction/ua-client-hints-crawler/tree/main/analysis.
@inproceedings{senol2023unveiling,
title={Unveiling the Impact of User-Agent Reduction and Client Hints: A Measurement Study},
author={Senol, Asuman and Acar, Gunes},
booktitle={Proceedings of the 22nd Workshop on Privacy in the Electronic Society},
pages={91--106},
year={2023}
}