These domains were identified by crawling the top sites on the internet, and looking for 3rd party resource requests that are on several different sites which set cookies or access browser APIs known to be used in fingerprinting.
-
Domain and entitiy data is automatically generated using the Tracker Radar Detector
-
Domains: Automatically generated domain data files.
-
Entities: Automatically generated data on entities or companies that control the domain.
-
Build data: Static files used to build the domain data files. This includes data on privacy policies, surrogate definitions, categories, and more.
-
Breakage data: Static files defining broken sites or requests that can cause breakage.
- The domain and entity data is automatically generated using Tracker Radar Detector.
- New development and bug fixes, other then broken sites, are handled internally.
The domain data is stored as individual region-specific JSON files. We crawl the most popular domains for each given region. The resulting domain data is stored in region-specific subdirectories within the domains directory:
Region Code | Region |
---|---|
US | United States |
AU | Australia |
CA | Canada |
CH | Switzerland |
DE | Germany |
FR | France |
GB | Great Britain |
NL | Netherlands |
NO | Norway |
Domains are grouped by their domain name and IDNA encoded.
Each domain is defined using the following fields:
domain | The domain of the third party request |
resources | Resources to match for a given domain |
surrogates | Replacement code for specific domain scripts to prevent website breakage |
owner | Entity (usually a company) that control this domain |
source | The organizations of processes which identified the domain |
prevalence | Percentage of top sites that request this domain |
fingerprinting | Likelihood this domain is fingerprinting users |
cookies | Percentage of the top sites that have cookies set by this domain |
performance | Performance impact of loading this domain |
categories | An array of categories describing the purpose of this domain |
types | Request types this domain has used |
cnames | An array of subdomains that include DNS CNAMES redirecting to this domain |
An array of regexes to match against the full URL of significant third-party requests made to this domain. Significant is any resource detected to be using a browser API used for fingerprinting or setting/getting cookies.
rule | A regex to match against the request URL |
cookies | Percentage of the top sites that have cookies set by this third-party domain |
fingerprinting | Likelihood this third-party domain is using browser APIs to uniquely identify users |
apis | A list of browser APIs accessed by this resource which are commonly used in fingerprinting |
prevalence | Percentage of the top sites where this resource was seen |
cnames | A list of DNS CNAMEs that redirected to this resource |
responseHashes | A list of SHA256 response body hashes seen for this resource |
exampleSites | An optional list of examples sites where a resource was found |
An optional array of resources that can cause breakage if blocked.
rule | A regex to match against the request URL |
domains | An optional list of domains that this breaking rule applies to |
types | An optional list of request types that this breaking rule applies to |
An optional array of resources and replacement function names to prevent site breakage.
rule | A regex to match against the request URL |
replaceWith | Name of replacement function to serve in place to avoid site breakage (NOTE: These are not currently included in the Tracker Radar data) |
An array of the organizations or processes which identified the domain. Currently the only value in all domains is DDG, but more may be added in the future.
Entity (a company or organization) that controls each domain. Each entity may have the following fields defined:
name | The name of the entity |
url | The primary website of the entity (if available) |
privacyPolicy | The privacy policy URL for the entity (if available) |
displayName | A shortened entity name without company suffixes |
The domain that should be matched against third-party requests.
The percentage of the top sites that request this third-party domain.
The percentage of the top sites that have cookies set by this third-party domain.
The likelihood this third-party domain is using browser APIs to uniquely identify users.
0 | No use of browser API's |
1 | Some use of browser API’s, but not obviously for tracking purposes |
2 | Use of many browser API’s, possibly for tracking purposes |
3 | Excessive use of browser API’s, almost certainly for tracking purposes |
Performance impact of a domain.
cache | How frequently do resources need to be re-downloaded? This is derived from the cache-control, expires and pragma headers. |
time | How long do requests take to load? The time from when request is made until it is fully downloaded. |
size | How big is it? This is the encoded size (actual transferred size) including headers. |
cpu | How much overhead does executing the code incur? This is derived from CPU time spent on script parsing, compiling and evaluation on the main thread. |
Each of these fields is assigned a value from 1-3, where 1 is little to no performance impact, and 3 is high impact. The delta between values representing an order of magnitude difference.
Categories assigned to each domain, attempting to describe its observed purpose or intent. See categories for the full list.
A list of companies and their properties. One file per company.
Each entity is defined using the following fields:
Owner | The name of another entity that ownes this entity if any |
Properties | A list of domains owned by this entity |
Name | Name of the entity |
Prevalence | The percent of sites this entity is found on |
Data files used to renerate the Tracker Radar
File | Use |
---|---|
categorized_trackers | CSV file with domains and which categories they belong to |
surrogates | Mapping domain resources to their surrogates. |
breaking | Broken site data used for identifying whole sites or requests that cause breakage. |
Generated files created while building Tracker Radar
File | Use |
---|---|
api_fingerprint_scores | An object mapping browser APIs to their likelyhood to be used for fingerprinting. Higher score means that an API is more likely to be used for fingerprinting |
We use three breakage files: two temporary breakage files meant for quick fixes and one long-term breakage file for requests.
Temporary breakage data is meant for quick fixes to address major site breakage. Temporary breakage entries will have a corresponding issue to find a long-term solution.
Name | Use |
---|---|
breaking_sites_temp | Used to indicate that an entire site is breaking |
breaking_requests_temp | Used to identify specific requests or resources that can cause breakage |
Name | Use |
---|---|
breaking_requests_longterm | Used to identify specific requests or resources that can cause breakage |
Name | Use |
---|---|
rule | A regex rule that will match on the request. These should be as specific as possible to fix the issue. |
domains | An optional list of domains that this breaking rule should apply to. |
requestTypes | An optional list of request types that this breaking rule should apply to. |
reason |
|
The site breaking entries are key/value mapping a site domain to a breakage category.
Name | Use |
---|---|
Key | Domain of the site identified to be broken |
Value | breakage category |
Cateories defining the site or request breakage type.
Category | Description |
---|---|
Video | Videos don't load or have issues playing |
Images | Images don't load |
Comments | Missing comment sections |
Content | Site is missing useful content |
Links | Links or site navigation is broken |
Login | Broken site login |
Paywall | Site asks to disable tracker blocking |
Other | Anything that doesn't fit in an existing category |