Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hard code live remote setting server location #127

Merged
merged 1 commit into from
Sep 5, 2024

Conversation

eakubilo
Copy link
Member

@eakubilo eakubilo commented Sep 5, 2024

Fixes #122.

https://firefox-source-docs.mozilla.org/python/marionette_driver.html
I did two things:

  1. I added https://github.com/mozilla/geckodriver to PATH.
  2. I hard coded the "services.settings.server" preference (in about:config) to https://firefox.settings.services.mozilla.com/v1".

to explain

  1. I didn't see in the installation directions but did anyway to suppress the annoying warning that happens when you run the crawler.
  2. Something is passing a development flag down to the web driver, which causes the "services.setting.server" key to have value "data:,#remote-settings-dummy/v1" in about:config. I do not know what causes the dummy url to be the default value of the key, all I presume is overriding it fixes the issue.

With enough investigation we can stumble upon this page which says:
Screen Shot 2024-09-05 at 5 28 13 AM

If you open the browser developer tools in a normal nightly browser, and look at that exact portion of the indexedDB, you will notice that this is where all our anti-tracking lists are stored.
Screen Shot 2024-09-05 at 5 21 14 AM
To motivate things more, this file is empty with in broken controlled browser.
Screen Shot 2024-09-05 at 5 34 32 AM
If we look elsewhere on the same page, we learn this:
Screen Shot 2024-09-05 at 4 53 34 AM

If you look at the console of the normal nightly, you'll see that it's very clean:
Screen Shot 2024-09-05 at 5 40 15 AM
But our controlled nightly instance has errors, conveniently for us in the RemoteSettingsClient:
Screen Shot 2024-09-05 at 5 42 06 AM
The error is especially useful to clue us in on what has happened. The local database is empty, so the client attempts to fetch settings from a sever, but fails since "data:,#remote-settings-dummy/v1" is not a real server. So, when we explicitly assign "services.settings.server" as "https://firefox.settings.services.mozilla.com/v1" in our preferences, we ensure that the local database gets synced with a real target.

With the change, our console looks nice again:
Screen Shot 2024-09-05 at 5 51 25 AM

Moreover, the local database is flush, which is what we really care about:
Screen Shot 2024-09-05 at 5 51 55 AM

Analysis file:
analysisfile.txt
screenshot of browser:
Screen Shot 2024-09-05 at 4 39 19 AM

@Mattm27
Copy link
Member

Mattm27 commented Sep 5, 2024

Great work @eakubilo! The urlClassification object looks to be operating properly in the crawl on my end with the updated settings serving location in local-crawler.js. We appreciate the hard work on this, we would love to hear more about your thought process and how you went about diagnosing and eventually solving this problem.

@Mattm27 Mattm27 merged commit 75cac81 into privacy-tech-lab:main Sep 5, 2024
@SebastianZimmeck
Copy link
Member

Indeed, well done, @eakubilo!

If you can write up a brief summary, as @Mattm27 mentioned, we would all learn something. You can do so in the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix empty urlClassification
3 participants