Manually create a profile.tar.gz #731

djhmateer · 2024-12-09T13:55:22Z

Is it possible to manually create a profile.tar.gz as in

docker run -p 6080:6080 -p 9223:9223 -v $PWD/crawls/profiles:/crawls/profiles/ -it webrecorder/browsertrix-crawler create-login-profile --url "https://facebook.com/"

I started looking in here:

browsertrix-crawler/src/create-login-profile.ts

Line 1 in fb8ed18

#!/usr/bin/env node

C:\Users\djhma\AppData\Local\Google\Chrome\User Data - I tried tar.gz'ing this directory but it didn't seem to work.

I've posted here too https://forum.webrecorder.net/t/manually-create-and-use-a-profile-tar-gz/702

Facebook is not happy with the docker profile.tar.gz creation process.

The text was updated successfully, but these errors were encountered:

tw4l · 2024-12-09T16:44:51Z

Hi @djhmateer, part of the issue may be that Browsertrix Crawler uses Brave Browser, which has a similar browser profile data structure to Chrome in that they are both Chromium-based but I would guess diverge at some points. I'm also not sure if the user profiles differ by operating system - the current browsertrix-browser-base Dockerfile is based on Ubuntu 24.

My guess would be that a manually saved and tarred/gzipped user data directory from a Brave browser installation would work but I haven't tested this myself, not sure if it'd have to be from the same OS as well.

djhmateer · 2024-12-09T19:23:00Z

Hi @tw4l - thank you so much for the reply. Will test and report back.

djhmateer · 2024-12-11T10:17:50Z

This strategy has worked well thank you @tw4l

Essentially I ran Release Channel Brave on my WSL2 (Ubuntu 22) instance using instructions from https://brave.com/linux/

Then did something like:

brave-browser
# now login to whatever site eg https://www.osr4rightstools.org
cd ~/.config/BraveSoftware/Brave-Browser
tar -czvf profile.tar.gz *

mv profile.tar.gz ~/auto-archiver/tmp/.

cd ~/auto-archiver/tmp
chmod 777 profile.tar.gz

# test
docker run --rm -v /home/dave/auto-archiver/tmp:/crawls/ webrecorder/browsertrix-crawler crawl --url https://www.osr4rightstools.org --scopeType page --generateWACZ --text --screenshot fullPage --collection 2 --id 2 --saveState never --behaviors autoscroll,autoplay,autofetch,siteSpecific --behaviorTimeout 200 --timeout 200 --profile /crawls/profile.tar.gz

# un tar and gz the wacz
# look for archive/screenshot .warc

# use replayweb.page to see if the screenshot is correct (easy to see if the site is logged in)

github-project-automation bot added this to Webrecorder Projects Dec 9, 2024

github-project-automation bot moved this to Triage in Webrecorder Projects Dec 9, 2024

djhmateer closed this as completed Dec 11, 2024

github-project-automation bot moved this from Triage to Done! in Webrecorder Projects Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually create a profile.tar.gz #731

Manually create a profile.tar.gz #731

djhmateer commented Dec 9, 2024 •

edited

Loading

tw4l commented Dec 9, 2024

djhmateer commented Dec 9, 2024

djhmateer commented Dec 11, 2024

Manually create a profile.tar.gz #731

Manually create a profile.tar.gz #731

Comments

djhmateer commented Dec 9, 2024 • edited Loading

tw4l commented Dec 9, 2024

djhmateer commented Dec 9, 2024

djhmateer commented Dec 11, 2024

djhmateer commented Dec 9, 2024 •

edited

Loading