-
Notifications
You must be signed in to change notification settings - Fork 192
Fix/browsify scraping #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ebif, sending them to the DB api
- Improved Cookie Handling. - Added Better Error Diagnostics in fetchAndValidateJson.
Sorry for the nit-picking, but I'd really like some more concrete evidence and facts rather than "appear more like legitimate browser traffic" and "minimize the likelihood". Please ask affected users in #57 to test your fork/branch and report if the fork fixes the issue for them as the issue depends on how/where the app is deployed and us maintainers don't have the resources to test on all the different VPS providers. |
Agree, I have not tested it thoroughly yet, so more testing might be better. |
|
@Anatol-Beck Tested using an oracle server, IP 141.144.239.XYZ I pulled your fork Tried with Verbindung am Do. 02.10.2025 and got the error message pointing to the issue ...so it doesn't seem to change anything for me unfortunately. The changes you made do also apply to the docker, right? |
|
@Anatol-Beck Your fix did not work for me either unfortunately, I'm still getting the 403 responses. Can you check and comment my MR below please? @jsschmid Can you check if my MR #131 solves this problem on your end and comment there? |
|
Unfortunately this didn't work for me. Already commented on #131 |
|
This PR is stale because it has been open for 30 days with no activity. It will be closed in 14 days if there is no activity. |
|
Thank you for all the testing, I understand the issue better now. I agree that this fix would most probably not work on servers and even for private machines it's not a general solution. I would therefore, take the learning, try to come up with a better solution and close this PR. |
Disclamer
This PR builts on the fork of defekkt who added a cookie fix.
Issue Description
The application was encountering 403 Forbidden errors when attempting to access the Deutsche Bahn API endpoint https://www.bahn.de/web/api/angebote/recon. This was happening specifically in the search functionality when users enter a search link in the UI.
Root Cause
Deutsche Bahn has implemented anti-scraping protection on their APIs that detects and blocks requests that don't appear to come from a legitimate browser. Our implementation was missing proper browser-like headers and had issues with cookie handling.
Changes Made
How It Works
These changes make our application's requests appear more like legitimate browser traffic, which helps bypass Deutsche Bahn's anti-scraping measures. By properly formatting cookies and including all the headers a real browser would send, we minimize the likelihood of being detected as an automated script.