Skip to content

Conversation

@alexanderroidl
Copy link
Contributor

@alexanderroidl alexanderroidl commented Sep 30, 2025

This MR aims to fix the 403 errors which were causing HTTP responses with status 500 on our end (as described in #57).

The Bahn-API was responding 403 errors as it deemed our requests forbidden. This MR solves this consistently as I was able to verify over several days of testing.

Therefore instead of doing plain HTTP requests via the fetch API this MR aims to replace it by:

  • Using a headless browser which has extra stealth capabilities and sets a random user agent/viewport.
  • ...which opens a page for journey information at https://www.bahn.de/buchung/start?vbid=<vbid>
  • ...which always triggers HTTP calls on the client-side (our browser) to the Bahn-API at /web/api/angebote/recon and /web/api/angebote/verbindung/<vbid>. This is not done by us but expected behavior of the page itself, as it would happen for an actual user.
  • These calls are intercepted, their responses extracted and evaluated (as before, but without using fetch).

Side-effects of this approach:

  • We no longer need to handle any response cookies as our browser automatically saves and re-uses them.
  • Also we no longer need to reverse-engineer the Recon logic as the actual implementation is used by our browser via the page's client-side JS.

Preview:

 GET /discount?url=https%3A%2F%2Fwww.bahn.de%2Fbuchung%2Fstart%3Fvbid%3D230ffc6f-f4e3-40ef-b079-50f08a1acce9&bahnCard=50&hasDeutschlandTicket=true&passengerAge=29&travelClass=2 200 in 501ms
🌐 Browser was setup with user agent "Mozilla/5.0 (iPhone; CPU iPhone OS 18_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.0 Mobile/15E148 Safari/604.1", platform "iPhone" and viewport 402x660.
🌐 Browser is visiting "https://www.bahn.de/buchung/start?vbid=230ffc6f-f4e3-40ef-b079-50f08a1acce9".
🌐 Browser has intercepted Verbindung response.
🌐 Browser has intercepted Recon response.
🌐 Browser was closed.
From: Berlin Hbf (8098160) | To: Passau Hbf (8000298) | Date: 2025-10-01 | Time: 03:45 | Class: Second
 POST /api/parse-url 200 in 6385ms

🔄 API Counter reset to 0
(...)

@@ -1,5 +1,6 @@
/// <reference types="next" />
/// <reference types="next/image-types/global" />
/// <reference path="./.next/types/routes.d.ts" />
Copy link
Contributor Author

@alexanderroidl alexanderroidl Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why Next.js automatically added this. 🤷

Somebody please double-check this for correctness.

}
ignoreBuildErrors: true, // temporarily, since some type errors still exists and are ambiguous
},
serverExternalPackages: ["puppeteer-extra", "puppeteer-extra-plugin-stealth"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those must be stated as external to avoid bundling.

export type ValidatedVendoJourney = z.infer<typeof validatedVendoJourneySchema>;

export const vbidSchema = z.object({
export const VerbindungResponseSchema = z.object({
Copy link
Contributor Author

@alexanderroidl alexanderroidl Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema has nothing to do with the Bahn's VBIDs.

It is about the HTTP response from their API at /web/api/angebote/verbindung/<vbid>.

The VBID is just the sole parameter of the route, but the schema should be named after the route itself acting as a descriptor.

export type VbidSchema = z.infer<typeof vbidSchema>;
export type VerbindungResponse = z.infer<typeof VerbindungResponseSchema>;

const reconLegSchema = z.object({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were over at parseHinfahrtRecon.ts which got deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need to reverse-engineer the Recon logic as we use the actual implementation by the Bahn itself via our browser-based approach, which executes client-side JS as a normal user would.

@alexanderroidl alexanderroidl marked this pull request as ready for review September 30, 2025 22:39
@FunctionDJ
Copy link
Collaborator

I don't know how I feel with going back to a headless browser / puppeteer when it was a small milestone to get rid of it, making setup and run requirements much leaner. I doubt that a headless browser will circumvent scrape protection from cloud provider IPs.

I'm expecting the same type of proof of this actually fixing #57 as detailed here #98 (comment)

I'll only consider this PR if affected users confirm that this branch/fork solves their issue.

@FunctionDJ FunctionDJ added the proof pending Effect of PR can't be verified by maintainers and requires testimonial by affected users label Sep 30, 2025
@alexanderroidl
Copy link
Contributor Author

I don't know how I feel with going back to a headless browser / puppeteer when it was a small milestone to get rid of it, making setup and run requirements much leaner. I doubt that a headless browser will circumvent scrape protection from cloud provider IPs.

I'm expecting the same type of proof of this actually fixing #57 as detailed here #98 (comment)

I'll only consider this PR if affected users confirm that this branch/fork solves their issue.

Generally I agree such extras should be avoided wherever possible. In our case though, where we're avoiding bot detection, we most likely will need to make our program mimic an actual user to avoid getting caught. I've even received error pages from the website itself until I added random user agents and viewports. But let's see...

@FunctionDJ
Copy link
Collaborator

PS: Please ask users to leave their testimonial on this PR instead of #57 or #98.

@FunctionDJ
Copy link
Collaborator

I've even received error pages from the website itself until I added random user agents and viewports.

I understand, but we should try our best to avoid adding a headless browser (again). In the end, almost everything a browser does - from the perspective of backends - can be emulated without running one. And not having a headless browser makes the application more lightweight, easier to deploy, and faster.

@alexanderroidl alexanderroidl force-pushed the fix/bahn-api-403-errors branch from ecb1634 to ee13b1d Compare October 1, 2025 09:57
@hackmybeer
Copy link

Works for me!

@jsschmid
Copy link

jsschmid commented Oct 1, 2025

@alexanderroidl
I tried it by checking out your 131

after sudo docker image rm d40e5aa07805

by running the docker image. Are your changes incorporated or do I need to rebuild the image?

Anyway...still got the "Fehler: Server error (500). Diese Problem ist uns bekannt und wir arbeiten daran, es zu beheben. Ein Status über den Fehler finden Sie unter #57"

@alexanderroidl
Copy link
Contributor Author

alexanderroidl commented Oct 1, 2025

@alexanderroidl I tried it by checking out your 131

after sudo docker image rm d40e5aa07805

by running the docker image. Are your changes incorporated or do I need to rebuild the image?

Anyway...still got the "Fehler: Server error (500). Diese Problem ist uns bekannt und wir arbeiten daran, es zu beheben. Ein Status über den Fehler finden Sie unter #57"

Dang, I forgot to adjust the Docker setup. Right now it's failing because it's missing Chromium.

Can you try it without Docker until I take care of it in the next couple of days, so we can see whether it generally works?

@CactiChameleon9
Copy link

This PR fixed it for me. I am running on a residential IP but I seemed to be blocked anyway... Although, I would prefer things to work without a whole chromium instance tbh

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity. It will be closed in 14 days if there is no activity.

@github-actions github-actions bot added the Stale label Nov 13, 2025
@laura-3
Copy link

laura-3 commented Nov 27, 2025

could this be merged soon?

@github-actions github-actions bot removed the Stale label Nov 28, 2025
@FunctionDJ
Copy link
Collaborator

could this be merged soon?

I won't merge it for the time being because only 2 users report that this works and 1 reports an error.

This PR massively changes the build and deployment requirements for Betterbahn and imo we need more data (i.e. more reports by affected users) to justify this change.

To give a concrete number, I'd say if we get 10 reports by different users and it fixes #57 for at least 5 of them, this could be a worthwhile change. Otherwise if the interest is too low or those affected don't want to test this PR, I won't be able to help because I'm not affected since I don't have cloud infrastructure to run Betterbahn on.

@koehdaniel
Copy link

koehdaniel commented Dec 21, 2025

Tried it locally on a residental IP and doesn‘t work.
I cloned alexanderroidl:fix/bahn-api-403-errors and ran the Docker-build.
Throws an 500 error and refers to #57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proof pending Effect of PR can't be verified by maintainers and requires testimonial by affected users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants