Skip to content

Fix non-deterministic empty flight fields via aria-label fallback#98

Open
LachyGroom wants to merge 1 commit intoAWeirdDev:v2from
LachyGroom:fix/aria-label-fallback
Open

Fix non-deterministic empty flight fields via aria-label fallback#98
LachyGroom wants to merge 1 commit intoAWeirdDev:v2from
LachyGroom:fix/aria-label-fallback

Conversation

@LachyGroom
Copy link

@LachyGroom LachyGroom commented Feb 15, 2026

Summary

  • The HTML parser's CSS selectors (e.g. tPgKwe, mv1WYe, Ak5kof, BbR8Ec) intermittently fail because Google obfuscates class names differently depending on the browser/TLS fingerprint. Since primp's chrome_126 impersonation silently falls back to random, ~25% of requests return HTML with different class names — airline name, times, duration, and stops all come back empty while price (.YMlIz.FpEdX) still works.
  • Adds a fallback that parses the aria-label attribute on each flight <li> when any CSS selector returns empty data. The aria-label always contains structured text like "Nonstop flight with Alaska. Leaves ... at 2:25 PM on Sunday, February 15 ..." regardless of which class names Google serves.
  • Also handles U+202F (narrow no-break space) that Google uses between time digits and AM/PM in some responses (e.g. 2:25\u202fPM).

Reproduction

from fast_flights import FlightData, Passengers, get_flights

# Run this ~10 times — without the fix, ~25% of attempts return empty fields
result = get_flights(
    flight_data=[FlightData(date="2026-02-15", from_airport="SJC", to_airport="KOA")],
    trip="one-way", seat="business",
    passengers=Passengers(adults=2),
)
f = result.flights[0]
print(f.name, f.departure, f.arrival, f.duration, f.stops, f.price)
# Before fix: ''  ''  ''  ''  'Unknown'  '$2359'  (intermittent)
# After fix:  'Alaska'  '2:25 PM on Sun, Feb 15'  '6:13 PM on Sun, Feb 15'  '5 hr 48 min'  0  '$2359'  (always)

Related issues

Test plan

  • 15/15 live requests pass with fix applied (vs 4 failures in 15 without)
  • Verified against captured failing HTML — all 41 flights parse correctly
  • Reviewed by OpenAI Codex — all flagged issues addressed
  • Existing CSS-based parsing is unchanged; fallback only activates when fields are empty

Summary by CodeRabbit

  • Bug Fixes
    • Improved flight data extraction reliability with a fallback parsing mechanism that activates when standard methods encounter missing or incomplete fields. Enhanced date formatting with abbreviated month and weekday names for better readability. These changes ensure consistent, accurate flight information retrieval across various website formats.

The HTML parser relies on specific CSS class names (e.g. tPgKwe,
mv1WYe, Ak5kof, BbR8Ec) to extract flight details. However, Google
obfuscates these class names differently depending on the browser/TLS
fingerprint. Since primp's chrome_126 impersonation silently falls back
to a random fingerprint, ~25% of requests receive HTML with different
class names, causing airline name, departure/arrival times, duration,
and stops to all come back empty while price still works.

This adds a fallback that parses the aria-label attribute on each
flight item when any CSS selector returns empty data. The aria-label
always contains structured text regardless of fingerprint, e.g.:

  "From 2359 US dollars. Nonstop flight with Alaska. Leaves San Jose
   Mineta International Airport at 2:25 PM on Sunday, February 15 ..."

Also handles U+202F (narrow no-break space) that Google uses between
time digits and AM/PM in aria-labels.

Relates to AWeirdDev#7 (same class of bug for price CSS selector) and AWeirdDev#63
(duplicate flights from multiple container elements).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 15, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 15, 2026

📝 Walkthrough

Walkthrough

A new aria-label fallback mechanism is added to parse flight data when CSS selectors fail to capture essential fields. The implementation includes helper functions to extract flight details from structured aria-label attributes and format dates by abbreviating weekday and month names. The fallback activates conditionally only when primary CSS-based extraction yields missing or Unknown values.

Changes

Cohort / File(s) Summary
Aria-label Fallback Parsing
fast_flights/core.py
Added _parse_aria_label() function to extract flight name, departure, arrival, duration, and stops from HTML aria-label attributes. Introduced _shorten_date() helper with date abbreviation mappings (_DAY_ABBREVS, _MONTH_ABBREVS) for human-friendly formatting. Integrated fallback logic into main parsing flow to populate missing fields when CSS selectors miss data.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A fallback so clever, when CSS fades away,
Aria-labels whisper their secrets to say,
With abbreviated dates, like Sun and like Feb,
Flight data persists through the parsing web!

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding an aria-label fallback to fix empty flight fields caused by CSS selector failures.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
fast_flights/core.py (1)

356-374: Solid fallback integration with appropriate guards.

One minor consideration: the "flight" in aria check on lines 359 and 363 is case-sensitive. If Google ever capitalizes it (e.g., "Flight with…"), the fallback would silently skip. A lowercase comparison would be more defensive:

Optional: case-insensitive check
-                if not aria or "flight" not in aria:
-                    aria_el = item.css_first("[aria-label*='flight']")
+                aria_lower = aria.lower()
+                if not aria or "flight" not in aria_lower:
+                    aria_el = item.css_first("[aria-label*='flight'], [aria-label*='Flight']")
                     if aria_el:
                         aria = aria_el.attributes.get("aria-label", "") or ""
-                if aria and "flight" in aria:
+                if aria and "flight" in aria.lower():

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot bot added the bug Something isn't working label Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant