Skip to content

[Bug]: Status code for redirect URLs is not correct #660

Open
@Dev4011

Description

crawl4ai version

0.4.248

Expected Behavior

For URLs that are redirected, the status code must come in the 300 series.

Current Behavior

Hi @unclecode ,
Firstly, I really appreciate the amazing tool that you and the entire team have built.

While crawling, I discovered that while status code works perfectly for 200 and 404 URLs, it does not give the 300 series - redirect code. Instead, it returns 200 even for those URLs that have been redirected.

Is this reproducible?

Yes

Inputs Causing the Bug

URL: http://testfire.net/doLogin

Steps to Reproduce

1. Run the below code
2. Find the status_code and redirected url printed

Code snippets

import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
import nest_asyncio
nest_asyncio.apply()

async def main():
    async with AsyncWebCrawler(
        headless=True,
        verbose=True,
    ) as crawler:
        url="http://testfire.net/doLogin"
        result = await crawler.arun(url, cache_mode=CacheMode.BYPASS)

    print(f"Original URL: {url}")
    print(f"Status code: {result.status_code}")
    print(f"Redirected URL: {result.redirected_url}")

loop = asyncio.get_event_loop()

loop.run_until_complete(main())

OS

Google Colab

Python version

3.11.11

Browser

Chrome

Browser version

No response

Error logs & Screenshots (if applicable)

The browser network panel showing that the link has been redirected:
Image

The code block showing incorrect status_code:
Image

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions