Open
Description
crawl4ai version
0.4.248b3
Expected Behavior
Correct rendering of links containing inline code. For example:
<a href="https://docs.spring.io/spring-framework/docs/6.2.x/javadoc-api/org/springframework/context/annotation/Configuration.html" class="apiref"><code>@Configuration</code></a>
should be rendered as
[`@Configuration`](https://docs.spring.io/spring-framework/docs/6.2.x/javadoc-api/org/springframework/context/annotation/Configuration.html)
Current Behavior
Currently, the rendering of links with inline code outputs inline code first, followed by correct but empty links like in
`@Configuration`[](https://docs.spring.io/spring-framework/docs/6.2.x/javadoc-api/org/springframework/context/annotation/Configuration.html)
Is this reproducible?
Yes
Inputs Causing the Bug
- URL: https://docs.spring.io/spring-boot/how-to/security.html
- css _selector: "article.doc > *:not(.breadcrumbs-container):not(aside):not(nav)"
- excluded_selector: ".source-toolbox, .ulist.tablist, .tab:not(.is-selected), .tabpanel.is-hidden"
Steps to Reproduce
Code snippets
crawler_run_config = CrawlerRunConfig(
scraping_strategy=CustomWebScrapingStrategy(),
css_selector=css_selector,
excluded_selector=excluded_css_selector or "",
exclude_external_links=True,
exclude_external_images=True,
markdown_generator=DefaultMarkdownGenerator(
options={
"skip_internal_links": True,
"single_line_break": False,
"protect_links": False,
"pad_tables": True
}
),
process_iframes=False,
magic=True,
cache_mode=CacheMode.BYPASS,
verbose=True,
)
crawl_result = await crawler.arun(
url=url,
config=crawler_run_config,
)
OS
macOS
Python version
3.12.7
Browser
Chrome
Browser version
Version 132.0.6834.160 (Official Build) (arm64)
Error logs & Screenshots (if applicable)
No response
Metadata
Assignees
Labels
Projects
Status
To Assign
Activity
dmurat commentedon Jan 29, 2025
Here is the outline of the fix I'm currently using, and it works ok as far as I can see:
HTH
aravindkarnam commentedon Jan 31, 2025
@dmurat Thanks for point this out and for your suggestion. Looks like you already fixed it. Could you raise a PR for this?
dmurat commentedon Jan 31, 2025
@aravindkarnam Sure, I can try. One question though, are there any existing tests where I can look for examples?
aravindkarnam commentedon Jan 31, 2025
@dmurat There are several examples in
/tests
folder.aravindkarnam commentedon Feb 10, 2025
@dmurat Were you able to make any progress on this?
dmurat commentedon Feb 10, 2025
@aravindkarnam Sry, didn't find time. Maybe during this or next week if you can wait.
7 remaining items