Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Markdown Incorrect Spacing #599 #658

Conversation

tautikAg
Copy link

@tautikAg tautikAg commented Feb 11, 2025

Summary

This PR fixes the spacing issues in description lists (dl/dt/dd tags) while preserving anchor link handling and list functionality. The changes improve readability of scraped content by:

  • Adding proper paragraph breaks between term-definition pairs
  • Maintaining consistent indentation for definitions
  • Using paragraph state counter for spacing control

Fixes #599

List of files changed and why

  • /crawl4ai/crawl4ai/html2text/__init__.py
    • Updated description list tag handling to improve spacing and readability
    • Added clarifying comments
    • Preserved existing anchor and list functionality

How Has This Been Tested?

  • Tested with sample Blender documentation page
  • Verified proper spacing between term-definition pairs
  • Confirmed anchor links still work correctly
  • Checked list formatting remains intact

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@tautikAg tautikAg changed the title crawl Fix Markdown Incorrect Spacing #599 Feb 11, 2025
@tautikAg
Copy link
Author

@aravindkarnam

@aravindkarnam aravindkarnam self-assigned this Feb 11, 2025
@aravindkarnam aravindkarnam changed the base branch from main to 2025-feb-alpha-1 February 11, 2025 13:31
@aravindkarnam aravindkarnam merged commit 850da3d into unclecode:2025-feb-alpha-1 Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Markdown output has incorect spacing.
3 participants