Skip to content

Conversation

@aud
Copy link

@aud aud commented Sep 25, 2025

This adds a simple strikethrough LLM processor that is capable of detecting
strikethrough text.

Fyi: I wrote tests, but was getting errors related to pulling the
datalab-to/pdfs dataset when running the LLM processor suite locally.

Here's a test script with a sample PDF:
strikethrough.pdf.

Rendered PDF: https://gist.github.com/aud/2af33313f945a397b28e7ca728a85d8b

from marker.converters.pdf import PdfConverter
from marker.models import create_model_dict
from marker.config.parser import ConfigParser

config = {
    "output_format": "markdown",
    "use_llm": True,
}

config_parser = ConfigParser(config)

converter = PdfConverter(
    config=config_parser.generate_config_dict(),
    artifact_dict=create_model_dict(),
    processor_list=config_parser.get_processors(),
    renderer=config_parser.get_renderer(),
    llm_service=config_parser.get_llm_service()
)

rendered = converter("strikethrough.pdf")
print(rendered.markdown)

This adds a simple strikethrough LLM processor that is capable of
detecting strikethrough text in PDFs.
@github-actions
Copy link
Contributor

github-actions bot commented Sep 25, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@aud
Copy link
Author

aud commented Sep 25, 2025

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Sep 25, 2025
@tarun-menta
Copy link
Contributor

This looks great, thank you for the PR! Will work on integrating into our next release :)

@tarun-menta tarun-menta self-requested a review September 29, 2025 14:51
@yilinjuang
Copy link

yilinjuang commented Nov 27, 2025

Hi! Just checking in on the status of this PR. Any updates on review or when it might make it into a release? This feature would be really helpful for revision/review-type documents. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants