Skip to content

[Bug]:似乎无法处理md文件,转换成pdf过程中出错 #119

@jumpfox3049

Description

@jumpfox3049

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

这个项目很好,特别是引导式学习很赞,期待持续完善。

我测试中添加了3个KB,有pdf也有md文档,看到docker容器输出如下日志:

INFO: Initializing LightRAG with parameters: {'working_dir': '/app/data/knowledge_bases/DeepTutor_Quickstart/rag_storage'}

DEBUG: Process 8 Shared-Data already initialized (multiprocess=False)

INFO: [] Created new empty graph file: /app/data/knowledge_bases/DeepTutor_Quickstart/rag_storage/graph_chunk_entity_relation.graphml

DEBUG: Process 8 storage namespace already initialized: [full_docs]

DEBUG: Process 8 storage namespace already initialized: [text_chunks]

DEBUG: Process 8 storage namespace already initialized: [full_entities]

DEBUG: Process 8 storage namespace already initialized: [full_relations]

DEBUG: Process 8 storage namespace already initialized: [entity_chunks]

DEBUG: Process 8 storage namespace already initialized: [relation_chunks]

DEBUG: Process 8 storage namespace already initialized: [llm_response_cache]

DEBUG: Process 8 storage namespace already initialized: [doc_status]

DEBUG: Process 8 storage namespace already initialized: [parse_cache]

INFO: Multimodal processors initialized with context support

INFO: Available processors: ['image', 'table', 'equation', 'generic']

INFO: Context configuration: ContextConfig(context_window=1, context_mode='page', max_context_tokens=2000, include_headers=True, include_captions=True, filter_content_types=['text'])

INFO: LightRAG, parse cache, and multimodal processors initialized

INFO: Starting complete document processing: /app/data/knowledge_bases/DeepTutor_Quickstart/raw/README.md

INFO: Starting document parsing: /app/data/knowledge_bases/DeepTutor_Quickstart/raw/README.md

INFO: Using mineru parser with method: auto

INFO: Using generic parser for .md file (method=auto)...

[KnowledgeInit] ○

Processing: README.md

[KnowledgeInit] → [kb_init_20260113_094802_ad6aa4d1] Processing: README.md (1/2, 50%) - File: README.md

[KnowledgeInit] ✗ ✗ Error processing README.md: Failed to convert text file README.md to PDF: paraparser: syntax error: parse ended with 1 unclosed tags

para

[KnowledgeInit] ✗ Traceback (most recent call last):

File "/usr/local/lib/python3.11/site-packages/raganything/parser.py", line 371, in convert_text_to_pdf

story.append(Paragraph(line, normal_style))

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/reportlab/platypus/paragraph.py", line 1859, in init

self._setup(text, style, bulletText or getattr(style,'bulletText',None), frags, cleanBlockQuotedText)

File "/usr/local/lib/python3.11/site-packages/reportlab/platypus/paragraph.py", line 1880, in _setup

style, frags, bulletTextFrags = _parser.parse(text,style)

                                ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/reportlab/platypus/paraparser.py", line 3221, in parse

return self._complete_parse()

       ^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/reportlab/platypus/paraparser.py", line 3155, in _complete_parse

self._syntax_error('parse ended with %d unclosed tags\n %s' % (len(self._stack),'\n '.join((x.__tag__ for x in reversed(self._stack)))))

File "/usr/local/lib/python3.11/site-packages/reportlab/platypus/paraparser.py", line 2780, in _syntax_error

raise ValueError('paraparser: syntax error: %s' % message)

ValueError: paraparser: syntax error: parse ended with 1 unclosed tags

para

During handling of the above exception, another exception occurred:

return self.parse_text_file(file_path, output_dir, lang, **kwargs)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/raganything/parser.py", line 1128, in parse_text_file

pdf_path = self.convert_text_to_pdf(text_path, output_dir)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/raganything/parser.py", line 422, in convert_text_to_pdf

raise RuntimeError(

RuntimeError: Failed to convert text file README.md to PDF: paraparser: syntax error: parse ended with 1 unclosed tags

para

[KnowledgeInit] ✗ [kb_init_20260113_094802_ad6aa4d1] Failed to process file: README.md (1/2, 50%) - File: README.md - Error: Failed to convert text file README.md to PDF: paraparser: syntax error: parse ended with 1 unclosed tags

para

[KnowledgeInit] ○

Processing: docker_update_guide.md

[KnowledgeInit] → [kb_init_20260113_094802_ad6aa4d1] Processing: docker_update_guide.md (2/2, 100%) - File: docker_update_guide.md

INFO: Starting complete document processing: /app/data/knowledge_bases/DeepTutor_Quickstart/raw/docker_update_guide.md

INFO: Starting document parsing: /app/data/knowledge_bases/DeepTutor_Quickstart/raw/docker_update_guide.md

INFO: Using mineru parser with method: auto

INFO: Using generic parser for .md file (method=auto)...

WARNING:root:WenQuanYi font not found at /usr/share/fonts/wqy-microhei/wqy-microhei.ttc. Chinese characters may not render correctly.

INFO: 172.17.0.1:58648 - "WebSocket /api/v1/knowledge/DeepTutor_Quickstart/progress/ws" [accepted]

INFO: connection open

[ProgressBroadcaster] Connected WebSocket for KB 'DeepTutor_Quickstart' (total: 1)

INFO: 172.17.0.1:58664 - "GET /api/v1/knowledge/health HTTP/1.1" 200 OK

[Knowledge] ○ Found 3 knowledge bases: ['Claude_Skills_Quickstart', 'DeepTutor_Quickstart', 'Kotlin_WebAssembly_Quickstart']

Warning: Knowledge base 'Claude_Skills_Quickstart' is not in kb_config.json, but directory exists

Warning: Knowledge base 'DeepTutor_Quickstart' is not in kb_config.json, but directory exists

Warning: Knowledge base 'Kotlin_WebAssembly_Quickstart' is not in kb_config.json, but directory exists

[Knowledge] ○ Returning 3 knowledge bases

INFO: 172.17.0.1:58664 - "GET /api/v1/knowledge/list HTTP/1.1" 200 OK

其中,处理md文件过程中的aise RuntimeError(

RuntimeError: Failed to convert text file README.md to PDF: paraparser: syntax error: parse ended with 1 unclosed tags 这种错误是什么原因,如何解决?

Steps to reproduce

No response

Expected Behavior

No response

Related Module

Dashboard

Configuration Used

No response

Logs and screenshots

No response

Additional Information

  • AI-Tutor Version:
  • Operating System:
  • Python Version:
  • Node.js Version:
  • Browser (if applicable):
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions