Skip to content

Conversation

jiyegege
Copy link
Contributor

@jiyegege jiyegege commented Aug 8, 2024

What problem does this PR solve?

Fix a "TypeError: expected string or buffer bug" in docx files extracted using Knowledge Graph. #1859

Traceback (most recent call last):
  File "//Users/XXX/ragflow/rag/svr/task_executor.py", line 149, in build
    cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/ragflow/rag/app/knowledge_graph.py", line 18, in chunk
    chunks = build_knowlege_graph_chunks(tenant_id, sections, callback,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/ragflow/graphrag/index.py", line 87, in build_knowlege_graph_chunks
    tkn_cnt = num_tokens_from_string(chunks[i])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/github/ragflow/rag/utils/__init__.py", line 79, in num_tokens_from_string
    num_tokens = len(encoder.encode(string))
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/tiktoken/core.py", line 116, in encode
    if match := _special_token_regex(disallowed_special).search(text):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or buffer

This type is Dict
Pasted Graphic 3
The correct type should be Str
Pasted Graphic 2

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

jiyegege added 15 commits July 22, 2024 17:13
Verify that 'reference' is not None and its length is greater than 0 before
processing chunks in API4ConversationService. This prevents potential errors
when 'reference' is missing or empty.
Simplify the conditionals for checking the existence and non-empty state
of the 'reference' key in the API app's response and streaming logic.
Additionally, enhance the rerank model in the rag module to handle empty
text lists gracefully by returning an empty array and zero score.
# Conflicts:
#	web/src/pages/user-setting/constants.tsx
#	web/src/pages/user-setting/setting-model/constant.ts
@jiyegege jiyegege changed the title Fix a "TypeError: expected string or buffer bug" in docx files extracted using Knowledge Graph. Fix a "TypeError: expected string or buffer bug" in docx files extracted using Knowledge Graph.#1859 Aug 8, 2024
@KevinHuSh KevinHuSh merged commit 19ded65 into infiniflow:main Aug 8, 2024
1 check passed
Halfknow pushed a commit to Halfknow/ragflow that referenced this pull request Nov 11, 2024
…ted using Knowledge Graph.infiniflow#1859 (infiniflow#1865)

### What problem does this PR solve?

Fix a "TypeError: expected string or buffer bug" in docx files extracted
using Knowledge Graph. infiniflow#1859
```
Traceback (most recent call last):
  File "//Users/XXX/ragflow/rag/svr/task_executor.py", line 149, in build
    cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/ragflow/rag/app/knowledge_graph.py", line 18, in chunk
    chunks = build_knowlege_graph_chunks(tenant_id, sections, callback,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/ragflow/graphrag/index.py", line 87, in build_knowlege_graph_chunks
    tkn_cnt = num_tokens_from_string(chunks[i])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/github/ragflow/rag/utils/__init__.py", line 79, in num_tokens_from_string
    num_tokens = len(encoder.encode(string))
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/XXX/tiktoken/core.py", line 116, in encode
    if match := _special_token_regex(disallowed_special).search(text):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or buffer
```
This type is `Dict`
<img width="1689" alt="Pasted Graphic 3"
src="https://github.com/user-attachments/assets/e5ba5c45-df1d-4697-98c9-14365c839f20">
The correct type should be ` Str`
<img width="1725" alt="Pasted Graphic 2"
src="https://github.com/user-attachments/assets/e54d5e60-4ce4-4180-b394-24e485013534">

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants