Description
Is there an existing issue for the same bug?
- I have checked the existing issues.
RAGFlow workspace code commit ID
9298acc full - Nighty Feb 18
RAGFlow image version
9298acc full
Other environment information
Running nightly build from Feb 18 on ubuntu
Actual behavior
Hi,
I have several issues similar to what others reported but not quite the same.
Issues with RAPTOR:
1) On this one doc with only 1 chunk it always errors our:
15:10:00 Task has been received.
15:10:01 Page(12): OCR started2): OCR finished (3.11s)
15:10:04 Page(1
15:10:05 Page(12): Layout analysis (0.86s)2): Table analysis (0.00s)
15:10:05 Page(1
15:10:05 Page(12): Text merged (0.06s)2): Page 0
15:10:05 Page(11: Text merging finished2): Generate 1 chunks
15:10:05 Page(1
15:10:05 Page(12): Embedding chunks (0.22s)2): Done (0.04s)
15:10:05 Page(1
15:10:08 Start RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
15:10:08 Task has been received.
15:10:08 [ERROR]Fail to bind LLM used by RAPTOR: 'NoneType' object is not subscriptable
15:10:08 [ERROR][Exception]: 'NoneType' object is not subscriptable
I can find chunk in elasticsearch.
2) On another document it processed fine,
then I changed a setting on the file to enable Entity resolution and re-run it and got an error.
16:01:12 Reused previous task's chunks.
16:01:17 Start RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
16:01:17 Task has been received.
16:01:18 [ERROR]Fail to bind LLM used by RAPTOR: 'NoneType' object is not subscriptable
16:01:18 [ERROR][Exception]: 'NoneType' object is not subscriptable
Then I turned that setting off and re-run and RAPTOR worked fine again. This time it had to re-generate chunks since it's error-ed before.
Simmilar for another document I chose not to regenerate chunks and it failed RAPTOR, they I re-generated and RAPTOR worked.
I see that I have some errors connecting to the elasticsearch
ESConnection.update got exception: BadRequestError(400, 'illegal_argument_exception', 'exceeded max allowed inline script size in bytes [65535] with size [213572] for script [ctx._source.content_with_weight='
Before I had errors regarding number of scripts that can be run and I increased it to 1000/1m
Could be related to how many entities it found and trying to resolve?
3) Tasks seems to be stuck at very last step after entity resolution for a really long time, for an hour or more for example.
18:57:39 Entities extraction progress ... 46/47 (8962 tokens)
18:57:39 Entities extraction progress ... 47/47 (9589 tokens)
then some times it fails.
Thank you.
Expected behavior
No response
Steps to reproduce
As described above. Using documents with RAPTOR and entity resolution.
Additional information
No response