Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for custom Q&A in the knowledge base #10873 #10874

Conversation

dajianguo
Copy link
Contributor

@dajianguo dajianguo commented Nov 20, 2024

Summary

I need to import some Q&A text,not need LLM to generate results for me。
I modified the code to support excel and csv uploading qa files。
The processing logic is that when there are only two columns in csv or excel and qa mode is selected, the LLM will not be called。

Resolves #4664
Resolves #6904
Resolves #7735
Resolves #7430
Resolves #10873

Screenshots

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 📚 feat:datasource Data sources like web, Notion, Logseq, Lark, Docs labels Nov 20, 2024
@crazywoola crazywoola requested a review from JohnJyong November 20, 2024 04:22
response = LLMGenerator.generate_qa_document(
current_user.current_tenant_id, preview_texts[0], doc_language
)
if "Q00001:" in preview_texts[0] and "A00001:" in preview_texts[0]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't seem very generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the separator? "Q00001:" and "A00001:"? I can change to a common separator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image
I refer to the format_split_text method。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only a runtime delimiter variable and does not actually store

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkisAya
Copy link
Contributor

AkisAya commented Nov 22, 2024

this should not be done by implicitly change QA LLM mode to a normal QA extraction by a template.
i think user should choose to disable llm QA mode when import file on ui.

something like this
Snipaste_2024-11-22_14-21-38

and if user choose to generate QA pair by template, ui shows a hint what template should be according to the file extension

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image

I think u shouldn't modify function create_documents, instead of that, create a separate splitter QASplitter

@crazywoola
Copy link
Member

I discussed with @JohnJyong, we decide not to merge this PR. If you have any other questions, please feel free to contact with @JohnJyong

@crazywoola crazywoola closed this Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 feat:datasource Data sources like web, Notion, Logseq, Lark, Docs size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
5 participants