Inquiry: Implementing a Multilingual RAG Workflow (Persian & English) with Custom Preprocessed Contex #30127

sharpdima · 2025-12-24T18:14:57Z

sharpdima
Dec 24, 2025

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
Please do not modify this template :) and fill in all the required fields.

Content

Hi Dify Team,

I am developing an academic assistant called "Mora" and I have a specific architectural question regarding Workflow design.

The Goal: I want to create a workflow that queries two separate knowledge bases—one containing Persian documents and the other English documents—simultaneously. The final response should be a synthesized answer that extracts and translates the best insights from both languages to provide a comprehensive result to the user.

Our Preprocessing Methodology: To ensure high retrieval accuracy, we use a unique "Four-Column Preprocessing" strategy for all our documents (both languages):

Executive Summary: A concise overview of the segment.

Structural Headings: Metadata mapping the document's hierarchy (Heading 1, 2, etc.) to maintain context.

Keywords: Core concepts and terms extracted for high-precision vector search.

Original Text: The raw, high-fidelity content for final reasoning.

The Question: In Dify Workflows, what is the best approach to "merge" or "rank" findings from these two multilingual sources? Should I use a Multiple Retrieval node followed by a Template Transformer, or is there a more efficient way to let the LLM reason across these multi-layered (four-column) segments from different languages?

Looking forward to your professional guidance.

2025-12-24T18:20:43Z

dosubot[bot]
bot Dec 24, 2025

Dify supports querying multiple knowledge bases (like Persian and English) simultaneously using the Knowledge Retrieval node in "multiWay" mode. This lets you merge and rerank results from both sources automatically—just select both datasets and enable reranking for best relevance. You can also use metadata filtering to target specific document segments or leverage your four-column structure for more precise retrievals. The output is an array of objects (with content and metadata) that you can further process downstream.

If you need more control (for example, custom merging, translation, or formatting), use a Code or Template Transformer node after retrieval. These nodes let you script how to combine, translate, or synthesize the results before passing them to the LLM for final reasoning. Translation isn't handled natively in retrieval or merging, so add a translation step (via LLM or code) after merging and before synthesis if needed.

The Variable Aggregator node does not merge multiple retrieval results by default; use a Code node to concatenate arrays if you want to combine outputs from separate retrieval nodes. Dify is also working on features to make multi-source context binding even easier in the future.

For your workflow: use multiWay retrieval with reranking for simplicity, or combine retrieval nodes with a Code/Template node for custom logic and translation. Metadata filtering can help you target the right segments in your four-column structure.
Reference 1 | Reference 2 | Reference 3

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inquiry: Implementing a Multilingual RAG Workflow (Persian & English) with Custom Preprocessed Contex #30127

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Inquiry: Implementing a Multilingual RAG Workflow (Persian & English) with Custom Preprocessed Contex #30127

Uh oh!

sharpdima Dec 24, 2025

Self Checks

Content

Replies: 1 comment

Uh oh!

dosubot[bot] bot Dec 24, 2025

sharpdima
Dec 24, 2025

dosubot[bot]
bot Dec 24, 2025