[Question]: Clarification Needed on Top N and Top K Parameters in Chat->Prompt Engine Configuration

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Describe your problem

confused with `Top N` and `Top K` parameter.
Based on the configuration of Chat Configuration-> Prompt Engine, `Top N` means number of chunks from the retriever, `Top K` which associated with Rerank model means K chunk will be sent into the rerank model.
However, from the code below,  it appears that retrieved chunks numbers is min(`RERANK_LIMIT = 64 `, `Top K`), and all retrieved chunks will be fed into rerank model?
https://github.com/infiniflow/ragflow/blob/b77ce4e8464df5c91589e5da6e6958c2a6731271/api/db/services/dialog_service.py#L212-L226
and here `TOP_N` was assigned to page_size, :
https://github.com/infiniflow/ragflow/blob/200b6f55c6539b93c023d8906a216bacaae720b9/rag/nlp/search.py#L342-L367

similar question about retrieve_testing module:https://github.com/infiniflow/ragflow/issues/4797

	kbinfos = retriever.retrieval(
	" ".join(questions),
	embd_mdl,
	tenant_ids,
	dialog.kb_ids,
	1,
	dialog.top_n,
	dialog.similarity_threshold,
	dialog.vector_similarity_weight,
	doc_ids=attachments,
	top=dialog.top_k,
	aggs=False,
	rerank_mdl=rerank_mdl,
	rank_feature=label_question(" ".join(questions), kbs),
	)

	def retrieval(self, question, embd_mdl, tenant_ids, kb_ids, page, page_size, similarity_threshold=0.2,
	vector_similarity_weight=0.3, top=1024, doc_ids=None, aggs=True,
	rerank_mdl=None, highlight=False,
	rank_feature: dict \| None = {PAGERANK_FLD: 10}):
	ranks = {"total": 0, "chunks": [], "doc_aggs": {}}
	if not question:
	return ranks

	RERANK_LIMIT = 64
	req = {"kb_ids": kb_ids, "doc_ids": doc_ids, "page": page, "size": RERANK_LIMIT,
	"question": question, "vector": True, "topk": top,
	"similarity": similarity_threshold,
	"available_int": 1}

	if isinstance(tenant_ids, str):
	tenant_ids = tenant_ids.split(",")

	sres = self.search(req, [index_name(tid) for tid in tenant_ids],
	kb_ids, embd_mdl, highlight, rank_feature=rank_feature)
	ranks["total"] = sres.total

	if rerank_mdl and sres.total > 0:
	sim, tsim, vsim = self.rerank_by_model(rerank_mdl,
	sres, question, 1 - vector_similarity_weight,
	vector_similarity_weight,
	rank_feature=rank_feature)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Clarification Needed on Top N and Top K Parameters in Chat->Prompt Engine Configuration #6456

Self Checks

Describe your problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: Clarification Needed on Top N and Top K Parameters in Chat->Prompt Engine Configuration #6456

Description

Self Checks

Describe your problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions