Skip to content

GO: implement PUT /api/v1/datasets/:dataset_id#14844

Draft
buua436 wants to merge 1 commit into
infiniflow:mainfrom
buua436:go003
Draft

GO: implement PUT /api/v1/datasets/:dataset_id#14844
buua436 wants to merge 1 commit into
infiniflow:mainfrom
buua436:go003

Conversation

@buua436
Copy link
Copy Markdown
Contributor

@buua436 buua436 commented May 12, 2026

What problem does this PR solve?

implement PUT /api/v1/datasets/:dataset_id

Type of change

  • Refactoring

@buua436 buua436 added the ci Continue Integration label May 12, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces dataset update functionality via a new PUT /api/v1/datasets/:dataset_id endpoint. Changes span a new DAO for connector mappings, service-layer validation and persistence logic, REST handler, and route registration to support updating dataset metadata, embedded models, parser configs, and connector associations.

Changes

Dataset Update Feature

Layer / File(s) Summary
Connector2Kb Data Access
internal/dao/connector2kb.go
New Connector2KbDAO provides GORM-scoped CRUD: delete/list by dataset ID, update auto_parse, delete by dataset+connector pair, and bulk/single create for mappings.
Service Types and Dependencies
internal/service/datasets.go
Extended DatasetsService with docEngine and engineType fields; added DatasetConnectorRequest and UpdateDatasetRequest types for structured updates.
UpdateDataset Service Implementation
internal/service/datasets.go
Core method validates fields (name, avatar, description, embedding model, pagerank by engine type), coerces updates from request and overrides, updates doc-engine pagerank, applies DB changes in transaction, and reconciles connector mappings.
HTTP Handler and Route Registration
internal/handler/datasets.go, internal/router/router.go
Added UpdateDataset handler with user auth and JSON binding; registered protected PUT /api/v1/datasets/:dataset_id route.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • infiniflow/ragflow#14834: Adds listing and join logic for connector–knowledgebase mappings that this PR manages via Connector2KbDAO.

Suggested labels

size:L

Suggested reviewers

  • JinHai-CN

Poem

🐰 A dataset's fate now rests in REST,
PUT it update, let it pass the test.
Connectors dance in mappings new,
Doc-engine ranks with pagerank true.
From DAO deep to handler high,
The update flow will never die! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It only contains a minimal heading and checkbox without addressing the template requirement to 'briefly describe what this PR aims to solve' with background context. Expand the description to explain the problem being solved, the business value, and how this endpoint will be used, not just restate the title.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing a PUT endpoint for dataset updates, which aligns with the addition of UpdateDataset handler, service logic, and router registration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@buua436 buua436 marked this pull request as ready for review May 12, 2026 11:13
@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label May 12, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
internal/service/datasets.go (2)

484-485: ⚡ Quick win

Consider accepting context.Context as a parameter.

The UpdateDataset method uses context.Background() at line 781 for the docEngine call. Accepting a context from the caller would allow proper propagation of cancellations and timeouts from the HTTP request.

♻️ Accept context as parameter
-func (s *DatasetsService) UpdateDataset(datasetID, tenantID string, req *UpdateDatasetRequest) (map[string]interface{}, common.ErrorCode, error) {
+func (s *DatasetsService) UpdateDataset(ctx context.Context, datasetID, tenantID string, req *UpdateDatasetRequest) (map[string]interface{}, common.ErrorCode, error) {

Then update the docEngine call:

-			if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil {
+			if err := s.docEngine.UpdateDataset(ctx, condition, newValue, indexName, datasetID); err != nil {

And update the handler call in internal/handler/datasets.go:

-	result, code, err := h.datasetsService.UpdateDataset(datasetID, user.ID, &req)
+	result, code, err := h.datasetsService.UpdateDataset(c.Request.Context(), datasetID, user.ID, &req)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/datasets.go` around lines 484 - 485, Update the
UpdateDataset method to accept a context.Context parameter (change signature of
DatasetsService.UpdateDataset to include ctx context.Context) and use that ctx
when calling docEngine instead of context.Background(); propagate the new ctx
through any internal helper calls within UpdateDataset. Then update the
corresponding handler in internal/handler/datasets.go to pass the request
context (r.Context()) into DatasetsService.UpdateDataset. Ensure function names
referenced are UpdateDataset and docEngine and update any compile-time call
sites to use the new signature.

484-890: 🏗️ Heavy lift

Extract helper methods to reduce complexity.

The UpdateDataset method is ~400 lines and handles multiple concerns: validation, field extraction, parser config processing, and persistence. Consider extracting helpers like validateUpdateFields, processParserConfig, and reconcileConnectors to improve readability and testability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/datasets.go` around lines 484 - 890, UpdateDataset is too
large and mixes validation, parser config handling, and connector
reconciliation; extract smaller helper functions to reduce complexity: implement
validateUpdateFields(req *UpdateDatasetRequest, kb *entity.Knowledgebase)
(name,avatar,description,embeddingModel,permission,language,chunkMethod,parserID,*pagerank,pipelineID,parserConfig,connectors,parserConfigProvided,extChanges,err)
to contain all input trimming/validation logic (existing name checks, size
limits, embedding verification, pagerank engine check, hasChanges detection) and
return cleaned values and error; implement
processParserConfig(parserID,chunkMethod string, parserConfig
map[string]interface{}, autoMeta *AutoMetadataConfig, kb *entity.Knowledgebase)
(finalParserMethod string, mergedParserConfig map[string]interface{}, err) to
encapsulate parent_child handling, ext merging, defaults via
common.GetParserConfig, validateDatasetParserConfigSize and merging with
kb.ParserConfig; implement reconcileConnectors(tx *gorm.DB, datasetID, tenantID
string, connectors []DatasetConnectorRequest, s *DatasetsService) error to
encapsulate listing existing mappings, validation of connector ownership, dedup
checks, creating/updating/deleting connector2kb rows (including UpdateAutoParse
and DeleteByDatasetIDAndConnectorID calls) and be invoked inside the DAO
transaction instead of inlining that logic; wire these helpers into
UpdateDataset so the main function builds updates map, calls the helpers, runs a
small transaction that updates Knowledgebase and calls reconcileConnectors, and
then returns datasetToMap(updatedKB).
internal/dao/connector2kb.go (1)

33-69: ⚖️ Poor tradeoff

Add defensive validation for transaction and ID parameters.

None of the DAO methods validate that tx is non-nil or that ID parameters are non-empty. This can lead to nil pointer dereferences or unexpected database behavior.

🛡️ Example: Add validation to DeleteByDatasetID
 func (dao *Connector2KbDAO) DeleteByDatasetID(tx *gorm.DB, datasetID string) error {
+	if tx == nil {
+		return errors.New("transaction cannot be nil")
+	}
+	if strings.TrimSpace(datasetID) == "" {
+		return errors.New("datasetID cannot be empty")
+	}
 	return tx.Where("kb_id = ?", datasetID).Delete(&entity.Connector2Kb{}).Error
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/dao/connector2kb.go` around lines 33 - 69, The DAO methods lack
defensive checks for a nil transaction and empty/invalid IDs or payloads which
can cause panics or bad SQL; update DeleteByDatasetID, ListByDatasetID,
UpdateAutoParse, DeleteByDatasetIDAndConnectorID, Create, and CreateMany to
first validate that tx != nil and return a descriptive error if nil, validate
string IDs (datasetID, connectorID, autoParse where applicable) are non-empty
and return an error for invalid params, and validate payloads for Create
(mapping != nil) and CreateMany (mappings != nil and length > 0) before calling
tx methods so callers get early, clear errors instead of nil-pointer/DB issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/service/datasets.go`:
- Around line 767-784: The docEngine.UpdateDataset call currently runs before
the DB transaction and can create inconsistency on rollback; move the
s.docEngine.UpdateDataset(...) call (the call using indexName and datasetID and
the pagerank-dependent newValue) into the database transaction scope so it
executes only after the DB update succeeds and before committing the
transaction, and if UpdateDataset returns an error propagate it so the
transaction can roll back; ensure you still set updates["pagerank"] = *pagerank
and preserve the same newValue/remove logic around pagerank_fea.

---

Nitpick comments:
In `@internal/dao/connector2kb.go`:
- Around line 33-69: The DAO methods lack defensive checks for a nil transaction
and empty/invalid IDs or payloads which can cause panics or bad SQL; update
DeleteByDatasetID, ListByDatasetID, UpdateAutoParse,
DeleteByDatasetIDAndConnectorID, Create, and CreateMany to first validate that
tx != nil and return a descriptive error if nil, validate string IDs (datasetID,
connectorID, autoParse where applicable) are non-empty and return an error for
invalid params, and validate payloads for Create (mapping != nil) and CreateMany
(mappings != nil and length > 0) before calling tx methods so callers get early,
clear errors instead of nil-pointer/DB issues.

In `@internal/service/datasets.go`:
- Around line 484-485: Update the UpdateDataset method to accept a
context.Context parameter (change signature of DatasetsService.UpdateDataset to
include ctx context.Context) and use that ctx when calling docEngine instead of
context.Background(); propagate the new ctx through any internal helper calls
within UpdateDataset. Then update the corresponding handler in
internal/handler/datasets.go to pass the request context (r.Context()) into
DatasetsService.UpdateDataset. Ensure function names referenced are
UpdateDataset and docEngine and update any compile-time call sites to use the
new signature.
- Around line 484-890: UpdateDataset is too large and mixes validation, parser
config handling, and connector reconciliation; extract smaller helper functions
to reduce complexity: implement validateUpdateFields(req *UpdateDatasetRequest,
kb *entity.Knowledgebase)
(name,avatar,description,embeddingModel,permission,language,chunkMethod,parserID,*pagerank,pipelineID,parserConfig,connectors,parserConfigProvided,extChanges,err)
to contain all input trimming/validation logic (existing name checks, size
limits, embedding verification, pagerank engine check, hasChanges detection) and
return cleaned values and error; implement
processParserConfig(parserID,chunkMethod string, parserConfig
map[string]interface{}, autoMeta *AutoMetadataConfig, kb *entity.Knowledgebase)
(finalParserMethod string, mergedParserConfig map[string]interface{}, err) to
encapsulate parent_child handling, ext merging, defaults via
common.GetParserConfig, validateDatasetParserConfigSize and merging with
kb.ParserConfig; implement reconcileConnectors(tx *gorm.DB, datasetID, tenantID
string, connectors []DatasetConnectorRequest, s *DatasetsService) error to
encapsulate listing existing mappings, validation of connector ownership, dedup
checks, creating/updating/deleting connector2kb rows (including UpdateAutoParse
and DeleteByDatasetIDAndConnectorID calls) and be invoked inside the DAO
transaction instead of inlining that logic; wire these helpers into
UpdateDataset so the main function builds updates map, calls the helpers, runs a
small transaction that updates Knowledgebase and calls reconcileConnectors, and
then returns datasetToMap(updatedKB).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b7db287-362a-4ae7-911c-f69972ad91b0

📥 Commits

Reviewing files that changed from the base of the PR and between 3f41f8c and 9b87651.

📒 Files selected for processing (4)
  • internal/dao/connector2kb.go
  • internal/handler/datasets.go
  • internal/router/router.go
  • internal/service/datasets.go

Comment on lines +767 to +784
if pagerank != nil {
updates["pagerank"] = *pagerank
if *pagerank != kb.Pagerank {
indexName := fmt.Sprintf("ragflow_%s", kb.TenantID)
if s.docEngine == nil {
return nil, common.CodeServerError, errors.New("Document engine not initialized")
}
condition := map[string]interface{}{"kb_id": datasetID}
newValue := map[string]interface{}{}
if *pagerank > 0 {
newValue["pagerank_fea"] = *pagerank
} else {
newValue["remove"] = map[string]interface{}{"pagerank_fea": true}
}
if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil {
return nil, common.CodeServerError, fmt.Errorf("failed to update pagerank in doc engine: %w", err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Move docEngine.UpdateDataset inside the database transaction to ensure consistency.

The docEngine.UpdateDataset call executes before the database transaction begins (line 798). If the transaction fails or rolls back, the document engine will contain stale data that doesn't match the database state, causing data inconsistency.

🔧 Move the docEngine update inside the transaction
 	if pagerank != nil {
 		updates["pagerank"] = *pagerank
-		if *pagerank != kb.Pagerank {
-			indexName := fmt.Sprintf("ragflow_%s", kb.TenantID)
-			if s.docEngine == nil {
-				return nil, common.CodeServerError, errors.New("Document engine not initialized")
-			}
-			condition := map[string]interface{}{"kb_id": datasetID}
-			newValue := map[string]interface{}{}
-			if *pagerank > 0 {
-				newValue["pagerank_fea"] = *pagerank
-			} else {
-				newValue["remove"] = map[string]interface{}{"pagerank_fea": true}
-			}
-			if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil {
-				return nil, common.CodeServerError, fmt.Errorf("failed to update pagerank in doc engine: %w", err)
-			}
-		}
 	}
 	if pipelineID != nil {
 		updates["pipeline_id"] = *pipelineID
@@ -797,6 +782,24 @@

 	err = dao.DB.Transaction(func(tx *gorm.DB) error {
+		if pagerank != nil && *pagerank != kb.Pagerank {
+			indexName := fmt.Sprintf("ragflow_%s", kb.TenantID)
+			if s.docEngine == nil {
+				return errors.New("Document engine not initialized")
+			}
+			condition := map[string]interface{}{"kb_id": datasetID}
+			newValue := map[string]interface{}{}
+			if *pagerank > 0 {
+				newValue["pagerank_fea"] = *pagerank
+			} else {
+				newValue["remove"] = map[string]interface{}{"pagerank_fea": true}
+			}
+			if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil {
+				return fmt.Errorf("failed to update pagerank in doc engine: %w", err)
+			}
+		}
+
 		if err := tx.Model(&entity.Knowledgebase{}).Where("id = ? AND tenant_id = ? AND status = ?", datasetID, tenantID, string(entity.StatusValid)).Updates(updates).Error; err != nil {
 			return err
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/datasets.go` around lines 767 - 784, The
docEngine.UpdateDataset call currently runs before the DB transaction and can
create inconsistency on rollback; move the s.docEngine.UpdateDataset(...) call
(the call using indexName and datasetID and the pagerank-dependent newValue)
into the database transaction scope so it executes only after the DB update
succeeds and before committing the transaction, and if UpdateDataset returns an
error propagate it so the transaction can roll back; ensure you still set
updates["pagerank"] = *pagerank and preserve the same newValue/remove logic
around pagerank_fea.

@buua436 buua436 marked this pull request as draft May 13, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant