GO: implement PUT /api/v1/datasets/:dataset_id#14844
Conversation
📝 WalkthroughWalkthroughThis PR introduces dataset update functionality via a new ChangesDataset Update Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/service/datasets.go (2)
484-485: ⚡ Quick winConsider accepting context.Context as a parameter.
The
UpdateDatasetmethod usescontext.Background()at line 781 for the docEngine call. Accepting a context from the caller would allow proper propagation of cancellations and timeouts from the HTTP request.♻️ Accept context as parameter
-func (s *DatasetsService) UpdateDataset(datasetID, tenantID string, req *UpdateDatasetRequest) (map[string]interface{}, common.ErrorCode, error) { +func (s *DatasetsService) UpdateDataset(ctx context.Context, datasetID, tenantID string, req *UpdateDatasetRequest) (map[string]interface{}, common.ErrorCode, error) {Then update the docEngine call:
- if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil { + if err := s.docEngine.UpdateDataset(ctx, condition, newValue, indexName, datasetID); err != nil {And update the handler call in
internal/handler/datasets.go:- result, code, err := h.datasetsService.UpdateDataset(datasetID, user.ID, &req) + result, code, err := h.datasetsService.UpdateDataset(c.Request.Context(), datasetID, user.ID, &req)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/datasets.go` around lines 484 - 485, Update the UpdateDataset method to accept a context.Context parameter (change signature of DatasetsService.UpdateDataset to include ctx context.Context) and use that ctx when calling docEngine instead of context.Background(); propagate the new ctx through any internal helper calls within UpdateDataset. Then update the corresponding handler in internal/handler/datasets.go to pass the request context (r.Context()) into DatasetsService.UpdateDataset. Ensure function names referenced are UpdateDataset and docEngine and update any compile-time call sites to use the new signature.
484-890: 🏗️ Heavy liftExtract helper methods to reduce complexity.
The
UpdateDatasetmethod is ~400 lines and handles multiple concerns: validation, field extraction, parser config processing, and persistence. Consider extracting helpers likevalidateUpdateFields,processParserConfig, andreconcileConnectorsto improve readability and testability.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/datasets.go` around lines 484 - 890, UpdateDataset is too large and mixes validation, parser config handling, and connector reconciliation; extract smaller helper functions to reduce complexity: implement validateUpdateFields(req *UpdateDatasetRequest, kb *entity.Knowledgebase) (name,avatar,description,embeddingModel,permission,language,chunkMethod,parserID,*pagerank,pipelineID,parserConfig,connectors,parserConfigProvided,extChanges,err) to contain all input trimming/validation logic (existing name checks, size limits, embedding verification, pagerank engine check, hasChanges detection) and return cleaned values and error; implement processParserConfig(parserID,chunkMethod string, parserConfig map[string]interface{}, autoMeta *AutoMetadataConfig, kb *entity.Knowledgebase) (finalParserMethod string, mergedParserConfig map[string]interface{}, err) to encapsulate parent_child handling, ext merging, defaults via common.GetParserConfig, validateDatasetParserConfigSize and merging with kb.ParserConfig; implement reconcileConnectors(tx *gorm.DB, datasetID, tenantID string, connectors []DatasetConnectorRequest, s *DatasetsService) error to encapsulate listing existing mappings, validation of connector ownership, dedup checks, creating/updating/deleting connector2kb rows (including UpdateAutoParse and DeleteByDatasetIDAndConnectorID calls) and be invoked inside the DAO transaction instead of inlining that logic; wire these helpers into UpdateDataset so the main function builds updates map, calls the helpers, runs a small transaction that updates Knowledgebase and calls reconcileConnectors, and then returns datasetToMap(updatedKB).internal/dao/connector2kb.go (1)
33-69: ⚖️ Poor tradeoffAdd defensive validation for transaction and ID parameters.
None of the DAO methods validate that
txis non-nil or that ID parameters are non-empty. This can lead to nil pointer dereferences or unexpected database behavior.🛡️ Example: Add validation to DeleteByDatasetID
func (dao *Connector2KbDAO) DeleteByDatasetID(tx *gorm.DB, datasetID string) error { + if tx == nil { + return errors.New("transaction cannot be nil") + } + if strings.TrimSpace(datasetID) == "" { + return errors.New("datasetID cannot be empty") + } return tx.Where("kb_id = ?", datasetID).Delete(&entity.Connector2Kb{}).Error }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/dao/connector2kb.go` around lines 33 - 69, The DAO methods lack defensive checks for a nil transaction and empty/invalid IDs or payloads which can cause panics or bad SQL; update DeleteByDatasetID, ListByDatasetID, UpdateAutoParse, DeleteByDatasetIDAndConnectorID, Create, and CreateMany to first validate that tx != nil and return a descriptive error if nil, validate string IDs (datasetID, connectorID, autoParse where applicable) are non-empty and return an error for invalid params, and validate payloads for Create (mapping != nil) and CreateMany (mappings != nil and length > 0) before calling tx methods so callers get early, clear errors instead of nil-pointer/DB issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/service/datasets.go`:
- Around line 767-784: The docEngine.UpdateDataset call currently runs before
the DB transaction and can create inconsistency on rollback; move the
s.docEngine.UpdateDataset(...) call (the call using indexName and datasetID and
the pagerank-dependent newValue) into the database transaction scope so it
executes only after the DB update succeeds and before committing the
transaction, and if UpdateDataset returns an error propagate it so the
transaction can roll back; ensure you still set updates["pagerank"] = *pagerank
and preserve the same newValue/remove logic around pagerank_fea.
---
Nitpick comments:
In `@internal/dao/connector2kb.go`:
- Around line 33-69: The DAO methods lack defensive checks for a nil transaction
and empty/invalid IDs or payloads which can cause panics or bad SQL; update
DeleteByDatasetID, ListByDatasetID, UpdateAutoParse,
DeleteByDatasetIDAndConnectorID, Create, and CreateMany to first validate that
tx != nil and return a descriptive error if nil, validate string IDs (datasetID,
connectorID, autoParse where applicable) are non-empty and return an error for
invalid params, and validate payloads for Create (mapping != nil) and CreateMany
(mappings != nil and length > 0) before calling tx methods so callers get early,
clear errors instead of nil-pointer/DB issues.
In `@internal/service/datasets.go`:
- Around line 484-485: Update the UpdateDataset method to accept a
context.Context parameter (change signature of DatasetsService.UpdateDataset to
include ctx context.Context) and use that ctx when calling docEngine instead of
context.Background(); propagate the new ctx through any internal helper calls
within UpdateDataset. Then update the corresponding handler in
internal/handler/datasets.go to pass the request context (r.Context()) into
DatasetsService.UpdateDataset. Ensure function names referenced are
UpdateDataset and docEngine and update any compile-time call sites to use the
new signature.
- Around line 484-890: UpdateDataset is too large and mixes validation, parser
config handling, and connector reconciliation; extract smaller helper functions
to reduce complexity: implement validateUpdateFields(req *UpdateDatasetRequest,
kb *entity.Knowledgebase)
(name,avatar,description,embeddingModel,permission,language,chunkMethod,parserID,*pagerank,pipelineID,parserConfig,connectors,parserConfigProvided,extChanges,err)
to contain all input trimming/validation logic (existing name checks, size
limits, embedding verification, pagerank engine check, hasChanges detection) and
return cleaned values and error; implement
processParserConfig(parserID,chunkMethod string, parserConfig
map[string]interface{}, autoMeta *AutoMetadataConfig, kb *entity.Knowledgebase)
(finalParserMethod string, mergedParserConfig map[string]interface{}, err) to
encapsulate parent_child handling, ext merging, defaults via
common.GetParserConfig, validateDatasetParserConfigSize and merging with
kb.ParserConfig; implement reconcileConnectors(tx *gorm.DB, datasetID, tenantID
string, connectors []DatasetConnectorRequest, s *DatasetsService) error to
encapsulate listing existing mappings, validation of connector ownership, dedup
checks, creating/updating/deleting connector2kb rows (including UpdateAutoParse
and DeleteByDatasetIDAndConnectorID calls) and be invoked inside the DAO
transaction instead of inlining that logic; wire these helpers into
UpdateDataset so the main function builds updates map, calls the helpers, runs a
small transaction that updates Knowledgebase and calls reconcileConnectors, and
then returns datasetToMap(updatedKB).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1b7db287-362a-4ae7-911c-f69972ad91b0
📒 Files selected for processing (4)
internal/dao/connector2kb.gointernal/handler/datasets.gointernal/router/router.gointernal/service/datasets.go
| if pagerank != nil { | ||
| updates["pagerank"] = *pagerank | ||
| if *pagerank != kb.Pagerank { | ||
| indexName := fmt.Sprintf("ragflow_%s", kb.TenantID) | ||
| if s.docEngine == nil { | ||
| return nil, common.CodeServerError, errors.New("Document engine not initialized") | ||
| } | ||
| condition := map[string]interface{}{"kb_id": datasetID} | ||
| newValue := map[string]interface{}{} | ||
| if *pagerank > 0 { | ||
| newValue["pagerank_fea"] = *pagerank | ||
| } else { | ||
| newValue["remove"] = map[string]interface{}{"pagerank_fea": true} | ||
| } | ||
| if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil { | ||
| return nil, common.CodeServerError, fmt.Errorf("failed to update pagerank in doc engine: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Move docEngine.UpdateDataset inside the database transaction to ensure consistency.
The docEngine.UpdateDataset call executes before the database transaction begins (line 798). If the transaction fails or rolls back, the document engine will contain stale data that doesn't match the database state, causing data inconsistency.
🔧 Move the docEngine update inside the transaction
if pagerank != nil {
updates["pagerank"] = *pagerank
- if *pagerank != kb.Pagerank {
- indexName := fmt.Sprintf("ragflow_%s", kb.TenantID)
- if s.docEngine == nil {
- return nil, common.CodeServerError, errors.New("Document engine not initialized")
- }
- condition := map[string]interface{}{"kb_id": datasetID}
- newValue := map[string]interface{}{}
- if *pagerank > 0 {
- newValue["pagerank_fea"] = *pagerank
- } else {
- newValue["remove"] = map[string]interface{}{"pagerank_fea": true}
- }
- if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil {
- return nil, common.CodeServerError, fmt.Errorf("failed to update pagerank in doc engine: %w", err)
- }
- }
}
if pipelineID != nil {
updates["pipeline_id"] = *pipelineID
@@ -797,6 +782,24 @@
err = dao.DB.Transaction(func(tx *gorm.DB) error {
+ if pagerank != nil && *pagerank != kb.Pagerank {
+ indexName := fmt.Sprintf("ragflow_%s", kb.TenantID)
+ if s.docEngine == nil {
+ return errors.New("Document engine not initialized")
+ }
+ condition := map[string]interface{}{"kb_id": datasetID}
+ newValue := map[string]interface{}{}
+ if *pagerank > 0 {
+ newValue["pagerank_fea"] = *pagerank
+ } else {
+ newValue["remove"] = map[string]interface{}{"pagerank_fea": true}
+ }
+ if err := s.docEngine.UpdateDataset(context.Background(), condition, newValue, indexName, datasetID); err != nil {
+ return fmt.Errorf("failed to update pagerank in doc engine: %w", err)
+ }
+ }
+
if err := tx.Model(&entity.Knowledgebase{}).Where("id = ? AND tenant_id = ? AND status = ?", datasetID, tenantID, string(entity.StatusValid)).Updates(updates).Error; err != nil {
return err
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/service/datasets.go` around lines 767 - 784, The
docEngine.UpdateDataset call currently runs before the DB transaction and can
create inconsistency on rollback; move the s.docEngine.UpdateDataset(...) call
(the call using indexName and datasetID and the pagerank-dependent newValue)
into the database transaction scope so it executes only after the DB update
succeeds and before committing the transaction, and if UpdateDataset returns an
error propagate it so the transaction can roll back; ensure you still set
updates["pagerank"] = *pagerank and preserve the same newValue/remove logic
around pagerank_fea.
What problem does this PR solve?
implement PUT /api/v1/datasets/:dataset_id
Type of change