Skip to content

Conversation

@mborodii-prog
Copy link
Contributor

Update Lookup (with train method)

Overview

This pull request introduces support for an action parameter to the write method for training lookup models, allowing more granular control with INSERT, UPDATE, and UPSERT operations. The implementation includes robust error handling and validation for these actions, and comprehensive tests have been added to ensure correct behavior and coverage of edge cases.

Feature: Action parameter for lookup model training

  • Added action parameter to the write method in wrangles/connectors/train.py, supporting INSERT, UPDATE, and UPSERT actions for lookup models. This enables explicit control over whether to create, update, or upsert models.
  • Implemented logic for each action:
    • UPSERT: Updates existing models or creates new ones, handling duplicate keys and merging data.
    • UPDATE: Updates only existing records in a model, with validation for presence of keys.
    • INSERT: Creates a new model, with checks for duplicate model names and unique keys.
    • Added error handling for invalid combinations of parameters and unsupported actions.

Usage Examples

INSERT

  write:
    - train.lookup:
        name: My Lookup Wrangle
        action: INSERT
        variant: key

UPDATE

  write:  
    - train.lookup:  
        model_id: test-model-id  
        action: UPDATE  

UPSERT

  write:  
  - train.lookup:  
      name: {model_name} 
      action: UPSERT  
      variant: key  

UPSERT (by default)

  write:  
  - train.lookup:  
      name: {model_name} 
      variant: key  

Validation and error handling

  • Comprehensive error messages and checks for duplicate model names, duplicate keys, missing models, and invalid action parameters, ensuring robust and user-friendly behavior.

Test coverage

  • Added extensive tests in tests/connectors/test_train.py for all new behaviors:
    • Successful insert and upsert operations.
    • Handling duplicate model names and duplicate keys.
    • Update operations for existing and non-existent models.
    • Validation of invalid actions and parameters.

Documentation

  • Updated schema documentation to include the new action parameter and its possible values, improving clarity for users.

@mborodii-prog mborodii-prog linked an issue Dec 27, 2025 that may be closed by this pull request
@mborodii-prog
Copy link
Contributor Author

@thomasstvr @ebhills Pls look into final version of PR:

  • Implemented logic for each action:
    UPSERT: Updates existing models or creates new ones, handling duplicate keys and merging data.
    UPDATE: Updates only existing records in a model, with validation for presence of keys.
    INSERT: Creates a new model, with checks for duplicate model names and unique keys.
    OVERWRITE: Default logic (agreed with Thomas)

Mocked tests to avoid real model modification

try:
metadata = _data.model(model_id)
except Exception as e:
raise e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When attempting to train a new model using the name parameter and action set to update, a cryptic error is raised:

RuntimeError: train.lookup - Something went wrong trying to access model None

Passing insert raises a better message, while upsert works. User should only be able to pass overwrite when creating a new model/using the name parameter. This can be caught early and each of the 3 cases can have the same error message. Something like:

"{action} not allowed when training a new model"




def test_upsert_missing_key_for_key_variant(self, mock_lookup_action_backend):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop mocker everywhere that a new model is not being trained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Lookup (with train method)

3 participants