Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table "find and replace" function #7749

Closed
jdunkerley opened this issue Sep 6, 2023 · 3 comments · Fixed by #7979
Closed

Table "find and replace" function #7749

jdunkerley opened this issue Sep 6, 2023 · 3 comments · Fixed by #7979
Assignees
Labels
-libs Libraries: New libraries to be implemented x-new-feature Type: new feature request

Comments

@jdunkerley
Copy link
Member

Currently to replace one more columns in one table from a second table preserving the input order, we would need to:

  • Add a row number to left table.
  • Left Outer Join on a Key.
  • For each column, coalesce the value with the new column in the right (this is tricky and tedious, can be done by expressions).
  • Order by the row number.
  • Remove row number and right columns.

This also will reorder the left table to a new ordering.
So for example, joining on a user id and replacing the user name from a new table.

The goal of this ticket is to make this easier.
Alteryx has a Find Replace tool: https://help.alteryx.com/20231/designer/find-replace-tool
This is a common VLookup style operation in Excel (IFERROR(VLOOKUP(...), ...))

This should be possible in DB as well but might need to relax the ordering unchanged for performance.

## Uses a look up table of values to replace value in this table.
    Columns are matched by name.
    If a duplicate key occurs in the lookup table, a `DUPLICATE_KEY` error will be raised.
    In-Memory to order of the input table will be preserved, In-Database this is not guarenteed.

    Arguments:
    - lookup_table: Table containing data to replace.
    - key_columns: Column names to join the two inputs on.
    - allow_new_columns: If lookup_table contains extra columns, if true will be added to the output, if False raises a `SOMETHING` error otherwise.
    - allow_unmatched_rows: If a row in the input is not present in the lookup, if True operation will succeed, if False raises a `MISSING_INPUT_VALUES` error.
    - on_problems: ...
Table.lookup_and_replace lookup_table:Table key_columns:Column_Selector allow_new_columns:Boolean=False  allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Report_Warning =
 ...
  • Keep it simple and make it a strict name match
@github-project-automation github-project-automation bot moved this to ❓New in Issues Board Sep 6, 2023
@jdunkerley jdunkerley added -libs Libraries: New libraries to be implemented x-new-feature Type: new feature request labels Sep 6, 2023
@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board Sep 12, 2023
@radeusgd radeusgd moved this from 📤 Backlog to 🔧 Implementation in Issues Board Oct 2, 2023
@enso-bot
Copy link

enso-bot bot commented Oct 4, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-10-03):

Progress: Clarifying spec and writing docs for lookup_and_replace. Added tests reflecting the spec. Initial work on the implementation. It should be finished by 2023-10-05.

Next Day: Next day I will be working on the same task. Do the in-memory implementation. Then DB.

@enso-bot
Copy link

enso-bot bot commented Oct 5, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-10-04):

Progress: Most of the in-memory implementation done, but some bugs to find. Reworked the common parts too. It should be finished by 2023-10-05.

Next Day: Next day I will be working on the same task. Finish the in-memory implementation. Then do DB - should hopefully by simpler with the common parts done.

@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Oct 6, 2023
@enso-bot
Copy link

enso-bot bot commented Oct 6, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-10-05):

Progress: Finished the in-memory implementation of lookup_and_replace. Analyzing problems with Database approach, discussing, brainstorming ideas for the Database implementation - created a separate ticket for it. It should be finished by 2023-10-05.

Next Day: Next day I will be working on the #7514 task. Move forward a bit on problem handling refactor. Next tasks.

@mergify mergify bot closed this as completed in #7979 Oct 10, 2023
mergify bot pushed a commit that referenced this issue Oct 10, 2023
- Closes #7749 implementing the in-memory logic.
- Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented x-new-feature Type: new feature request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants