Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Revision (flow) and Metadata Changes #223

Closed
3 tasks
osopardo1 opened this issue Oct 26, 2023 · 0 comments · Fixed by #520
Closed
3 tasks

Refactor Revision (flow) and Metadata Changes #223

osopardo1 opened this issue Oct 26, 2023 · 0 comments · Fixed by #520
Assignees
Labels
type: enhancement Improvement of existing feature or code

Comments

@osopardo1
Copy link
Member

Right now, Revision Changes are being treated in three different parts:

  1. SparkRevisionFactory -> creates a Revision with user configurations (columnsToIndex, columnStats...)
  2. OTreeAnalyzer -> analyses the data and triggers a new Revision if supersedes the existing one, or if the user input is does not contain enough information (such as columnStats that initialises the transformations).
  3. MetadataWriter -> commits the desired Revision Changes into the Table Metadata.

Each one of the components works independently and without visibility about how the Revision is triggered.

That's why a lot of bugs appear in certain conditions (such as appending to an empty table) that affect only one of the processes. We need to:

  • Analyze the current status.
  • Design a better flow of information. (One option could be to pass some information around through options (a Map of Strings)
  • Refactor the corresponding processes with new information treatment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Improvement of existing feature or code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants