Skip to content

Optimisation: Do not run validators on columns that are not present #1839

Closed
@jcpitre

Description

@jcpitre

Describe the problem

In #1749 we got to a point where one of the datasets was so big that we got out of memory issues and the time to run increased significantly.
We could check if a column exists in a file and not run validators related to that column.
In particular for foreign key validator, if the column that has the annotation does not exist, don't run the validator.
This could be useful in particular for stop_times, that usually has the most number of records and, with the addition of flex, has now 5 fields with the ForeignKey annotation.

Note: #1747 tackles the same problem, but is much broader in scope.

Proposed solution

See above
The exact mechanism TBD

Alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature request or improvement on an existing feature

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions