Skip to content

Add fuzz regression testing to parquet/arrow/csv readers #9358

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The arrow-testing repository has several data files that caused issues with the C/C++ implementation

It would be nice to add tests in this repository that ensure the parquet/arrow/csv readers behave "nicely" when reading such files.

The definition of "nice" would be "no panics, and errors when appropriate"

Describe the solution you'd like

  1. Add tests to try and read all the above mentioned invalid files
  2. If any cause panics, temporarily skip them in the tests and file a ticket to track fixing the panics

Describe alternatives you've considered

Additional context
This was inspired while reviewing this doc from @pitrou on arrow security guidelines:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions