Parquet output format by mschulist · Pull Request #870 · birdnet-team/BirdNET-Analyzer

mschulist · 2026-02-15T03:10:51Z

This PR adds parquet as an output format for analyze. It uses PyArrow to handle all of the parquet reading and writing.

Currently, it creates a separate "table" for each timestamp, which might result in nonoptimal compression when there are few results per timestamp. There are a few options to improve this:

Have a buffer that creates a new table for every $n$ rows (slightly more complex implementation, but still not too bad).
Put all rows in a single table (which might use a lot of memory for large datasets).

Combining the results into a single file (with --combine_results) does make the output much smaller, but it would be ideal to have good compression without having to do this extra step.

Either way, I have found that parquet's columnar compression works particularly well on classifier outputs due to the repetitive nature of their outputs (e.g. filenames are repeated for many rows). For large datasets, parquet should provide a significant improvement in file size.

This is somewhat related to #230 as well.

Josef-Haupt · 2026-02-17T12:47:52Z

Sounds good, the birdnet lib also has a parquet output, we are currently replacing the core of the analyzer with the lib anyway, we can merge the PR and I'll update the code in #867 to match it.

mschulist added 2 commits February 14, 2026 20:58

add parquet output format

317b7c4

remove print

e08e7ef

tphakala mentioned this pull request Feb 15, 2026

Add Parquet output format support tphakala/birda#130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet output format#870

Parquet output format#870
mschulist wants to merge 2 commits intobirdnet-team:mainfrom
mschulist:mschulist-parquet-output

mschulist commented Feb 15, 2026

Uh oh!

Josef-Haupt commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mschulist commented Feb 15, 2026

Uh oh!

Josef-Haupt commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants