Add more "basic" tests samples to cover supported content types #662
Open
Description
opened on Aug 30, 2024
The new model "standard_v2_0" supports 200+ content types: https://github.com/google/magika/tree/main/assets/models/standard_v2_0/README.md
Ideally, we have at least one "basic sample" for each of the supported content types (See /tests_data/basic/*
).
This issue acts as a call for action -- external help is very welcome!
Important aspects to keep in mind:
- Content types for which we have no samples yet should be prioritized. Among these, prioritize more common content types rather than niche ones.
- The "basic" test samples (in the
tests_data/basic/<content_type>/*
) are supposed to be "easy to recognize". In other words, the goal for these samples is to check that the model does a reasonable job with clear-cut samples, rather than corner-cases. - It's OK to group a bunch of test cases in a single PR.
- The PR should state the origin of each sample.
- The samples should NOT be taken from existing projects / online resources (in these settings, it would be very challenging to properly document the origin of these files); they should be manually written/created by the PR author.
Activity