Add auto-discovery for regression test datasets with --include-local flag #2312

jucor · 2025-11-25T18:39:22Z

Summary

This PR adds support for testing with local datasets that are not committed to the repository. This enables:

Confidential data testing - Run regression tests on private conversations without risking accidental commits
Large dataset testing - Test with larger conversations that would bloat the repository
Easy dataset addition - Just drop data in real_data/.local/ and it's auto-discovered, no config changes needed

Changes

Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/
Add --include-local pytest flag to include git-ignored local datasets
Add .local/ to .gitignore for confidential/large datasets
Simplify datasets.py with DatasetInfo dataclass and discovery functions
Add conftest.py with pytest hooks for dynamic test parametrization
Update download_real_data.py to default to .local/ with --commit flag
Add unit tests for dataset discovery in test_datasets.py
Update tests/README.md with new documentation

Usage

# Default: run with committed datasets only
pytest tests/test_regression.py

# Include local datasets from real_data/.local/
pytest tests/test_regression.py --include-local

Test plan

Run pytest delphi/tests/test_datasets.py to verify dataset discovery
Run pytest delphi/tests/test_regression.py with committed datasets
Add a dataset to real_data/.local/ and verify it's discovered with --include-local
Verify .local/ directory is properly git-ignored

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR modernizes the regression testing infrastructure by implementing auto-discovery for test datasets and adding support for local, git-ignored datasets. The changes enable developers to test with confidential or large conversation data without risking accidental commits, while simplifying the process of adding new test datasets—just drop them in real_data/.local/ and they're automatically discovered.

Key changes:

Auto-discovery mechanism that scans directories matching pattern <report_id>-<name>/ in real_data/ and real_data/.local/
New --include-local pytest flag to opt-in to testing with local datasets
Refactored datasets.py with DatasetInfo dataclass and discovery functions replacing hardcoded configuration
Updated download script to default to .local/ directory with --commit flag for public datasets

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
delphi/polismath/regression/datasets.py	Core auto-discovery implementation with DatasetInfo dataclass and directory scanning logic
delphi/tests/conftest.py	pytest hooks for --include-local flag, dynamic test parametrization, and dataset summary reporting
delphi/tests/test_regression.py	Removed hardcoded parametrization in favor of dynamic discovery via conftest.py
delphi/tests/test_datasets.py	Unit tests for directory pattern matching, file checking, and dataset discovery logic
delphi/tests/download_real_data.py	New positional arguments (report_id, dataset_name) with --commit flag to control download location
delphi/tests/README.md	Updated documentation explaining auto-discovery, local datasets, and new download patterns
delphi/pyproject.toml	Added local_dataset marker for pytest
delphi/.gitignore	Added real_data/.local/ to git ignore list
delphi/polismath/regression/init.py	Updated exports to include new discovery functions and DatasetInfo class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

delphi/scripts/regression_download.py

delphi/tests/test_regression.py

delphi/polismath/regression/datasets.py

delphi/tests/conftest.py

delphi/tests/download_real_data.py

delphi/tests/test_datasets.py

delphi/tests/test_regression.py

jucor · 2025-11-25T18:52:42Z

All Copilot review comments have been addressed:

In commit 30edbff:

✅ Use any() instead of bool(list()) in _check_files for better performance
✅ Add multiple match validation in find_file function
✅ Fix pytest.skip() during collection - use empty parametrize list instead
✅ Add directory context comment to documentation
✅ Remove unused pytest import from test_datasets.py
✅ Remove unused list_regression_datasets import from test_regression.py
✅ Rename TestDirPattern to TestDirectoryPattern for clarity
✅ Improve error message for missing dataset_name argument

In commit 80115f9:

✅ Add warning when local datasets shadow committed datasets with same name
✅ Add test for include_local=True behavior

jucor · 2025-11-25T18:56:21Z

Note: rebased all commits to sign them -- hence why the commits appear after the review. Same content, just added signature.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (9)

delphi/scripts/regression_download.py:363

The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:17
The usage examples reference download_real_data.py but the script file is actually named regression_download.py. Update the references to use the correct filename to avoid confusion.
delphi/scripts/regression_download.py:31
The examples reference download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:355
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:359
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:367
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:20
The usage example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename.
delphi/scripts/regression_download.py:26
The usage example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename.
delphi/scripts/regression_download.py:351
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR compdemocracy#2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

jucor · 2025-11-26T13:45:51Z

Ready for human review and merge :-)

- _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR compdemocracy#2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

ballPointPenguin · 2025-12-01T23:43:02Z

Suggested Updates here: jucor#3

* Optimize update_votes with vectorized pivot_table (5x speedup) Replace the row-by-row for-loop in update_votes with a vectorized pivot_table approach. This dramatically speeds up vote loading for large datasets. Performance on bg2050 dataset (1M+ votes, 7.8k participants, 7.7k comments): - Before: 18.5s average, 56k votes/sec - After: 3.5s average, 295k votes/sec - Speedup: 5.3x overall, 16x for the batch update step The optimization: 1. Use pivot_table to reshape long-form votes to wide-form matrix 2. Use DataFrame.where() to merge with existing matrix 3. Use float32 for intermediate matrix to halve memory usage Also adds a benchmark script at polismath/benchmarks/bench_update_votes.py for measuring update_votes performance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Vectorize _compute_vote_stats and make benchmark standalone - _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR #2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Remove misleading float32 conversion in update_votes Addresses GitHub Copilot review comments on PR #2313: - Removed float32 conversion that only provided temporary memory savings - The comment was misleading as savings didn't persist after .where() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Use vectorized pandas operations in benchmark loader Replace iterrows() with rename() + to_dict('records') for efficiency, as suggested by GitHub Copilot review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add timing logging for PCA and repness * Add benchmark script for repness * Add profiling to benchmark for repness * Vectorize vote count: 2x speedup on large convos * Extract common setup code * Rename vote_matrix to vote_matrix_df for clarity * Keep NaNs instead of None: 2x more speedup * Refactor conv_repness() to use long-format DataFrame Convert wide-format vote matrix to long-format using melt() and use vectorized pandas groupby operations instead of nested loops. Key changes: - Add compute_group_comment_stats_df() for vectorized (group, comment) stats - Add prop_test_vectorized() and two_prop_test_vectorized() for batch z-tests - Add select_rep_comments_df() and select_consensus_comments_df() for DataFrame-native selection, converting to dicts only at the end - Compute "other" stats as total - group instead of recalculating - Use MultiIndex.from_product() to ensure all (group, comment) combinations Test changes: - Add test_old_format_repness.py to preserve backwards compatibility tests - Add TestVectorizedFunctions class with 8 tests for new DataFrame interface 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Shorten imports as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docstring as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unused import as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Move profiler to within profiling function as per GH Copilot review * Remove unused import as per GH Copilot review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Profile new functions --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add db scaling, install datadog (#2147) * add db scaling, install datadog * add to example env * dd instrumentation * Update deploy-prod.yml * fix dd * more dd config (#2150) * stop dd agent * more dd config * more dd instrumentation * dd config add network * add log tags * delphi dd config * dd add report RUM * try new rum strategy * fix obj prop name * fix err superadmin * another superadmin fix * make collective statements scroll more good (#2163) * Te adjust collective stmt prmpt (#2167) * expand on object properties for LLM * prompt hardening * fix inversion (#2169) * enable full math tracing (#2171) * Better API server logging for Datadog (#2173) * Implement Datadog logging middleware and enhance error handling - Added `middleware_http_json_logger` for structured logging in production, replacing the default morgan logger. - Updated `app.ts` to conditionally use the new logger based on the environment. - Enhanced `globalErrorHandler` to log errors in a Datadog-friendly format, including HTTP method, URL, and error details. - Introduced `ddEnv` configuration in `config.ts` for environment-specific logging. - Updated logger configuration to support both development and production formats. * small cleanup * devMode convenience var * Updates the topic agenda component to use conversation_id prop directly instead of accessing it through the conversation object. Fixes bug where conversation_id is not included in the POST request. (#2174) * Update and fix e2e tests (#2176) * minor update; lint * include AUTH_DOMAIN and AUTH_CLIENT_SECRET in env examples; rename AUTH0 vars to generic * replace console with logger * formatting * include ADMIN_UIDS in more configurations * safely parse ADMIN_UIDS json * generic OIDC language * repair report-authentication test * init dynamodb tables in test env * env vars to determine DD usage in client-report * restore deleted tests * allow moderator or seed comment auto approval * add rebuild-server to makefile * improved comment tests * auto-approve seed and moderator comments * remove unused jigsaw key * upgrade cypress and faker; fix xid test * fix int test * improve oidc test reliability * fix client-report tests --------- Co-authored-by: tevko <tim@devzero.io> * pass include moderation arg (#2178) * fix dynamo hardcode * More Test fixes and small improvements (#2181) * improve participant insertion vs race conditions; minor tweaks to logging and next comment selection * improve e2e OIDC checks for stability in CI test suite * Tree Invite Updates and Fixes (#2182) * integration tests for treevite * Invite improvements and Fixes; Invite CSV Download --------- Co-authored-by: Tim <timevko@gmail.com> * remove hardcoded region values (#2184) * Make psql shell (#1627) * Add psql-shell task to makefile * `make psql-shell` now uses env values, and quits if POSTGRES_DOCKER is not `true` * ensure compose-file args for `make psql-shell` --------- Co-authored-by: Bennie Rosas <bennie.rosas@blvd.co> Co-authored-by: Bennie Rosas <ben@aliencyb.org> * Improved Topic Naming (#2185) * Use pseudo-random comment selection for topic naming; Improved ollama topic naming prompt. * remove prompt_prefix; formatting. * TOPIC_NAMING document * Te euro cdk prep (#2187) * add euro deploy scripts and update aws action * disable temp nginx * multi stage building for action * actions fix * fix script typo * add appspec-euro * appspec fix * fix typo * another typo fix * final path correction * another typo update * stop nginx so docker can take over * update static assets deploy * remove environment * automated db backups (#2199) * automated db backups * fix handler call * add lambda layer * update lambda layers for pg_dump in lambda capabilities * typo fix * delete and rotate * add region arg to job poller setup * no dd trace in euro * fix hardcoded region defaults * fix dynamo table create conflict * viz logic fix * fix another default region err * Client Admin : Responsive Design and other Improvements (#2202) * client-admin minor pkg updates * normalize component names; remove dead code * remove d3-scale * email is not an ADMIN UID * client-admin don't run simple analytics in dev * clean up dead reducers * auth helpers and unified user state * upgrade legacy components * eslint cleanup * ZidMetadataProvider * Pro gating for Topic Mod * handle conversation permission at the top level; bug fixes * rename zid_metadata to conversation_data * rename some more components and tests * theme ui recommendations doc * repair client-admin tests * Add lots of test coverage * fix delphi check * better responsive and mobile design * Update fixed widths for responsive * VictoryTheme more responsive * Consolidate topic-moderation styles * enhance theme with mobile-first tokens * update and normalize color palette * Improve TopicMod style, but hide it for now; Show "alpha" url when treevite is enabled * docker-compose test fix * git file renames * rename tos -> TOS * test mock fix * Add some clarity to authUser vs contextUser * Improved ReportsList with expandable list of URLs * minor pkg updates * fix tests * improve ract-condition protection in comment creation * ReportsList: Remove Comment Report * client-admin test reliability * Bump torch from 2.3.1 to 2.8.0 in /delphi (#2142) Bumps [torch](https://github.com/pytorch/pytorch) from 2.3.1 to 2.8.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](https://github.com/pytorch/pytorch/compare/v2.3.1...v2.8.0) --- updated-dependencies: - dependency-name: torch dependency-version: 2.8.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim <timevko@gmail.com> * Bump axios from 1.10.0 to 1.12.2 in /server (#2200) Bumps [axios](https://github.com/axios/axios) from 1.10.0 to 1.12.2. - [Release notes](https://github.com/axios/axios/releases) - [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md) - [Commits](https://github.com/axios/axios/compare/v1.10.0...v1.12.2) --- updated-dependencies: - dependency-name: axios dependency-version: 1.12.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim <timevko@gmail.com> * Bump nodemailer from 6.10.1 to 7.0.7 in /server (#2209) Bumps [nodemailer](https://github.com/nodemailer/nodemailer) from 6.10.1 to 7.0.7. - [Release notes](https://github.com/nodemailer/nodemailer/releases) - [Changelog](https://github.com/nodemailer/nodemailer/blob/master/CHANGELOG.md) - [Commits](https://github.com/nodemailer/nodemailer/compare/v6.10.1...v7.0.7) --- updated-dependencies: - dependency-name: nodemailer dependency-version: 7.0.7 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump vite from 6.3.5 to 6.3.6 in /client-participation-alpha (#2161) Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.3.5 to 6.3.6. - [Release notes](https://github.com/vitejs/vite/releases) - [Changelog](https://github.com/vitejs/vite/blob/v6.3.6/packages/vite/CHANGELOG.md) - [Commits](https://github.com/vitejs/vite/commits/v6.3.6/packages/vite) --- updated-dependencies: - dependency-name: vite dependency-version: 6.3.6 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim <timevko@gmail.com> * Fix dev environment startup (#2211) * Revert "Bump torch from 2.3.1 to 2.8.0 in /delphi (#2142)" This reverts commit a7a060b8b63372141a6b092848d4823b8d8b9c0d. * Move clojure math env to dev instead of prod Set `MATH_ENV=dev` instead of `=prod` in `example.env`. This avoids an infinite reboot loop of the clojure worker due to failing to load Datadog profiler -- which is skipped in development environment. * Start notes to get running @ballPointPenguin has asked me to let him know whether `make start` works as intended. Documenting here the steps needed to make it work :) * Describe fix for login problem * Fix login failure due to missing hostname in certificate * Remove explanatory notes to make a clean commit As discussed with @ballPointPenguin --------- Co-authored-by: Julien Cornebise <julien@cornebise.com> * count default votes for bulk-upload seeds (#2213) * support markdown in cpa (#2218) * simplify email service (#2210) * simplify email service * begin testing, remove maildev, add ses-local * docker fix * swap mail docker container * use env var * fix typo * update logs, add export * debug update * add exports * succinct var passing * more config fixes * add from email * fix test helper * more email helper updates * partial revert * debug logging * obj prop change * store refactoring * debug cleanup * add back jsdoc * clarify test environment (#2215) * prodclone dev workflow; db update (#2216) * helpful db scripts in ./bin * make start-prodclone workflow * avoid running datadog in math for local/dev * Remove narrative report from menu (#2217) * remove narrative report link * improve test reliability * donate message (#2223) * bang head against wall * change verbiage * finally get backbone right * include importance data in comments and votes data exports (#2224) * include importance data in comments and votes data exports * fix importance export tests * temporary disable topical comment routing for perf (#2222) * consolidate comment cluster query logic; optional cache (#2229) * consolidate comment cluster query logic; optional cache * re-enable topical comment routing * hotfix * pin docker compose version * better compose pinning * Te delphi ux (#2230) * begin in progress job ux * remove nested ternarys * cleanup * remove unused * BUGFIX: actual comment_ids must be used (#2233) * BUGFIX: actual comment_ids must be used * use distance to centroid for representative topic comments * Te delphi ux (#2235) * begin in progress job ux * remove nested ternarys * cleanup * remove unused * pass var correctly * move after_install block (#2237) * better filter pattern (#2239) * fix query (#2241) * Te delphi ux 5 (#2243) * add debug logging * fix math bug * enable pagination (#2245) * remove log * Delphi package and env management (#2228) * made Makefile faster and compatible with os x (#2232) * update to patched version (#2249) * update to patched version * make generate-requirements --------- Co-authored-by: Bennie Rosas <ben@aliencyb.org> * Te delphi ux logs (#2247) * remove form * promote delphi and show users how to generate reports * better messaging * fix test * change link * add donate link * participant-importance report (#2248) * participant-importance report * test fixes * Update client-admin/src/util/auth.js Co-authored-by: Tim <timevko@gmail.com> --------- Co-authored-by: Tim <timevko@gmail.com> * better messaging during batch report phase (#2252) * fix reset_conversation bug (#2254) * change message success text (#2256) * use modal for delphi run confirmation (#2258) * use modal for delphi run confirmation * css * formatting * add embedded donate page and change links (#2264) * Visualise participation (#2262) Co-authored-by: Julien Cornebise <julien@cornebise.com> * Pagination for comments in Admin Moderation view (#2263) * enable pagination for get-comments * client-admin moderation pagination * server api comment pagination * comment pagination tests * Parameterize Delphi path (#2266) Before this PR, the Delphi python codebase had hardcoded paths to `/app/` that made it difficult to run in different environments or directory structures, especially for local development and algorithmic/data analysis. This PR introduces the optional environment variable DELPHI_APP_PATH, which, if specified, overrides `/app`. * Update README docs with cert and key generation steps (#2271) * pin node version to 24 (LTS) (#2270) * Add instance type `dev` to process all sizes (#2267) * Add instance type dev to process all sizes Especially useful for local dev instances where we don't want to limit resources. * Set INSTANCE_SIZE to 'dev' for local setup Update INSTANCE_SIZE for local development. * Change instance type check from 'omnipotent' to 'dev' * Refine comments on Delphi instance size configuration Updated comments for clarity regarding Delphi instance size. * Speed up NamedMatrix updates between 40x and 200x (#2268) * Factorize named matrix vote normalization option The tests are also fixed, while keeping the same behavior as before. Weirdly, update() does not normalize the values being set, whereas batch_update() does. And _convert_to_numeric() keeps NaN values as NaN, whereas batch_update() converts them to 0.0 by default. This is not very consistent, but I have kept the same behavior for backward compatibility. Since not all Delphi tests are passing, I could not verify whether other parts of the pipeline depend on this behaviour. * Speed up named matrix computation Keep both behaviours in this commit, for comparison and to log a speed report. Will remove it before pull-request. * Add deep test and remove speed up comparison This concludes the refactoring. * Apply copilot spelling corrections * make commands: refresh-db, refresh-devdb, refresh-prodclone (#2272) * make commands: refresh-db, refresh-devdb, refresh-prodclone * Ensure make refresh-* db works as intended * Use Python 3.12 to regenerate requirements.lock; minor updates (#2278) * Use Python 3.12 to regenerate requirements.lock; minor updates * configure python version 3.12.x and pip version < 25.3 * add ref to github issue * Factorize Dynamodb deletions for readability (and log their timing) (#2275) * Refactor dynamodb deletions In the first step of the pipeline, where we delete any previous data, we had a *lot* of duplicated code. Factored all the common bits to make it simpler to understand. * Add timing info to dynamoDB writes * Fix off-by-one page count on logging * Move import to top of file * Minor defensive fixes * Robustify data diagnostics (#2277) * Add test for multiple updates to same cell in one batch This will be handy when I change how we do the updates to the matrix. * Log when no new votes are here Useful to debug. * Speed up and display memory usage * Display duplicate statistics and make graph optional * Replace list by generator in sum (copilot) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix typo (copilot) * Test for NaN/NA behaviour in update and batch_update * Set up tests that match legacy behaviour Note: they are failing right now. I will next implement that legacy behaviour. * Implement legacy behaviour --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * remove scans (#2280) * add interstitial and banner (#2269) * add interstitial and banner * test ixes * another test fix * comment out banner, add images * Te delphi py tests (#2285) * Add type hint in some poller functions * Extract common function to utils file That function was defined 3 times in 3 different files. * Create script to download real data for tests This is useful if no folder `real data` was provided. I suspect these tests were written with a `real data` folder already in place. I do not have it, therefore we need to download it. See the `README` file that has been updated. * Refactor real_data loading Remove duplication, allow for automatic finding of the files within a location, allow for generalisation to other conversations than the two used so far. * Print whether comment priorites are missing from test data * Fix path... * Clarify terms in messages and comments * Fix buggy test that blocked pytest collection The `test_batch_id.py` was running code at load time, and that code had an error, thus crashed during pytest collection, preventing all tests from running. By refactoring into a proper test function, pytest can now collect all tests and run them. We also fix the error itself, which was a missing escape of the "scan" reserved word in DynamoDB. * Fix direct conversation test - Convert to proper pytest format, not standalone script - Use fixtures for setup/teardown - Warn it is test to check Conversation class instantiation and method calls - Replace prints by logging - Parametrize the test to run over all available real_data - Add some dimension and attributes assertions - Rename to test_conversation_smoke.py * Refactor direct_pca_test.py to test_pca_smoke.py with pytest structure Converted legacy procedural test script to proper pytest: - Class-based structure with TestPCAImplementation - Parametrized tests for all datasets - Fixtures for vote matrix loading - Proper logging instead of prints - Smoke test warning (no correctness validation) - Tests: runs without error, projection statistics, clustering Tests PCA functions directly (not through Conversation class). * Clarify the naming of PCA test files and remove redundant tests * Ignore warning from library ddtrace in pytest * Refactor repness smoke test Similar to how we refactored the "direct PCA" tests * Rename test_repness.py to test_repness_unit.py for clarity Rename to clarify that these are unit tests with synthetic data, following the same naming convention established for PCA tests: - test_repness.py → test_repness_unit.py (unit tests, synthetic data) - test_repness_smoke.py (real data, smoke tests - already renamed) - test_repness_comparison.py (Python vs Clojure - already clear) This mirrors the PCA test structure: - test_pca_unit.py (unit tests) - test_pca_edge_cases.py (edge cases) - test_pca_smoke.py (smoke tests) All 14 tests pass: - Statistical utility functions (z-scores, proportion tests) - Comment statistics calculation - Representative comment selection - Consensus selection - Integration tests (conv_repness, participant_stats) * Refactor test_repness_comparison.py to proper pytest structure Similar to pca tests, refactor test_repness_comparison.py - Converts test_comparison() function to TestRepnessComparison class - Uses @pytest.mark.parametrize for multiple datasets - Proper fixtures for clojure_results, conversation, python_results - Two test methods: test_structural_compatibility and test_comparison_visibility - Replaces print() with logging.info/debug - Adds warning that results are known to be very different - Reports comparison results for visibility without asserting on match rates - Maintains comparison functionality for manual inspection Test results: 4 tests passed (2 datasets × 2 test methods) * Add assert failure messages * Exclude Conversation serialization tests Until https://github.com/compdemocracy/polis/issues/2284 is resolved * add action * update action * update action 2 * update action 3 * use env for data script * fix all tests * fix action * actions update 2 * add delphi service to test * update action again * another actions fix * action fix again * another actions fix again again again * actions - mount volume tests * actions - change baseUrl * remove pg check * change healthcheck - actions * remove pg check again - actions * try more robust action -- actions * use pol.is baseurl -- actions * add real data, update action * remove duplicate data * ensure dynamo tables created * update region * add access keys * shared, test db * freup space * update other action * build dependency * add back in removed test, commented out * comment stuff out --------- Co-authored-by: Julien Cornebise <julien@cornebise.com> * add more tests * change import paths * revert bad path changes * add coverage report * fix indentation * fix action file * add coverage * fix action * fix cov location * update sourceDir * remove coverage * update action * update action correctly * fix actions syntax * fix actions syntax * fix actions syntax * add coveragerc * add back export path * update action, pass polismath explicitly * change coverage detection strategy * more config * try removing coverage path * fix coveragerc * fix coveragerc * add to pyproject * add to pyproject * remove mention of .coveragerc * remove reference * attempt path mapping * slight config change * copy config during build * remove tool section * create .coveragerc inside action * last try * ok one more try * last try for real * one more try final v2 * more config adjustment * one more ocnfig update * try all in one container * fix package name * almost there * try better formatting * separate script * add another test * import sys * move script into container * clarify path * fix db connection * clarify env * fix path * update env * fix test * fix test again * stub data * still fixing test * remove nonexistent key * schema fix * try db commit * pakistan approach * another try * use mock data instead * path fix * fix id * fix field names * fix dynamo calls in test * switch to scan * relax test assertions * more relaxed tests * Admin - Participant Management (#2279) * remove deprecated conversation fields * add GET all_conversations route * superadmin all-conversations view * Participant Management WIP * refactor xid logic; show xid list with pids in client-admin * new xid tests * Enable XID Upload * show xid vote_count * block non-xid participants when xid is required * update some internal naming from "whitelist" to "allow list" * xid arg not needed in votesPost * fix test * participation-management e2e * upgrade cypress * fix e2e test * update alpha client with xid concerns * normalize message; fix test * rebuild astro * relax tests further * ignore pakistan test * Update pip-tools and Delphi build (#2299) * update pip-tools; remove pip version restriction; update requirements.lock * simplify Dockerfile; remove unused `IS_GITHUB_ACTION` conditional * update cypress config to not use `IS_GITHUB_ACTION` * conditionally use cpu-only torch libs in test builds * Fix run_math_pipeline test import to use proper package path (#2308) * Fix run_math_pipeline test import to use proper package path The test file was importing `from run_math_pipeline import main` which failed locally because `run_math_pipeline.py` lives inside the `polismath` package at `delphi/polismath/run_math_pipeline.py`. CI was working around this by copying the file to a flat location: docker cp delphi/polismath/run_math_pipeline.py delphi:/app/run_math_pipeline.py This created a discrepancy between local and CI environments. The fix: 1. Update test imports to use the correct package path: `from polismath.run_math_pipeline import main` 2. Update mock.patch paths to match: `mock.patch('polismath.run_math_pipeline.fetch_comments', ...)` 3. Remove the CI workaround that copied the file to /app flat 4. Simplify coverage to `--cov=polismath` (run_math_pipeline is inside it) The Docker image already has `polismath/` at `/app/polismath/` and the package is installed via `pip install --no-deps .`, so the proper import path works in both local and CI environments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve CI coverage reporting reliability Changes to the CI workflow: 1. Print coverage report to workflow logs (always visible) 2. Upload coverage report as downloadable artifact 3. Make PR comment step non-fatal with continue-on-error: true (fork PRs cannot post comments due to GitHub token restrictions) Coverage is now accessible three ways: - In the workflow logs (step 7) - As a downloadable artifact (step 8) - As a PR comment when permissions allow (step 9) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add graceful error handling for coverage comment on fork PRs Instead of showing an unhandled error when posting coverage comments fails on fork PRs, the script now catches the 403 error and displays a helpful message explaining: - Why the comment could not be posted (GitHub token permissions) - Where to find the coverage report (logs and artifact) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * Replace NamedMatrix by DataFrame and add regression tests (#2282) * Merge Squashed onto `edge`: commit 7f14aedafed4fea97c993d7996853407cba7f7dd Merge: 93a2d313 780f1298 Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 20 15:50:04 2025 +0000 Merge commit '780f1298ca7d72b9717f6aa38526301305e520e8' into replace_named_matrix This will allow CI to run correctly. commit 93a2d313e5cc25a4be336b1f4de33aa5d331a579 Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 21:10:15 2025 +0000 Recompile requirements.lock to include natsort commit 0fd37344ca160c0a296e9af4aaec0d889516191f Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 15:01:16 2025 +0000 Update golden records Now that we have changed behaviours of matrix in terms of ordering and of types, we need to update the golden records to reflect these changes. commit 08d2383841687d6345d1a620646eccfd24c4c75c Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 15:01:04 2025 +0000 Fix regression bugs from package reorganization due to hallucinations During refactoring to polismath.regression package, introduced bugs by hallucinating non-existent methods and changing behavior without checking the original code (commit afb8525a). Fixed: - prepare_votes_data(): Restored CSV columns ('voter-id', 'comment-id') and vote dict keys ('pid', 'tid') instead of hallucinated alternatives - compute_all_stages(): Restored actual methods (update_votes(), _compute_pca(), _compute_clusters()) instead of hallucinated ones (process_votes(), compute_pca(), compute_clustering()) - compute_all_stages_with_benchmark(): Restored original implementation - get_dataset_files(): Restored original dict keys ('votes', 'comments') instead of changed keys ('votes_csv', 'comments_csv') - load_golden_snapshot(): Restored golden_path computation logic - Numpy type handling: Added custom JSON encoder to preserve numeric types and extended comparer to treat Python/numpy numeric types as compatible commit 334c01b2f09ab321d558d10995b3144c18ec5d8d Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 14:11:31 2025 +0000 Reorganize regression testing into dedicated polismath.regression package - Split monolithic regression.py (1167 lines) into focused modules: - recorder.py: ConversationRecorder class - comparer.py: ConversationComparer class - datasets.py: Dataset configuration (moved from tests/) - utils.py: Shared utility functions - Clean architecture: No backwards dependencies from production to tests - Updated all imports in CLI scripts and test files - Regression testing now treated as first-class production feature This improves code organization, maintainability, and makes the regression tools suitable for use in production environments (monitoring, validation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> commit afb8525a5ce0e7ace2a7feeb0aae935d78f2333a Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 13:47:11 2025 +0000 Improve logging throughout regression testing system - Replace all print statements with proper logging calls in polismath/regression.py - Use logger.info() for progress updates and results - Use logger.warning() for comparison mismatches - Use logger.debug() for detailed diagnostic information - Make PCA debug output conditional on DEBUG log level - Only save debug JSON files when logger.isEnabledFor(logging.DEBUG) - Move debug outputs from current directory to .test_outputs/debug/ - Add --log-level CLI argument to regression scripts - Support DEBUG, INFO, WARNING, ERROR, CRITICAL levels - Default to INFO level - DEBUG level enables PCA debug file generation - Fix conversation module's logging initialization - Check logging.root.handlers instead of logger.handlers - Prevents duplicate handlers when logging is externally configured - Simplifies logging setup in CLI scripts The regression tools now provide full control over logging verbosity, making it easier to debug issues (with DEBUG) or run quietly (with WARNING/ERROR). 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> commit 87f8cb24803cb5a14efa3389673a24a5708fa054 Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 13:14:10 2025 +0000 Reorganize regression tests and consolidate test outputs - Move golden snapshots to dataset folders (real_data/{dataset}/golden_snapshot.json) - Relocate regression library from regression_tests/ to polismath/regression.py - Move CLI tools to scripts/ with clearer names (regression_recorder.py, regression_comparer.py) - Mark Clojure comparison tests as legacy with 'legacy_' prefix - Consolidate ALL test outputs in hidden .test_outputs/ directory: - Regression outputs → .test_outputs/regression/ - Python implementation outputs → .test_outputs/python_output/{dataset}/ - Keep real_data/ clean with only source data and golden snapshots - Fix path resolution bugs and unknown dataset handling in regression system - Update documentation and simplify .gitignore This reorganization clearly separates: - Source data and golden snapshots (real_data/) from temporary outputs (.test_outputs/) - Standard Python regression tests from legacy Clojure comparisons - Core libraries (polismath/) from CLI tools (scripts/) commit a947c5a8ee19ce91c6b2bb55a398e334a7b5b3ec Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 12:08:03 2025 +0000 Process appropriate RunTimeWarning in correlation tests The fourth row of the test matrix is intentationally constant, which causes a RuntimeWarning when computing correlations. This commit updates the test to properly handle this warning using the warnings module, ensuring that the test suite runs cleanly without unhandled warnings. commit b6fbc09c7e412503272a3c3e85a49185e93e70b6 Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 11:56:16 2025 +0000 Skip failing Clojure regression tests It's OK for now, as we want Delphi to stand on its own. commit d8cb94262c47eddc3a05debb1a3991d85d9124df Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 11:49:45 2025 +0000 Remove hardcoded paths fed to Claude commit 8dca87dba5017434482a110f4c3db6fcef4f2742 Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 11:48:03 2025 +0000 Factorize the clojure comparison and pipeline tests A lot of code was redundant and there was little separation of purpose between the clojure comparison logic and the pipeline tests. This change factorizes the clojure comparison logic into its own module and simplifies the pipeline tests. commit 4622440583b579264464f1becbda5f74cd3f2d62 Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 11:07:57 2025 +0000 Fix output of full pipeline test commit a274b8a4717ac6bfa0e19f4b9f340bbc087ceaa2 Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 11:05:11 2025 +0000 Refactor comparison to Clojure results commit b94a6c135768d712212232618e57aab3a934076b Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 09:47:07 2025 +0000 Preserve original data types and uses natural sorting. Makes for a much clearer output. Will need to uppdate the golden record. All tests passing. commit 7c6412b0249e27dede382e377caac7401cd032af Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 09:39:30 2025 +0000 Add test for natural sorting order before implementing commit e06f0ebad265ac58b5adb5d92400d5679e7c9159 Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 18 09:14:17 2025 +0000 Match old sorting and conveting behaviour commit e5f47cd56278eb82587517e60ae48e7c999b47c1 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 15:31:59 2025 +0000 Comment out BG2018 report for tests commit cdc238c27e0357cd5c92436e9fcaf8465c901959 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 15:05:00 2025 +0000 Remove every mention of NamedMatrix commit dacd95a42ce3f0067e92048f25aee37c5b1e6784 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 14:49:18 2025 +0000 Restrict pytest regression test to VW dataset only for speed commit b657c870245eb758ac5b090fae60b7e4e23e1469 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 14:43:41 2025 +0000 Vectorize matrix clean-up commit 4bb11b514763a5ec9eb6dca6c85cb299c0d9bf28 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 14:39:00 2025 +0000 Fix bug in PCA that caused different results Found the bug ! (With Claude Code's help) The PCA code starts by "cleaning" the matrix with some replacement rules for NaN and strings. Then it proceeds to compute the PCA on that cleaned up matrix. Great, I've done the cleaning, and done it in-place for efficiency, since the matrix is cleaned up first thing in the code and the unclean one therefore not used. Right ? ... RIGHT ?? *It turns out*, hidden way below, the projection of the participants on the *low-dimensional space is (intentionally) done *on the non-cleaned matrix* !! *(TODO : I'll have to put my math thinking cap on understand exactly why it was *coded like that...) Adding "copy=True" in one built-in invocation solved it. This version here also restored the loop-in-loop cleanup code. My next commit will clean it up. commit 8da68f3b429fa9a269fddfdfcfce2677713ebe25 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 13:35:27 2025 +0000 Try but fail to mimic the old handling of strings and NaNs commit 076535785edf0430a51f693b755f60ace9c12b00 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 13:32:15 2025 +0000 Add a sanity check test for matrix cleaning functions Compare old and new way of doing things, to spot differnces. commit b5ca83133a2ae7cf15c6d26715386251f7f2432f Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 10:19:51 2025 +0000 Print differences in regular order Set operations are unordered... commit 762dcaaa0a945e5fdf84fe12b18bd31d3084c861 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 10:07:27 2025 +0000 Order lexicographically (by str) upon moderation commit dde8d6fb744b3f9a349e0b016ce7d0c0ab422349 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 10:07:05 2025 +0000 Store actual computation results commit e9a231af1952b23d3317a4eb10713535d044d9b1 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 10:02:09 2025 +0000 Test ordering to match pre-NamedMatrixectomy ordering commit fd592f72302a08c32a588190312dae81c691b399 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 17 09:28:40 2025 +0000 Save computed JSON for outside comparison Also create a symlink to the latest, for ease of opening without having to read timestamps. commit 3392f442dc32feb7e10e4386a58ff40ced6ad38c Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 14 15:22:06 2025 +0000 Sort comment ids and participants Sort the comment ids and participant ids using natsort to ensure consistent ordering. Not sure why things need to be ordered, but it is probably less surprising this way. As a bonus, our indices can now be any type instead of being force-converted to strings. commit 27b4ccac63430e88fa78ed6b538489d72dac1f37 Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 14 13:52:21 2025 +0000 Remove python output that is dynamically generated during tests commit c056ef61b0be291b494beaf7ee0f5901208ee030 Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 14 13:49:09 2025 +0000 Remove duplicate files commit 81b3d9d6ad26bcf7f8b045169c94a9e735091c84 Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 14 13:46:33 2025 +0000 Rename folder to new name commit 7c2cc3fda8a68e55b08b6ca2707e10df78ecb510 Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 14 13:42:51 2025 +0000 Fix trailing comma commit 020ae42fd59febad1722e66fdc48436a27b44c5e Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 14 13:39:53 2025 +0000 Correct spaces to avoid false positives in git diffs. commit 3866da55a343f3fe990f5de62362a92ba705a5ec Merge: 9bbdc49a 2081ed8b Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 14 12:26:08 2025 +0000 Merge remote-tracking branch 'upstream/edge' into replace_named_matrix A lot of merge conflicts due to this branch having merged changes earlier that were merge-squashed into upstream/edge since then. commit 9bbdc49abbf5de0a9bf665912d13af2b0d747f34 Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 20:06:08 2025 +0000 Pass all unit tests without NamedMatrix commit 3fbeee693e1c73c2e93bae426352d84988ec7edd Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 20:03:45 2025 +0000 Remove python output This python output is overwritten each time the tests are run, and should not be committed. commit 1392b6543103d52e67b102fece0958ad242d063e Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 19:46:56 2025 +0000 Pass correlation tests without NamedMatrix commit 6b47f00d3e9716669ee955d35b4de5237ba0c583 Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 19:41:39 2025 +0000 Pass Clustering tests without NamedMatrix :) commit dbe197b1b520153b2d07f09d7830ef87139edf91 Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 19:38:12 2025 +0000 Pass all PCA unit tests PCA now works without NamedMatrix ! commit 13b8395dbb548f81f1dfa39caf64174310e76762 Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 19:18:24 2025 +0000 Replace NamedMatrix by DF in corr. clusters, and repness This passes test_conversation.py ! commit 2d1a6f7ab41e5b6f3838588075781b7e860c4060 Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 19:02:01 2025 +0000 Revert "Skip a warning generated by boto3 about datetime.utcnow being deprecated" This reverts commit 80be8bc7df085f39f29b67322d975415e21bc62e. commit 11f18b21066149d039d1ed6e912de81b5d10239c Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 19:01:11 2025 +0000 Replace NamedMatrix by DF in conversation.recompute() This means also applying to pca and clustering! commit 7ac7b2508486f82292facffab10b473cf01ee51b Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 15:47:37 2025 +0000 First replacement and first test to pass Replace NamedMatrix by DataFrame in - conversation.update_votes() - conversation._get_clean_matrix() - conversation._apply_moderation() and modify test_conversation::test_init commit 80be8bc7df085f39f29b67322d975415e21bc62e Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 13:01:24 2025 +0000 Skip a warning generated by boto3 about datetime.utcnow being deprecated commit 767a2d2349a3d8c965b95e3580801509afe01d79 Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 12:48:53 2025 +0000 Add BG2018 and rename for clarity + Replace DB connection by URL commit b80bd078d31a282b07c0705a61dfcea37725fcca Merge: 58bd636a 52a458ac Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 13 09:35:31 2025 +0000 Merge remote-tracking branch 'upstream/te-delphi-py-tests' into replace_named_matrix commit 52a458aca1daf51b3c8c117b9013a2462a4381db Author: tevko <tim@devzero.io> Date: Wed Nov 12 21:55:27 2025 -0600 build dependency commit e26b482e2cc69d5c0df8d5cfdee9dc788be5b428 Author: tevko <tim@devzero.io> Date: Wed Nov 12 21:41:04 2025 -0600 update other action commit 58bd636aab3960499153a60d66ddc1ebef0cf6fb Merge: e00d970a b4275829 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 22:54:15 2025 +0000 Merge branch 'te-delphi-py-tests' into replace_named_matrix commit e00d970a4f9d5b9ef5cf12659aa2b1c159b3e31b Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 22:43:17 2025 +0000 Allow comparison to be run on multiple datasets commit 1e48a4a53dbdb04733c50fd21717bdfc71df6228 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 22:16:45 2025 +0000 Add BG2050 and BG2018 datasets to dataset_config.py commit d99c79e002b10aa1ed92f5bb4bd0693b7a0eb79c Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 22:04:55 2025 +0000 Improve downloading of real data - Check if data already exists before downloading - Add option to force re-download of data - Change paths to include dataset name if known - Download all datasets from the config, by default - Add progress bars commit 46eac1be5c6f0bc0af540f6c32da9e4b097f4513 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 22:00:15 2025 +0000 Allow to run tests in parallel Particularly useful now that we have multiple tests. commit f43cbc051014b508bc02d07c6979764b81ad20b4 Author: tevko <tim@devzero.io> Date: Wed Nov 12 15:52:48 2025 -0600 freup space commit 26f023c2ddc49cdb65c5bfccaf772949bf5cf635 Author: tevko <tim@devzero.io> Date: Wed Nov 12 15:42:21 2025 -0600 shared, test db commit fb6beebe92c467b1530e7e6d792cd916bfd08159 Author: tevko <tim@devzero.io> Date: Wed Nov 12 15:39:07 2025 -0600 add access keys commit ad85a95162da7d4d8d7f6a376c852a057679dc82 Author: tevko <tim@devzero.io> Date: Wed Nov 12 15:22:00 2025 -0600 update region commit aa69f3f50b091bf49262e642a14451a3f1349a7c Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 21:09:39 2025 +0000 Configure VScode to run pytest on the regression tests commit 457125e1355d1ac2f578da4fdc987c6c86c7cea2 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 21:06:31 2025 +0000 Run regression tests on all available datasets by default commit 44ef75c2f9c85c3052938c47f37b5e8fc6370b47 Author: tevko <tim@devzero.io> Date: Wed Nov 12 15:02:30 2025 -0600 ensure dynamo tables created commit 29ee422bc0a37a2174f7b0bb4ed8969448689cb1 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 21:01:54 2025 +0000 Fix regression tests' logic The golden tests should not be generated by the tests, but by the developer once. commit 695ede06c2cc7f00b36f0732950dd341c676430e Author: tevko <tim@devzero.io> Date: Wed Nov 12 14:47:09 2025 -0600 remove duplicate data commit ae175168f1d30324b9843f5ce8452160ae47e298 Author: tevko <tim@devzero.io> Date: Wed Nov 12 12:37:03 2025 -0600 add real data, update action commit 1ef3c064a6c9f1738e49cd02a9585714429b3e47 Author: tevko <tim@devzero.io> Date: Wed Nov 12 09:33:40 2025 -0600 use pol.is baseurl -- actions commit f1ce4aa9b5200b7419352d3841c13817672fdc37 Author: tevko <tim@devzero.io> Date: Wed Nov 12 09:15:10 2025 -0600 try more robust action -- actions commit a5b4e6cf5f5c1a143d873a209672f5d8efe4c91c Author: tevko <tim@devzero.io> Date: Wed Nov 12 09:04:53 2025 -0600 remove pg check again - actions commit 966a551152cedcff20d9698f53b381aa4d4d59ab Author: tevko <tim@devzero.io> Date: Wed Nov 12 08:57:02 2025 -0600 change healthcheck - actions commit 585cd2b3eb6bb360acef04d29fd15b5cc163537a Author: tevko <tim@devzero.io> Date: Wed Nov 12 08:37:39 2025 -0600 remove pg check commit dd1c49225c295a1e27d214b29a15871c421bc426 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 14:33:33 2025 +0000 Improve benchmark: 3 runs, statistical test commit 55d39a2a242b01384d1434e03570f3cc4f33aa01 Author: tevko <tim@devzero.io> Date: Wed Nov 12 08:24:10 2025 -0600 actions - change baseUrl commit 977813708ea9be794b2caa6384dc5908b4e9c249 Author: tevko <tim@devzero.io> Date: Wed Nov 12 08:07:00 2025 -0600 actions - mount volume tests commit 017d6863a7166c22589a836ceaa3c7b00e983246 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 13:45:15 2025 +0000 Refactor regression test and add basic benchmark commit d90dd3cc63f472171f637ab7cee42dd3514c8129 Author: Julien Cornebise <julien@cornebise.com> Date: Wed Nov 12 09:41:30 2025 +0000 Fix comparer and recorder to properly record and compare Saves PCA, clusters, etc commit b42758299c396df2f716100b71d7f15a70d28cf1 Author: tevko <tim@devzero.io> Date: Tue Nov 11 21:40:00 2025 -0600 another actions fix again again again commit 411c75108ea5e45b5b0f211ec08cd3e9aa273106 Author: tevko <tim@devzero.io> Date: Tue Nov 11 21:29:13 2025 -0600 action fix again commit d9421e1bd038ca6de9e481dbd71e358dabaa2f7e Author: tevko <tim@devzero.io> Date: Tue Nov 11 21:19:04 2025 -0600 another actions fix commit be7bfbbb85c8dcadce7ff0f23ff551066bc7ce9c Author: tevko <tim@devzero.io> Date: Tue Nov 11 21:10:20 2025 -0600 update action again commit dac822b23d5032d7ba3c56fa3cc3078f503a29e5 Author: tevko <tim@devzero.io> Date: Tue Nov 11 21:09:44 2025 -0600 add delphi service to test commit 2b3aa6c0677394e5faaa813cc7526b8a0664ec82 Author: tevko <tim@devzero.io> Date: Tue Nov 11 20:27:46 2025 -0600 actions update 2 commit 483c5b47d937c4dae1e35f62b9794d94102ab298 Author: tevko <tim@devzero.io> Date: Tue Nov 11 20:15:37 2025 -0600 fix action commit 008bd2202547b48f05b7a866b1ca532036225947 Author: tevko <tim@devzero.io> Date: Tue Nov 11 20:07:15 2025 -0600 fix all tests commit 09edb21c4b7a8c6be02c7ac6b57c2631bd694e3a Author: tevko <tim@devzero.io> Date: Tue Nov 11 16:30:06 2025 -0600 use env for data script commit b9f8f60b04c2905c63822bac48fd0c84720bb05d Author: Julien Cornebise <julien@cornebise.com> Date: Tue Nov 11 12:54:29 2025 +0000 First draft of regression tests based on recorder The output is not yet the kind of exhaustive result I was expecting, so needs more work. Done with Claude. commit 531280b6f27dab65d409b997066923d880935e8e Author: tevko <tim@devzero.io> Date: Mon Nov 10 22:32:39 2025 -0600 update action 3 commit ea8f989b41f5062f9a6e18401bf1726c61b7037b Author: tevko <tim@devzero.io> Date: Mon Nov 10 22:24:08 2025 -0600 update action 2 commit 779f5dd42cdf854376c7e606016badfb06d60ff6 Author: tevko <tim@devzero.io> Date: Mon Nov 10 22:17:10 2025 -0600 update action commit 359dbe387df39112db875409d5e23ac4afa4d441 Author: tevko <tim@devzero.io> Date: Mon Nov 10 22:05:53 2025 -0600 add action commit cb33f2d321ecc407397d5b7cd36911105bd634ee Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 10 13:16:35 2025 +0000 Exclude Conversation serialization tests Until https://github.com/compdemocracy/polis/issues/2284 is resolved commit 5a9d60add5ca359d5986601350143469c91c66e9 Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 10 10:05:42 2025 +0000 Add assert failure messages commit 78a27df39eb50b569a1aab76ef40716052b2a9a2 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 21:02:12 2025 +0000 Refactor test_repness_comparison.py to proper pytest structure Similar to pca tests, refactor test_repness_comparison.py - Converts test_comparison() function to TestRepnessComparison class - Uses @pytest.mark.parametrize for multiple datasets - Proper fixtures for clojure_results, conversation, python_results - Two test methods: test_structural_compatibility and test_comparison_visibility - Replaces print() with logging.info/debug - Adds warning that results are known to be very different - Reports comparison results for visibility without asserting on match rates - Maintains comparison functionality for manual inspection Test results: 4 tests passed (2 datasets × 2 test methods) commit bc4f9e0bb37ee24da8b51a9dbd694804802f2631 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 19:27:07 2025 +0000 Rename test_repness.py to test_repness_unit.py for clarity Rename to clarify that these are unit tests with synthetic data, following the same naming convention established for PCA tests: - test_repness.py → test_repness_unit.py (unit tests, synthetic data) - test_repness_smoke.py (real data, smoke tests - already renamed) - test_repness_comparison.py (Python vs Clojure - already clear) This mirrors the PCA test structure: - test_pca_unit.py (unit tests) - test_pca_edge_cases.py (edge cases) - test_pca_smoke.py (smoke tests) All 14 tests pass: - Statistical utility functions (z-scores, proportion tests) - Comment statistics calculation - Representative comment selection - Consensus selection - Integration tests (conv_repness, participant_stats) commit 41355a6161cb6cd1d4564d99e7ea580f63e66064 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 19:19:56 2025 +0000 Refactor repness smoke test Similar to how we refactored the "direct PCA" tests commit c3947d1d45904e88daf4973be8189d4f74a65f10 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 19:04:05 2025 +0000 Ignore warning from library ddtrace in pytest commit 622adb4adc71058e77514ab2c6d20b34561627d6 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 19:01:51 2025 +0000 Clarify the naming of PCA test files and remove redundant tests commit 0a0b55ef5990741c23c97afa9c5557c05e65db63 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 18:57:28 2025 +0000 Refactor direct_pca_test.py to test_pca_smoke.py with pytest structure Converted legacy procedural test script to proper pytest: - Class-based structure with TestPCAImplementation - Parametrized tests for all datasets - Fixtures for vote matrix loading - Proper logging instead of prints - Smoke test warning (no correctness validation) - Tests: runs without error, projection statistics, clustering Tests PCA functions directly (not through Conversation class). commit 43593a03751b5caf63b41e99455199c8c53eaf10 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 18:26:20 2025 +0000 Fix direct conversation test - Convert to proper pytest format, not standalone script - Use fixtures for setup/teardown - Warn it is test to check Conversation class instantiation and method calls - Replace prints by logging - Parametrize the test to run over all available real_data - Add some dimension and attributes assertions - Rename to test_conversation_smoke.py commit c717c472b02758a684addfd588b57d130d390b34 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 18:12:07 2025 +0000 Fix buggy test that blocked pytest collection The `test_batch_id.py` was running code at load time, and that code had an error, thus crashed during pytest collection, preventing all tests from running. By refactoring into a proper test function, pytest can now collect all tests and run them. We also fix the error itself, which was a missing escape of the "scan" reserved word in DynamoDB. commit 84547f2f76b6b44c63f848319886377c8d6c7ae5 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 17:54:50 2025 +0000 Clarify terms in messages and comments commit 23d1833fb44f3e05969e800e871c62fefab85880 Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 17:54:30 2025 +0000 Fix path... commit 6adbd51da65235c92c3a5c1f1a09fa404c4ed55b Merge: d560fe66 b8df940f Author: Julien Cornebise <julien@cornebise.com> Date: Sun Nov 9 11:24:27 2025 +0000 Merge branch 'edge' into replace_named_matrix commit d560fe6653a64c093589fedd5f8855349391bddb Merge: be3d50e9 c5ec8994 Author: Julien Cornebise <julien@cornebise.com> Date: Sat Nov 8 11:45:13 2025 +0000 Merge remote-tracking branch 'upstream/edge' into replace_named_matrix commit be3d50e97669fdde45971917cb5b3ac58cf54288 Author: Julien Cornebise <julien@cornebise.com> Date: Sat Nov 8 11:44:22 2025 +0000 Print whether comment priorites are missing from test data commit d7970d8f4b0aef03dbfab30ba54b7cd4b688c17d Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 7 12:38:16 2025 +0000 Refactor real_data loading Remove duplication, allow for automatic finding of the files within a location, allow for generalisation to other conversations than the two used so far. commit f5ac66916db7c6b7541058abe151cb54b1caff3c Author: Julien Cornebise <julien@cornebise.com> Date: Fri Nov 7 10:55:55 2025 +0000 Create script to download real data for tests This is useful if no folder `real data` was provided. I suspect these tests were written with a `real data` folder already in place. I do not have it, therefore we need to download it. See the `README` file that has been updated. commit 23cced099659c9d256653f8085d2999760d45caa Author: Julien Cornebise <julien@cornebise.com> Date: Thu Nov 6 13:51:13 2025 +0000 Extract common function to utils file That function was defined 3 times in 3 different files. commit 3c6e7880f3b7e8076d3a339b3cc8b71f1f3adb1f Author: Julien Cornebise <julien@cornebise.com> Date: Mon Nov 3 17:57:21 2025 +0000 Add type hint in some poller functions * Fix run_math_pipeline test import to use proper package path The test file was importing `from run_math_pipeline import main` which failed locally because `run_math_pipeline.py` lives inside the `polismath` package at `delphi/polismath/run_math_pipeline.py`. CI was working around this by copying the file to a flat location: docker cp delphi/polismath/run_math_pipeline.py delphi:/app/run_math_pipeline.py This created a discrepancy between local and CI environments. The fix: 1. Update test imports to use the correct package path: `from polismath.run_math_pipeline import main` 2. Update mock.patch paths to match: `mock.patch('polismath.run_math_pipeline.fetch_comments', ...)` 3. Remove the CI workaround that copied the file to /app flat 4. Simplify coverage to `--cov=polismath` (run_math_pipeline is inside it) The Docker image already has `polismath/` at `/app/polismath/` and the package is installed via `pip install --no-deps .`, so the proper import path works in both local and CI environments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve CI coverage reporting reliability Changes to the CI workflow: 1. Print coverage report to workflow logs (always visible) 2. Upload coverage report as downloadable artifact 3. Make PR comment step non-fatal with continue-on-error: true (fork PRs cannot post comments due to GitHub token restrictions) Coverage is now accessible three ways: - In the workflow logs (step 7) - As a downloadable artifact (step 8) - As a PR comment when permissions allow (step 9) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add graceful error handling for coverage comment on fork PRs Instead of showing an unhandled error when posting coverage comments fails on fork PRs, the script now catches the 403 error and displays a helpful message explaining: - Why the comment could not be posted (GitHub token permissions) - Where to find the coverage report (logs and artifact) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix test for malformed votes Malformed votes should be ignored. * Clean up unused variables and imports Address GitHub Copilot review comments: - Log superseded votes count in conversation.py instead of leaving unused - Remove unused p1_idx/p2_idx index lookups in corr.py - Remove unused all_passed variable in regression_comparer.py - Remove unused imports (numpy, Path, List, datetime, stats, pca/cluster functions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * Only run python-ci for delphi changes; minimize output (#2315) * Only run python-ci for delphi changes; minimize output * address PR feedback * Revert "Merge branch 'stable' into edge" (#2305) This reverts commit 51665ab3b552e406526364d7e8fc5a0be7bd8277, reversing changes made to 3901ee5fcd134adfe498ca6a76a89ab5c1cda3a6. * add narrative pipelline test (#2307) * add narrative pipelline test * change filename * slight mocking adjustment * mock sentence transformer * better evoc * try massaging mock data again * more mocking * diff mock strategy * fix cov report * test 500 gen embed * syntax fixes * update sytax again * syntax fix again * attempt mock fix * another mock attempt * fix action * fix action again * actions fix * add another test * add another test * fix test * Alpha visualization (#2302) * add client-visualization submodule * add pca visualization to alpha client * show user in the data viz * fetch and animate new pca data * remove gitmodule * use concaveman lib; update package.json; use gray color; only show when vis_type is set * reset selected statement when group changes * update astro types * include remaining comment count * Bump js-yaml from 4.1.0 to 4.1.1 in /e2e (#2292) Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1. - [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md) - [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1) --- updated-dependencies: - dependency-name: js-yaml dependency-version: 4.1.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump js-yaml from 3.14.1 to 3.14.2 in /cdk (#2298) Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 3.14.1 to 3.14.2. - [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md) - [Commits](https://github.com/nodeca/js-yaml/compare/3.14.1...3.14.2) --- updated-dependencies: - dependency-name: js-yaml dependency-version: 3.14.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump glob from 10.3.16 to 10.5.0 in /client-report (#2300) Bumps [glob](https://github.com/isaacs/node-glob) from 10.3.16 to 10.5.0. - [Changelog](https://github.com/isaacs/node-glob/blob/main/changelog.md) - [Commits](https://github.com/isaacs/node-glob/compare/v10.3.16...v10.5.0) --- updated-dependencies: - dependency-name: glob dependency-version: 10.5.0 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump js-yaml in /client-admin (#2309) Bumps and [js-yaml](https://github.com/nodeca/js-yaml). These dependencies needed to be updated together. Updates `js-yaml` from 4.1.0 to 4.1.1 - [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md) - [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1) Updates `js-yaml` from 3.14.1 to 3.14.2 - [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md) - [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1) --- updated-dependencies: - dependency-name: js-yaml dependency-version: 4.1.1 dependency-type: indirect - dependency-name: js-yaml dependency-version: 3.14.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Speed up repness 11x (#2316) * Optimize update_votes with vectorized pivot_table (5x speedup) Replace the row-by-row for-loop in update_votes with a vectorized pivot_table approach. This dramatically speeds up vote loading for large datasets. Performance on bg2050 dataset (1M+ votes, 7.8k participants, 7.7k comments): - Before: 18.5s average, 56k votes/sec - After: 3.5s average, 295k votes/sec - Speedup: 5.3x overall, 16x for the batch update step The optimization: 1. Use pivot_table to reshape long-form votes to wide-form matrix 2. Use DataFrame.where() to merge with existing matrix 3. Use float32 for intermediate matrix to halve memory usage Also adds a benchmark script at polismath/benchmarks/bench_update_votes.py for measuring update_votes performance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Vectorize _compute_vote_stats and make benchmark standalone - _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR #2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Remove misleading float32 conversion in update_votes Addresses GitHub Copilot review comments on PR #2313: - Removed float32 conversion that only provided temporary memory savings - The comment was misleading as savings didn't persist after .where() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Use vectorized pandas operations in benchmark loader Replace iterrows() with rename() + to_dict('records') for efficiency, as suggested by GitHub Copilot review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add timing logging for PCA and repness * Add benchmark script for repness * Add profiling to benchmark for repness * Vectorize vote count: 2x speedup on large convos * Extract common setup code * Rename vote_matrix to vote_matrix_df for clarity * Keep NaNs instead of None: 2x more speedup * Refactor conv_repness() to use long-format DataFrame Convert wide-format vote matrix to long-format using melt() and use vectorized pandas groupby operations instead of nested loops. Key changes: - Add compute_group_comment_stats_df() for vectorized (group, comment) stats - Add prop_test_vectorized() and two_prop_test_vectorized() for batch z-tests - Add select_rep_comments_df() and select_consensus_comments_df() for DataFrame-native selection, converting to dicts only at the end - Compute "other" stats as total - group instead of recalculating - Use MultiIndex.from_product() to ensure all (group, comment) combinations Test changes: - Add test_old_format_repness.py to preserve backwards compatibility tests - Add TestVectorizedFunctions class with 8 tests for new DataFrame interface 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Shorten imports as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docstring as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unused import as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Move profiler to within profiling function as per GH Copilot review * Remove unused import as per GH Copilot review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Profile new functions --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * some lib updates (#2323) * remove express from oidc-simulator; update other libs * pin auth0-simulator to 0.10.2 * e2e lib updates * client-admin lib updates * fix delphi dockerfile -- torch versions for cpu * Bump js-yaml in /client-report (#2317) Bumps and [js-yaml](https://github.com/nodeca/js-yaml). These dependencies needed to be updated together. Updates `js-yaml` from 4.1.0 to 4.1.1 - [Changelo…

jucor · 2025-12-10T12:42:54Z

Thanks @ballPointPenguin ! Rebasing then merging :)

…flag - Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/ - Add --include-local pytest flag to include git-ignored local datasets - Add .local/ to .gitignore for confidential/large datasets - Simplify datasets.py with DatasetInfo dataclass and discovery functions - Add conftest.py with pytest hooks for dynamic test parametrization - Update download_real_data.py to default to .local/ with --commit flag - Add unit tests for dataset discovery in test_datasets.py - Update tests/README.md with new documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Use any() instead of bool(list()) in _check_files for efficiency - Add multiple match validation in find_file - Fix pytest.skip() during collection (use empty parametrize instead) - Add directory context comment to test_regression.py usage - Remove unused list_regression_datasets import - Rename TestDirPattern to TestDirectoryPattern - Improve error message in regression_download.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Warn when local dataset shadows a committed dataset with same name - Add test for include_local=True behavior - Add test for name collision warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Allow datasets to be valid for regression testing without the Clojure math_blob file. This enables testing when database access is unavailable (e.g., when DATABASE_URL is not set). Changes: - DatasetInfo.is_valid now only requires votes, comments, and golden_snapshot - Added has_clojure_reference property to check if Clojure comparison is possible - Updated documentation to clarify math_blob is optional - Added tests for new behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- test_legacy_clojure_regression.py: Replace hardcoded ["biodiversity", "vw"] with auto-discovery using clojure_dataset fixture. Only includes datasets with has_clojure_reference=True (i.e., have math_blob for Clojure comparison). Respects --include-local flag. - regression_download.py: After download, check for missing golden_snapshot.json and offer to create them interactively. Shows command to create later if user declines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Clojure comparison tests only need votes, comments, and math_blob. They compare against the Clojure output, not the Python golden snapshot. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ata download failures better

…flag (#2312) * Add auto-discovery for regression test datasets with --include-local flag - Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/ - Add --include-local pytest flag to include git-ignored local datasets - Add .local/ to .gitignore for confidential/large datasets - Simplify datasets.py with DatasetInfo dataclass and discovery functions - Add conftest.py with pytest hooks for dynamic test parametrization - Update download_real_data.py to default to .local/ with --commit flag - Add unit tests for dataset discovery in test_datasets.py - Update tests/README.md with new documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Address Copilot review feedback - Use any() instead of bool(list()) in _check_files for efficiency - Add multiple match validation in find_file - Fix pytest.skip() during collection (use empty parametrize instead) - Add directory context comment to test_regression.py usage - Remove unused list_regression_datasets import - Rename TestDirPattern to TestDirectoryPattern - Improve error message in regression_download.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add warning for name collisions and test for include_local - Warn when local dataset shadows a committed dataset with same name - Add test for include_local=True behavior - Add test for name collision warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Make math_blob optional for regression testing Allow datasets to be valid for regression testing without the Clojure math_blob file. This enables testing when database access is unavailable (e.g., when DATABASE_URL is not set). Changes: - DatasetInfo.is_valid now only requires votes, comments, and golden_snapshot - Added has_clojure_reference property to check if Clojure comparison is possible - Updated documentation to clarify math_blob is optional - Added tests for new behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Auto-discover datasets in tests and prompt for golden snapshots - test_legacy_clojure_regression.py: Replace hardcoded ["biodiversity", "vw"] with auto-discovery using clojure_dataset fixture. Only includes datasets with has_clojure_reference=True (i.e., have math_blob for Clojure comparison). Respects --include-local flag. - regression_download.py: After download, check for missing golden_snapshot.json and offer to create them interactively. Shows command to create later if user declines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Clojure tests do not require golden_snapshot Clojure comparison tests only need votes, comments, and math_blob. They compare against the Clojure output, not the Python golden snapshot. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix name in examples as per GH Copilot review * clarify dotenv ".env" location; unignore .gitkeep in .local; handle data download failures better --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Bennie Rosas <ben@aliencyb.org>

jucor requested a review from Copilot November 25, 2025 18:39

Copilot started reviewing on behalf of jucor November 25, 2025 18:39 View session

Copilot finished reviewing on behalf of jucor November 25, 2025 18:43

jucor force-pushed the local_datasets branch from 2667ae5 to f692f13 Compare November 25, 2025 18:43

Copilot AI reviewed Nov 25, 2025

View reviewed changes

jucor force-pushed the local_datasets branch from ca08d46 to 80115f9 Compare November 25, 2025 18:54

jucor requested a review from tevko November 25, 2025 18:56

jucor force-pushed the local_datasets branch 3 times, most recently from 7ceb99c to 075797b Compare November 25, 2025 20:52

jucor requested a review from Copilot November 25, 2025 21:00

Copilot started reviewing on behalf of jucor November 25, 2025 21:01 View session

Copilot finished reviewing on behalf of jucor November 25, 2025 21:05

Copilot AI reviewed Nov 25, 2025

View reviewed changes

jucor and others added 7 commits December 10, 2025 12:43

Fix name in examples as per GH Copilot review

cbcda17

clarify dotenv ".env" location; unignore .gitkeep in .local; handle d…

41a3e62

…ata download failures better

jucor force-pushed the local_datasets branch from d04c2ca to 41a3e62 Compare December 10, 2025 12:43

jucor merged commit d2f9b6d into compdemocracy:edge Dec 10, 2025
4 checks passed

Add auto-discovery for regression test datasets with --include-local flag #2312

Add auto-discovery for regression test datasets with --include-local flag #2312

Conversation

jucor commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jucor commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jucor commented Nov 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jucor commented Nov 26, 2025

Uh oh!

ballPointPenguin commented Dec 1, 2025

Uh oh!

jucor commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Nov 25, 2025 •

edited

Loading

jucor commented Nov 25, 2025 •

edited

Loading