-
Notifications
You must be signed in to change notification settings - Fork 232
Add auto-discovery for regression test datasets with --include-local flag #2312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2667ae5 to
f692f13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR modernizes the regression testing infrastructure by implementing auto-discovery for test datasets and adding support for local, git-ignored datasets. The changes enable developers to test with confidential or large conversation data without risking accidental commits, while simplifying the process of adding new test datasets—just drop them in real_data/.local/ and they're automatically discovered.
Key changes:
- Auto-discovery mechanism that scans directories matching pattern
<report_id>-<name>/inreal_data/andreal_data/.local/ - New
--include-localpytest flag to opt-in to testing with local datasets - Refactored
datasets.pywithDatasetInfodataclass and discovery functions replacing hardcoded configuration - Updated download script to default to
.local/directory with--commitflag for public datasets
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| delphi/polismath/regression/datasets.py | Core auto-discovery implementation with DatasetInfo dataclass and directory scanning logic |
| delphi/tests/conftest.py | pytest hooks for --include-local flag, dynamic test parametrization, and dataset summary reporting |
| delphi/tests/test_regression.py | Removed hardcoded parametrization in favor of dynamic discovery via conftest.py |
| delphi/tests/test_datasets.py | Unit tests for directory pattern matching, file checking, and dataset discovery logic |
| delphi/tests/download_real_data.py | New positional arguments (report_id, dataset_name) with --commit flag to control download location |
| delphi/tests/README.md | Updated documentation explaining auto-discovery, local datasets, and new download patterns |
| delphi/pyproject.toml | Added local_dataset marker for pytest |
| delphi/.gitignore | Added real_data/.local/ to git ignore list |
| delphi/polismath/regression/init.py | Updated exports to include new discovery functions and DatasetInfo class |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
All Copilot review comments have been addressed: In commit 30edbff:
In commit 80115f9:
|
ca08d46 to
80115f9
Compare
|
Note: rebased all commits to sign them -- hence why the commits appear after the review. Same content, just added signature. |
7ceb99c to
075797b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (9)
delphi/scripts/regression_download.py:363
- The example references
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filenamepython scripts/regression_download.py.
delphi/scripts/regression_download.py:17 - The usage examples reference
download_real_data.pybut the script file is actually namedregression_download.py. Update the references to use the correct filename to avoid confusion.
delphi/scripts/regression_download.py:31 - The examples reference
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filenamepython scripts/regression_download.py.
delphi/scripts/regression_download.py:355 - The example references
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filenamepython scripts/regression_download.py.
delphi/scripts/regression_download.py:359 - The example references
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filenamepython scripts/regression_download.py.
delphi/scripts/regression_download.py:367 - The example references
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filenamepython scripts/regression_download.py.
delphi/scripts/regression_download.py:20 - The usage example references
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filename.
delphi/scripts/regression_download.py:26 - The usage example references
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filename.
delphi/scripts/regression_download.py:351 - The example references
download_real_data.pybut the script file is actually namedregression_download.py. Update to use the correct filenamepython scripts/regression_download.py.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR compdemocracy#2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Ready for human review and merge :-) |
- _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR compdemocracy#2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Suggested Updates here: jucor#3 |
* Optimize update_votes with vectorized pivot_table (5x speedup) Replace the row-by-row for-loop in update_votes with a vectorized pivot_table approach. This dramatically speeds up vote loading for large datasets. Performance on bg2050 dataset (1M+ votes, 7.8k participants, 7.7k comments): - Before: 18.5s average, 56k votes/sec - After: 3.5s average, 295k votes/sec - Speedup: 5.3x overall, 16x for the batch update step The optimization: 1. Use pivot_table to reshape long-form votes to wide-form matrix 2. Use DataFrame.where() to merge with existing matrix 3. Use float32 for intermediate matrix to halve memory usage Also adds a benchmark script at polismath/benchmarks/bench_update_votes.py for measuring update_votes performance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Vectorize _compute_vote_stats and make benchmark standalone - _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR #2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Remove misleading float32 conversion in update_votes Addresses GitHub Copilot review comments on PR #2313: - Removed float32 conversion that only provided temporary memory savings - The comment was misleading as savings didn't persist after .where() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Use vectorized pandas operations in benchmark loader Replace iterrows() with rename() + to_dict('records') for efficiency, as suggested by GitHub Copilot review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add timing logging for PCA and repness * Add benchmark script for repness * Add profiling to benchmark for repness * Vectorize vote count: 2x speedup on large convos * Extract common setup code * Rename vote_matrix to vote_matrix_df for clarity * Keep NaNs instead of None: 2x more speedup * Refactor conv_repness() to use long-format DataFrame Convert wide-format vote matrix to long-format using melt() and use vectorized pandas groupby operations instead of nested loops. Key changes: - Add compute_group_comment_stats_df() for vectorized (group, comment) stats - Add prop_test_vectorized() and two_prop_test_vectorized() for batch z-tests - Add select_rep_comments_df() and select_consensus_comments_df() for DataFrame-native selection, converting to dicts only at the end - Compute "other" stats as total - group instead of recalculating - Use MultiIndex.from_product() to ensure all (group, comment) combinations Test changes: - Add test_old_format_repness.py to preserve backwards compatibility tests - Add TestVectorizedFunctions class with 8 tests for new DataFrame interface 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Shorten imports as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docstring as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unused import as per GH Copilot Review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Move profiler to within profiling function as per GH Copilot review * Remove unused import as per GH Copilot review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Profile new functions --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add db scaling, install datadog (#2147)
* add db scaling, install datadog
* add to example env
* dd instrumentation
* Update deploy-prod.yml
* fix dd
* more dd config (#2150)
* stop dd agent
* more dd config
* more dd instrumentation
* dd config add network
* add log tags
* delphi dd config
* dd add report RUM
* try new rum strategy
* fix obj prop name
* fix err superadmin
* another superadmin fix
* make collective statements scroll more good (#2163)
* Te adjust collective stmt prmpt (#2167)
* expand on object properties for LLM
* prompt hardening
* fix inversion (#2169)
* enable full math tracing (#2171)
* Better API server logging for Datadog (#2173)
* Implement Datadog logging middleware and enhance error handling
- Added `middleware_http_json_logger` for structured logging in production, replacing the default morgan logger.
- Updated `app.ts` to conditionally use the new logger based on the environment.
- Enhanced `globalErrorHandler` to log errors in a Datadog-friendly format, including HTTP method, URL, and error details.
- Introduced `ddEnv` configuration in `config.ts` for environment-specific logging.
- Updated logger configuration to support both development and production formats.
* small cleanup
* devMode convenience var
* Updates the topic agenda component to use conversation_id prop directly instead of accessing it through the conversation object. Fixes bug where conversation_id is not included in the POST request. (#2174)
* Update and fix e2e tests (#2176)
* minor update; lint
* include AUTH_DOMAIN and AUTH_CLIENT_SECRET in env examples; rename AUTH0 vars to generic
* replace console with logger
* formatting
* include ADMIN_UIDS in more configurations
* safely parse ADMIN_UIDS json
* generic OIDC language
* repair report-authentication test
* init dynamodb tables in test env
* env vars to determine DD usage in client-report
* restore deleted tests
* allow moderator or seed comment auto approval
* add rebuild-server to makefile
* improved comment tests
* auto-approve seed and moderator comments
* remove unused jigsaw key
* upgrade cypress and faker; fix xid test
* fix int test
* improve oidc test reliability
* fix client-report tests
---------
Co-authored-by: tevko <tim@devzero.io>
* pass include moderation arg (#2178)
* fix dynamo hardcode
* More Test fixes and small improvements (#2181)
* improve participant insertion vs race conditions; minor tweaks to logging and next comment selection
* improve e2e OIDC checks for stability in CI test suite
* Tree Invite Updates and Fixes (#2182)
* integration tests for treevite
* Invite improvements and Fixes;
Invite CSV Download
---------
Co-authored-by: Tim <timevko@gmail.com>
* remove hardcoded region values (#2184)
* Make psql shell (#1627)
* Add psql-shell task to makefile
* `make psql-shell` now uses env values, and quits if POSTGRES_DOCKER is not `true`
* ensure compose-file args for `make psql-shell`
---------
Co-authored-by: Bennie Rosas <bennie.rosas@blvd.co>
Co-authored-by: Bennie Rosas <ben@aliencyb.org>
* Improved Topic Naming (#2185)
* Use pseudo-random comment selection for topic naming;
Improved ollama topic naming prompt.
* remove prompt_prefix; formatting.
* TOPIC_NAMING document
* Te euro cdk prep (#2187)
* add euro deploy scripts and update aws action
* disable temp nginx
* multi stage building for action
* actions fix
* fix script typo
* add appspec-euro
* appspec fix
* fix typo
* another typo fix
* final path correction
* another typo update
* stop nginx so docker can take over
* update static assets deploy
* remove environment
* automated db backups (#2199)
* automated db backups
* fix handler call
* add lambda layer
* update lambda layers for pg_dump in lambda capabilities
* typo fix
* delete and rotate
* add region arg to job poller setup
* no dd trace in euro
* fix hardcoded region defaults
* fix dynamo table create conflict
* viz logic fix
* fix another default region err
* Client Admin : Responsive Design and other Improvements (#2202)
* client-admin minor pkg updates
* normalize component names; remove dead code
* remove d3-scale
* email is not an ADMIN UID
* client-admin don't run simple analytics in dev
* clean up dead reducers
* auth helpers and unified user state
* upgrade legacy components
* eslint cleanup
* ZidMetadataProvider
* Pro gating for Topic Mod
* handle conversation permission at the top level; bug fixes
* rename zid_metadata to conversation_data
* rename some more components and tests
* theme ui recommendations doc
* repair client-admin tests
* Add lots of test coverage
* fix delphi check
* better responsive and mobile design
* Update fixed widths for responsive
* VictoryTheme more responsive
* Consolidate topic-moderation styles
* enhance theme with mobile-first tokens
* update and normalize color palette
* Improve TopicMod style, but hide it for now;
Show "alpha" url when treevite is enabled
* docker-compose test fix
* git file renames
* rename tos -> TOS
* test mock fix
* Add some clarity to authUser vs contextUser
* Improved ReportsList with expandable list of URLs
* minor pkg updates
* fix tests
* improve ract-condition protection in comment creation
* ReportsList: Remove Comment Report
* client-admin test reliability
* Bump torch from 2.3.1 to 2.8.0 in /delphi (#2142)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.3.1 to 2.8.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.3.1...v2.8.0)
---
updated-dependencies:
- dependency-name: torch
dependency-version: 2.8.0
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim <timevko@gmail.com>
* Bump axios from 1.10.0 to 1.12.2 in /server (#2200)
Bumps [axios](https://github.com/axios/axios) from 1.10.0 to 1.12.2.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md)
- [Commits](https://github.com/axios/axios/compare/v1.10.0...v1.12.2)
---
updated-dependencies:
- dependency-name: axios
dependency-version: 1.12.2
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim <timevko@gmail.com>
* Bump nodemailer from 6.10.1 to 7.0.7 in /server (#2209)
Bumps [nodemailer](https://github.com/nodemailer/nodemailer) from 6.10.1 to 7.0.7.
- [Release notes](https://github.com/nodemailer/nodemailer/releases)
- [Changelog](https://github.com/nodemailer/nodemailer/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodemailer/nodemailer/compare/v6.10.1...v7.0.7)
---
updated-dependencies:
- dependency-name: nodemailer
dependency-version: 7.0.7
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump vite from 6.3.5 to 6.3.6 in /client-participation-alpha (#2161)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.3.5 to 6.3.6.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v6.3.6/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v6.3.6/packages/vite)
---
updated-dependencies:
- dependency-name: vite
dependency-version: 6.3.6
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim <timevko@gmail.com>
* Fix dev environment startup (#2211)
* Revert "Bump torch from 2.3.1 to 2.8.0 in /delphi (#2142)"
This reverts commit a7a060b8b63372141a6b092848d4823b8d8b9c0d.
* Move clojure math env to dev instead of prod
Set `MATH_ENV=dev` instead of `=prod` in `example.env`. This avoids
an infinite reboot loop of the clojure worker due to failing to load Datadog
profiler -- which is skipped in development environment.
* Start notes to get running
@ballPointPenguin has asked me to let him know whether
`make start` works as intended. Documenting here the steps needed
to make it work :)
* Describe fix for login problem
* Fix login failure due to missing hostname in certificate
* Remove explanatory notes to make a clean commit
As discussed with @ballPointPenguin
---------
Co-authored-by: Julien Cornebise <julien@cornebise.com>
* count default votes for bulk-upload seeds (#2213)
* support markdown in cpa (#2218)
* simplify email service (#2210)
* simplify email service
* begin testing, remove maildev, add ses-local
* docker fix
* swap mail docker container
* use env var
* fix typo
* update logs, add export
* debug update
* add exports
* succinct var passing
* more config fixes
* add from email
* fix test helper
* more email helper updates
* partial revert
* debug logging
* obj prop change
* store refactoring
* debug cleanup
* add back jsdoc
* clarify test environment (#2215)
* prodclone dev workflow; db update (#2216)
* helpful db scripts in ./bin
* make start-prodclone workflow
* avoid running datadog in math for local/dev
* Remove narrative report from menu (#2217)
* remove narrative report link
* improve test reliability
* donate message (#2223)
* bang head against wall
* change verbiage
* finally get backbone right
* include importance data in comments and votes data exports (#2224)
* include importance data in comments and votes data exports
* fix importance export tests
* temporary disable topical comment routing for perf (#2222)
* consolidate comment cluster query logic; optional cache (#2229)
* consolidate comment cluster query logic; optional cache
* re-enable topical comment routing
* hotfix
* pin docker compose version
* better compose pinning
* Te delphi ux (#2230)
* begin in progress job ux
* remove nested ternarys
* cleanup
* remove unused
* BUGFIX: actual comment_ids must be used (#2233)
* BUGFIX: actual comment_ids must be used
* use distance to centroid for representative topic comments
* Te delphi ux (#2235)
* begin in progress job ux
* remove nested ternarys
* cleanup
* remove unused
* pass var correctly
* move after_install block (#2237)
* better filter pattern (#2239)
* fix query (#2241)
* Te delphi ux 5 (#2243)
* add debug logging
* fix math bug
* enable pagination (#2245)
* remove log
* Delphi package and env management (#2228)
* made Makefile faster and compatible with os x (#2232)
* update to patched version (#2249)
* update to patched version
* make generate-requirements
---------
Co-authored-by: Bennie Rosas <ben@aliencyb.org>
* Te delphi ux logs (#2247)
* remove form
* promote delphi and show users how to generate reports
* better messaging
* fix test
* change link
* add donate link
* participant-importance report (#2248)
* participant-importance report
* test fixes
* Update client-admin/src/util/auth.js
Co-authored-by: Tim <timevko@gmail.com>
---------
Co-authored-by: Tim <timevko@gmail.com>
* better messaging during batch report phase (#2252)
* fix reset_conversation bug (#2254)
* change message success text (#2256)
* use modal for delphi run confirmation (#2258)
* use modal for delphi run confirmation
* css
* formatting
* add embedded donate page and change links (#2264)
* Visualise participation (#2262)
Co-authored-by: Julien Cornebise <julien@cornebise.com>
* Pagination for comments in Admin Moderation view (#2263)
* enable pagination for get-comments
* client-admin moderation pagination
* server api comment pagination
* comment pagination tests
* Parameterize Delphi path (#2266)
Before this PR, the Delphi python codebase had hardcoded paths to `/app/` that
made it difficult to run in different environments or directory structures,
especially for local development and algorithmic/data analysis.
This PR introduces the optional environment variable DELPHI_APP_PATH,
which, if specified, overrides `/app`.
* Update README docs with cert and key generation steps (#2271)
* pin node version to 24 (LTS) (#2270)
* Add instance type `dev` to process all sizes (#2267)
* Add instance type dev to process all sizes
Especially useful for local dev instances where we don't want to limit resources.
* Set INSTANCE_SIZE to 'dev' for local setup
Update INSTANCE_SIZE for local development.
* Change instance type check from 'omnipotent' to 'dev'
* Refine comments on Delphi instance size configuration
Updated comments for clarity regarding Delphi instance size.
* Speed up NamedMatrix updates between 40x and 200x (#2268)
* Factorize named matrix vote normalization option
The tests are also fixed, while keeping the same behavior as before.
Weirdly, update() does not normalize the values being set, whereas batch_update() does.
And _convert_to_numeric() keeps NaN values as NaN, whereas batch_update() converts them to 0.0 by default.
This is not very consistent, but I have kept the same behavior for backward compatibility.
Since not all Delphi tests are passing, I could not verify whether other parts of the pipeline depend on this behaviour.
* Speed up named matrix computation
Keep both behaviours in this commit, for comparison and to log a speed report.
Will remove it before pull-request.
* Add deep test and remove speed up comparison
This concludes the refactoring.
* Apply copilot spelling corrections
* make commands: refresh-db, refresh-devdb, refresh-prodclone (#2272)
* make commands: refresh-db, refresh-devdb, refresh-prodclone
* Ensure make refresh-* db works as intended
* Use Python 3.12 to regenerate requirements.lock; minor updates (#2278)
* Use Python 3.12 to regenerate requirements.lock; minor updates
* configure python version 3.12.x and pip version < 25.3
* add ref to github issue
* Factorize Dynamodb deletions for readability (and log their timing) (#2275)
* Refactor dynamodb deletions
In the first step of the pipeline, where we delete any previous data, we had a *lot* of duplicated code. Factored all the common bits to make it simpler to understand.
* Add timing info to dynamoDB writes
* Fix off-by-one page count on logging
* Move import to top of file
* Minor defensive fixes
* Robustify data diagnostics (#2277)
* Add test for multiple updates to same cell in one batch
This will be handy when I change how we do the updates to the matrix.
* Log when no new votes are here
Useful to debug.
* Speed up and display memory usage
* Display duplicate statistics and make graph optional
* Replace list by generator in sum (copilot)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix typo (copilot)
* Test for NaN/NA behaviour in update and batch_update
* Set up tests that match legacy behaviour
Note: they are failing right now. I will next implement that legacy behaviour.
* Implement legacy behaviour
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* remove scans (#2280)
* add interstitial and banner (#2269)
* add interstitial and banner
* test ixes
* another test fix
* comment out banner, add images
* Te delphi py tests (#2285)
* Add type hint in some poller functions
* Extract common function to utils file
That function was defined 3 times in 3 different files.
* Create script to download real data for tests
This is useful if no folder `real data` was provided. I suspect these tests were
written with a `real data` folder already in place. I do not have it, therefore
we need to download it. See the `README` file that has been updated.
* Refactor real_data loading
Remove duplication, allow for automatic finding of the files within a location,
allow for generalisation to other conversations than the two used so far.
* Print whether comment priorites are missing from test data
* Fix path...
* Clarify terms in messages and comments
* Fix buggy test that blocked pytest collection
The `test_batch_id.py` was running code at load time, and that code had an error,
thus crashed during pytest collection, preventing all tests from running.
By refactoring into a proper test function, pytest can now collect all tests and run them.
We also fix the error itself, which was a missing escape of the "scan" reserved word in DynamoDB.
* Fix direct conversation test
- Convert to proper pytest format, not standalone script
- Use fixtures for setup/teardown
- Warn it is test to check Conversation class instantiation and method calls
- Replace prints by logging
- Parametrize the test to run over all available real_data
- Add some dimension and attributes assertions
- Rename to test_conversation_smoke.py
* Refactor direct_pca_test.py to test_pca_smoke.py with pytest structure
Converted legacy procedural test script to proper pytest:
- Class-based structure with TestPCAImplementation
- Parametrized tests for all datasets
- Fixtures for vote matrix loading
- Proper logging instead of prints
- Smoke test warning (no correctness validation)
- Tests: runs without error, projection statistics, clustering
Tests PCA functions directly (not through Conversation class).
* Clarify the naming of PCA test files and remove redundant tests
* Ignore warning from library ddtrace in pytest
* Refactor repness smoke test
Similar to how we refactored the "direct PCA" tests
* Rename test_repness.py to test_repness_unit.py for clarity
Rename to clarify that these are unit tests with synthetic data,
following the same naming convention established for PCA tests:
- test_repness.py → test_repness_unit.py (unit tests, synthetic data)
- test_repness_smoke.py (real data, smoke tests - already renamed)
- test_repness_comparison.py (Python vs Clojure - already clear)
This mirrors the PCA test structure:
- test_pca_unit.py (unit tests)
- test_pca_edge_cases.py (edge cases)
- test_pca_smoke.py (smoke tests)
All 14 tests pass:
- Statistical utility functions (z-scores, proportion tests)
- Comment statistics calculation
- Representative comment selection
- Consensus selection
- Integration tests (conv_repness, participant_stats)
* Refactor test_repness_comparison.py to proper pytest structure
Similar to pca tests, refactor test_repness_comparison.py
- Converts test_comparison() function to TestRepnessComparison class
- Uses @pytest.mark.parametrize for multiple datasets
- Proper fixtures for clojure_results, conversation, python_results
- Two test methods: test_structural_compatibility and test_comparison_visibility
- Replaces print() with logging.info/debug
- Adds warning that results are known to be very different
- Reports comparison results for visibility without asserting on match rates
- Maintains comparison functionality for manual inspection
Test results: 4 tests passed (2 datasets × 2 test methods)
* Add assert failure messages
* Exclude Conversation serialization tests
Until https://github.com/compdemocracy/polis/issues/2284 is resolved
* add action
* update action
* update action 2
* update action 3
* use env for data script
* fix all tests
* fix action
* actions update 2
* add delphi service to test
* update action again
* another actions fix
* action fix again
* another actions fix again again again
* actions - mount volume tests
* actions - change baseUrl
* remove pg check
* change healthcheck - actions
* remove pg check again - actions
* try more robust action -- actions
* use pol.is baseurl -- actions
* add real data, update action
* remove duplicate data
* ensure dynamo tables created
* update region
* add access keys
* shared, test db
* freup space
* update other action
* build dependency
* add back in removed test, commented out
* comment stuff out
---------
Co-authored-by: Julien Cornebise <julien@cornebise.com>
* add more tests
* change import paths
* revert bad path changes
* add coverage report
* fix indentation
* fix action file
* add coverage
* fix action
* fix cov location
* update sourceDir
* remove coverage
* update action
* update action correctly
* fix actions syntax
* fix actions syntax
* fix actions syntax
* add coveragerc
* add back export path
* update action, pass polismath explicitly
* change coverage detection strategy
* more config
* try removing coverage path
* fix coveragerc
* fix coveragerc
* add to pyproject
* add to pyproject
* remove mention of .coveragerc
* remove reference
* attempt path mapping
* slight config change
* copy config during build
* remove tool section
* create .coveragerc inside action
* last try
* ok one more try
* last try for real
* one more try final v2
* more config adjustment
* one more ocnfig update
* try all in one container
* fix package name
* almost there
* try better formatting
* separate script
* add another test
* import sys
* move script into container
* clarify path
* fix db connection
* clarify env
* fix path
* update env
* fix test
* fix test again
* stub data
* still fixing test
* remove nonexistent key
* schema fix
* try db commit
* pakistan approach
* another try
* use mock data instead
* path fix
* fix id
* fix field names
* fix dynamo calls in test
* switch to scan
* relax test assertions
* more relaxed tests
* Admin - Participant Management (#2279)
* remove deprecated conversation fields
* add GET all_conversations route
* superadmin all-conversations view
* Participant Management WIP
* refactor xid logic; show xid list with pids in client-admin
* new xid tests
* Enable XID Upload
* show xid vote_count
* block non-xid participants when xid is required
* update some internal naming from "whitelist" to "allow list"
* xid arg not needed in votesPost
* fix test
* participation-management e2e
* upgrade cypress
* fix e2e test
* update alpha client with xid concerns
* normalize message; fix test
* rebuild astro
* relax tests further
* ignore pakistan test
* Update pip-tools and Delphi build (#2299)
* update pip-tools; remove pip version restriction; update requirements.lock
* simplify Dockerfile; remove unused `IS_GITHUB_ACTION` conditional
* update cypress config to not use `IS_GITHUB_ACTION`
* conditionally use cpu-only torch libs in test builds
* Fix run_math_pipeline test import to use proper package path (#2308)
* Fix run_math_pipeline test import to use proper package path
The test file was importing `from run_math_pipeline import main` which
failed locally because `run_math_pipeline.py` lives inside the `polismath`
package at `delphi/polismath/run_math_pipeline.py`.
CI was working around this by copying the file to a flat location:
docker cp delphi/polismath/run_math_pipeline.py delphi:/app/run_math_pipeline.py
This created a discrepancy between local and CI environments.
The fix:
1. Update test imports to use the correct package path:
`from polismath.run_math_pipeline import main`
2. Update mock.patch paths to match:
`mock.patch('polismath.run_math_pipeline.fetch_comments', ...)`
3. Remove the CI workaround that copied the file to /app flat
4. Simplify coverage to `--cov=polismath` (run_math_pipeline is inside it)
The Docker image already has `polismath/` at `/app/polismath/` and the
package is installed via `pip install --no-deps .`, so the proper import
path works in both local and CI environments.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Improve CI coverage reporting reliability
Changes to the CI workflow:
1. Print coverage report to workflow logs (always visible)
2. Upload coverage report as downloadable artifact
3. Make PR comment step non-fatal with continue-on-error: true
(fork PRs cannot post comments due to GitHub token restrictions)
Coverage is now accessible three ways:
- In the workflow logs (step 7)
- As a downloadable artifact (step 8)
- As a PR comment when permissions allow (step 9)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add graceful error handling for coverage comment on fork PRs
Instead of showing an unhandled error when posting coverage comments
fails on fork PRs, the script now catches the 403 error and displays
a helpful message explaining:
- Why the comment could not be posted (GitHub token permissions)
- Where to find the coverage report (logs and artifact)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Replace NamedMatrix by DataFrame and add regression tests (#2282)
* Merge Squashed onto `edge`:
commit 7f14aedafed4fea97c993d7996853407cba7f7dd
Merge: 93a2d313 780f1298
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 20 15:50:04 2025 +0000
Merge commit '780f1298ca7d72b9717f6aa38526301305e520e8' into replace_named_matrix
This will allow CI to run correctly.
commit 93a2d313e5cc25a4be336b1f4de33aa5d331a579
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 21:10:15 2025 +0000
Recompile requirements.lock to include natsort
commit 0fd37344ca160c0a296e9af4aaec0d889516191f
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 15:01:16 2025 +0000
Update golden records
Now that we have changed behaviours of matrix in terms of ordering and of types,
we need to update the golden records to reflect these changes.
commit 08d2383841687d6345d1a620646eccfd24c4c75c
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 15:01:04 2025 +0000
Fix regression bugs from package reorganization due to hallucinations
During refactoring to polismath.regression package, introduced bugs by
hallucinating non-existent methods and changing behavior without checking
the original code (commit afb8525a).
Fixed:
- prepare_votes_data(): Restored CSV columns ('voter-id', 'comment-id')
and vote dict keys ('pid', 'tid') instead of hallucinated alternatives
- compute_all_stages(): Restored actual methods (update_votes(),
_compute_pca(), _compute_clusters()) instead of hallucinated ones
(process_votes(), compute_pca(), compute_clustering())
- compute_all_stages_with_benchmark(): Restored original implementation
- get_dataset_files(): Restored original dict keys ('votes', 'comments')
instead of changed keys ('votes_csv', 'comments_csv')
- load_golden_snapshot(): Restored golden_path computation logic
- Numpy type handling: Added custom JSON encoder to preserve numeric types
and extended comparer to treat Python/numpy numeric types as compatible
commit 334c01b2f09ab321d558d10995b3144c18ec5d8d
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 14:11:31 2025 +0000
Reorganize regression testing into dedicated polismath.regression package
- Split monolithic regression.py (1167 lines) into focused modules:
- recorder.py: ConversationRecorder class
- comparer.py: ConversationComparer class
- datasets.py: Dataset configuration (moved from tests/)
- utils.py: Shared utility functions
- Clean architecture: No backwards dependencies from production to tests
- Updated all imports in CLI scripts and test files
- Regression testing now treated as first-class production feature
This improves code organization, maintainability, and makes the regression
tools suitable for use in production environments (monitoring, validation).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
commit afb8525a5ce0e7ace2a7feeb0aae935d78f2333a
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 13:47:11 2025 +0000
Improve logging throughout regression testing system
- Replace all print statements with proper logging calls in polismath/regression.py
- Use logger.info() for progress updates and results
- Use logger.warning() for comparison mismatches
- Use logger.debug() for detailed diagnostic information
- Make PCA debug output conditional on DEBUG log level
- Only save debug JSON files when logger.isEnabledFor(logging.DEBUG)
- Move debug outputs from current directory to .test_outputs/debug/
- Add --log-level CLI argument to regression scripts
- Support DEBUG, INFO, WARNING, ERROR, CRITICAL levels
- Default to INFO level
- DEBUG level enables PCA debug file generation
- Fix conversation module's logging initialization
- Check logging.root.handlers instead of logger.handlers
- Prevents duplicate handlers when logging is externally configured
- Simplifies logging setup in CLI scripts
The regression tools now provide full control over logging verbosity,
making it easier to debug issues (with DEBUG) or run quietly (with WARNING/ERROR).
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
commit 87f8cb24803cb5a14efa3389673a24a5708fa054
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 13:14:10 2025 +0000
Reorganize regression tests and consolidate test outputs
- Move golden snapshots to dataset folders (real_data/{dataset}/golden_snapshot.json)
- Relocate regression library from regression_tests/ to polismath/regression.py
- Move CLI tools to scripts/ with clearer names (regression_recorder.py, regression_comparer.py)
- Mark Clojure comparison tests as legacy with 'legacy_' prefix
- Consolidate ALL test outputs in hidden .test_outputs/ directory:
- Regression outputs → .test_outputs/regression/
- Python implementation outputs → .test_outputs/python_output/{dataset}/
- Keep real_data/ clean with only source data and golden snapshots
- Fix path resolution bugs and unknown dataset handling in regression system
- Update documentation and simplify .gitignore
This reorganization clearly separates:
- Source data and golden snapshots (real_data/) from temporary outputs (.test_outputs/)
- Standard Python regression tests from legacy Clojure comparisons
- Core libraries (polismath/) from CLI tools (scripts/)
commit a947c5a8ee19ce91c6b2bb55a398e334a7b5b3ec
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 12:08:03 2025 +0000
Process appropriate RunTimeWarning in correlation tests
The fourth row of the test matrix is intentationally constant, which
causes a RuntimeWarning when computing correlations. This commit updates
the test to properly handle this warning using the warnings module, ensuring
that the test suite runs cleanly without unhandled warnings.
commit b6fbc09c7e412503272a3c3e85a49185e93e70b6
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 11:56:16 2025 +0000
Skip failing Clojure regression tests
It's OK for now, as we want Delphi to stand on its own.
commit d8cb94262c47eddc3a05debb1a3991d85d9124df
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 11:49:45 2025 +0000
Remove hardcoded paths fed to Claude
commit 8dca87dba5017434482a110f4c3db6fcef4f2742
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 11:48:03 2025 +0000
Factorize the clojure comparison and pipeline tests
A lot of code was redundant and there was little separation
of purpose between the clojure comparison logic and the
pipeline tests. This change factorizes the clojure comparison
logic into its own module and simplifies the pipeline tests.
commit 4622440583b579264464f1becbda5f74cd3f2d62
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 11:07:57 2025 +0000
Fix output of full pipeline test
commit a274b8a4717ac6bfa0e19f4b9f340bbc087ceaa2
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 11:05:11 2025 +0000
Refactor comparison to Clojure results
commit b94a6c135768d712212232618e57aab3a934076b
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 09:47:07 2025 +0000
Preserve original data types and uses natural sorting.
Makes for a much clearer output. Will need to uppdate the golden record.
All tests passing.
commit 7c6412b0249e27dede382e377caac7401cd032af
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 09:39:30 2025 +0000
Add test for natural sorting order before implementing
commit e06f0ebad265ac58b5adb5d92400d5679e7c9159
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 18 09:14:17 2025 +0000
Match old sorting and conveting behaviour
commit e5f47cd56278eb82587517e60ae48e7c999b47c1
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 15:31:59 2025 +0000
Comment out BG2018 report for tests
commit cdc238c27e0357cd5c92436e9fcaf8465c901959
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 15:05:00 2025 +0000
Remove every mention of NamedMatrix
commit dacd95a42ce3f0067e92048f25aee37c5b1e6784
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 14:49:18 2025 +0000
Restrict pytest regression test to VW dataset only for speed
commit b657c870245eb758ac5b090fae60b7e4e23e1469
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 14:43:41 2025 +0000
Vectorize matrix clean-up
commit 4bb11b514763a5ec9eb6dca6c85cb299c0d9bf28
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 14:39:00 2025 +0000
Fix bug in PCA that caused different results
Found the bug ! (With Claude Code's help)
The PCA code starts by "cleaning" the matrix with some replacement rules for NaN
and strings. Then it proceeds to compute the PCA on that cleaned up matrix.
Great, I've done the cleaning, and done it in-place for efficiency, since the
matrix is cleaned up first thing in the code and the unclean one therefore not
used. Right ?
...
RIGHT ??
*It turns out*, hidden way below, the projection of the participants on the
*low-dimensional space is (intentionally) done *on the non-cleaned matrix* !!
*(TODO : I'll have to put my math thinking cap on understand exactly why it was
*coded like that...)
Adding "copy=True" in one built-in invocation solved it.
This version here also restored the loop-in-loop cleanup code. My next commit will clean it up.
commit 8da68f3b429fa9a269fddfdfcfce2677713ebe25
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 13:35:27 2025 +0000
Try but fail to mimic the old handling of strings and NaNs
commit 076535785edf0430a51f693b755f60ace9c12b00
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 13:32:15 2025 +0000
Add a sanity check test for matrix cleaning functions
Compare old and new way of doing things, to spot differnces.
commit b5ca83133a2ae7cf15c6d26715386251f7f2432f
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 10:19:51 2025 +0000
Print differences in regular order
Set operations are unordered...
commit 762dcaaa0a945e5fdf84fe12b18bd31d3084c861
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 10:07:27 2025 +0000
Order lexicographically (by str) upon moderation
commit dde8d6fb744b3f9a349e0b016ce7d0c0ab422349
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 10:07:05 2025 +0000
Store actual computation results
commit e9a231af1952b23d3317a4eb10713535d044d9b1
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 10:02:09 2025 +0000
Test ordering to match pre-NamedMatrixectomy ordering
commit fd592f72302a08c32a588190312dae81c691b399
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 17 09:28:40 2025 +0000
Save computed JSON for outside comparison
Also create a symlink to the latest, for ease of opening without having
to read timestamps.
commit 3392f442dc32feb7e10e4386a58ff40ced6ad38c
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 14 15:22:06 2025 +0000
Sort comment ids and participants
Sort the comment ids and participant ids using natsort to ensure consistent ordering.
Not sure why things need to be ordered, but it is probably less surprising this way.
As a bonus, our indices can now be any type instead of being force-converted to strings.
commit 27b4ccac63430e88fa78ed6b538489d72dac1f37
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 14 13:52:21 2025 +0000
Remove python output that is dynamically generated during tests
commit c056ef61b0be291b494beaf7ee0f5901208ee030
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 14 13:49:09 2025 +0000
Remove duplicate files
commit 81b3d9d6ad26bcf7f8b045169c94a9e735091c84
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 14 13:46:33 2025 +0000
Rename folder to new name
commit 7c2cc3fda8a68e55b08b6ca2707e10df78ecb510
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 14 13:42:51 2025 +0000
Fix trailing comma
commit 020ae42fd59febad1722e66fdc48436a27b44c5e
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 14 13:39:53 2025 +0000
Correct spaces to avoid false positives in git diffs.
commit 3866da55a343f3fe990f5de62362a92ba705a5ec
Merge: 9bbdc49a 2081ed8b
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 14 12:26:08 2025 +0000
Merge remote-tracking branch 'upstream/edge' into replace_named_matrix
A lot of merge conflicts due to this branch having merged changes earlier
that were merge-squashed into upstream/edge since then.
commit 9bbdc49abbf5de0a9bf665912d13af2b0d747f34
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 20:06:08 2025 +0000
Pass all unit tests without NamedMatrix
commit 3fbeee693e1c73c2e93bae426352d84988ec7edd
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 20:03:45 2025 +0000
Remove python output
This python output is overwritten each time the tests are run, and should not be committed.
commit 1392b6543103d52e67b102fece0958ad242d063e
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 19:46:56 2025 +0000
Pass correlation tests without NamedMatrix
commit 6b47f00d3e9716669ee955d35b4de5237ba0c583
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 19:41:39 2025 +0000
Pass Clustering tests without NamedMatrix :)
commit dbe197b1b520153b2d07f09d7830ef87139edf91
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 19:38:12 2025 +0000
Pass all PCA unit tests
PCA now works without NamedMatrix !
commit 13b8395dbb548f81f1dfa39caf64174310e76762
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 19:18:24 2025 +0000
Replace NamedMatrix by DF in corr. clusters, and repness
This passes test_conversation.py !
commit 2d1a6f7ab41e5b6f3838588075781b7e860c4060
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 19:02:01 2025 +0000
Revert "Skip a warning generated by boto3 about datetime.utcnow being deprecated"
This reverts commit 80be8bc7df085f39f29b67322d975415e21bc62e.
commit 11f18b21066149d039d1ed6e912de81b5d10239c
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 19:01:11 2025 +0000
Replace NamedMatrix by DF in conversation.recompute()
This means also applying to pca and clustering!
commit 7ac7b2508486f82292facffab10b473cf01ee51b
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 15:47:37 2025 +0000
First replacement and first test to pass
Replace NamedMatrix by DataFrame in
- conversation.update_votes()
- conversation._get_clean_matrix()
- conversation._apply_moderation()
and modify test_conversation::test_init
commit 80be8bc7df085f39f29b67322d975415e21bc62e
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 13:01:24 2025 +0000
Skip a warning generated by boto3 about datetime.utcnow being deprecated
commit 767a2d2349a3d8c965b95e3580801509afe01d79
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 12:48:53 2025 +0000
Add BG2018 and rename for clarity + Replace DB connection by URL
commit b80bd078d31a282b07c0705a61dfcea37725fcca
Merge: 58bd636a 52a458ac
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 13 09:35:31 2025 +0000
Merge remote-tracking branch 'upstream/te-delphi-py-tests' into replace_named_matrix
commit 52a458aca1daf51b3c8c117b9013a2462a4381db
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 21:55:27 2025 -0600
build dependency
commit e26b482e2cc69d5c0df8d5cfdee9dc788be5b428
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 21:41:04 2025 -0600
update other action
commit 58bd636aab3960499153a60d66ddc1ebef0cf6fb
Merge: e00d970a b4275829
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 22:54:15 2025 +0000
Merge branch 'te-delphi-py-tests' into replace_named_matrix
commit e00d970a4f9d5b9ef5cf12659aa2b1c159b3e31b
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 22:43:17 2025 +0000
Allow comparison to be run on multiple datasets
commit 1e48a4a53dbdb04733c50fd21717bdfc71df6228
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 22:16:45 2025 +0000
Add BG2050 and BG2018 datasets to dataset_config.py
commit d99c79e002b10aa1ed92f5bb4bd0693b7a0eb79c
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 22:04:55 2025 +0000
Improve downloading of real data
- Check if data already exists before downloading
- Add option to force re-download of data
- Change paths to include dataset name if known
- Download all datasets from the config, by default
- Add progress bars
commit 46eac1be5c6f0bc0af540f6c32da9e4b097f4513
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 22:00:15 2025 +0000
Allow to run tests in parallel
Particularly useful now that we have multiple tests.
commit f43cbc051014b508bc02d07c6979764b81ad20b4
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 15:52:48 2025 -0600
freup space
commit 26f023c2ddc49cdb65c5bfccaf772949bf5cf635
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 15:42:21 2025 -0600
shared, test db
commit fb6beebe92c467b1530e7e6d792cd916bfd08159
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 15:39:07 2025 -0600
add access keys
commit ad85a95162da7d4d8d7f6a376c852a057679dc82
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 15:22:00 2025 -0600
update region
commit aa69f3f50b091bf49262e642a14451a3f1349a7c
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 21:09:39 2025 +0000
Configure VScode to run pytest on the regression tests
commit 457125e1355d1ac2f578da4fdc987c6c86c7cea2
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 21:06:31 2025 +0000
Run regression tests on all available datasets by default
commit 44ef75c2f9c85c3052938c47f37b5e8fc6370b47
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 15:02:30 2025 -0600
ensure dynamo tables created
commit 29ee422bc0a37a2174f7b0bb4ed8969448689cb1
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 21:01:54 2025 +0000
Fix regression tests' logic
The golden tests should not be generated by the tests, but by the developer once.
commit 695ede06c2cc7f00b36f0732950dd341c676430e
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 14:47:09 2025 -0600
remove duplicate data
commit ae175168f1d30324b9843f5ce8452160ae47e298
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 12:37:03 2025 -0600
add real data, update action
commit 1ef3c064a6c9f1738e49cd02a9585714429b3e47
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 09:33:40 2025 -0600
use pol.is baseurl -- actions
commit f1ce4aa9b5200b7419352d3841c13817672fdc37
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 09:15:10 2025 -0600
try more robust action -- actions
commit a5b4e6cf5f5c1a143d873a209672f5d8efe4c91c
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 09:04:53 2025 -0600
remove pg check again - actions
commit 966a551152cedcff20d9698f53b381aa4d4d59ab
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 08:57:02 2025 -0600
change healthcheck - actions
commit 585cd2b3eb6bb360acef04d29fd15b5cc163537a
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 08:37:39 2025 -0600
remove pg check
commit dd1c49225c295a1e27d214b29a15871c421bc426
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 14:33:33 2025 +0000
Improve benchmark: 3 runs, statistical test
commit 55d39a2a242b01384d1434e03570f3cc4f33aa01
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 08:24:10 2025 -0600
actions - change baseUrl
commit 977813708ea9be794b2caa6384dc5908b4e9c249
Author: tevko <tim@devzero.io>
Date: Wed Nov 12 08:07:00 2025 -0600
actions - mount volume tests
commit 017d6863a7166c22589a836ceaa3c7b00e983246
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 13:45:15 2025 +0000
Refactor regression test and add basic benchmark
commit d90dd3cc63f472171f637ab7cee42dd3514c8129
Author: Julien Cornebise <julien@cornebise.com>
Date: Wed Nov 12 09:41:30 2025 +0000
Fix comparer and recorder to properly record and compare
Saves PCA, clusters, etc
commit b42758299c396df2f716100b71d7f15a70d28cf1
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 21:40:00 2025 -0600
another actions fix again again again
commit 411c75108ea5e45b5b0f211ec08cd3e9aa273106
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 21:29:13 2025 -0600
action fix again
commit d9421e1bd038ca6de9e481dbd71e358dabaa2f7e
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 21:19:04 2025 -0600
another actions fix
commit be7bfbbb85c8dcadce7ff0f23ff551066bc7ce9c
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 21:10:20 2025 -0600
update action again
commit dac822b23d5032d7ba3c56fa3cc3078f503a29e5
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 21:09:44 2025 -0600
add delphi service to test
commit 2b3aa6c0677394e5faaa813cc7526b8a0664ec82
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 20:27:46 2025 -0600
actions update 2
commit 483c5b47d937c4dae1e35f62b9794d94102ab298
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 20:15:37 2025 -0600
fix action
commit 008bd2202547b48f05b7a866b1ca532036225947
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 20:07:15 2025 -0600
fix all tests
commit 09edb21c4b7a8c6be02c7ac6b57c2631bd694e3a
Author: tevko <tim@devzero.io>
Date: Tue Nov 11 16:30:06 2025 -0600
use env for data script
commit b9f8f60b04c2905c63822bac48fd0c84720bb05d
Author: Julien Cornebise <julien@cornebise.com>
Date: Tue Nov 11 12:54:29 2025 +0000
First draft of regression tests based on recorder
The output is not yet the kind of exhaustive result I was expecting,
so needs more work.
Done with Claude.
commit 531280b6f27dab65d409b997066923d880935e8e
Author: tevko <tim@devzero.io>
Date: Mon Nov 10 22:32:39 2025 -0600
update action 3
commit ea8f989b41f5062f9a6e18401bf1726c61b7037b
Author: tevko <tim@devzero.io>
Date: Mon Nov 10 22:24:08 2025 -0600
update action 2
commit 779f5dd42cdf854376c7e606016badfb06d60ff6
Author: tevko <tim@devzero.io>
Date: Mon Nov 10 22:17:10 2025 -0600
update action
commit 359dbe387df39112db875409d5e23ac4afa4d441
Author: tevko <tim@devzero.io>
Date: Mon Nov 10 22:05:53 2025 -0600
add action
commit cb33f2d321ecc407397d5b7cd36911105bd634ee
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 10 13:16:35 2025 +0000
Exclude Conversation serialization tests
Until https://github.com/compdemocracy/polis/issues/2284 is resolved
commit 5a9d60add5ca359d5986601350143469c91c66e9
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 10 10:05:42 2025 +0000
Add assert failure messages
commit 78a27df39eb50b569a1aab76ef40716052b2a9a2
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 21:02:12 2025 +0000
Refactor test_repness_comparison.py to proper pytest structure
Similar to pca tests, refactor test_repness_comparison.py
- Converts test_comparison() function to TestRepnessComparison class
- Uses @pytest.mark.parametrize for multiple datasets
- Proper fixtures for clojure_results, conversation, python_results
- Two test methods: test_structural_compatibility and test_comparison_visibility
- Replaces print() with logging.info/debug
- Adds warning that results are known to be very different
- Reports comparison results for visibility without asserting on match rates
- Maintains comparison functionality for manual inspection
Test results: 4 tests passed (2 datasets × 2 test methods)
commit bc4f9e0bb37ee24da8b51a9dbd694804802f2631
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 19:27:07 2025 +0000
Rename test_repness.py to test_repness_unit.py for clarity
Rename to clarify that these are unit tests with synthetic data,
following the same naming convention established for PCA tests:
- test_repness.py → test_repness_unit.py (unit tests, synthetic data)
- test_repness_smoke.py (real data, smoke tests - already renamed)
- test_repness_comparison.py (Python vs Clojure - already clear)
This mirrors the PCA test structure:
- test_pca_unit.py (unit tests)
- test_pca_edge_cases.py (edge cases)
- test_pca_smoke.py (smoke tests)
All 14 tests pass:
- Statistical utility functions (z-scores, proportion tests)
- Comment statistics calculation
- Representative comment selection
- Consensus selection
- Integration tests (conv_repness, participant_stats)
commit 41355a6161cb6cd1d4564d99e7ea580f63e66064
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 19:19:56 2025 +0000
Refactor repness smoke test
Similar to how we refactored the "direct PCA" tests
commit c3947d1d45904e88daf4973be8189d4f74a65f10
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 19:04:05 2025 +0000
Ignore warning from library ddtrace in pytest
commit 622adb4adc71058e77514ab2c6d20b34561627d6
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 19:01:51 2025 +0000
Clarify the naming of PCA test files and remove redundant tests
commit 0a0b55ef5990741c23c97afa9c5557c05e65db63
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 18:57:28 2025 +0000
Refactor direct_pca_test.py to test_pca_smoke.py with pytest structure
Converted legacy procedural test script to proper pytest:
- Class-based structure with TestPCAImplementation
- Parametrized tests for all datasets
- Fixtures for vote matrix loading
- Proper logging instead of prints
- Smoke test warning (no correctness validation)
- Tests: runs without error, projection statistics, clustering
Tests PCA functions directly (not through Conversation class).
commit 43593a03751b5caf63b41e99455199c8c53eaf10
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 18:26:20 2025 +0000
Fix direct conversation test
- Convert to proper pytest format, not standalone script
- Use fixtures for setup/teardown
- Warn it is test to check Conversation class instantiation and method calls
- Replace prints by logging
- Parametrize the test to run over all available real_data
- Add some dimension and attributes assertions
- Rename to test_conversation_smoke.py
commit c717c472b02758a684addfd588b57d130d390b34
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 18:12:07 2025 +0000
Fix buggy test that blocked pytest collection
The `test_batch_id.py` was running code at load time, and that code had an error,
thus crashed during pytest collection, preventing all tests from running.
By refactoring into a proper test function, pytest can now collect all tests and run them.
We also fix the error itself, which was a missing escape of the "scan" reserved word in DynamoDB.
commit 84547f2f76b6b44c63f848319886377c8d6c7ae5
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 17:54:50 2025 +0000
Clarify terms in messages and comments
commit 23d1833fb44f3e05969e800e871c62fefab85880
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 17:54:30 2025 +0000
Fix path...
commit 6adbd51da65235c92c3a5c1f1a09fa404c4ed55b
Merge: d560fe66 b8df940f
Author: Julien Cornebise <julien@cornebise.com>
Date: Sun Nov 9 11:24:27 2025 +0000
Merge branch 'edge' into replace_named_matrix
commit d560fe6653a64c093589fedd5f8855349391bddb
Merge: be3d50e9 c5ec8994
Author: Julien Cornebise <julien@cornebise.com>
Date: Sat Nov 8 11:45:13 2025 +0000
Merge remote-tracking branch 'upstream/edge' into replace_named_matrix
commit be3d50e97669fdde45971917cb5b3ac58cf54288
Author: Julien Cornebise <julien@cornebise.com>
Date: Sat Nov 8 11:44:22 2025 +0000
Print whether comment priorites are missing from test data
commit d7970d8f4b0aef03dbfab30ba54b7cd4b688c17d
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 7 12:38:16 2025 +0000
Refactor real_data loading
Remove duplication, allow for automatic finding of the files within a location,
allow for generalisation to other conversations than the two used so far.
commit f5ac66916db7c6b7541058abe151cb54b1caff3c
Author: Julien Cornebise <julien@cornebise.com>
Date: Fri Nov 7 10:55:55 2025 +0000
Create script to download real data for tests
This is useful if no folder `real data` was provided. I suspect these tests were
written with a `real data` folder already in place. I do not have it, therefore
we need to download it. See the `README` file that has been updated.
commit 23cced099659c9d256653f8085d2999760d45caa
Author: Julien Cornebise <julien@cornebise.com>
Date: Thu Nov 6 13:51:13 2025 +0000
Extract common function to utils file
That function was defined 3 times in 3 different files.
commit 3c6e7880f3b7e8076d3a339b3cc8b71f1f3adb1f
Author: Julien Cornebise <julien@cornebise.com>
Date: Mon Nov 3 17:57:21 2025 +0000
Add type hint in some poller functions
* Fix run_math_pipeline test import to use proper package path
The test file was importing `from run_math_pipeline import main` which
failed locally because `run_math_pipeline.py` lives inside the `polismath`
package at `delphi/polismath/run_math_pipeline.py`.
CI was working around this by copying the file to a flat location:
docker cp delphi/polismath/run_math_pipeline.py delphi:/app/run_math_pipeline.py
This created a discrepancy between local and CI environments.
The fix:
1. Update test imports to use the correct package path:
`from polismath.run_math_pipeline import main`
2. Update mock.patch paths to match:
`mock.patch('polismath.run_math_pipeline.fetch_comments', ...)`
3. Remove the CI workaround that copied the file to /app flat
4. Simplify coverage to `--cov=polismath` (run_math_pipeline is inside it)
The Docker image already has `polismath/` at `/app/polismath/` and the
package is installed via `pip install --no-deps .`, so the proper import
path works in both local and CI environments.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Improve CI coverage reporting reliability
Changes to the CI workflow:
1. Print coverage report to workflow logs (always visible)
2. Upload coverage report as downloadable artifact
3. Make PR comment step non-fatal with continue-on-error: true
(fork PRs cannot post comments due to GitHub token restrictions)
Coverage is now accessible three ways:
- In the workflow logs (step 7)
- As a downloadable artifact (step 8)
- As a PR comment when permissions allow (step 9)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add graceful error handling for coverage comment on fork PRs
Instead of showing an unhandled error when posting coverage comments
fails on fork PRs, the script now catches the 403 error and displays
a helpful message explaining:
- Why the comment could not be posted (GitHub token permissions)
- Where to find the coverage report (logs and artifact)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix test for malformed votes
Malformed votes should be ignored.
* Clean up unused variables and imports
Address GitHub Copilot review comments:
- Log superseded votes count in conversation.py instead of leaving unused
- Remove unused p1_idx/p2_idx index lookups in corr.py
- Remove unused all_passed variable in regression_comparer.py
- Remove unused imports (numpy, Path, List, datetime, stats, pca/cluster functions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Only run python-ci for delphi changes; minimize output (#2315)
* Only run python-ci for delphi changes; minimize output
* address PR feedback
* Revert "Merge branch 'stable' into edge" (#2305)
This reverts commit 51665ab3b552e406526364d7e8fc5a0be7bd8277, reversing
changes made to 3901ee5fcd134adfe498ca6a76a89ab5c1cda3a6.
* add narrative pipelline test (#2307)
* add narrative pipelline test
* change filename
* slight mocking adjustment
* mock sentence transformer
* better evoc
* try massaging mock data again
* more mocking
* diff mock strategy
* fix cov report
* test 500 gen embed
* syntax fixes
* update sytax again
* syntax fix again
* attempt mock fix
* another mock attempt
* fix action
* fix action again
* actions fix
* add another test
* add another test
* fix test
* Alpha visualization (#2302)
* add client-visualization submodule
* add pca visualization to alpha client
* show user in the data viz
* fetch and animate new pca data
* remove gitmodule
* use concaveman lib; update package.json; use gray color; only show when vis_type is set
* reset selected statement when group changes
* update astro types
* include remaining comment count
* Bump js-yaml from 4.1.0 to 4.1.1 in /e2e (#2292)
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1)
---
updated-dependencies:
- dependency-name: js-yaml
dependency-version: 4.1.1
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump js-yaml from 3.14.1 to 3.14.2 in /cdk (#2298)
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 3.14.1 to 3.14.2.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/3.14.1...3.14.2)
---
updated-dependencies:
- dependency-name: js-yaml
dependency-version: 3.14.2
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump glob from 10.3.16 to 10.5.0 in /client-report (#2300)
Bumps [glob](https://github.com/isaacs/node-glob) from 10.3.16 to 10.5.0.
- [Changelog](https://github.com/isaacs/node-glob/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/node-glob/compare/v10.3.16...v10.5.0)
---
updated-dependencies:
- dependency-name: glob
dependency-version: 10.5.0
dependency-type: direct:development
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump js-yaml in /client-admin (#2309)
Bumps and [js-yaml](https://github.com/nodeca/js-yaml). These dependencies needed to be updated together.
Updates `js-yaml` from 4.1.0 to 4.1.1
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1)
Updates `js-yaml` from 3.14.1 to 3.14.2
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1)
---
updated-dependencies:
- dependency-name: js-yaml
dependency-version: 4.1.1
dependency-type: indirect
- dependency-name: js-yaml
dependency-version: 3.14.2
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Speed up repness 11x (#2316)
* Optimize update_votes with vectorized pivot_table (5x speedup)
Replace the row-by-row for-loop in update_votes with a vectorized
pivot_table approach. This dramatically speeds up vote loading for
large datasets.
Performance on bg2050 dataset (1M+ votes, 7.8k participants, 7.7k comments):
- Before: 18.5s average, 56k votes/sec
- After: 3.5s average, 295k votes/sec
- Speedup: 5.3x overall, 16x for the batch update step
The optimization:
1. Use pivot_table to reshape long-form votes to wide-form matrix
2. Use DataFrame.where() to merge with existing matrix
3. Use float32 for intermediate matrix to halve memory usage
Also adds a benchmark script at polismath/benchmarks/bench_update_votes.py
for measuring update_votes performance.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Vectorize _compute_vote_stats and make benchmark standalone
- _compute_vote_stats: Replace per-row/per-column loops with numpy
vectorized operations using boolean masks and axis-based sums.
This eliminates O(rows + cols) Python loops.
- bench_update_votes.py: Make standalone by accepting CSV path directly
instead of depending on datasets package. Add TODO for using datasets
package once PR #2312 is merged.
Combined with pivot_table optimization, achieves ~10x speedup on bg2050
dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix: Remove misleading float32 conversion in update_votes
Addresses GitHub Copilot review comments on PR #2313:
- Removed float32 conversion that only provided temporary memory savings
- The comment was misleading as savings didn't persist after .where()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix: Use vectorized pandas operations in benchmark loader
Replace iterrows() with rename() + to_dict('records') for efficiency,
as suggested by GitHub Copilot review.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add timing logging for PCA and repness
* Add benchmark script for repness
* Add profiling to benchmark for repness
* Vectorize vote count: 2x speedup on large convos
* Extract common setup code
* Rename vote_matrix to vote_matrix_df for clarity
* Keep NaNs instead of None: 2x more speedup
* Refactor conv_repness() to use long-format DataFrame
Convert wide-format vote matrix to long-format using melt() and use
vectorized pandas groupby operations instead of nested loops.
Key changes:
- Add compute_group_comment_stats_df() for vectorized (group, comment) stats
- Add prop_test_vectorized() and two_prop_test_vectorized() for batch z-tests
- Add select_rep_comments_df() and select_consensus_comments_df() for
DataFrame-native selection, converting to dicts only at the end
- Compute "other" stats as total - group instead of recalculating
- Use MultiIndex.from_product() to ensure all (group, comment) combinations
Test changes:
- Add test_old_format_repness.py to preserve backwards compatibility tests
- Add TestVectorizedFunctions class with 8 tests for new DataFrame interface
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Shorten imports as per GH Copilot Review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update docstring as per GH Copilot Review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Remove unused import as per GH Copilot Review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Move profiler to within profiling function as per GH Copilot review
* Remove unused import as per GH Copilot review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Profile new functions
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* some lib updates (#2323)
* remove express from oidc-simulator; update other libs
* pin auth0-simulator to 0.10.2
* e2e lib updates
* client-admin lib updates
* fix delphi dockerfile -- torch versions for cpu
* Bump js-yaml in /client-report (#2317)
Bumps and [js-yaml](https://github.com/nodeca/js-yaml). These dependencies needed to be updated together.
Updates `js-yaml` from 4.1.0 to 4.1.1
- [Changelo…
|
Thanks @ballPointPenguin ! Rebasing then merging :) |
…flag - Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/ - Add --include-local pytest flag to include git-ignored local datasets - Add .local/ to .gitignore for confidential/large datasets - Simplify datasets.py with DatasetInfo dataclass and discovery functions - Add conftest.py with pytest hooks for dynamic test parametrization - Update download_real_data.py to default to .local/ with --commit flag - Add unit tests for dataset discovery in test_datasets.py - Update tests/README.md with new documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Use any() instead of bool(list()) in _check_files for efficiency - Add multiple match validation in find_file - Fix pytest.skip() during collection (use empty parametrize instead) - Add directory context comment to test_regression.py usage - Remove unused list_regression_datasets import - Rename TestDirPattern to TestDirectoryPattern - Improve error message in regression_download.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Warn when local dataset shadows a committed dataset with same name - Add test for include_local=True behavior - Add test for name collision warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Allow datasets to be valid for regression testing without the Clojure math_blob file. This enables testing when database access is unavailable (e.g., when DATABASE_URL is not set). Changes: - DatasetInfo.is_valid now only requires votes, comments, and golden_snapshot - Added has_clojure_reference property to check if Clojure comparison is possible - Updated documentation to clarify math_blob is optional - Added tests for new behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- test_legacy_clojure_regression.py: Replace hardcoded ["biodiversity", "vw"] with auto-discovery using clojure_dataset fixture. Only includes datasets with has_clojure_reference=True (i.e., have math_blob for Clojure comparison). Respects --include-local flag. - regression_download.py: After download, check for missing golden_snapshot.json and offer to create them interactively. Shows command to create later if user declines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Clojure comparison tests only need votes, comments, and math_blob. They compare against the Clojure output, not the Python golden snapshot. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ata download failures better
d04c2ca to
41a3e62
Compare
…flag (#2312) * Add auto-discovery for regression test datasets with --include-local flag - Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/ - Add --include-local pytest flag to include git-ignored local datasets - Add .local/ to .gitignore for confidential/large datasets - Simplify datasets.py with DatasetInfo dataclass and discovery functions - Add conftest.py with pytest hooks for dynamic test parametrization - Update download_real_data.py to default to .local/ with --commit flag - Add unit tests for dataset discovery in test_datasets.py - Update tests/README.md with new documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Address Copilot review feedback - Use any() instead of bool(list()) in _check_files for efficiency - Add multiple match validation in find_file - Fix pytest.skip() during collection (use empty parametrize instead) - Add directory context comment to test_regression.py usage - Remove unused list_regression_datasets import - Rename TestDirPattern to TestDirectoryPattern - Improve error message in regression_download.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add warning for name collisions and test for include_local - Warn when local dataset shadows a committed dataset with same name - Add test for include_local=True behavior - Add test for name collision warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Make math_blob optional for regression testing Allow datasets to be valid for regression testing without the Clojure math_blob file. This enables testing when database access is unavailable (e.g., when DATABASE_URL is not set). Changes: - DatasetInfo.is_valid now only requires votes, comments, and golden_snapshot - Added has_clojure_reference property to check if Clojure comparison is possible - Updated documentation to clarify math_blob is optional - Added tests for new behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Auto-discover datasets in tests and prompt for golden snapshots - test_legacy_clojure_regression.py: Replace hardcoded ["biodiversity", "vw"] with auto-discovery using clojure_dataset fixture. Only includes datasets with has_clojure_reference=True (i.e., have math_blob for Clojure comparison). Respects --include-local flag. - regression_download.py: After download, check for missing golden_snapshot.json and offer to create them interactively. Shows command to create later if user declines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix: Clojure tests do not require golden_snapshot Clojure comparison tests only need votes, comments, and math_blob. They compare against the Clojure output, not the Python golden snapshot. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix name in examples as per GH Copilot review * clarify dotenv ".env" location; unignore .gitkeep in .local; handle data download failures better --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Bennie Rosas <ben@aliencyb.org>
Summary
This PR adds support for testing with local datasets that are not committed to the repository. This enables:
real_data/.local/and it's auto-discovered, no config changes neededChanges
real_data/andreal_data/.local/based on directory naming pattern<report_id>-<name>/--include-localpytest flag to include git-ignored local datasets.local/to.gitignorefor confidential/large datasetsdatasets.pywithDatasetInfodataclass and discovery functionsconftest.pywith pytest hooks for dynamic test parametrizationdownload_real_data.pyto default to.local/with--commitflagtest_datasets.pytests/README.mdwith new documentationUsage
Test plan
pytest delphi/tests/test_datasets.pyto verify dataset discoverypytest delphi/tests/test_regression.pywith committed datasetsreal_data/.local/and verify it's discovered with--include-local.local/directory is properly git-ignored🤖 Generated with Claude Code