Initial data portal endpoints #324

jeffbaumes · 2023-10-11T17:33:33Z

No description provided.

* replace dead link 'nmdc-metadata' with 'issues' repo * update name of make command to ssh into nersc mongo dbs --------- Co-authored-by: Jing - Peters MBP <jingcao.yale@gmail.com>

@op

* initial checkin with base class and basic tests - Base ChangeSheet write class - unit tests for base class * add conftest and gold changesheet tests - move test fixtures to conftest.py - add get_biosample_name function and unit test to GoldBiosample generator * update biosample name unit test add explicit expected values * Sketch out functions for gold changesheet generator * function and test for missing GOLD ecosystem metadata * add function and test for missing gold_biosample_identifiers * add get_normalized_gold_biosample_identifier * update logic with omics processing step * skeleton find_omics_processing_set function, and updated (correct this time) test data files * Add Omics to Biosample map - add omics_to_biosample map imput - added nmdc / gold BioSample comparison logic - unit tests - stub API dependent methods * Add changesheets.py pachage for common functions and classes - Changesheet and ChangesheetLineItem classes - API @op functions * refactor to split omice procesing data file read to stand-aloine function * more refactoring and code cleanup * add test generation job * add resource definitions and config * refactor and code cleanup Simplify to just ChangeSheet and ChangeSheerLineItem classes * Cleanup this branch to focus on getting assets working * fix defs and fetch statement * get basic GOLD asset generation working * Add Api resources as ConfigurableResources * Add asset scaffolding * update normalizer functions to all take and return strings * update resources add empty click script * fix gold ID normalization and add unit tests * implement compare biosamples and write_changesheet * add omics reccord comparison * Add validate_changesheet method * cleanup unused data files * fix validate_changesheet method and add logging * delete dagster asset based code and tests - move to a demo branch * add changesheet_output to .gitignore * add changesheet_output to .gitignore * remove Dagster-related code and settings * style: format with black * Use TypeAlias for JSON_OBJECT * Removed hard-coded URL from Changesheet.validate() * remove .tsv file - should be ignorewd * clarify function name and blacken formatting * fix click options help text and blacken * yet more blackening * uncomment wait-for-it * Delete get_data.ipynb * Revert "Delete get_data.ipynb" This reverts commit fe3e68a. * add docstring for generate_changesheet * automatic reformatting * bring get_data noteback back to original state * add some logging * update to use gold_sequencing_identifiers over alternative_identifiers * Delete neon_cache.sqlite * strip and de-tab the value in tsv output * set default line_items in changesheet class correctly * update output_dir type hint * remove apply_changes option * Dry up unfindable logging * Clean up gold normalization and documentation * fix: style --------- Co-authored-by: Donny Winston <donny@polyneme.xyz>

…urrent nmdc-schema

…ce OmicsProcessing and DataObject records

fixes #337

* fix: run `bump-pydantic nmdc_runtime` and apply closes #339 addresses #343 * fix: @model_validator refactor closes #343

model-field ranges with `Query`-annotated types aren't covered by the automated bump-pydantic tool.

re-submission of "same" changes is a valid use case closes #340

add regression test closes #345

dwinston · 2023-11-03T17:07:43Z

coordinated with microbiomedata/nmdc-server#1037

dwinston · 2023-11-03T17:12:44Z

note: denormalization of mongo collections for data portal, via a series of mongo aggregation pipelines (python nmdc_runtime/api/endpoints/portal_denormalize.py), takes approx 50s on my laptop, against a local mongorestore of the production db.

jeffbaumes and others added 17 commits October 11, 2023 10:30

Initial data portal endpoints

986d06c

Denormalize script

8568391

style: black refmt

507e9c4

refactor

0e8764f

Update links in readme (#328)

f1820af

* replace dead link 'nmdc-metadata' with 'issues' repo * update name of make command to ssh into nersc mongo dbs --------- Co-authored-by: Jing - Peters MBP <jingcao.yale@gmail.com>

Update submission portal translator to pass validation according to c…

61207b3

…urrent nmdc-schema

Add op for getting CSV data from arbitrary URL

ad0db60

Update submission portal graphs to accept a CSV mapping file to produ…

9807446

…ce OmicsProcessing and DataObject records

Revert docker compose test changes

45f8469

Revert docker compose test change

40b1b9e

Add additional comments and improve variable names

dd1e065

fix: pin to fastapi>=0.104.1 to pin swagger ui

ed7de9a

fixes #337

migrate to pydantic v2 (#344)

bb5f1df

* fix: run `bump-pydantic nmdc_runtime` and apply closes #339 addresses #343 * fix: @model_validator refactor closes #343

hotfix: pydantic v2 migration

df1e0a1

model-field ranges with `Query`-annotated types aren't covered by the automated bump-pydantic tool.

fix: de-duplicate metadata submission after one minute (#347)

cbead62

re-submission of "same" changes is a valid use case closes #340

fix: ensure pydantic models serialize to json-compatible dicts. (#346)

bd9df67

add regression test closes #345

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial data portal endpoints #324

Initial data portal endpoints #324

jeffbaumes commented Oct 11, 2023

dwinston commented Nov 3, 2023

dwinston commented Nov 3, 2023

Initial data portal endpoints #324

Are you sure you want to change the base?

Initial data portal endpoints #324

Conversation

jeffbaumes commented Oct 11, 2023

dwinston commented Nov 3, 2023

dwinston commented Nov 3, 2023