Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
296 commits
Select commit Hold shift + click to select a range
d422776
feat(v2): add shared intermediates envelopes and stats stages
caviri Feb 24, 2026
a20c4f7
feat: add v2 logfire bootstrap module
caviri Feb 24, 2026
99912e4
feat(logfire): enhance connectivity checks and credential handling fo…
caviri Feb 24, 2026
cdb5104
feat(v2): instrument request, agent, and pipeline tracing
caviri Feb 24, 2026
83a2a17
feat(v2): add run correlation metrics and structured error events
caviri Feb 24, 2026
a0b5389
feat(v2): add generated model freshness and ttl-schema CI gates
caviri Feb 24, 2026
e11b73e
feat: add v2 migration gates, provider throttling, and parity docs
caviri Feb 24, 2026
63d5edf
chore: update .gitignore to include .logfire
caviri Feb 25, 2026
10d34d1
chore: clean up .gitignore by removing unnecessary entries
caviri Feb 25, 2026
1e5c164
feat(v2): introduce cache-bypass controls and update documentation fo…
caviri Feb 25, 2026
0e6296f
feat(v2): restrict GitHub repository-mode traversal to direct entitie…
caviri Feb 25, 2026
ab38032
feat(v2): add support for optional GitHub author inclusion and enhanc…
caviri Feb 25, 2026
84fa6e5
feat(v2): add typed entity buckets and publication contracts
caviri Feb 25, 2026
84625ee
feat(v2): add deterministic class agents for article links
caviri Feb 25, 2026
6476eae
feat(v2): orchestrate six-class agent stages
caviri Feb 25, 2026
b3f65fa
feat(v2): integrate reconciliation-first strict and SHACL extract gates
caviri Feb 25, 2026
af6ad22
feat(v2): finalize extract output contracts and jsonld build stage
caviri Feb 25, 2026
9d12c63
feat(v2): integrate graph store writes and intermediates APIs
caviri Feb 25, 2026
955f50e
test(v2): tighten extract and graph regression contracts
caviri Feb 25, 2026
463386d
fix(v2): normalize repository creation timestamps for strict extract …
caviri Feb 25, 2026
b654919
v2: enforce uuid4 agent ids and disable synthetic fallbacks by default
caviri Feb 26, 2026
d4346b5
chore: update package dependencies in uv.lock and modify test script …
caviri Feb 26, 2026
4bfdc90
chore(v2): update AGENTS.md to reflect new entry task and adjust Info…
caviri Feb 26, 2026
da352b8
feat(v2): enhance person fanout orchestration to skip GitHub organiza…
caviri Feb 26, 2026
abd7c8c
feat(v2): enhance organization and membership resolution with alterna…
caviri Feb 26, 2026
ea9a199
chore(v2): update AGENTS.md with new entry task and enhance RDF coerc…
caviri Feb 26, 2026
59c9e75
feat(v2): implement JSON-LD context enhancements and organization ide…
caviri Feb 26, 2026
56ba704
Canonicalize pre-resolved v2 entity IDs by idSource
caviri Feb 26, 2026
2f81bd4
Aggregate class entities from stats and relax class empty retries
caviri Feb 26, 2026
51edb7c
Fail fast on required GitHub errors and tighten person/org linking
caviri Feb 26, 2026
e26839b
Promote JSON-LD IRI typing for license citation and org hierarchy
caviri Feb 26, 2026
d7ca8b4
Improve org alias resolution and harden org-account person filtering
caviri Feb 26, 2026
00d6775
Reduce non-actionable warning noise in person and class agents
caviri Feb 26, 2026
f3e9fae
Reduce actionable GIMIE warnings via aliasing and author filtering
caviri Feb 27, 2026
b4ae7c9
fix(v2): Enforce ontology parity by removing `schema:alternateName` f…
caviri Feb 27, 2026
b8dc349
v2 article agent: normalize year-only dates and enrich warning context
caviri Feb 27, 2026
500bc63
v2 article warnings: dedupe unresolved-id notice and add match counts
caviri Feb 27, 2026
559f698
feat(v2): Remove v2 dependency on the v1 TTL cache system, eliminatin…
caviri Feb 27, 2026
30df232
feat(v2): Introduce LLM repository agent with runtime selection and t…
caviri Mar 3, 2026
e3007d4
feat(v2): Add logging configuration to LLM repository agent for impro…
caviri Mar 3, 2026
e762af4
feat(v2): Enhance testing workflows and documentation. Add `pytest-te…
caviri Mar 3, 2026
acd44a5
feat(v2): Introduce LLMPersonAgentV2 for person metadata extraction. …
caviri Mar 4, 2026
6cbfe4d
refactor(justfile): Update linting and type-check commands to run wit…
caviri Mar 4, 2026
f47b34f
chore: Remove debug cache manager database file to clean up temporary…
caviri Mar 4, 2026
537570b
feat(v2): Implement runtime prompt context propagation and enforce pe…
caviri Mar 4, 2026
ca719c5
chore: Remove the RISKS.md file, which contained a risk register for …
caviri Mar 4, 2026
122fa0a
feat(v2): Introduce LLMOrganizationAgentV2 for organization metadata …
caviri Mar 4, 2026
acedc6c
refactor(v2): Remove `schema:alternateName` from organization handlin…
caviri Mar 4, 2026
ff83160
feat(v2): Introduce LLMArticleAgentV2, LLMContributionAgentV2, and LL…
caviri Mar 4, 2026
941ffc9
feat(v2): Introduce LLMLinkVeracityAgentV2 for link relationship veri…
caviri Mar 4, 2026
a8e8765
feat(v2): Enhance entity reconciliation and validation processes. Imp…
caviri Mar 4, 2026
152e0e9
feat(v2): Enhance API and pipeline with link veracity verification an…
caviri Mar 5, 2026
47bc3c9
fix(tests): Update expected triples count in GitHub repository extrac…
caviri Mar 5, 2026
4039e41
feat(v2): Enhance organization identity reconciliation and prompt con…
caviri Mar 5, 2026
1d1d44c
feat(v2): Introduce LLM deduplication and critic stages in extraction…
caviri Mar 5, 2026
08ea271
feat(v2): Enhance LLM extraction pipeline with new tools and link val…
caviri Mar 6, 2026
7661c2b
feat(v2): Add context summary feature to LLM extraction pipeline. Int…
caviri Mar 6, 2026
b7d97de
fix: Update base URL for OpenAI-compatible model configurations to co…
caviri Mar 6, 2026
6c7829d
feat(v2): Add prototype body-based extract endpoint and related enhan…
caviri Mar 6, 2026
641050e
refactor: Improve error handling and logging in GIMIE analysis. Enhan…
caviri Mar 25, 2026
151d2f6
feat(devcontainer): Add SSH feature and password setup script. Update…
caviri Mar 25, 2026
e5547c2
chore(env): Update .env.example to include optional devcontainer SSH …
caviri Mar 25, 2026
89989da
feat(devcontainer): Introduce docker-compose setup for development en…
caviri Mar 25, 2026
03a3e27
feat(devcontainer): Update environment configuration for caching and …
caviri Mar 25, 2026
8ac08d9
feat(devcontainer): Enhance .env.example and docker-compose.yml with …
caviri Mar 25, 2026
5357a67
chore: Update .env.example and .gitignore for improved development se…
caviri Apr 29, 2026
0116416
feat(refactoring): Introduce new agents and modules for EPFL relation…
caviri Apr 29, 2026
34294a7
refactor(v2): Remove deprecated compatibility layer and legacy import…
caviri Apr 29, 2026
629c42e
refactor(simplifying v2): Replace JSONLDExporter with load_jsonld_con…
caviri Apr 29, 2026
4a66f0d
refactor(v2): Remove Logfire integration and related observability co…
caviri Apr 29, 2026
2c9fd90
refactor(v2): Remove obsolete scripts and testing utilities. Delete c…
caviri Apr 29, 2026
997a691
feat(v2 post endpoint): Enhance environment configuration and API doc…
caviri Apr 30, 2026
ff9139e
feat(org relationships): Implement LLM-based organization hierarchy d…
caviri Apr 30, 2026
1b7b467
feat(api): Add link veracity and max concurrent agents configuration.…
caviri Apr 30, 2026
f8f7c3c
feat(ownership check): Implement guarantee_repo_author function to ha…
caviri Apr 30, 2026
6ec99ee
feat(devcontainer): Add Qdrant service to Docker Compose configuratio…
caviri May 1, 2026
cbba58e
feat(rag system and docker compose): Remove obsolete .env.dist file a…
caviri May 1, 2026
cfd7656
feat(RAGs): Enhance .env.example with detailed settings for SWISSUbas…
caviri May 3, 2026
9777306
feat(auth): Implement bearer token authentication for all `/v1/*` rou…
caviri May 4, 2026
e53c63b
feat(env and documentation): Update .env.example to remove deprecated…
caviri May 5, 2026
ce4eb7b
feat(github integration): Enhance GitHub provider with retry logic fo…
caviri May 6, 2026
b8250e2
Merge pull request #28 from Imaging-Plaza/feat/open-pulse-ontology-v2…
caviri May 6, 2026
7ec4727
Update pyproject.toml
caviri May 6, 2026
e872885
feat(disciplines): Wikidata QID backfill, RAG score threshold, v2 cac…
caviri May 17, 2026
ce21df5
feat(v2 indices): async POST /v2/indices/{zenodo,huggingface}/ingest …
caviri May 17, 2026
7a91f29
feat(v2 indices): per-item ingest for github / openalex / orcid / ren…
caviri May 18, 2026
635645a
feat(v2 indices): uniform POST /v2/indices/{name}/search across the 8…
caviri May 18, 2026
8b67af3
feat(v2 indices): Open Access Monitor CH index — module + ingest/sear…
caviri May 18, 2026
4e04b09
feat(v2 agents): wire OAM-CH RAG provider as LLM/hybrid agent tool
caviri May 18, 2026
f0a7cd7
fix(v2 org agent): ROR country bias against acronym collisions (SDSC …
caviri May 18, 2026
6700f6f
fix(v2 pipeline): drop Contribution/Membership entities with missing …
caviri May 18, 2026
7dd7698
fix(v2 + v1): unblock v2/extract behind comma-list GITHUB_TOKEN + adv…
caviri May 18, 2026
dc2d64e
perf(v2 pipeline): parallel agents + LLM iteration cap + tighter arti…
caviri May 18, 2026
da24a0e
Merge pull request #39 from Imaging-Plaza/feat/v2-pipeline-perf
caviri May 18, 2026
9c1b701
feat(v2 article agent): deterministic OAM prefetch before LLM call
caviri May 18, 2026
8b07d46
ci(justfile): use PATH-resolved `python` in v2-models-check
caviri May 18, 2026
bb0edec
ci(tests): set GITHUB_TOKEN/API_TOKEN module-level defaults in v2 con…
caviri May 18, 2026
41f7783
ci(ontology): track *.ttl source-of-truth so CI can resolve v2.1.2 shape
caviri May 18, 2026
cc3df92
ci(ontology): track a-001/ test fixtures so test_roundtrip.py can run…
caviri May 18, 2026
bedec91
fix(tests): pass target_person/target_repository to LLM contribution …
caviri May 18, 2026
d6cd0e5
fix(tests): send Bearer auth header in v2 router/golden async clients
caviri May 18, 2026
a993e6b
test(v2): retire stale tests + regenerate goldens + match current API…
caviri May 18, 2026
d0783b9
Merge pull request #37 from Imaging-Plaza/feat/oamonitor-index
caviri May 18, 2026
9116fa6
feat(v2 scout): per-agent usage_limits override + TODO for general co…
caviri May 19, 2026
f7e7f9e
fix(infra): disable gunicorn worker recycle by default (long-poll inc…
caviri May 19, 2026
cfd3701
fix(v2 repo agents): emit pulse:isForkOf from GitHub metadata.parent.…
caviri May 19, 2026
aadc465
fix(v2 membership agent): emit-or-skip rules + repo-owner country prior
caviri May 19, 2026
36fd57f
fix(v2 repo agents): drop pulse:discipline catch-all when no domain s…
caviri May 19, 2026
b58736f
fix(v2): dedup body merge + composite-id `__` separator + drop empty …
caviri May 19, 2026
9aadfa7
fix(v2): URN collision-proof fallback + owner Contribution salvage + …
caviri May 19, 2026
071f05e
fix(v2 output_assembly): link ScholarlyArticles back to root repo via…
caviri May 19, 2026
b2f5829
fix(v2 article validation): normalise schema:identifier to canonical …
caviri May 19, 2026
1fe2506
feat(v2 api): auto-ingest public repos into GitHub RAG after extract
caviri May 19, 2026
7245aca
fix(v2): runtime composite separator + dual-org inverse + LLM person …
caviri May 19, 2026
97285c1
fix(v2): LLM person self-loop (dict form) + spurious-membership code …
caviri May 19, 2026
b3101f8
fix(v2 ownership): synthesized Person stubs no longer emit schema:url…
caviri May 19, 2026
2ff9b6f
fix(v2 output_assembly): final-pass schema:url self-loop sweep
caviri May 19, 2026
4fe5ba9
Merge pull request #40 from Imaging-Plaza/feat/scout-usage-limits-bump
caviri May 19, 2026
2c6e8b1
feat(v2 runtime): V2_LLM_MODEL_OVERRIDE to swap the model across all …
caviri May 20, 2026
3c7465f
fix(v2): demote GitHub-derived props to org units + RCP_TOKEN round-r…
caviri May 20, 2026
17ff486
fix(v2 reconciliation): merge duplicate memberships instead of droppi…
caviri May 20, 2026
3ee2451
fix(v2): restore pulse:isForkOf for fork repos (#B)
caviri May 20, 2026
88d9c3a
fix(v2 ownership): synthesize github unit when ROR carries handle wit…
caviri May 20, 2026
2b8d667
fix(v2): drop evidence-free Memberships + orphan Orgs (#A determinism)
caviri May 20, 2026
5bc7733
fix(v2 person agent): require ORCID or name overlap to accept Infosci…
caviri May 20, 2026
bb737ba
fix(v2 org agent): require name-token overlap on Infoscience + ROR ma…
caviri May 20, 2026
4f351ae
fix(v2 article agent): drop unresolved schema:sourceOrganization
caviri May 20, 2026
6e51f81
feat(v2): infer schema:sourceOrganization from author memberships by …
caviri May 20, 2026
6d02b3a
feat(v2): deterministic RAG-based discipline tagger for rule_based
caviri May 20, 2026
2cd4ca7
fix(v2 disciplines): strip markdown + use GitHub description in RAG q…
caviri May 20, 2026
e990846
fix(v2 enums): align DisciplineV2.INFORMATION_ENGINEERING with ontolo…
caviri May 20, 2026
200493e
fix(v2 disciplines): collect granular + ancestor chain instead of fir…
caviri May 20, 2026
7003950
perf(v2 context): light README clean once at the source
caviri May 20, 2026
86b62c0
fix(v2 jobs): heartbeat + stale-job detection so dead workers don't w…
caviri May 20, 2026
a343bc8
feat(v2 repository): surface README description + topics as internal …
caviri May 20, 2026
154d0b8
feat(v2 hybrid): additive discovery refiner — propose Persons/Orgs/Ar…
caviri May 20, 2026
a36e728
fix(v2 reconciliation): keep evidence-thin Memberships with ORCID+ROR…
caviri May 20, 2026
cde79fe
fix(v2 reconciliation): broaden ORCID+ROR anchor to any registered or…
caviri May 20, 2026
245b62f
fix(v2 reconciliation): surface dropped affiliation evidence (person+…
caviri May 20, 2026
9b8fcd2
feat(v2 hybrid): rescue refiner — LLM-judged re-instatement of droppe…
caviri May 21, 2026
b22662a
feat(v2 hybrid): feed repo-root attribution files to rescue + discove…
caviri May 21, 2026
44adcd1
feat(v2 ingest): also fetch `.github/` aux files (CODEOWNERS, FUNDING…
caviri May 21, 2026
75b8d10
fix(v2 ingest): fetch the real README via REST + bump discovery timeout
caviri May 21, 2026
dcc0286
feat(v2 hybrid): cross-identifier dedup + lab-Org reclassification in…
caviri May 21, 2026
2066f1d
feat(communities index): new DuckDB-backed index for institutional Ze…
caviri May 22, 2026
784c789
feat(zenodo): link records to communities + ingest heartbeats + lock …
caviri May 22, 2026
be850c8
feat(infoscience): add acronym + organisation metadata columns
caviri May 22, 2026
cab6292
feat(v2 hybrid): org_resolver refiner, internal metadata fields, GitH…
caviri May 22, 2026
23442e5
feat(huggingface): expand Swiss org seed + CERN discovery tokens
caviri May 22, 2026
acaf374
feat(v2 terminal-agent): experimental terminal-agent runtime PoC
caviri May 22, 2026
48c17af
fix(v2): green the v2-ci-gates test suite
caviri May 22, 2026
462b29a
docs: communities index page + zenodo scope refresh
caviri May 22, 2026
cc7e05e
fix(v2): regenerate agent models after schema patternProperties change
caviri May 22, 2026
eeaf7b4
fix(ci): point v1 parity step at the relocated tests/v1/ paths
caviri May 22, 2026
eb5f813
fix(ci): provide GITHUB_TOKEN to the v1 parity regression step
caviri May 22, 2026
437dbc7
fix(v1 tests): make root-response parity check version-agnostic
caviri May 22, 2026
cad57ed
feat(huggingface): promote 3 verified Swiss/EPFL orgs from discovery …
caviri May 22, 2026
e4dc139
Merge pull request #41 from Imaging-Plaza/feat/scout-usage-limits-bump
caviri May 22, 2026
642d292
feat(v2 api): document include_internal_fields on POST + apply flag t…
caviri May 22, 2026
0e12be9
fix(v2): honour V2_EXPAND_OWNED_REPOS in context gather, not just the…
caviri May 22, 2026
f46b811
Merge pull request #43 from Imaging-Plaza/fix/v2-owned-repo-fanout-an…
caviri May 22, 2026
4175a81
fix(image): copy config/ into the deployment image
caviri May 22, 2026
9467985
Merge pull request #45 from Imaging-Plaza/fix/image-copy-config-dir
caviri May 22, 2026
116e8e7
fix(v2 cache): report the real deleted count from ProviderCache.clear()
caviri May 22, 2026
94e4fb0
Merge pull request #46 from Imaging-Plaza/fix/provider-cache-clear-count
caviri May 22, 2026
7d71686
fix(zenodo): stop orphaning communities referenced by record_communities
caviri May 22, 2026
415b147
Merge pull request #47 from Imaging-Plaza/fix/zenodo-orphan-communities
caviri May 22, 2026
5cfd15d
feat(zenodo): add `backfill-communities` CLI subcommand
caviri May 22, 2026
5e3272b
Merge pull request #48 from Imaging-Plaza/fix/zenodo-orphan-communities
caviri May 22, 2026
ed196eb
feat(v2): emit internal fields under the gme-internal RDF namespace
caviri May 22, 2026
b7f7023
Merge pull request #49 from Imaging-Plaza/feat/gme-internal-namespace
caviri May 22, 2026
a40f0ed
feat(v2): resolve ROR parents with an LLM selector agent
caviri May 22, 2026
44bc542
feat(v2): discover ORCID iDs for Infoscience-anchored persons
caviri May 22, 2026
e2da344
test(v2): guard pulse:owns preservation in hybrid runtime
caviri May 22, 2026
c658a36
feat(v2): feed the GitHub profile README to the org/person agents
caviri May 22, 2026
607595e
Merge pull request #50 from Imaging-Plaza/fix/users-orgs-extractor
caviri May 22, 2026
6f20122
chore: rename GITHUB_TOKEN env var to GME_GITHUB_TOKEN
caviri May 27, 2026
298388a
Merge pull request #57 from Imaging-Plaza/chore/rename-github-token-e…
caviri May 27, 2026
4cb8bb5
fix(v1 parsers): rotate across the GitHub PAT pool per request (#56)
caviri May 27, 2026
e6685b9
Merge pull request #58 from Imaging-Plaza/fix/v1-parsers-token-pool-r…
caviri May 27, 2026
7a0b8ef
fix(gimie wrapper): collapse the PAT pool to a single GITHUB_TOKEN be…
caviri May 27, 2026
305f9f9
Merge pull request #59 from Imaging-Plaza/fix/gimie-legacy-github-tok…
caviri May 27, 2026
d8c1cc8
fix(v2 auto-ingest): construct GitHubClient with keyword args (closes…
caviri May 27, 2026
ca8f917
feat(v2 indices): GET /v2/indices/{provider}/stats — read-only catalo…
caviri May 27, 2026
cd0dd0f
Merge pull request #60 from Imaging-Plaza/fix/github-auto-ingest-clie…
caviri May 27, 2026
4626f0a
Merge pull request #61 from Imaging-Plaza/feat/v2-indices-stats-endpoint
caviri May 27, 2026
650804d
test(v2 pipeline): lock in #29–#36 data-quality fixes with regression…
caviri May 27, 2026
ddd003c
fix(v1 endpoints): reject non-github URLs at the boundary (closes #11…
caviri May 27, 2026
dbdef5f
Merge pull request #62 from Imaging-Plaza/test/v2-pipeline-data-quali…
caviri May 27, 2026
273f3c8
Merge pull request #63 from Imaging-Plaza/fix/v1-reject-non-github-urls
caviri May 27, 2026
ac81855
chore(index): drop dead `chunks` DuckDB table + delete orphan Qdrant …
caviri May 27, 2026
2847cb5
fix(tests): openalex config test reads correct YAML default + isolate…
caviri May 27, 2026
c29929c
feat(v2 indices): stats + search + compact coverage for CLI-managed c…
caviri May 27, 2026
7605f29
feat(v2 indices): freshness sentinel + communities lexical search
caviri May 27, 2026
e767502
chore(communities): canonical IRI as `community_id` (drop `zenodo:<sl…
caviri May 27, 2026
3749cf7
feat(zenodo): canonical IRI for `zenodo_id` + `community_id` (+ link …
caviri May 27, 2026
410ad3d
feat(zenodo): promote stats + version + timestamps to first-class col…
caviri May 27, 2026
578b30c
feat(zenodo): canonical doi.org URL for `doi` + `concept_doi`
caviri May 27, 2026
d6246fa
fix(zenodo): strip description HTML iteratively + re-clean existing rows
caviri May 27, 2026
e18a256
feat(huggingface): canonical IRI for orgs / models / datasets / space…
caviri May 27, 2026
b85c9c6
feat(huggingface): citation surface — arXiv DOIs for models + dataset…
caviri May 27, 2026
d64a9fc
feat(catalogs): canonical doi.org URL for the four remaining catalogs
caviri May 27, 2026
f3e75c6
fix(snsf, ror): write canonical DOI at ingest time too
caviri May 27, 2026
72ad267
feat(v2 pipeline): resolve_company_to_ror stage — stamp schema:affili…
caviri May 27, 2026
2d74c0b
Merge pull request #64 from Imaging-Plaza/chore/remove-orphan-qdrant-…
caviri May 27, 2026
f4cc4b3
Merge pull request #65 from Imaging-Plaza/fix/openalex-test-config-en…
caviri May 27, 2026
2e9d9e2
Merge pull request #66 from Imaging-Plaza/feat/v2-indices-stats-searc…
caviri May 27, 2026
1b4045b
Merge pull request #68 from Imaging-Plaza/fix/communities-canonical-i…
caviri May 27, 2026
f65ea17
Merge pull request #69 from Imaging-Plaza/feat/zenodo-canonical-iri-ids
caviri May 27, 2026
2624bf7
Merge pull request #70 from Imaging-Plaza/feat/zenodo-stats-and-versi…
caviri May 27, 2026
b48a240
Merge pull request #71 from Imaging-Plaza/feat/zenodo-doi-urls
caviri May 27, 2026
e58408a
Merge pull request #72 from Imaging-Plaza/fix/zenodo-description-html…
caviri May 27, 2026
c30eda9
Merge pull request #73 from Imaging-Plaza/feat/hf-canonical-iri-ids
caviri May 27, 2026
3ced328
Merge pull request #75 from Imaging-Plaza/feat/all-catalogs-doi-urls
caviri May 27, 2026
7905a37
Merge pull request #76 from Imaging-Plaza/feat/v2-resolve-company-to-ror
caviri May 27, 2026
28fa819
Merge pull request #67 from Imaging-Plaza/feat/v2-indices-freshness-a…
caviri May 27, 2026
b919c3e
Merge pull request #74 from Imaging-Plaza/feat/hf-citations-arxiv-doi
caviri May 27, 2026
2171473
fix(resolve_company_to_ror): read `_company` (pipeline shape) too
caviri May 27, 2026
43c5237
feat(v2 pipeline): resolve_bio_to_ror stage — backstop affiliation fr…
caviri May 27, 2026
1595e75
Merge pull request #77 from Imaging-Plaza/feat/v2-resolve-bio-to-ror
caviri May 27, 2026
b07e2db
feat(resolve_bio_to_ror): also resolve via institutional email domain
caviri May 27, 2026
f73c4ba
feat(v2 pipeline): resolve_bio_to_ror_llm — LLM agent for the long tail
caviri May 27, 2026
1e2b14b
Merge pull request #78 from Imaging-Plaza/feat/v2-resolve-bio-to-ror-llm
caviri May 27, 2026
2dd0c35
feat(organization_agent): surface six unused GitHub-org REST fields
caviri May 27, 2026
a2a6b13
feat(repository): surface CITATION.cff / AUTHORS / CONTRIBUTING / pub…
caviri May 27, 2026
4289b23
feat(repository): parse publiccode.yml into typed `_publiccode` field
caviri May 27, 2026
0c646cd
Merge pull request #80 from Imaging-Plaza/feat/v2-org-internal-fields
caviri May 27, 2026
f2bbd92
Merge pull request #79 from Imaging-Plaza/feat/v2-repo-aux-file-paths
caviri May 27, 2026
4068fc4
Merge pull request #81 from Imaging-Plaza/feat/v2-publiccode-parser
caviri May 27, 2026
da80edc
docs(env.example): document the bio-resolver pipeline flags
caviri May 27, 2026
84b4de1
feat(jsonld): publiccode: namespace so _publiccode is uploadable as RDF
caviri May 27, 2026
a6b9be5
Merge pull request #82 from Imaging-Plaza/chore/env-example-resolver-…
caviri May 27, 2026
8204b7c
Merge pull request #83 from Imaging-Plaza/feat/v2-publiccode-rdf-name…
caviri May 27, 2026
d39d889
fix(index stores): wrap CTAS-swap in a transaction so DROP+RENAME is …
caviri May 27, 2026
3fa8d72
fix(resolver stages): materialise Membership + Organization (not sche…
caviri May 28, 2026
55d4c6b
Merge pull request #84 from Imaging-Plaza/fix/index-ctas-swap-atomic
caviri May 28, 2026
c9d0e15
Merge pull request #85 from Imaging-Plaza/fix/affiliation-as-membersh…
caviri May 28, 2026
a1ed55a
docs(v2 pipeline): overview doc with node-graph, assumptions, affilia…
caviri May 28, 2026
590257e
docs(readme): rewrite for v2-first + add production-use callout
caviri May 28, 2026
dc38378
chore(release): cut v2.1.0rc1
caviri May 28, 2026
830dea2
feat(v2 pipeline): resolve_placeholder_orgs_to_ror — rewrite urn:puls…
caviri May 28, 2026
d4ce9cb
Merge pull request #87 from Imaging-Plaza/docs/v2-pipeline-doc
caviri May 28, 2026
4acc978
Merge pull request #88 from Imaging-Plaza/feat/v2-resolve-placeholder…
caviri May 28, 2026
d1ef375
feat(repository): surface SECURITY.md URL pointer + LLM context excerpt
caviri May 28, 2026
54d57e2
feat(repository): parse CITATION.cff into typed `_citation_cff` field
caviri May 28, 2026
2c0ed50
refactor(canonicalization): consolidate 10 _normalize_orcid impls int…
caviri May 28, 2026
77a031d
Merge pull request #89 from Imaging-Plaza/feat/v2-orcid-url-canonical…
caviri May 28, 2026
8853574
Merge pull request #90 from Imaging-Plaza/feat/v2-citation-cff-parser
caviri May 28, 2026
cbef7ca
Merge pull request #91 from Imaging-Plaza/feat/v2-security-md-interna…
caviri May 28, 2026
aac3559
Merge pull request #92 from Imaging-Plaza/release/2.1.0rc1
caviri May 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
18 changes: 18 additions & 0 deletions .devcontainer/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Copy to `.devcontainer/.env` for docker-compose variable substitution.
# Compose reads this file from the `.devcontainer/` directory (not repo-root `.env` for these keys).
#
# Host port mappings (optional):
# SSH_PORT=2220
# APP_PORT=1234
# DEV_PORT=8888
#
# DNS inside the container (optional; defaults are 1.1.1.1 + 8.8.8.8 in docker-compose.yml).
# Use your corporate resolvers if public DNS is blocked:
# DEVCONTAINER_DNS_1=10.0.0.1
# DEVCONTAINER_DNS_2=10.0.0.2
#
# Selenium (standalone Firefox service in docker-compose.yml). Override URL if you use an external grid:
# SELENIUM_REMOTE_URL=http://selenium-standalone-firefox:4444
# Host ports if 4444 / 7900 are already in use:
# SELENIUM_GRID_PORT=4445
# SELENIUM_VNC_PORT=7901
11 changes: 2 additions & 9 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
FROM ghcr.io/astral-sh/uv:python3.12-bookworm

# Set locale to avoid warnings
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

# Install just and other system dependencies
RUN apt-get update && apt-get install -y \
sudo \
curl \
Expand All @@ -13,16 +11,11 @@ RUN apt-get update && apt-get install -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Crear usuario no-root con UID/GID que suele usar VS Code (1000:1000).
# TOOD: Take this user out of sudoers if you want to use this in fully agents mode.
RUN useradd -ms /bin/bash -u 1000 vscode \
&& apt-get update && apt-get install -y sudo \
&& apt-get update \
&& apt-get install -y sudo \
&& echo "vscode ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

# Gemini CLI
# Please login outside of the container and copy your credentials to ~/.gemini/...
RUN curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash - && sudo apt-get install -y nodejs
RUN npm install -g @google/gemini-cli

RUN mkdir -p /app/data \
&& chown -R 1000:1000 /app/data \
Expand Down
28 changes: 14 additions & 14 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
{
"name": "git-metadata-extractor-dev",
"build": {
"dockerfile": "Dockerfile"
"dockerComposeFile": "docker-compose.yml",
"service": "devcontainer",
"workspaceFolder": "/workspaces/project",
"containerEnv": {
"UV_CACHE_DIR": "/workspaces/project/.uv-cache"
},
"overrideCommand": false,
"features": {
"ghcr.io/devcontainers/features/sshd:1": {
"version": "latest"
}
},
"runArgs": [
"--env-file",
"${localWorkspaceFolder}/.env",
"--network",
"dev"
],
"remoteUser": "vscode",
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
"customizations": {
"vscode": {
"settings": {
Expand All @@ -26,8 +28,6 @@
]
}
},
"forwardPorts": [
1234
],
"postCreateCommand": "rm -rf .venv && uv venv && uv pip install -e .[dev] && echo '. $PWD/.venv/bin/activate' >> /home/vscode/.bashrc"
}
"postCreateCommand": "mkdir -p .uv-cache && rm -rf .venv && uv venv && uv pip install -e .[dev] && echo '. $PWD/.venv/bin/activate' >> /home/vscode/.bashrc",
"postStartCommand": "bash .devcontainer/set-vscode-password.sh"
}
61 changes: 61 additions & 0 deletions .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Dev container stack. Compose publishes ports on the host (more reliable than
# devcontainer forwardPorts in some setups). Interpolation vars (SSH_PORT, etc.)
# can be set in `.devcontainer/.env` (see `.devcontainer/.env.example`).
#
# Internal SSH: devcontainers `sshd` feature listens on 2222, not 22 — map host:2222.
#
# Explicit DNS: containers on external networks (e.g. `dev`) sometimes get no working resolver
# and `uv pip` fails with "dns error" / "failed to lookup address information".
# Override in `.devcontainer/.env`: DEVCONTAINER_DNS_1 / DEVCONTAINER_DNS_2.
services:
devcontainer:
build:
context: ..
dockerfile: .devcontainer/Dockerfile
dns:
- "${DEVCONTAINER_DNS_1:-1.1.1.1}"
- "${DEVCONTAINER_DNS_2:-8.8.8.8}"
env_file:
- ../.env
environment:
# Avoid ~/.cache/uv (often root-owned after sshd/common-utils); workspace is bind-mounted as vscode.
UV_CACHE_DIR: /workspaces/project/.uv-cache
SELENIUM_REMOTE_URL: ${SELENIUM_REMOTE_URL:-http://gme-selenium-firefox:4444}
ports:
- "${SSH_PORT:-2222}:2222"
- "${APP_PORT:-1234}:1234"
- "${DEV_PORT:-8888}:8888"
volumes:
- ..:/workspaces/project:cached
command: sleep infinity
networks:
- dev
gme-qdrant:
image: qdrant/qdrant:latest
container_name: gme-qdrant
ports:
- "6333:6333"
- "6334:6334"
volumes:
- ../data/qdrant/storage:/qdrant/storage
restart: unless-stopped
networks:
- dev
# README "Option B": multi-session standalone Firefox (ORCID, Selenium-backed tools).
gme-selenium-firefox:
image: selenium/standalone-firefox
container_name: gme-selenium-firefox
ports:
- "${SELENIUM_GRID_PORT:-4444}:4444"
- "${SELENIUM_VNC_PORT:-7900}:7900"
shm_size: "2g"
environment:
SE_NODE_MAX_SESSIONS: "5"
SE_NODE_SESSION_TIMEOUT: "300"
restart: unless-stopped
networks:
- dev

networks:
dev:
external: true
8 changes: 8 additions & 0 deletions .devcontainer/set-vscode-password.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
# Apply VSCODE_PASSWORD to user vscode at container start (not baked into the image).
# Set VSCODE_PASSWORD in .env (this repo loads it via devcontainer runArgs --env-file).
set -euo pipefail
if [[ -z "${VSCODE_PASSWORD:-}" ]]; then
exit 0
fi
printf 'vscode:%s\n' "$VSCODE_PASSWORD" | sudo chpasswd
12 changes: 0 additions & 12 deletions .env.dist

This file was deleted.

Loading
Loading