Automated curation of tenure-track academic job postings across U.S. R1 research universities.
JobMarketStar uses OpenAI's Codex CLI to search the web for current tenure-track Assistant Professor positions in specific research fields at 187 R1 (Very High Research Activity) universities. The tool automates what would otherwise be a tedious manual process of checking hundreds of university job portals.
- Evolutionary Biology
- Human Genetics
- Computational Biology
- Population Genetics
- Disease Genetics
- Human Evolution
JobMarketStar/
├── data_parasite_codex.py # Generic batch runner for Codex CLI tasks
├── JobMarketStar_codex.yaml # Configuration: prompt template & settings
├── R1_university.csv # List of 187 R1 universities with metadata
├── job_ads_jsonl/ # Output: one JSONL file per university
└── all_job_ads.jsonl # Concatenated job ads for analysis
# Create a virtual environment
uv venv venv
# Activate it
source venv/bin/activate
# Install dependencies
uv pip install -r requirements.txt# Full run across all universities
python data_parasite_codex.py --config JobMarketStar_codex.yaml
# Test run with a random sample
python data_parasite_codex.py --config JobMarketStar_codex.yaml --sample 5 --seed 42
# Override the model
python data_parasite_codex.py --config JobMarketStar_codex.yaml --model gpt-4oTo combine all university job ads into a single JSONL file:
cat job_ads_jsonl/*.jsonl > all_job_ads.jsonlThis handles empty files gracefully (universities with no current job postings).
To also filter out any blank lines:
cat job_ads_jsonl/*.jsonl | grep -v '^$' > all_job_ads.jsonlimport pandas as pd
df = pd.read_json('all_job_ads.jsonl', lines=True)JobMarketStar is highly customizable! The core search logic is defined in the YAML configuration file (JobMarketStar_codex.yaml), which you can easily modify or create new versions of to suit different search requirements.
You can ask any coding agent (Cursor Agent, Codex, Claude Code, GitHub Copilot, etc.) to:
- Modify the existing YAML file to search for different position types (e.g., postdoc positions, associate/full professor roles, research scientist positions)
- Change the target research fields to match your interests (e.g., switch from genetics to physics, chemistry, computer science, etc.)
- Adjust search criteria (e.g., add location filters, salary ranges, specific departments)
- Create entirely new YAML files for completely different use cases
- Postdoc Positions: Ask an agent to modify the YAML to search for "Postdoctoral Researcher" or "Postdoctoral Fellow" positions instead of tenure-track roles
- Different Fields: Change the research areas from genetics/evolution to any other field (e.g., "Machine Learning", "Quantum Computing", "Climate Science")
- Different Institutions: Use a different input CSV file with community colleges, industry labs, or international universities
- Different Job Types: Search for staff positions, lecturer roles, or industry positions
The script is fully generic and can be repurposed to search or process any kind of data—as long as you provide a matching CSV file and a suitable YAML configuration. You're not limited to searching for jobs: you can adapt the workflow to find grants, awards, conferences, datasets, or any other information that can be represented in tabular (CSV) form with a coordinated YAML config.
Just update your CSV and YAML files to match your new use case, and the runner (data_parasite_codex.py) will handle the rest.
Each job ad is stored as a JSON object with the following fields:
| Field | Description |
|---|---|
university |
University name |
city |
City location |
state |
State abbreviation |
url |
Direct link to the job posting |
field |
Research area (e.g., "Computational Biology") |
title |
Official job title |
post_date |
Posting date (YYYY-MM-DD or "not_available") |
deadline |
Application deadline or "Open until filled" |
summary |
1-3 sentence description of the position |
- Python 3.x
- PyYAML (
pip install pyyaml) - Codex CLI with both web search and network access enabled, and with the Playwright MCP extension installed.