Skip to content

Turn codex into a job ads curation agent (or really a curation agent, if config file is edited properly).

Notifications You must be signed in to change notification settings

mengysun/JobMarketStar

Repository files navigation

JobMarketStar

Automated curation of tenure-track academic job postings across U.S. R1 research universities.

Overview

JobMarketStar uses OpenAI's Codex CLI to search the web for current tenure-track Assistant Professor positions in specific research fields at 187 R1 (Very High Research Activity) universities. The tool automates what would otherwise be a tedious manual process of checking hundreds of university job portals.

Target Research Fields

  • Evolutionary Biology
  • Human Genetics
  • Computational Biology
  • Population Genetics
  • Disease Genetics
  • Human Evolution

Project Structure

JobMarketStar/
├── data_parasite_codex.py   # Generic batch runner for Codex CLI tasks
├── JobMarketStar_codex.yaml # Configuration: prompt template & settings
├── R1_university.csv        # List of 187 R1 universities with metadata
├── job_ads_jsonl/           # Output: one JSONL file per university
└── all_job_ads.jsonl        # Concatenated job ads for analysis

Setup

Create Virtual Environment with uv

# Create a virtual environment
uv venv venv

# Activate it
source venv/bin/activate

# Install dependencies
uv pip install -r requirements.txt

Usage

Running the Job Search

# Full run across all universities
python data_parasite_codex.py --config JobMarketStar_codex.yaml

# Test run with a random sample
python data_parasite_codex.py --config JobMarketStar_codex.yaml --sample 5 --seed 42

# Override the model
python data_parasite_codex.py --config JobMarketStar_codex.yaml --model gpt-4o

Concatenating JSONL Files

To combine all university job ads into a single JSONL file:

cat job_ads_jsonl/*.jsonl > all_job_ads.jsonl

This handles empty files gracefully (universities with no current job postings).

To also filter out any blank lines:

cat job_ads_jsonl/*.jsonl | grep -v '^$' > all_job_ads.jsonl

Reading in Python

import pandas as pd
df = pd.read_json('all_job_ads.jsonl', lines=True)

🔧 Customization: Adapting JobMarketStar for Your Needs

JobMarketStar is highly customizable! The core search logic is defined in the YAML configuration file (JobMarketStar_codex.yaml), which you can easily modify or create new versions of to suit different search requirements.

Using AI Coding Agents to Customize

You can ask any coding agent (Cursor Agent, Codex, Claude Code, GitHub Copilot, etc.) to:

  • Modify the existing YAML file to search for different position types (e.g., postdoc positions, associate/full professor roles, research scientist positions)
  • Change the target research fields to match your interests (e.g., switch from genetics to physics, chemistry, computer science, etc.)
  • Adjust search criteria (e.g., add location filters, salary ranges, specific departments)
  • Create entirely new YAML files for completely different use cases

Example Customizations

  • Postdoc Positions: Ask an agent to modify the YAML to search for "Postdoctoral Researcher" or "Postdoctoral Fellow" positions instead of tenure-track roles
  • Different Fields: Change the research areas from genetics/evolution to any other field (e.g., "Machine Learning", "Quantum Computing", "Climate Science")
  • Different Institutions: Use a different input CSV file with community colleges, industry labs, or international universities
  • Different Job Types: Search for staff positions, lecturer roles, or industry positions

Flexible Use Beyond Jobs

The script is fully generic and can be repurposed to search or process any kind of data—as long as you provide a matching CSV file and a suitable YAML configuration. You're not limited to searching for jobs: you can adapt the workflow to find grants, awards, conferences, datasets, or any other information that can be represented in tabular (CSV) form with a coordinated YAML config.

Just update your CSV and YAML files to match your new use case, and the runner (data_parasite_codex.py) will handle the rest.

Output Format

Each job ad is stored as a JSON object with the following fields:

Field Description
university University name
city City location
state State abbreviation
url Direct link to the job posting
field Research area (e.g., "Computational Biology")
title Official job title
post_date Posting date (YYYY-MM-DD or "not_available")
deadline Application deadline or "Open until filled"
summary 1-3 sentence description of the position

Requirements

  • Python 3.x
  • PyYAML (pip install pyyaml)
  • Codex CLI with both web search and network access enabled, and with the Playwright MCP extension installed.

About

Turn codex into a job ads curation agent (or really a curation agent, if config file is edited properly).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages