GitHub - bits-bytes-nn/scholar-lens: A tool that generates comprehensive and in-depth reviews of AI/ML research papers using generative AI technology. Check out my tech blog: https://bits-bytes-nn.github.io/

👩‍🏫 SCHOLAR-LENS

AI-powered academic paper review assistant that generates comprehensive analyses of arXiv papers through intelligent content extraction, citation analysis, and code repository integration.

✨ Features

AI-Powered Analysis: Uses Amazon Bedrock (Claude models) for multi-stage paper understanding
Multi-Source Processing: Extracts content from arXiv HTML/PDF, analyzes citations, and reviews code repositories
Scalable Infrastructure: AWS Batch for containerized job execution
Intelligent Extraction: Figure analysis, table of contents generation, and citation mapping
Code Integration: GitHub repository analysis with semantic search via FAISS

🏗️ Architecture

Core Components

ArxivHandler (arxiv_handler.py): Paper metadata and content retrieval
Parser (parser.py): HTML/PDF parsing with figure extraction
ContentExtractor (content_extractor.py): Structured content and citation extraction
CodeRetriever (code_retriever.py): Repository cloning and semantic code analysis
CitationSummarizer (citation_summarizer.py): Reference paper analysis
ExplainerGraph (explainer.py): Multi-stage LangGraph workflow for paper synthesis

Infrastructure

AWS Batch: Containerized job execution with ECS
Amazon Bedrock: Claude models for analysis and generation
S3: Paper storage and asset management
SSM Parameter Store: Configuration management

🛠️ Tech Stack

Python 3.12+, AWS CDK, Docker
Amazon Bedrock, LangChain, LangGraph
PyMuPDF, BeautifulSoup4, FAISS
Pydantic validation, YAML configuration

📋 Configuration

Create scholar_lens/configs/config.yaml:

resources:
  project_name: scholar-lens
  stage: dev
  profile_name: your-profile
  default_region_name: ap-northeast-2
  bedrock_region_name: us-west-2
  s3_bucket_name: your-bucket
  email_address: your-email@example.com

paper:
  citation_extraction_model_id: anthropic.claude-sonnet-4-5-20250929-v1:0
  table_of_contents_model_id: anthropic.claude-sonnet-4-5-20250929-v1:0

explanation:
  paper_analysis_model_id: anthropic.claude-sonnet-4-5-20250929-v1:0
  paper_synthesis_model_id: anthropic.claude-sonnet-4-5-20250929-v1:0

🚀 Usage

Infrastructure Deployment

# Deploy infrastructure
python scripts/deploy_infra.py

Development

# Install dependencies
poetry install

# Set up environment
cp .env.template .env
# Edit .env with your configuration

# Run locally
python scholar_lens/main.py --arxiv-id 2401.04088v1

# Submit batch job
python scripts/run_batch.py --arxiv-id 2401.04088v1 --repo-urls https://github.com/example/repo  --parse-pdf True

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
scholar_lens		scholar_lens
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cdk.json		cdk.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👩‍🏫 SCHOLAR-LENS

✨ Features

🏗️ Architecture

Core Components

Infrastructure

🛠️ Tech Stack

📋 Configuration

🚀 Usage

Infrastructure Deployment

Development

About

Uh oh!

Releases

Packages

Languages

License

bits-bytes-nn/scholar-lens

Folders and files

Latest commit

History

Repository files navigation

👩‍🏫 SCHOLAR-LENS

✨ Features

🏗️ Architecture

Core Components

Infrastructure

🛠️ Tech Stack

📋 Configuration

🚀 Usage

Infrastructure Deployment

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages