OpenBlog Neo

AI-powered blog generation pipeline using Gemini 3 Flash Preview with Google Search grounding.

Features

5-Stage Pipeline: Context → Generation → Quality → URL Verify → Internal Links
Google Search Grounding: Real-time web research for accurate, sourced content
Parallel Processing: Generate multiple articles simultaneously
Multiple Export Formats: HTML, Markdown, JSON, CSV, XLSX, PDF
Image Generation: Optional hero, mid, and bottom images via Google Imagen

Architecture

Stage 1 (once per batch)
     ↓
┌────┴────┬─────────┐
▼         ▼         ▼
[Art 1]  [Art 2]  [Art 3]  ← parallel processing
  │         │         │
  ▼         ▼         ▼
Stage 2   Stage 2   Stage 2   ← Blog Gen + Images
  │         │         │
  ▼         ▼         ▼
Stage 3   Stage 3   Stage 3   ← Quality Check
  │         │         │
  ▼         ▼         ▼
Stage 4   Stage 4   Stage 4   ← URL Verify
  │         │         │
  ▼         ▼         ▼
Stage 5   Stage 5   Stage 5   ← Internal Links
  │         │         │
  ▼         ▼         ▼
Export    Export    Export    ← HTML/MD/JSON/CSV/XLSX/PDF

Pipeline Stages

Stage	Name	AI Calls	Purpose
1	Set Context	0-1	Company context + authors + sitemap (runs once per batch)
2	Blog Gen + Images	1-4	Generate article with Gemini + 3 images with Imagen
3	Quality Check	1	Surgical find/replace fixes (uses structured schema)
4	URL Verify	0-2	Validate/replace dead URLs (uses structured schema)
5	Internal Links	1	Embed internal links from sitemap (uses structured schema)

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set Environment Variables

export GEMINI_API_KEY=your-gemini-api-key

Or create a .env file:

GEMINI_API_KEY=your-gemini-api-key

3. Run the Pipeline (CLI)

# Basic usage
python run_pipeline.py --url https://example.com --keywords "keyword 1" "keyword 2" --output results/

# With all export formats
python run_pipeline.py --url https://example.com --keywords "topic" \
    --output results/ --export-formats html markdown json csv xlsx pdf

# Skip images, limit parallelism
python run_pipeline.py --url https://example.com --keywords "topic" \
    --output results/ --skip-images --max-parallel 2

4. Run the API Server

# Start the FastAPI server
uvicorn api:app --reload --port 8000

# Or run directly
python api.py

API Documentation:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI JSON: http://localhost:8000/openapi.json

REST API

The API provides async job-based processing for blog generation.

Endpoints

Method	Endpoint	Description
`GET`	`/`	Health check
`GET`	`/health`	Health check (alias)
`POST`	`/api/v1/jobs`	Start a new pipeline job (async)
`GET`	`/api/v1/jobs`	List all jobs
`GET`	`/api/v1/jobs/{job_id}`	Get job status and result
`DELETE`	`/api/v1/jobs/{job_id}`	Delete a job
`GET`	`/api/v1/jobs/{job_id}/articles`	List articles for a job
`GET`	`/api/v1/jobs/{job_id}/articles/{keyword}/html`	Get article HTML
`POST`	`/api/v1/generate`	Generate articles (sync, max 3)

Example: Create a Job

curl -X POST http://localhost:8000/api/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "keywords": ["AI in healthcare", "Machine learning basics"],
    "company_url": "https://example.com",
    "language": "en",
    "market": "US",
    "skip_images": false,
    "export_formats": ["html", "json"]
  }'

Response:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "message": "Job created. Processing 2 article(s).",
  "created_at": "2024-01-15T10:30:00Z"
}

Example: Check Job Status

curl http://localhost:8000/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000

Command Line Options

--url               Company website URL (required)
--keywords          List of keywords to generate articles for (required)
--output            Output directory for generated files
--export-formats    Formats to export (html, markdown, json, csv, xlsx, pdf)
--skip-images       Skip image generation
--max-parallel      Maximum parallel article processing (default: 3)
--language          Content language (default: en)
--market            Target market (default: US)
--word-count        Target word count per article (default: 2000)

Project Structure

openblog-neo/
├── shared/                 # Shared components
│   ├── gemini_client.py    # Unified Gemini client
│   ├── models.py           # ArticleOutput schema (40+ fields)
│   ├── html_renderer.py    # Render article to HTML
│   ├── article_exporter.py # Export to multiple formats
│   └── constants.py        # Model configuration
├── stage1/                 # Set Context (company, authors, sitemap)
├── stage2/                 # Blog Gen + Images
├── stage3/                 # Quality Check
├── stage4/                 # URL Verify
├── stage5/                 # Internal Links
├── run_pipeline.py         # Main orchestrator
└── requirements.txt

Output Schema

Each article includes 40+ fields:

Headlines: Headline, Subtitle, Teaser, Meta Title, Meta Description
Content: Intro, 4-9 sections with HTML content
SEO: Direct Answer (featured snippets), Key Takeaways
Q&A: 4 People Also Ask, 5-6 FAQs
Media: 3 image slots with URLs, alt text, credits
Sources: Verified URLs from Google Search grounding
Optional: Tables, Pros/Cons, CTA, Related Keywords, Video embed

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenBlog Neo

Features

Architecture

Pipeline Stages

Quick Start

1. Install Dependencies

2. Set Environment Variables

3. Run the Pipeline (CLI)

4. Run the API Server

REST API

Endpoints

Example: Create a Job

Example: Check Job Status

Command Line Options

Project Structure

Output Schema

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
shared		shared
stage1		stage1
stage2		stage2
stage3		stage3
stage4		stage4
stage5		stage5
stage_refresh		stage_refresh
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
api.py		api.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
test_full_pipeline.py		test_full_pipeline.py

License

scailetech/openblog

Folders and files

Latest commit

History

Repository files navigation

OpenBlog Neo

Features

Architecture

Pipeline Stages

Quick Start

1. Install Dependencies

2. Set Environment Variables

3. Run the Pipeline (CLI)

4. Run the API Server

REST API

Endpoints

Example: Create a Job

Example: Check Job Status

Command Line Options

Project Structure

Output Schema

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages