A powerful tool that can scrape, store, and answer questions about web page content using OpenAI's embeddings and ChromaDB for vector storage.
- Web page scraping with content extraction
- Vector embeddings using OpenAI's text-embedding-3-small model
- Vector storage using ChromaDB
- Natural language querying of scraped content
- Content chunking for efficient processing
To scrape and store a web page's content:
await ingest("https://example.com");
To ask questions about the scraped content:
const response = await chat("What is the main topic of the page?");
console.log(response);
To view the data stored in ChromaDB:
await viewChromaDBData();
src/index.ts
: Main application file containing all core functionality- Web scraping
- Text chunking
- Vector embeddings generation
- ChromaDB operations
- Chat interface
- axios: For making HTTP requests
- cheerio: For web scraping and HTML parsing
- openai: For OpenAI API integration
- chromadb: For vector database operations
- dotenv: For environment variable management