An end-to-end Retrieval-Augmented Generation (RAG) sample that lets you ingest your own PDFs and chat with them. Now supports both standard (non‑streaming) replies and Server‑Sent Events (SSE) streaming. The project is split into two apps:
backend/– Node/TypeScript Express API that ingests PDFs, stores embeddings in a local LanceDB vector store, performs semantic search, and calls OpenAI for answers.frontend/– React TypeScript SPA that calls the backend to ingest data, search, and chat.
- Ingestion
- PDFs are read and converted to per-page text using
pdf-parse(pagerender). - Each page is chunked into overlapping segments using LangChain's
RecursiveCharacterTextSplitter(default: 800 chars, 120 overlap). - Embeddings (OpenAI
text-embedding-3-small) are generated for chunks and stored in LanceDB together with metadata (documentId,pageNumber,text).
- Retrieval
- A query is embedded and compared with chunk vectors using LanceDB ANN search.
- Optional filter by
documentId.
- Generation
- Top‑k chunk texts are concatenated into a context prompt.
- OpenAI/GitHub Models chat model (default:
gpt-4o-mini) produces a grounded answer. - The system prompt asks the model to append XML citations per fact using the context’s filename/page.
- Optionally stream tokens over SSE so the UI renders incremental output.
- Node 18+ / TypeScript
- Express
- LanceDB (local vector DB)
- pdf-parse (pagerender per-page text)
- LangChain text splitter
- Data (PDFs):
backend/src/data/ - Vector store:
backend/src/lancedb/ - Postman collection:
backend/ChatApp_Backend.postman_collection.json
Create backend/.env (or export environment variables):
API_KEY=sk-...
# Optional overrides
EMBEDDING_MODEL=text-embedding-3-small
BASE_URL=...
CHAT_MODEL=gpt-4o-mini
PDF_DIR=./src/data
LANCEDB_DIR=./src/lancedb
PORT=4000
Notes:
PDF_DIRandLANCEDB_DIRdefault to in-repo paths and are resolved to absolute at runtime.- Use absolute paths in env to avoid ambiguity.
cd backend
npm install
npm run dev
The server starts (default :4000).
-
Health
GET /health→{ ok: true }
-
Ingest
POST /api/ingest/pdf-directory- Ingests PDFs from
PDF_DIR(server-side configured). You don’t send a path from the client for security.
-
Search
GET /api/search?q=...&documentId=...&max=5- Returns top-k chunks with metadata and distances. If
documentIdis provided, results are filtered client-side (compatible with multiple LanceDB versions).
-
Chat (RAG)
POST /api/chat- Body:
{ question?: string, messages?: Message[], documentId?: string, maxResults?: number } - Notes: You can pass a single
questionor a rollingmessages[]history. Ifquestionis omitted, the latest user message inmessages[]is used. - Returns:
{ answer: string } - The system prompt instructs the model to append XML citations per fact, e.g.:
<citation filename='Solar_Charger.pdf' page_number='6'>short quote</citation>
-
Chat (RAG) — Streaming (SSE)
POST /api/chat/stream- Body:
{ question?: string, messages?: Message[], documentId?: string, maxResults?: number } - Response:
text/event-streamwhere each event isdata: { delta?: string, done?: boolean, full?: string } - The final event includes
{ done: true, full }. Frontend typically accumulatesdeltatokens and appendsfullas one assistant turn.
This variant also supports a tool-choice flow where the model decides whether to invoke a search tool first:
- When enabled: The model calls
searchonly if it believes the user’s question is related to the ingested documents; then it grounds the answer with citations. If unrelated, it answers briefly and emits no citations. - When disabled: The backend always runs direct RAG before generating.
How to enable/disable:
- Non-streaming route uses direct RAG or tool-choice depending on the code path in
routes/chat.ts. - Streaming route can use the same logic but streams tokens after the decision. In
routes/chat.ts, call the streaming helper that wiresAskService.askWithToolChoice(...)when tool-choice is desired, or useask.ask(...)for direct RAG.
When tool-calling triggers, the backend logs a line similar to:
[AskService] Function calling triggered: [ 'search' ]
- Per-page extraction with
pdf-parsepagerender (more reliable than splitting on form-feed). - LangChain splitter: chunkSize 800, overlap 120 (adjust to taste).
- Re-ingest instructions:
- Stop the backend.
- Delete the vector store folder:
rm -rf backend/src/lancedb. - Start the backend and
POST /api/ingest/pdf-directoryagain.
- React + TypeScript (Create React App / Vite‑like dev experience)
- Calls the backend’s endpoints for ingest, search, and chat.
- Optional streaming mode with a UI switch.
cd frontend
npm install
npm start
If your backend runs on a different origin/port, configure the base URL in frontend/src/api.ts (or via an env like REACT_APP_API_BASE).
- Start backend, then frontend.
- Ingest PDFs (call
POST /api/ingest/pdf-directory). - Try search (
GET /api/search?q=...). - Ask a question (
POST /api/chat) or toggle streaming and usePOST /api/chat/stream.
- Streaming client:
frontend/src/services/stream.ts(streamChat(...)) reads the SSE stream viafetch+ReadableStream. - Toggle: The Ask panel includes a “Streaming” switch. When on, the app calls
streamChat; when off, it calls the regular chat API. - History window: Only the last 6 messages are sent to the backend to keep prompts compact; you can adjust this as needed.
If you prefer higher-level utilities for streaming and chat state, you can adopt the Vercel AI SDK.
-
Install (backend and/or frontend):
npm i ai @ai-sdk/openai
-
Backend idea (Express):
- Use
streamTextwith anopenai(config.chatModel)model to generate a stream and pipe it as SSE. You would still build your system+history messages and context (as we do), butstreamTextreduces boilerplate when chunking and sending tokens.
- Use
-
Frontend idea (React):
- Keep the current
stream.ts(already works), or tryai/reacthooks likeuseChatto manage input, messages, and streaming automatically. You may need to align the endpoint response with AI SDK helpers if you choose its built-in handlers.
- Keep the current
Note: AI SDK is optional. The included SSE implementation is production‑ready; AI SDK can just reduce code and provide useful abstractions.
MIT (or your preferred license)