Skip to content

Commit 09be7a2

Browse files
committed
Improved prompting to extract questions and answers
1 parent 9ec40f0 commit 09be7a2

File tree

4 files changed

+509
-19
lines changed

4 files changed

+509
-19
lines changed

conversion2025/README.md

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,22 @@
1-
This is a folder where we'll generate a 'wizard' to automatically process input documents ready for in2lambda to reacte a JSON (or to directly create a JSON)
2-
3-
We'll work on this branch 'hackathon'
4-
5-
61
# README
72

83
## Overview
9-
This Jupyter Notebook (`sandbox.ipynb`) is designed for processing scientific documents, extracting mathematical expressions, and formatting them in Markdown. It leverages Azure OpenAI's LLM capabilities for text transformation.
10-
11-
## Features
12-
- Loads PDFs and extracts text using `UnstructuredPDFLoader` and `PyMuPDF`.
13-
- Converts mathematical expressions into properly formatted Markdown.
14-
- Uses `langchain` and `AzureChatOpenAI` for text processing.
15-
- Supports structured output parsing using `pydantic`.
4+
This Jupyter Notebook (`file_name.ipynb`) is designed for processing scientific documents, extracting mathematical expressions, and formatting them in Markdown. It leverages Mathpix and OpenAI's LLM capabilities for text transformation.
165

176
## Requirements
187
Ensure you have the following installed:
198
- Python 3.8+
209
- `pip install -r requirements.txt`
21-
- `langchain`, `langchain_openai`, `pydantic`, `dotenv`, `PyMuPDF`, `PIL`
2210

2311
## Setup
24-
1. Create a `.env` file in the root directory and add your Azure OpenAI API keys:
12+
1. Create a `.env` file in the root directory and add your OpenAI API keys:
2513
```env
26-
AZURE_OPENAI_API_KEY=<your-api-key>
27-
AZURE_OPENAI_ENDPOINT=<your-endpoint>
14+
OPENAI_API_KEY=<your-openai-api-key>
15+
OPENAI_MODEL=<your-openai-model>
16+
MATHPIX_API_KEY=<your-mathpix-key>
17+
MATHPIX_APP_ID=<your-mathpix-id>
2818
```
29-
4. Open `sandbox.ipynb` and execute the cells to process your documents.
19+
4. Open `file_name.ipynb` and execute the cells to process your documents.
3020

3121
## Notes
3222
- Ensure your API key and endpoint are correct, as they are required for LLM functionality.

0 commit comments

Comments
 (0)