|
1 | | -This is a folder where we'll generate a 'wizard' to automatically process input documents ready for in2lambda to reacte a JSON (or to directly create a JSON) |
2 | | - |
3 | | -We'll work on this branch 'hackathon' |
4 | | - |
5 | | - |
6 | 1 | # README |
7 | 2 |
|
8 | 3 | ## Overview |
9 | | -This Jupyter Notebook (`sandbox.ipynb`) is designed for processing scientific documents, extracting mathematical expressions, and formatting them in Markdown. It leverages Azure OpenAI's LLM capabilities for text transformation. |
10 | | - |
11 | | -## Features |
12 | | -- Loads PDFs and extracts text using `UnstructuredPDFLoader` and `PyMuPDF`. |
13 | | -- Converts mathematical expressions into properly formatted Markdown. |
14 | | -- Uses `langchain` and `AzureChatOpenAI` for text processing. |
15 | | -- Supports structured output parsing using `pydantic`. |
| 4 | +This Jupyter Notebook (`file_name.ipynb`) is designed for processing scientific documents, extracting mathematical expressions, and formatting them in Markdown. It leverages Mathpix and OpenAI's LLM capabilities for text transformation. |
16 | 5 |
|
17 | 6 | ## Requirements |
18 | 7 | Ensure you have the following installed: |
19 | 8 | - Python 3.8+ |
20 | 9 | - `pip install -r requirements.txt` |
21 | | -- `langchain`, `langchain_openai`, `pydantic`, `dotenv`, `PyMuPDF`, `PIL` |
22 | 10 |
|
23 | 11 | ## Setup |
24 | | -1. Create a `.env` file in the root directory and add your Azure OpenAI API keys: |
| 12 | +1. Create a `.env` file in the root directory and add your OpenAI API keys: |
25 | 13 | ```env |
26 | | - AZURE_OPENAI_API_KEY=<your-api-key> |
27 | | - AZURE_OPENAI_ENDPOINT=<your-endpoint> |
| 14 | + OPENAI_API_KEY=<your-openai-api-key> |
| 15 | + OPENAI_MODEL=<your-openai-model> |
| 16 | + MATHPIX_API_KEY=<your-mathpix-key> |
| 17 | + MATHPIX_APP_ID=<your-mathpix-id> |
28 | 18 | ``` |
29 | | -4. Open `sandbox.ipynb` and execute the cells to process your documents. |
| 19 | +4. Open `file_name.ipynb` and execute the cells to process your documents. |
30 | 20 |
|
31 | 21 | ## Notes |
32 | 22 | - Ensure your API key and endpoint are correct, as they are required for LLM functionality. |
|
0 commit comments