Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Tammilore committed Nov 17, 2024
0 parents commit d911217
Show file tree
Hide file tree
Showing 30 changed files with 5,455 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
node_modules
3 changes: 3 additions & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node_modules
.git
.vscode
668 changes: 668 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

203 changes: 203 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Documind

**`documind`** is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.

## **Features**

- Converts PDFs to images for detailed AI processing.
- Uses OpenAI’s API to extract and structure information.
- Allows users to specify extraction schemas for various document formats.
- Designed for flexible deployment on local or cloud environments.

### Try the Hosted Version 🚀

A demo of the **documind** hosted version will be available soon for you to try out! The hosted version provides a seamless experience with fully managed APIs, so you can skip the setup and start extracting data right away.

For full access to the hosted service, please [request access](https://documind.xyz/signup) and we’ll get you set up.

## **Requirements**

Before using **`documind`**, ensure the following software dependencies are installed:

### **System Dependencies**

- **Ghostscript****`documind`** relies on Ghostscript for handling certain PDF operations.
- **GraphicsMagick**: Required for image processing within document conversions.

Install both on your system before proceeding:

```bash
# On macOS
brew install ghostscript graphicsmagick

# On Debian/Ubuntu
sudo apt-get update
sudo apt-get install -y ghostscript graphicsmagick

```

### **Node.js & NPM**

Ensure Node.js (v18+) and NPM are installed on your system.

## **Installation**

You can install **`documind`** via npm:

```bash
npm install documind

```

### **Environment Setup**

**`documind`** requires an **`.env`** file to store sensitive information like API keys and Supabase configurations.

Create an **`.env`** file in your project directory and add the following:

```bash
OPENAI_API_KEY=your_openai_api_key
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key
SUPABASE_BUCKET=your_supabase_bucket_name

```

## **Usage**

### **Basic Example**

First, import **`documind`** and define your schema. The schema outline what information **`documind`** should look for in each document. Here’s a quick setup to get started.

### **1. Define a Schema**

The schema is an array of objects where each object defines:

- **name**: Field name to extract.
- **type**: Data type (e.g., **`"string"`****`"number"`****`"array"`****`"object"`**).
- **description**: Description of the field.
- **children** (optional): For arrays and objects, define nested fields.

Example schema for a bank statement:

```jsx
const schema = [
{
name: "accountNumber",
type: "string",
description: "The account number of the bank statement."
},
{
name: "openingBalance",
type: "number",
description: "The opening balance of the account."
},
{
name: "transactions",
type: "array",
description: "List of transactions in the account.",
children: [
{
name: "date",
type: "string",
description: "Transaction date."
},
{
name: "creditAmount",
type: "number",
description: "Credit Amount of the transaction."
},
{
name: "debitAmount",
type: "number",
description: "Debit Amount of the transaction."
},
{
name: "description",
type: "string",
description: "Transaction description."
}
]
},
{
name: "closingBalance",
type: "number",
description: "The closing balance of the account."
}
];

```

### **2. Run `documind`**

Use **`documind`** to process a PDF by passing the file URL and the schema.

```jsx
import { extract } from 'documind';

const runExtraction = async () => {
const result = await extract({
file: 'https://bank_statement.pdf',
schema
});

console.log("Extracted Data:", result);
};

runExtraction();

```

### **Example Output**

Here’s an example of what the extracted result might look like:

```json
{
"success": true,
"pages": 1,
"data": {
"accountNumber": "100002345",
"openingBalance": $3200,
"transactions": [
{
"date": "2021-05-12",
"creditAmount": null,
"debitAmount": $100,
"description": "transfer to Tom"
},
{
"date": "2021-05-12",
"creditAmount": $50,
"debitAmount": null,
"description": "For lunch the other day"
},
{
"date": "2021-05-13",
"creditAmount": $20,
"debitAmount": null,
"description": "Refund for voucher"
},
{
"date": "2021-05-13",
"creditAmount": null,
"debitAmount": $750,
"description": "May's rent"
}
],
"closingBalance": $2420
},
"fileName": "bank_statement.pdf",
}

```

## **Contributing**

Contributions are welcome! Please submit a pull request with any improvements or features.

## **License**

This project is licensed under the AGPL v3.0 License.

---
19 changes: 19 additions & 0 deletions core/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
2 changes: 2 additions & 0 deletions core/dist/index.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
import { DocumindArgs, DocumindOutput } from "./types";
export declare const documind: ({ cleanup, concurrency, filePath, llmParams, maintainFormat, model, openaiAPIKey, outputDir, pagesToConvertAsImages, tempDir, }: DocumindArgs) => Promise<DocumindOutput>;
Loading

0 comments on commit d911217

Please sign in to comment.