Skip to content

Commit

Permalink
New Webapp migration (#56)
Browse files Browse the repository at this point in the history
* feat(v2): loaders added

* feature: Add scroll animations

* feature: upload ui

* feature: upload multiple files

* fix: Same file name and size remove

* feat(crawler): added

* feat(parsers): v2 added more

* feat(v2): audio now working

* feat(v2): all loaders

* feat(v2): explorer

* chore: add links

* feat(api): added status in return message

* refactor(website): remove old code

* feat(upload): return type for messages

* feature: redirect to upload if ENV=local

* fix(chat): fixed some issues

* feature: respect response type

* loading state

* feature: Loading stat

* feat(v2): added explore and chat pages

* feature: modal settings

* style: Chat UI

* feature: scroll to bottom when chatting

* feature: smooth scroll in chat

* feature(anim): Slide chat in

* feature: markdown chat

* feat(explorer): list

* feat(doc): added document item

* feat(explore): added modal

* Add clarification on Project API keys and web interface for migration scripts to Readme (#58)

* fix(demo): changed link

* add support to uploading zip file (#62)

* Catch UnicodeEncodeError exception (#64)

* feature: fixed chatbar

* fix(loaders): missing argument

* fix: layout

* fix: One whole chatbox

* fix: Scroll into view

* fix(build): vercel issues

* chore(streamlit): moved to own file

* refactor(api): moved to backend folder

* feat(docker): added docker compose

* Fix a bug where langchain memories were not being cleaned (#71)

* Update README.md (#70)

* chore(streamlit): moved to own file

* refactor(api): moved to backend folder

* docs(readme): updated for new version

* docs(readme): added old readme

* docs(readme): update copy dot env file

* docs(readme): cleanup

---------

Co-authored-by: iMADi-ARCH <nandanaditya985@gmail.com>
Co-authored-by: Matt LeBel <github@lebel.io>
Co-authored-by: Evan Carlson <45178375+EvanCarlson@users.noreply.github.com>
Co-authored-by: Mustafa Hasan Khan <65130881+mustafahasankhan@users.noreply.github.com>
Co-authored-by: zhulixi <48713110+zlxxlz1026@users.noreply.github.com>
Co-authored-by: Stanisław Tuszyński <stanislaw@tuszynski.me>
  • Loading branch information
7 people authored May 20, 2023
1 parent a80c210 commit f952d7a
Show file tree
Hide file tree
Showing 71 changed files with 1,533 additions and 5,708 deletions.
4 changes: 4 additions & 0 deletions .backend_env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
SUPABASE_URL="XXXXX"
SUPABASE_SERVICE_KEY="eyXXXXX"
OPENAI_API_KEY="sk-XXXXXX"
anthropic_api_key="XXXXXX"
1 change: 1 addition & 0 deletions .frontend_env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ENV=local
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,8 @@ yarn-error.log*

# typescript
*.tsbuildinfo
next-env.d.ts
next-env.d.ts
quivr/*
streamlit-demo/.streamlit/secrets.toml
.backend_env
.frontend_env
Binary file removed 2023-05-13-02-16-02.png
Binary file not shown.
62 changes: 24 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Quivr
# Quivr - Your GenerativeAI Second Brain

<p align="center">
<img src="./logo.png" alt="Quivr-logo" width="30%">
Expand All @@ -8,42 +8,52 @@
<img src="https://img.shields.io/badge/discord-join%20chat-blue.svg" alt="Join our Discord" height="40">
</a>

Quivr is your second brain in the cloud, designed to easily store and retrieve unstructured information. It's like Obsidian but powered by generative AI.
Quivr is your GenerativeAI second brain, designed to easily store and retrieve unstructured information. It's like Obsidian but powered by generative AI.


## Features

- **Store Anything**: Quivr can handle almost any type of data you throw at it. Text, images, code snippets, you name it.
- **Generative AI**: Quivr uses advanced AI to help you generate and retrieve information.
- **Fast and Efficient**: Designed with speed and efficiency in mind. Quivr makes sure you can access your data as quickly as possible.
- **Secure**: Your data is stored securely in the cloud and is always under your control.
- **Secure**: Your data is always under your control.
- **Compatible Files**:
- **Text**
- **Markdown**
- **PDF**
- **Powerpoint**
- **Excel**
- **Word**
- **Audio**
- **Video**
- **Open Source**: Quivr is open source and free to use.
## Demo




### DEMO WITH STREAMLIT IS USING OLD VERSION
New version is using a new UI and is not yet deployed as it doesn't have all the features of the old version.
Should be up and live before 25/05/23

### Demo with GPT3.5
https://github.com/StanGirard/quivr/assets/19614572/80721777-2313-468f-b75e-09379f694653


### Demo with Claude 100k context
https://github.com/StanGirard/quivr/assets/5101573/9dba918c-9032-4c8d-9eea-94336d2c8bd4

## Getting Started


## Getting Started with the new version
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
*Old version readme is in the streamlit-demo folder* [here](streamlit-demo/README.md)

### Prerequisites

Make sure you have the following installed before continuing:

- Python 3.10 or higher
- Pip
- Virtualenv
- Docker
- Docker Compose

You'll also need a [Supabase](https://supabase.com/) account for:

Expand All @@ -59,41 +69,18 @@ You'll also need a [Supabase](https://supabase.com/) account for:
git clone git@github.com:StanGirard/Quivr.git && cd Quivr
```

- Create a virtual environment

```bash
virtualenv venv
```

- Activate the virtual environment

```bash
source venv/bin/activate
```

- Install the dependencies

```bash
pip install -r requirements.txt
```

- Copy the streamlit secrets.toml example file
- Copy the `.XXXXX_env` files

```bash
cp .streamlit/secrets.toml.example .streamlit/secrets.toml
cp .backend_env.example .backend_env
cp .frontend_env.example .frontend_env
```

- Add your credentials to .streamlit/secrets.toml file

```toml
supabase_url = "SUPABASE_URL"
supabase_service_key = "SUPABASE_SERVICE_KEY"
openai_api_key = "OPENAI_API_KEY"
anthropic_api_key = "ANTHROPIC_API_KEY" # Optional
```
- Update the `.backend_env` file

_Note that the `supabase_service_key` is found in your Supabase dashboard under Project Settings -> API. Use the `anon` `public` key found in the `Project API keys` section._


- Run the following migration scripts on the Supabase database via the web interface (SQL Editor -> `New query`)

```sql
Expand Down Expand Up @@ -156,13 +143,12 @@ create table
- Run the app

```bash
streamlit run main.py
docker compose build && docker compose up
```

## Built With

* [Python](https://www.python.org/) - The programming language used.
* [Streamlit](https://streamlit.io/) - The web framework used.
* [Supabase](https://supabase.io/) - The open source Firebase alternative.

## Contributing
Expand Down
11 changes: 11 additions & 0 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM python:3.11

WORKDIR /code

COPY ./requirements.txt /code/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

COPY . /code/

CMD ["uvicorn", "api:app", "--reload", "--host", "0.0.0.0", "--port", "5000"]
171 changes: 171 additions & 0 deletions backend/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
from fastapi import FastAPI, UploadFile, File, HTTPException
import os
from pydantic import BaseModel
from typing import List, Tuple
from supabase import create_client, Client
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.vectorstores import SupabaseVectorStore
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI
from fastapi.openapi.utils import get_openapi
from tempfile import SpooledTemporaryFile
import shutil


from parsers.common import file_already_exists
from parsers.txt import process_txt
from parsers.csv import process_csv
from parsers.docx import process_docx
from parsers.pdf import process_pdf
from parsers.markdown import process_markdown
from parsers.powerpoint import process_powerpoint
from parsers.html import process_html
from parsers.audio import process_audio
from crawl.crawler import CrawlWebsite


from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

origins = [
"http://localhost",
"http://localhost:3000",
"http://localhost:8080",
]

app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

supabase_url = os.environ.get("SUPABASE_URL")
supabase_key = os.environ.get("SUPABASE_SERVICE_KEY")
openai_api_key = os.environ.get("OPENAI_API_KEY")
anthropic_api_key = ""
supabase: Client = create_client(supabase_url, supabase_key)
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
vector_store = SupabaseVectorStore(
supabase, embeddings, table_name="documents")
memory = ConversationBufferMemory(
memory_key="chat_history", return_messages=True)


class ChatMessage(BaseModel):
model: str = "gpt-3.5-turbo"
question: str
history: List[Tuple[str, str]] # A list of tuples where each tuple is (speaker, text)
temperature: float = 0.0
max_tokens: int = 256






file_processors = {
".txt": process_txt,
".csv": process_csv,
".md": process_markdown,
".markdown": process_markdown,
".m4a": process_audio,
".mp3": process_audio,
".webm": process_audio,
".mp4": process_audio,
".mpga": process_audio,
".wav": process_audio,
".mpeg": process_audio,
".pdf": process_pdf,
".html": process_html,
".pptx": process_powerpoint,
".docx": process_docx
}

async def filter_file(file: UploadFile, supabase, vector_store, stats_db):
if await file_already_exists(supabase, file):
return {"message": f"🤔 {file.filename} already exists.", "type": "warning"}
elif file.file._file.tell() < 1:
return {"message": f"❌ {file.filename} is empty.", "type": "error"}
else:
file_extension = os.path.splitext(file.filename)[-1]
if file_extension in file_processors:
await file_processors[file_extension](vector_store, file, stats_db=None)
return {"message": f"✅ {file.filename} has been uploaded.", "type": "success"}
else:
return {"message": f"❌ {file.filename} is not supported.", "type": "error"}

@app.post("/upload")
async def upload_file(file: UploadFile):
message = await filter_file(file, supabase, vector_store, stats_db=None)
return message

@app.post("/chat/")
async def chat_endpoint(chat_message: ChatMessage):
history = chat_message.history
# Logic from your Streamlit app goes here. For example:
qa = None
if chat_message.model.startswith("gpt"):
qa = ConversationalRetrievalChain.from_llm(
OpenAI(
model_name=chat_message.model, openai_api_key=openai_api_key, temperature=chat_message.temperature, max_tokens=chat_message.max_tokens), vector_store.as_retriever(), memory=memory, verbose=True)
elif anthropic_api_key and model.startswith("claude"):
qa = ConversationalRetrievalChain.from_llm(
ChatAnthropic(
model=model, anthropic_api_key=anthropic_api_key, temperature=temperature, max_tokens_to_sample=max_tokens), vector_store.as_retriever(), memory=memory, verbose=True, max_tokens_limit=102400)

history.append(("user", chat_message.question))
model_response = qa({"question": chat_message.question})
history.append(("assistant", model_response["answer"]))

return {"history": history}

@app.post("/crawl/")
async def crawl_endpoint(crawl_website: CrawlWebsite):

file_path, file_name = crawl_website.process()

# Create a SpooledTemporaryFile from the file_path
spooled_file = SpooledTemporaryFile()
with open(file_path, 'rb') as f:
shutil.copyfileobj(f, spooled_file)

# Pass the SpooledTemporaryFile to UploadFile
file = UploadFile(file=spooled_file, filename=file_name)
message = await filter_file(file, supabase, vector_store)
print(message)
return {"message": message}

@app.get("/explore")
async def explore_endpoint():
response = supabase.table("documents").select("name:metadata->>file_name, size:metadata->>file_size", count="exact").execute()
documents = response.data # Access the data from the response
# Convert each dictionary to a tuple of items, then to a set to remove duplicates, and then back to a dictionary
unique_data = [dict(t) for t in set(tuple(d.items()) for d in documents)]
# Sort the list of documents by size in decreasing order
unique_data.sort(key=lambda x: int(x['size']), reverse=True)

return {"documents": unique_data}

@app.delete("/explore/{file_name}")
async def delete_endpoint(file_name: str):
response = supabase.table("documents").delete().match({"metadata->>file_name": file_name}).execute()
return {"message": f"{file_name} has been deleted."}

@app.get("/explore/{file_name}")
async def download_endpoint(file_name: str):
response = supabase.table("documents").select("metadata->>file_name, metadata->>file_size, metadata->>file_extension, metadata->>file_url").match({"metadata->>file_name": file_name}).execute()
documents = response.data
### Returns all documents with the same file name
return {"documents": documents}



@app.get("/")
async def root():
return {"message": "Hello World"}


43 changes: 43 additions & 0 deletions backend/crawl/crawler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import requests
from pydantic import BaseModel
import requests
import re
import unicodedata
import tempfile
import os


class CrawlWebsite(BaseModel):
url : str
js : bool = False
depth : int = 1
max_pages : int = 100
max_time : int = 60

def _crawl(self, url):
response = requests.get(url)
if response.status_code == 200:
return response.text
else:
return None

def process(self):
content = self._crawl(self.url)
## Create a file
file_name = slugify(self.url) + ".html"
temp_file_path = os.path.join(tempfile.gettempdir(), file_name)
with open(temp_file_path, 'w') as temp_file:
temp_file.write(content)
## Process the file

if content:
return temp_file_path, file_name
else:
return None


def slugify(text):
text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('utf-8')
text = re.sub(r'[^\w\s-]', '', text).strip().lower()
text = re.sub(r'[-\s]+', '-', text)
return text
Loading

1 comment on commit f952d7a

@vercel
Copy link

@vercel vercel bot commented on f952d7a May 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.