Skip to content

Commit

Permalink
add readme v1
Browse files Browse the repository at this point in the history
  • Loading branch information
liyin2015 committed Jul 3, 2024
1 parent 829a1dc commit 45e61ad
Show file tree
Hide file tree
Showing 3 changed files with 132 additions and 86 deletions.
78 changes: 41 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# Introduction
![LightRAG Logo](docs/source/_static/images/LightRAG-logo-doc.jpeg)

⚡ The PyTorch Library for Large language Model (LLM) Applications ⚡

We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines.
It is *light*, *modular*, and *robust*.


LightRAG is the `PyTorch` library for building large language model (LLM) applications. We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines.
It is light, modular, and robust.

**PyTorch**

Expand Down Expand Up @@ -58,46 +62,46 @@ class SimpleQA(Component):
return await self.generator.acall({"input_str": query})
```

## Simplicity
## Quick Install

Developers who are building real-world Large Language Model (LLM) applications are the real heroes.
As a library, we provide them with the fundamental building blocks with 100% clarity and simplicity.
Install LightRAG with pip:

* Two fundamental and powerful base classes: Component for the pipeline and DataClass for data interaction with LLMs.
* We end up with less than two levels of subclasses. Class Hierarchy Visualization.
* The result is a library with bare minimum abstraction, providing developers with maximum customizability.
```bash
pip install lightrag
```

Similar to the PyTorch module, our Component provides excellent visualization of the pipeline structure.
Please refer to the [full installation guide](https://lightrag.sylph.ai/get_started/installation.html) for more details.

```
SimpleQA(
(generator): Generator(
model_kwargs={'model': 'llama3-8b-8192'},
(prompt): Prompt(
template: <SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
, prompt_variables: ['input_str']
)
(model_client): GroqAPIClient()
)
)
```

## Controllability

Our simplicity did not come from doing 'less'.
On the contrary, we have to do 'more' and go 'deeper' and 'wider' on any topic to offer developers maximum control and robustness.
You can place the above code in your project's root ``__init__.py`` file.
This setup ensures that LightRAG can access all necessary configurations during runtime.

* LLMs are sensitive to the prompt. We allow developers full control over their prompts without relying on API features such as tools and JSON format with components like Prompt, OutputParser, FunctionTool, and ToolManager.
* Our goal is not to optimize for integration, but to provide a robust abstraction with representative examples. See this in ModelClient and Retriever.
* All integrations, such as different API SDKs, are formed as optional packages but all within the same library. You can easily switch to any models from different providers that we officially support.
# Documentation

## Future of LLM Applications
LightRAG full documentation available at [lightrag.sylph.ai](https://lightrag.sylph.ai/):

On top of the easiness to use, we in particular optimize the configurability of components for researchers to build their solutions and to benchmark existing solutions.
Like how PyTorch has united both researchers and production teams, it enables smooth transition from research to production.
With researchers building on LightRAG, production engineers can easily take over the method and test and iterate on their production data.
Researchers will want their code to be adapted into more products too.
- [Introduction](https://lightrag.sylph.ai/)
- [Full installation guide](https://lightrag.sylph.ai/get_started/installation.html)
- [Design philosophy](https://lightrag.sylph.ai/developer_notes/lightrag_design_philosophy.html)
- [Class hierarchy](https://lightrag.sylph.ai/developer_notes/class_hierarchy.html)
- [Tutorials](https://lightrag.sylph.ai/developer_notes/index.html)
- [API reference](https://lightrag.sylph.ai/apis/index.html)



## Contributors

[![contributors](https://contrib.rocks/image?repo=SylphAI-Inc/LightRAG&max=2000)](https://github.com/SylphAI-Inc/LightRAG/graphs/contributors)

# Citation

```bibtex
@software{Yin-LightRAG-2024,
author = {Yin, Li},
title = {{LightRAG: The PyTorch Library for Large language Model (LLM) Applications}},
month = {7},
year = {2024},
url = {https://github.com/SylphAI-Inc/LightRAG}
}
```
2 changes: 1 addition & 1 deletion docs/source/get_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Or, you can load it yourself with ``python-dotenv``:
You can place the above code in your project's root ``__init__.py`` file.
This setup ensures that LightRAG can access all necessary configurations during runtime.

1. Install Optional Packages
4. Install Optional Packages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Expand Down
138 changes: 90 additions & 48 deletions lightrag/README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,108 @@
<!-- # LightRAG
[1000 lines of code are all you need](https://github.com/Sylph-AI/LightRAG/blob/main/lightrag/light_rag.py). No lock-in to vendors and frameworks, only the best practices of **productionable RAG and Agent**.
![LightRAG Logo](../docs/source/_static/images/LightRAG-logo-doc.jpeg)

## What is LightRAG?

⚡ The PyTorch Library for Large language Model (LLM) Applications ⚡

LightRAG comes from the best of the AI research and engineering. Fundamentally, we ask ourselves: what kind of system that combines the best of research(such as LLM), engineering (such as 'jinja') to build the best applications?
We are not a framework. We do not want you to directly install the package. We want you to carefully decide to take modules and structures from here to build your own library and applications. This is a cookbook organized uniquely for easy understanding: you can read the 1000 lines of code to see a typical RAG end-to-end without jumping between files and going through multi-level class inheritance. If we build our system expanding from `light_rag.py`, we as a community will share the same RAG languages, and share other building blocks and use cases easily without depending on a complex framework.
We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines.
It is *light*, *modular*, and *robust*.

**Our principles:**

- We recognize that building a RAG is a highly complex and customized process, and no framework can offer that flexibility.
- We seek to understand, not to establish.
- We seek to be flexible, not rigid.
- We seek to be organized and clear, not complex and confusing.
- We seek to be open, with open contributions from the community and research on use cases, maybe we will come up with a more established framework in the future.
- We seek to learn from code; we don't read documentation that calls very high-level functions and classes.

Note: If we can't understand your code quickly and easily, you should not add it to the library.
**PyTorch**

This is a new beginning, where all developers can come together, enjoy learning and building, have fun, and build the best RAGs in the world.
```python
import torch
import torch.nn as nn

We stay neutral to all frameworks and all vendors, and we do not allow vendors to merge code without a clear, articulate community vouch for their performance, and comparison with other vendor or open-source solutions.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)

**Our opinions:**
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.dropout1(x)
x = self.dropout2(x)
x = self.fc1(x)
return self.fc2(x)
```

We are very opinionated but we ground our opinions on best practices. Here are some of our unique opinions:
- We code in a way that we can switch between model providers or between oss and proprietary models easily.
**LightRAG**

How? LLM is a "text-in-text-out" model. `Prompts` are the new model parameters--The in-context learning. We want full-control over it. Any model provider API that manipulates our input prompts and output text, we dont use it. Three examples are OpenAI's `role`, `function call` (`tool`), `json output mode`. The problem with this: our prompt cannot be directly adapted to other LLM providers and we lose transparency, adding more uncontrollable variables to our system.
- We write the `prompts` all together like writing a document instead of separating them into multiple strings or variables. We think `jinja2` speaks the best of the prompt language. [Here show how Llamaindex addes different prompts together but we put all of them together.] Yes, we think manual prompt enginerring is just a stage, like manually label your data. The future is another LLM can take your description and will be optimized to do your prompt, or another `hypernetwork` will convert the lengthy prompt into parameters that can be plugged into the model.
```python

# Structure
<p align="center">
<img src="images/lightrag_structure.png" alt="Alt text" width="800">
<br>
<em>LightRAG structure</em>
</p>
from lightrag.core import Component, Generator
from lightrag.components.model_client import GroqAPIClient
from lightrag.utils import setup_env #noqa

## Foundation
- `lightrag/`: All core data structures, the core 1000 lines of code to cover the essence of a performant RAG.
## Building blocks
- `extend/`: All modules that can be used to extend the core RAG.
1. Mainly including functional modules: we can extend different `Embedder`, `Retriever`, `Generator`, `Agent`, `Reranker`, etc.
2. Tracking or monitoring modules: such as `CallbackManager`, `Phoenix` integration. When necessary, please add a `README.md` to explain why this module is necessary and how to use it.
- `tests/`: All tests for the core RAG and its extensions. Includng `dummy modules` for testing new modules.
class SimpleQA(Component):
def __init__(self):
super().__init__()
template = r"""<SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
"""
self.generator = Generator(
model_client=GroqAPIClient(),
model_kwargs={"model": "llama3-8b-8192"},
template=template,
)

## End-to-end applications
- `use_cases/`: All use cases that can be solved using LightRAG. For instance, we can solve `Question Answering`, `Summarization`, `Information Extraction`, etc.
def call(self, query):
return self.generator({"input_str": query})

To add a new use case, you can add a new folder in `use_cases/` with the name `application_name` and add the following files:
- `/prompt`: a directory containing all prompts used in the application.
- `/data`: a directory containing all data used in the application, or instructions on how to download the data.
- `/main.py`: a file containing the main code to run the application.
async def acall(self, query):
return await self.generator.acall({"input_str": query})
```

# What is not part of LightRAG?
- Data processing: For instance, llamaindex has `from llama_index.core.ingestion import IngestionPipeline` which transforms the data that are either in the `Document` or `Chunk`. We do not cover this in LightRAG.
Similarly, `from llama_index.core.postprocessor import SimilarityPostprocessor` which processes the retrieved `chunk`, sometimes with further filtering.
## Quick Install

# How to start?
Install LightRAG with pip:

1. Clone the repository.
2. Setup API keys by make a copy of `.env.example` to `.env` and fill in the necessary API keys.
3. Setup the Python environment using `poetry install`. And activate the environment using `poetry shell`.
4. (For contributors only) Install pre-commit into your git hooks using `pre-commit install`, which will automatically check the code standard on every commit.
5. Now you should run any file in the repo. -->
```bash
pip install lightrag
```

Please refer to the [full installation guide](https://lightrag.sylph.ai/get_started/installation.html) for more details.



You can place the above code in your project's root ``__init__.py`` file.
This setup ensures that LightRAG can access all necessary configurations during runtime.

# Documentation

LightRAG full documentation available at [lightrag.sylph.ai](https://lightrag.sylph.ai/):

- [Introduction](https://lightrag.sylph.ai/)
- [Full installation guide](https://lightrag.sylph.ai/get_started/installation.html)
- [Design philosophy](https://lightrag.sylph.ai/developer_notes/lightrag_design_philosophy.html)
- [Class hierarchy](https://lightrag.sylph.ai/developer_notes/class_hierarchy.html)
- [Tutorials](https://lightrag.sylph.ai/developer_notes/index.html)
- [API reference](https://lightrag.sylph.ai/apis/index.html)



## Contributors

[![contributors](https://contrib.rocks/image?repo=SylphAI-Inc/LightRAG&max=2000)](https://github.com/SylphAI-Inc/LightRAG/graphs/contributors)

# Citation

```bibtex
@software{Yin-LightRAG-2024,
author = {Yin, Li},
title = {{LightRAG: The PyTorch Library for Large language Model (LLM) Applications}},
month = {7},
year = {2024},
url = {https://github.com/SylphAI-Inc/LightRAG}
}
```

0 comments on commit 45e61ad

Please sign in to comment.