Skip to content

Commit

Permalink
update the documentation main page, remove author names in tutorials
Browse files Browse the repository at this point in the history
  • Loading branch information
liyin2015 committed Jul 6, 2024
1 parent 4524fea commit c20e0b1
Show file tree
Hide file tree
Showing 17 changed files with 211 additions and 72 deletions.
1 change: 1 addition & 0 deletions docs/source/apis/core/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Overview

core.base_data_class
core.component
core.container
core.default_prompt_template
core.embedder
core.generator
Expand Down
1 change: 1 addition & 0 deletions docs/source/apis/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ All base/abstract classes, core components like generator, embedder, and basic f
.. autosummary::
core.component
core.container
core.base_data_class
core.default_prompt_template
core.model_client
Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_notes/base_data_class.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
DataClass
============

.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Li Yin <https://github.com/liyin2015>`_
.. `Li Yin <https://github.com/liyin2015>`_
In `PyTorch`, ``Tensor`` is the data type used in ``Module`` and ``Optimizer`` across the library.
Tensor wraps a multi-dimensional matrix to better support its operations and computations.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_notes/db.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Data & RAG
====================

.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Li Yin <https://github.com/liyin2015>`_
.. `Li Yin <https://github.com/liyin2015>`_
The purpose of this note is to provide an overview on data, data modeling, and data storage in LLM applications along with how LightRAG works with data.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_notes/embedder.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Embedder
============
.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Li Yin <https://github.com/liyin2015>`_
.. `Li Yin <https://github.com/liyin2015>`_
What you will learn?

Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_notes/evaluation.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
LLM Evaluation
====================================

.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Meng Liu <https://github.com/mengliu1998>`_
.. `Meng Liu <https://github.com/mengliu1998>`_
"You cannot improve what you cannot measure". This is especially true in the context of LLMs, which have become increasingly popular due to their impressive performance on a wide range of tasks. Evaluating LLMs and their applications is crucial in both research and production to understand their capabilities and limitations.
Overall, such evaluation is a complex and multifaceted process. Below, we provide a guideline for evaluating LLMs and their applications, incorporating aspects outlined by *Chang et al.* [1]_:
Expand Down
24 changes: 12 additions & 12 deletions docs/source/developer_notes/logging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,22 +106,22 @@ Config 3 can be quite neat:
- You can enable different levels of logging for the library and your application.
- You can easily focus on debugging your own code without being distracted by the library logs and still have the option to see the library logs if needed.

Create a named logger
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. Create a named logger
.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
.. .. code-block:: python
from lightrag.utils.logger import get_logger
.. from lightrag.utils.logger import get_logger
app_logger = get_logger(name="my_app", level="DEBUG", save_dir="./logs") # log to ./logs/my_app.log
# or
logger = get_logger(name=__name__, level="DEBUG", save_dir="./logs", filename="my_app.log")
.. app_logger = get_logger(name="my_app", level="DEBUG", save_dir="./logs") # log to ./logs/my_app.log
.. # or
.. logger = get_logger(name=__name__, level="DEBUG", save_dir="./logs", filename="my_app.log")
app_logger.debug("This is a debug message")
app_logger.info("This is an info message")
app_logger.warning("This is a warning message")
app_logger.error("This is an error message")
app_logger.critical("This is a critical message")
.. app_logger.debug("This is a debug message")
.. app_logger.info("This is an info message")
.. app_logger.warning("This is a warning message")
.. app_logger.error("This is an error message")
.. app_logger.critical("This is a critical message")
.. admonition:: References
Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_notes/model_client.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
ModelClient
============

.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Li Yin <https://github.com/liyin2015>`_
.. `Li Yin <https://github.com/liyin2015>`_
What you will learn?

Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_notes/prompt.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Prompt
============
.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Li Yin <https://github.com/liyin2015>`_
.. `Li Yin <https://github.com/liyin2015>`_
Context
----------------
Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_notes/retriever.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Retriever
============

.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Li Yin <https://github.com/liyin2015>`_
.. `Li Yin <https://github.com/liyin2015>`_
Context
------------------
Expand Down
29 changes: 14 additions & 15 deletions docs/source/developer_notes/text_splitter.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Text Splitter
-----------------
.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Xiaoyi Gu <https://github.com/Alleria1809>`_
.. `Xiaoyi Gu <https://github.com/Alleria1809>`_
In this tutorial, we will learn:

Expand All @@ -19,7 +19,7 @@ LLMs’s context window is limited and the performance often drops with very lon
Shorter content is more manageable and fits memory constraint.
The goal of the text splitter is to chunk large data into smaller ones, potentially improving embedding and retrieving.

The ``TextSplitter`` is designed to efficiently process and chunk **plain text**.
The ``TextSplitter`` is designed to efficiently process and chunk **plain text**.
It leverages configurable separators to facilitate the splitting of :obj:`document object <core.types.Document>` into smaller manageable document chunks.

How does it work
Expand All @@ -30,11 +30,11 @@ The texts inside each window will get merged to a smaller chunk. The generated c

**Splitting Types**

``TextSplitter`` supports 2 types of splitting.
``TextSplitter`` supports 2 types of splitting.

* **Type 1:** Specify the exact text splitting point such as space<" "> and periods<".">. It is intuitive, for example, split_by "word":

::
::

"Hello, world!" -> ["Hello, " ,"world!"]

Expand All @@ -48,7 +48,7 @@ This aligns with how models see text in the form of tokens (`Reference <https://
Tokenizer reflects the real token numbers the models take in and helps the developers control budgets.

**Definitions**

* **split_by** specifies the split rule, i.e. the smallest unit during splitting. We support ``"word"``, ``"sentence"``, ``"page"``, ``"passage"``, and ``"token"``. The splitter utilizes the corresponding separator from the ``SEPARATORS`` dictionary.
For Type 1 splitting, we apply ``Python str.split()`` to break the text.

Expand All @@ -57,15 +57,15 @@ For Type 1 splitting, we apply ``Python str.split()`` to break the text.
.. note::
For option ``token``, its separator is "" because we directly split by a tokenizer, instead of text point.

* **chunk_size** is the the maximum number of units in each chunk.
* **chunk_size** is the the maximum number of units in each chunk.

* **chunk_overlap** is the number of units that each chunk should overlap. Including context at the borders prevents sudden meaning shift in text between sentences/context, especially in sentiment analysis.

Here are examples of how ``split_by``, ``chunk_size`` works with ``chunk_overlap``.
Document Text:
Document Text:

::

Hello, this is lightrag. Please implement your splitter here.


Expand Down Expand Up @@ -94,12 +94,12 @@ When splitting by ``word`` with ``chunk_size`` = 5 and ``chunk_overlap`` = 2,
each chunk will repeat 2 words from the previous chunk. These 2 words are set by ``chunk_overlap``.
This means each chunk has ``5-2=3`` word(split unit) difference compared with its previous.

When splitting using tokenizer, each chunk still keeps 5 tokens.
When splitting using tokenizer, each chunk still keeps 5 tokens.
For example, the tokenizer transforms ``lightrag`` to ['l', 'igh', 'trag']. So the second chunk is actually ``is`` + ``l`` + ``igh`` + ``trag`` + ``.``.

.. note::
``chunk_overlap`` should always be smaller than ``chunk_size``, otherwise the window won't move and the splitting stucks.
When ``split_by`` = ``token``, the punctuation is considered as a token.
When ``split_by`` = ``token``, the punctuation is considered as a token.

How to use it
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -142,6 +142,5 @@ For **PDFs**, developers will need to extract the text before using the splitter

Customization Tips
~~~~~~~~~~~~~~~~~~~~~
You can also customize the ``SEPARATORS``. For example, by defining ``SEPARATORS`` = {"question": "?"} and setting ``split_by`` = "question", the document will be split at each ``?``, ideal for processing text structured
You can also customize the ``SEPARATORS``. For example, by defining ``SEPARATORS`` = {"question": "?"} and setting ``split_by`` = "question", the document will be split at each ``?``, ideal for processing text structured
as a series of questions. If you need to customize :class:`tokenizer <lightrag.core.tokenizer.Tokenizer>`, please check `Reference <https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb>`_.

6 changes: 3 additions & 3 deletions docs/source/developer_notes/tool_helper.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Function calls
===========================
.. admonition:: Author
:class: highlight
.. .. admonition:: Author
.. :class: highlight
`Li Yin <https://github.com/liyin2015>`_
.. `Li Yin <https://github.com/liyin2015>`_
Tools are means LLM can use to interact with the world beyond of its internal knowledge. Technically speaking, retrievers are tools to help LLM to get more relevant context, and memory is a tool for LLM to carry out a conversation.
Deciding when, which, and how to use a tool, and even to creating a tool is an agentic behavior:
Expand Down
24 changes: 11 additions & 13 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@
:width: 100%
:alt: LightRAG Logo


.. raw:: html

<h1 style="text-align: center; font-size: 2em; margin-top: 10px;">⚡ The PyTorch Library for Large Language Model Applications ⚡</h1>

*LightRAG* helps developers with both building and optimizing *Retriever-Agent-Generator (RAG)* pipelines.
It is *light*, *modular*, and *robust*.




.. |License| image:: https://img.shields.io/github/license/SylphAI-Inc/LightRAG
:target: https://opensource.org/license/MIT

Expand All @@ -18,9 +29,6 @@
.. |GitHub Stars| image:: https://img.shields.io/github/stars/SylphAI-Inc/LightRAG?style=flat-square
:target: https://star-history.com/#SylphAI-Inc/LightRAG

.. |Open Issues| image:: https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square
:target: https://github.com/SylphAI-Inc/LightRAG/issues

.. |Discord| image:: https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat
:target: https://discord.gg/zt2mTPcu

Expand All @@ -33,19 +41,9 @@
<a href="https://pypi.org/project/lightRAG/"><img src="https://img.shields.io/pypi/v/lightRAG?style=flat-square" alt="PyPI Version"></a>
<a href="https://pypistats.org/packages/lightRAG"><img src="https://img.shields.io/pypi/dm/lightRAG?style=flat-square" alt="PyPI Downloads"></a>
<a href="https://star-history.com/#SylphAI-Inc/LightRAG"><img src="https://img.shields.io/github/stars/SylphAI-Inc/LightRAG?style=flat-square" alt="GitHub Stars"></a>
<a href="https://github.com/SylphAI-Inc/LightRAG/issues"><img src="https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square" alt="Open Issues"></a>
<a href="https://discord.gg/zt2mTPcu"><img src="https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat" alt="Discord"></a>
</div>


===============================================================
⚡ The PyTorch Library for Large Language Model Applications ⚡
===============================================================

*LightRAG* helps developers with both building and optimizing *Retriever-Agent-Generator (RAG)* pipelines.
It is *light*, *modular*, and *robust*.


.. grid:: 1
:gutter: 1

Expand Down
9 changes: 8 additions & 1 deletion lightrag/lightrag/components/retriever/reranker_retriever.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,14 @@ def __init__(
self.top_k = top_k
self._model_kwargs = model_kwargs or {}
assert "model" in self._model_kwargs, "model must be specified in model_kwargs"

if not isinstance(self._model_kwargs, Dict):
raise TypeError(
f"{type(self).__name__} requires a dictionary for model_kwargs, not a string"
)
if not isinstance(model_client, ModelClient):
raise TypeError(
f"{type(self).__name__} requires a ModelClient instance for model_client"
)
self.model_client = model_client

self.reset_index()
Expand Down
Loading

0 comments on commit c20e0b1

Please sign in to comment.