update the documentation main page, remove author names in tutorials

CloudBee7 · Jul 6, 2024 · c20e0b1 · c20e0b1
1 parent 4524fea
commit c20e0b1
Show file tree

Hide file tree

Showing 17 changed files with 211 additions and 72 deletions.
diff --git a/docs/source/apis/core/index.rst b/docs/source/apis/core/index.rst
@@ -14,6 +14,7 @@ Overview
 
    core.base_data_class
    core.component
+   core.container
    core.default_prompt_template
    core.embedder
    core.generator

diff --git a/docs/source/apis/index.rst b/docs/source/apis/index.rst
@@ -17,6 +17,7 @@ All base/abstract classes, core components like generator, embedder, and basic f
 
 .. autosummary::
    core.component
+   core.container
    core.base_data_class
    core.default_prompt_template
    core.model_client

diff --git a/docs/source/developer_notes/base_data_class.rst b/docs/source/developer_notes/base_data_class.rst
@@ -2,10 +2,10 @@
 DataClass
 ============
 
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Li Yin <https://github.com/liyin2015>`_
+..    `Li Yin <https://github.com/liyin2015>`_
 
 In `PyTorch`, ``Tensor`` is the data type used in ``Module`` and ``Optimizer`` across the library.
 Tensor wraps a multi-dimensional matrix to better support its operations and computations.

diff --git a/docs/source/developer_notes/db.rst b/docs/source/developer_notes/db.rst
@@ -1,10 +1,10 @@
 Data & RAG
 ====================
 
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Li Yin <https://github.com/liyin2015>`_
+..    `Li Yin <https://github.com/liyin2015>`_
 
 
 The purpose of this note is to provide an overview on data, data modeling, and data storage in LLM applications along with how LightRAG works with data.

diff --git a/docs/source/developer_notes/embedder.rst b/docs/source/developer_notes/embedder.rst
@@ -1,9 +1,9 @@
 Embedder
 ============
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Li Yin <https://github.com/liyin2015>`_
+..    `Li Yin <https://github.com/liyin2015>`_
 
 What you will learn?
 

diff --git a/docs/source/developer_notes/evaluation.rst b/docs/source/developer_notes/evaluation.rst
@@ -1,10 +1,10 @@
 LLM Evaluation
 ====================================
 
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Meng Liu <https://github.com/mengliu1998>`_
+..    `Meng Liu <https://github.com/mengliu1998>`_
 
 "You cannot improve what you cannot measure". This is especially true in the context of LLMs, which have become increasingly popular due to their impressive performance on a wide range of tasks. Evaluating LLMs and their applications is crucial in both research and production to understand their capabilities and limitations.
 Overall, such evaluation is a complex and multifaceted process. Below, we provide a guideline for evaluating LLMs and their applications, incorporating aspects outlined by *Chang et al.* [1]_:

diff --git a/docs/source/developer_notes/logging.rst b/docs/source/developer_notes/logging.rst
@@ -106,22 +106,22 @@ Config 3 can be quite neat:
 - You can enable different levels of logging for the library and your application.
 - You can easily focus on debugging your own code without being distracted by the library logs and still have the option to see the library logs if needed.
 
-Create a named logger
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. Create a named logger
+.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. code-block:: python
+.. .. code-block:: python
 
-    from lightrag.utils.logger import get_logger
+..     from lightrag.utils.logger import get_logger
 
-    app_logger = get_logger(name="my_app", level="DEBUG", save_dir="./logs") # log to ./logs/my_app.log
-    # or
-    logger = get_logger(name=__name__, level="DEBUG", save_dir="./logs", filename="my_app.log")
+..     app_logger = get_logger(name="my_app", level="DEBUG", save_dir="./logs") # log to ./logs/my_app.log
+..     # or
+..     logger = get_logger(name=__name__, level="DEBUG", save_dir="./logs", filename="my_app.log")
 
-    app_logger.debug("This is a debug message")
-    app_logger.info("This is an info message")
-    app_logger.warning("This is a warning message")
-    app_logger.error("This is an error message")
-    app_logger.critical("This is a critical message")
+..     app_logger.debug("This is a debug message")
+..     app_logger.info("This is an info message")
+..     app_logger.warning("This is a warning message")
+..     app_logger.error("This is an error message")
+..     app_logger.critical("This is a critical message")
 
 
 .. admonition:: References

diff --git a/docs/source/developer_notes/model_client.rst b/docs/source/developer_notes/model_client.rst
@@ -1,10 +1,10 @@
 ModelClient
 ============
 
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Li Yin <https://github.com/liyin2015>`_
+..    `Li Yin <https://github.com/liyin2015>`_
 
 What you will learn?
 

diff --git a/docs/source/developer_notes/prompt.rst b/docs/source/developer_notes/prompt.rst
@@ -1,9 +1,9 @@
 Prompt
 ============
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Li Yin <https://github.com/liyin2015>`_
+..    `Li Yin <https://github.com/liyin2015>`_
 
 Context
 ----------------

diff --git a/docs/source/developer_notes/retriever.rst b/docs/source/developer_notes/retriever.rst
@@ -1,10 +1,10 @@
 Retriever
 ============
 
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Li Yin <https://github.com/liyin2015>`_
+..    `Li Yin <https://github.com/liyin2015>`_
 
 Context
 ------------------

diff --git a/docs/source/developer_notes/text_splitter.rst b/docs/source/developer_notes/text_splitter.rst
@@ -1,9 +1,9 @@
 Text Splitter
 -----------------
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Xiaoyi Gu <https://github.com/Alleria1809>`_
+..    `Xiaoyi Gu <https://github.com/Alleria1809>`_
 
 In this tutorial, we will learn:
 
@@ -19,7 +19,7 @@ LLMs’s context window is limited and the performance often drops with very lon
 Shorter content is more manageable and fits memory constraint.
 The goal of the text splitter is to chunk large data into smaller ones, potentially improving embedding and retrieving.
 
-The ``TextSplitter`` is designed to efficiently process and chunk **plain text**. 
+The ``TextSplitter`` is designed to efficiently process and chunk **plain text**.
 It leverages configurable separators to facilitate the splitting of :obj:`document object <core.types.Document>` into smaller manageable document chunks.
 
 How does it work
@@ -30,11 +30,11 @@ The texts inside each window will get merged to a smaller chunk. The generated c
 
 **Splitting Types**
 
-``TextSplitter`` supports 2 types of splitting. 
-    
+``TextSplitter`` supports 2 types of splitting.
+
 * **Type 1:** Specify the exact text splitting point such as space<" "> and periods<".">. It is intuitive, for example, split_by "word":
 
-:: 
+::
 
     "Hello, world!" -> ["Hello, " ,"world!"]
 
@@ -48,7 +48,7 @@ This aligns with how models see text in the form of tokens (`Reference <https://
 Tokenizer reflects the real token numbers the models take in and helps the developers control budgets.
 
 **Definitions**
-    
+
 * **split_by** specifies the split rule, i.e. the smallest unit during splitting. We support ``"word"``, ``"sentence"``, ``"page"``, ``"passage"``, and ``"token"``. The splitter utilizes the corresponding separator from the ``SEPARATORS`` dictionary.
 For Type 1 splitting, we apply ``Python str.split()`` to break the text.
 
@@ -57,15 +57,15 @@ For Type 1 splitting, we apply ``Python str.split()`` to break the text.
 .. note::
     For option ``token``, its separator is "" because we directly split by a tokenizer, instead of text point.
 
-* **chunk_size** is the the maximum number of units in each chunk. 
+* **chunk_size** is the the maximum number of units in each chunk.
 
 * **chunk_overlap** is the number of units that each chunk should overlap. Including context at the borders prevents sudden meaning shift in text between sentences/context, especially in sentiment analysis.
 
 Here are examples of how ``split_by``, ``chunk_size`` works with ``chunk_overlap``.
-Document Text: 
+Document Text:
 
 ::
-    
+
     Hello, this is lightrag. Please implement your splitter here.
 
 
@@ -94,12 +94,12 @@ When splitting by ``word`` with ``chunk_size`` = 5 and ``chunk_overlap`` = 2,
 each chunk will repeat 2 words from the previous chunk. These 2 words are set by ``chunk_overlap``.
 This means each chunk has ``5-2=3`` word(split unit) difference compared with its previous.
 
-When splitting using tokenizer, each chunk still keeps 5 tokens. 
+When splitting using tokenizer, each chunk still keeps 5 tokens.
 For example, the tokenizer transforms ``lightrag`` to ['l', 'igh', 'trag']. So the second chunk is actually ``is`` + ``l`` + ``igh`` + ``trag`` + ``.``.
 
 .. note::
     ``chunk_overlap`` should always be smaller than ``chunk_size``, otherwise the window won't move and the splitting stucks.
-    When ``split_by`` = ``token``, the punctuation is considered as a token.    
+    When ``split_by`` = ``token``, the punctuation is considered as a token.
 
 How to use it
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -142,6 +142,5 @@ For **PDFs**, developers will need to extract the text before using the splitter
 
 Customization Tips
 ~~~~~~~~~~~~~~~~~~~~~
-You can also customize the ``SEPARATORS``. For example, by defining ``SEPARATORS`` = {"question": "?"} and setting ``split_by`` = "question", the document will be split at each ``?``, ideal for processing text structured 
+You can also customize the ``SEPARATORS``. For example, by defining ``SEPARATORS`` = {"question": "?"} and setting ``split_by`` = "question", the document will be split at each ``?``, ideal for processing text structured
 as a series of questions. If you need to customize :class:`tokenizer <lightrag.core.tokenizer.Tokenizer>`, please check `Reference <https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb>`_.
-
diff --git a/docs/source/developer_notes/tool_helper.rst b/docs/source/developer_notes/tool_helper.rst
@@ -1,9 +1,9 @@
 Function calls
 ===========================
-.. admonition:: Author
-   :class: highlight
+.. .. admonition:: Author
+..    :class: highlight
 
-   `Li Yin <https://github.com/liyin2015>`_
+..    `Li Yin <https://github.com/liyin2015>`_
 
 Tools are means LLM can use to interact with the world beyond of its internal knowledge. Technically speaking, retrievers are tools to help LLM to get more relevant context, and memory is a tool for LLM to carry out a conversation.
 Deciding when, which, and how to use a tool, and even to creating a tool is an agentic behavior:

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,6 +6,17 @@
    :width: 100%
    :alt: LightRAG Logo
 
+
+.. raw:: html
+
+    <h1 style="text-align: center; font-size: 2em; margin-top: 10px;">⚡ The PyTorch Library for Large Language Model Applications ⚡</h1>
+
+*LightRAG* helps developers with both building and optimizing *Retriever-Agent-Generator (RAG)* pipelines.
+It is *light*, *modular*, and *robust*.
+
+
+
+
 .. |License| image:: https://img.shields.io/github/license/SylphAI-Inc/LightRAG
    :target: https://opensource.org/license/MIT
 
@@ -18,9 +29,6 @@
 .. |GitHub Stars| image:: https://img.shields.io/github/stars/SylphAI-Inc/LightRAG?style=flat-square
    :target: https://star-history.com/#SylphAI-Inc/LightRAG
 
-.. |Open Issues| image:: https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square
-   :target: https://github.com/SylphAI-Inc/LightRAG/issues
-
 .. |Discord| image:: https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat
    :target: https://discord.gg/zt2mTPcu
 
@@ -33,19 +41,9 @@
        <a href="https://pypi.org/project/lightRAG/"><img src="https://img.shields.io/pypi/v/lightRAG?style=flat-square" alt="PyPI Version"></a>
        <a href="https://pypistats.org/packages/lightRAG"><img src="https://img.shields.io/pypi/dm/lightRAG?style=flat-square" alt="PyPI Downloads"></a>
        <a href="https://star-history.com/#SylphAI-Inc/LightRAG"><img src="https://img.shields.io/github/stars/SylphAI-Inc/LightRAG?style=flat-square" alt="GitHub Stars"></a>
-       <a href="https://github.com/SylphAI-Inc/LightRAG/issues"><img src="https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square" alt="Open Issues"></a>
        <a href="https://discord.gg/zt2mTPcu"><img src="https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat" alt="Discord"></a>
    </div>
 
-
-===============================================================
-⚡ The PyTorch Library for Large Language Model Applications ⚡
-===============================================================
-
-*LightRAG* helps developers with both building and optimizing *Retriever-Agent-Generator (RAG)* pipelines.
-It is *light*, *modular*, and *robust*.
-
-
 .. grid:: 1
    :gutter: 1
 

diff --git a/lightrag/lightrag/components/retriever/reranker_retriever.py b/lightrag/lightrag/components/retriever/reranker_retriever.py
@@ -47,7 +47,14 @@ def __init__(
         self.top_k = top_k
         self._model_kwargs = model_kwargs or {}
         assert "model" in self._model_kwargs, "model must be specified in model_kwargs"
-
+        if not isinstance(self._model_kwargs, Dict):
+            raise TypeError(
+                f"{type(self).__name__} requires a dictionary for model_kwargs, not a string"
+            )
+        if not isinstance(model_client, ModelClient):
+            raise TypeError(
+                f"{type(self).__name__} requires a ModelClient instance for model_client"
+            )
         self.model_client = model_client
 
         self.reset_index()