Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: RAGFlow API proposal #1102

Open
30 of 44 tasks
JinHai-CN opened this issue Jun 8, 2024 · 20 comments
Open
30 of 44 tasks

[Feature Request]: RAGFlow API proposal #1102

JinHai-CN opened this issue Jun 8, 2024 · 20 comments

Comments

@JinHai-CN
Copy link
Contributor

JinHai-CN commented Jun 8, 2024

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Describe the feature you'd like

RAGFlow's API interfaces are not enough and theAPI are not RESTful style. The goal of this issue is to propose the RESTFul APIs which covers most functions of RAGFlow.

Knowledge base

Content management in knowledge base

AI assistant management

  • create an assistant
  • remove assistants
  • list assistants
  • update assistant config
  • get the description of a specific assitant

Conversation management

  • create a conversation
  • delete conversations
  • list conversations
  • chat
  • get the conversation history.

File management

  • create a directory
  • remove directories from a directory
  • move a directory
  • copy a directory
  • get the description of a specific directory
  • list file or directory from a parent directory
  • upload files into a specific directory
  • remove files from a specific directory
  • download files from a specific directory
  • move file
  • copy file
  • attach files to a knowledge base.
  • get the description of a specific file

Related issues: #345 #717

@Scoutink
Copy link

Scoutink commented Jun 8, 2024

This is exactly what I am looking for...

@JinHai-CN JinHai-CN pinned this issue Jun 8, 2024
@JinHai-CN JinHai-CN mentioned this issue Jun 8, 2024
27 tasks
@yangboz
Copy link
Contributor

yangboz commented Jun 18, 2024

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Describe the feature you'd like

RAGFlow's API interfaces are not enough and theAPI are not RESTful style. The goal of this issue is to propose the RESTFul APIs which covers most functions of RAGFlow.

Knowledge base

  • create knowledge base API: create dataset #1106
  • remove knowledge base
  • update knowledge base
  • list knowledge bases
  • get the description of a specific knowledge base

Content management in knowledge base

  • upload files
  • download files
  • remove files
  • update file attributes(name, enable status, ...)
  • list files
  • get the description of a specific file
  • start parsing a file
  • abort file parsing
  • get parsing progress
  • get the chunk list of a parsed file
  • remove chunks of a parsed file
  • download/fetch a chunk of a parsed file
  • update the chunk status
  • insert a new chunk to a parse file
  • retrieval test on a specific knowledge base

File management

  • create a directory
  • remove directories from a directory
  • move a directory
  • copy a directory
  • get the description of a specific directory
  • list file or directory from a parent directory
  • upload files into a specific directory
  • remove files from a specific directory
  • download files from a specific directory
  • move file
  • copy file
  • attach files to a knowledge base.
  • get the description of a specific file

AI assistant management

  • create an assistant
  • remove assistants
  • list assistants
  • update assistant config
  • get the description of a specific assitant

Model management

  • list models
  • get the description of a specific model

Conversation management

  • create a conversation
  • delete conversations
  • list conversations
  • chat
  • get the conversation history.

Related issues: #345 #717

I am wondering with flaskrest,flaskrestplus will helps a lot. or flask_restx

@jeremi
Copy link

jeremi commented Jun 19, 2024

  • create a knowledge base
    is it supposed to work? I could not make it work.
    There seem to be a few issues with the SDK, including configuring the wrong URL path.

for me:
self.api_url = f"{base_url}/api/{version}"
should be:
self.api_url = f"{base_url}/{version}/api"

Also even after this change I could not call the create dataset endpoint.

@cecilia-uu
Copy link
Contributor

cecilia-uu commented Jun 19, 2024

  • create a knowledge base
    is it supposed to work? I could not make it work.
    There seem to be a few issues with the SDK, including configuring the wrong URL path.

for me: self.api_url = f"{base_url}/api/{version}" should be: self.api_url = f"{base_url}/{version}/api"

Also even after this change I could not call the create dataset endpoint.

Hi jeremi, thanks for your question. I would like to inform you that we have introduced a newly proposed API endpoint - http://<host_address>/api/v1/. The previous URL you mentioned is now deprecated . If you want to create a dataset, you can use http://<host_address>/api/v1/dataset by POST request.

@jeremi
Copy link

jeremi commented Jun 19, 2024

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.

If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

@cecilia-uu
Copy link
Contributor

Could you share your screenshot for the input and output?

@Valdanitooooo Valdanitooooo mentioned this issue Aug 1, 2024
1 task
KevinHuSh pushed a commit that referenced this issue Aug 1, 2024
### What problem does this PR solve?

Add retrieval api on a specific knowledge base


![ragflow](https://github.com/user-attachments/assets/dc30a4c3-03c5-4d34-bb7c-60b8830f1225)

#1102

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
@yingfeng yingfeng mentioned this issue Aug 6, 2024
46 tasks
@yingfeng yingfeng unpinned this issue Aug 6, 2024
@yingfeng yingfeng pinned this issue Aug 6, 2024
@TTTnlp
Copy link

TTTnlp commented Aug 16, 2024

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.

If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

i have the same question

@RELmon25
Copy link

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.

If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

Same issue.

If I post http://localhost/api/v1/dataset it returns:

<html>

<head>
	<title>405 Not Allowed</title>
</head>

<body>
	<center>
		<h1>405 Not Allowed</h1>
	</center>
	<hr>
	<center>nginx/1.18.0 (Ubuntu)</center>
</body>

</html>

I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?

@Feiue
Copy link
Contributor

Feiue commented Aug 29, 2024

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.
If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

Same issue.

If I post http://localhost/api/v1/dataset it returns:

<html>

<head>
	<title>405 Not Allowed</title>
</head>

<body>
	<center>
		<h1>405 Not Allowed</h1>
	</center>
	<hr>
	<center>nginx/1.18.0 (Ubuntu)</center>
</body>

</html>

I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?

First, http://localhost/api/v1/dataset is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.

KevinHuSh added a commit that referenced this issue Aug 29, 2024
### What problem does this PR solve?

Complete implementation of dataset SDK.
#1102

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Feiue <10215101452@stu.ecun.edu.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
KevinHuSh pushed a commit that referenced this issue Aug 30, 2024
### What problem does this PR solve?

Complete DataSet SDK implementation
#1102

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Feiue <10215101452@stu.ecun.edu.cn>
@Feiue Feiue mentioned this issue Sep 5, 2024
1 task
KevinHuSh pushed a commit that referenced this issue Sep 5, 2024
### What problem does this PR solve?

SDK for Assistant
#1102 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Feiue <10215101452@stu.ecun.edu.cn>
@Feiue Feiue mentioned this issue Sep 9, 2024
1 task
KevinHuSh added a commit that referenced this issue Sep 9, 2024
### What problem does this PR solve?

SDK for session
#1102 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Feiue <10215101452@stu.ecun.edu.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
@Feiue Feiue mentioned this issue Sep 10, 2024
1 task
KevinHuSh pushed a commit that referenced this issue Sep 11, 2024
### What problem does this PR solve?

Includes SDK for creating, updating sessions, getting sessions, listing
sessions, and dialogues
#1102 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
@Valdanitooooo
Copy link
Contributor

这部分创建知识库API的代码 #1106 看上去和 https://github.com/infiniflow/ragflow/blob/main/api/apps/kb_app.py#L39 应用中的创建知识库API的方法内容差距挺大,而且 API的路径也完全不同,我理解这出自不同开发人员,而且知识库和数据集确实是一一对应的,但创建知识库就是创建知识库,用 /api/v1/dataset 这样的url会很难受啊

我之前因为急需要用 retrieval api 所以也提了个 PR #1763 但其实方法的内容主要都是从 https://github.com/infiniflow/ragflow/blob/main/api/apps/chunk_app.py#L256 的retrieval_test方法复制过来的

这样的开发方式让我对代码的健壮性比较担忧,屎山都是你一铲子我一铲子堆起来的,趁屎堆不大就该早点解决

所以能不能对 API 部分重构一下,我初步的设想:

  1. 抽象出一个 service 层,实现应用端和API端所共用的逻辑
  2. API 风格:API 路径要和应用端一一对应 vs API端和应用端都重构为Restful风格 vs 只把API端改为 Restful 风格,这三种我倾向在后两种选一个

大家如果有更好的想法可以提出来我们一起讨论,有好的方案我可以来开发这部分内容

@yangboz
Copy link
Contributor

yangboz commented Sep 12, 2024

这部分创建知识库API的代码 #1106 看上去和 https://github.com/infiniflow/ragflow/blob/main/api/apps/kb_app.py#L39 应用中的创建知识库API的方法内容差距挺大,而且 API的路径也完全不同,我理解这出自不同开发人员,而且知识库和数据集确实是一一对应的,但创建知识库就是创建知识库,用 /api/v1/dataset 这样的url会很难受啊

我之前因为急需要用 retrieval api 所以也提了个 PR #1763 但其实方法的内容主要都是从 https://github.com/infiniflow/ragflow/blob/main/api/apps/chunk_app.py#L256 的retrieval_test方法复制过来的

这样的开发方式让我对代码的健壮性比较担忧,屎山都是你一铲子我一铲子堆起来的,趁屎堆不大就该早点解决

所以能不能对 API 部分重构一下,我初步的设想:

  1. 抽象出一个 service 层,实现应用端和API端所共用的逻辑
  2. API 风格:API 路径要和应用端一一对应 vs API端和应用端都重构为Restful风格 vs 只把API端改为 Restful 风格,这三种我倾向在后两种选一个

大家如果有更好的想法可以提出来我们一起讨论,有好的方案我可以来开发这部分内容

IMHO 依我拙见,可以基于RAGFLOW已有实现的可以对外提供的resources[dataset,agent,dialog,conversation,tenant,user]来RESTFUI,或者openAPI标准更佳:- )

@JinHai-CN
Copy link
Contributor Author

@Valdanitooooo @yangboz Thank you guys comments on RAGFlow API.
We intend to create an international community, so we encourage using English for communication.

@KevinHuSh
Copy link
Collaborator

Good point. We're gona spend more time on this.

@Valdanitooooo
Copy link
Contributor

@yangboz @JinHai-CN @KevinHuSh In order to not disrupt existing functionality, I am attempting to refactor the API in a new directory. The most ideal scenario is for the Web APP API, Server API, and SDK API to all use the same code. I hope everything goes smoothly.

image

image

@KevinHuSh
Copy link
Collaborator

Hint: APIs to Web/SDK/developers are somewhat different.

@KevinHuSh
Copy link
Collaborator

@yangboz @JinHai-CN @KevinHuSh In order to not disrupt existing functionality, I am attempting to refactor the API in a new directory. The most ideal scenario is for the Web APP API, Server API, and SDK API to all use the same code. I hope everything goes smoothly.

image

image

What about choosing a relative naive one to file a pull request?

@Valdanitooooo
Copy link
Contributor

What about choosing a relative naive one to file a pull request?

I won't do everything at once. I hope to only complete the API for the dataset as a starting point, and then everyone can discuss together what defects and issues are. If the solution is mature, then assign the API of each resource to different people to complete.
I think we must be cautious. I am still using version 0.8.0, and the bugs in the new version are causing me headaches. I don't want to create more bugs for this project.

KevinHuSh pushed a commit that referenced this issue Sep 18, 2024
### What problem does this PR solve?

discuss:#1102

#### Completed
1. Integrate API Flask to generate Swagger API documentation, through
http://ragflow_host:ragflow_port/v1/docs visit
2. Refactored http_token_auth
```
class AuthUser:
    def __init__(self, tenant_id, token):
        self.id = tenant_id
        self.token = token

    def get_token(self):
        return self.token


@http_token_auth.verify_token
def verify_token(token: str) -> Union[AuthUser, None]:
    try:
        objs = APIToken.query(token=token)
        if objs:
            api_token = objs[0]
            user = AuthUser(api_token.tenant_id, api_token.token)
            return user
    except Exception as e:
        server_error_response(e)
    return None

# resources api
@manager.auth_required(http_token_auth)
def get_all_datasets(query_data):
	....
```
3. Refactored the Datasets (Knowledgebase) API to extract the
implementation logic into the api/apps/services directory

![image](https://github.com/user-attachments/assets/ad1f16f1-b0ce-4301-855f-6e162163f99a)
4. Python SDK, I only added get_all_datasets as an attempt, Just to
verify that SDK API and Server API can use the same method.
```
from ragflow.ragflow import RAGFLow
ragflow = RAGFLow('<ACCESS_KEY>', 'http://127.0.0.1:9380')
ragflow.get_all_datasets()
```
5. Request parameter validation, as an attempt, may not be necessary as
this feature is already present at the data model layer. This is mainly
easier to test the API in Swagger Docs service
```
class UpdateDatasetReq(Schema):
    kb_id = fields.String(required=True)
    name = fields.String(validate=validators.Length(min=1, max=128))
    description = fields.String(allow_none=True)
    permission = fields.String(validate=validators.OneOf(['me', 'team']))
    embd_id = fields.String(validate=validators.Length(min=1, max=128))
    language = fields.String(validate=validators.OneOf(['Chinese', 'English']))
    parser_id = fields.String(validate=validators.OneOf([parser_type.value for parser_type in ParserType]))
    parser_config = fields.Dict()
    avatar = fields.String()
```

#### TODO

1. Simultaneously supporting multiple authentication methods, so that
the Web API can use the same method as the Server API, but perhaps this
feature is not important.
I tried using this method, but it was not successful. It only allows
token authentication when not logged in, but cannot skip token
authentication when logged in 😢
```
def http_basic_auth_required(func):
    @wraps(func)
    def decorated_view(*args, **kwargs):
        if 'Authorization' in flask_request.headers:
            # If the request header contains a token, skip username and password verification
            return func(*args, **kwargs)
        if flask_request.method in EXEMPT_METHODS or current_app.config.get("LOGIN_DISABLED"):
            pass
        elif not current_user.is_authenticated:
            return current_app.login_manager.unauthorized()

        if callable(getattr(current_app, "ensure_sync", None)):
            return current_app.ensure_sync(func)(*args, **kwargs)
        return func(*args, **kwargs)

    return decorated_view
```
2. Refactoring the SDK API using the same method as the Server API is
feasible and constructive, but it still requires time
I see some differences between the Web and SDK APIs, such as the
key_mapping handling of the returned results. Until I figure it out, I
cannot modify these codes to avoid causing more problems

```
    for kb in kbs:
        key_mapping = {
            "chunk_num": "chunk_count",
            "doc_num": "document_count",
            "parser_id": "parse_method",
            "embd_id": "embedding_model"
        }
        renamed_data = {}
        for key, value in kb.items():
            new_key = key_mapping.get(key, key)
            renamed_data[new_key] = value
        renamed_list.append(renamed_data)
    return get_json_result(data=renamed_list)
```

### Type of change

- [x] Refactoring
KevinHuSh added a commit that referenced this issue Sep 18, 2024
### What problem does this PR solve?

#1102

### Type of change

- [x] Performance Improvement

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
@RELmon25
Copy link

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.
If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

Same issue.
If I post http://localhost/api/v1/dataset it returns:

<html>

<head>
	<title>405 Not Allowed</title>
</head>

<body>
	<center>
		<h1>405 Not Allowed</h1>
	</center>
	<hr>
	<center>nginx/1.18.0 (Ubuntu)</center>
</body>

</html>

I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?

First, http://localhost/api/v1/dataset is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.

It actually is a valid URL. As you can se in the documentation of ragflow_api, http://<host_address>/api/v1/dataset is a valid endpoint.

Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:

image

The decorator @login_required can be found in the line 128 before the Get dataset list method.

If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the @login_required is commented:

image

Or like in the case of the Get answer method, just isn't there at all.

image

So, anyone knows how to log in?

@Feiue
Copy link
Contributor

Feiue commented Sep 27, 2024

It actually is a valid URL. As you can se in the documentation of ragflow_api, http://<host_address>/api/v1/dataset is a valid endpoint.

Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:

image The decorator `@login_required` can be found in the line [128](https://github.com/infiniflow/ragflow/blob/main/api/apps/dataset_api.py#L128) before the [Get dataset list](https://github.com/infiniflow/ragflow/blob/main/docs/references/ragflow_api.md#get-dataset-list) method.

If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the @login_required is commented:

image Or like in the case of the [Get answer](https://github.com/infiniflow/ragflow/blob/main/docs/references/api.md#get-answer) method, just isn't there at all. image So, anyone knows how to log in?

The ragflow_api may have some issues. You can look __init__.py, then you will find http://localhost/api/v1/dataset directs to api/apps/sdk/dataset.py, but / is not implemented.In this file, token_required instead of login_required. To get the token, reference ragflow_api since authentication part is correct.

@Valdanitooooo
Copy link
Contributor

I used RAGFlow as a knowledge base management tool and refactored some APIs that need to be used in my application.
If anyone needs more APIs, they can contribute code, and I will also take some vacation time to add more APIs.

swagger docs: http://your_ragflow_address/v1/docs
image

sdk usage

import os

from dotenv import load_dotenv
from ragflow import RAGFlow

load_dotenv()

RAGFLOW_API_KEY = os.environ.get("RAGFLOW_API_KEY", "")
RAGFLOW_ADDRESS = os.environ.get("RAGFLOW_ADDRESS", "")
EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "")

ragflow = RAGFlow(RAGFLOW_API_KEY, RAGFLOW_ADDRESS)


# 查询所有知识库
def get_all_datasets():
    res = ragflow.dataset.list()
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


# 通过名字查询知识库
def get_dataset_by_name(dataset_name):
    res = ragflow.dataset.find_by_name(dataset_name)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


# 创建知识库
def create_dataset(dataset_name):
    res = ragflow.dataset.create(dataset_name)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


# 创建知识库
def update_dataset(kb_id):
    res = ragflow.dataset.update(
        kb_id=kb_id, language="Chinese", embd_id=EMBEDDING_MODEL, parser_id="naive", parser_config={
            "raptor": {"use_raptor": False}, "chunk_token_num": 256, "layout_recognize": True
        }
    )
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


# 上传文档到知识库
def upload_documents_2_dataset(kb_id: str, file_paths: list[str]):
    res = ragflow.document.upload(kb_id, file_paths)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


# 查询知识库中的文档
def get_all_documents(kb_id: str):
    res = ragflow.dataset.list_documents(kb_id)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


# 修改文档解析方法
def change_document_parser(doc_id: str, parser_id: str, parser_config: dict):
    res = ragflow.document.change_parser(doc_id, parser_id, parser_config)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


# 执行解析文档
def documents_run_parsing(doc_ids):
    res = ragflow.document.run_parsing(doc_ids=doc_ids)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)


def retrieval(kb_id, question, top):
    res = ragflow.dataset.retrieval(
        kb_id=kb_id, question=question, page_size=top, top_k=top, similarity_threshold=0.2)
    if "retmsg" in res and res["retmsg"] == "success":
        try:
            chunks = res['data']['chunks']
            docs_str = ""
            if len(chunks) > 0:
                for chunk in chunks:
                    docs_str += "\n-------\n\n" + chunk["content_with_weight"].replace("\r", "\n") + "\n\n"
            print(docs_str)
            return docs_str
        except Exception as e:
            print(e)
    return "未检索到结果"

KevinHuSh pushed a commit that referenced this issue Sep 29, 2024
### What problem does this PR solve?

#1102

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

11 participants