Skip to content

Commit 6be2b3e

Browse files
📝Translating docs to Simplified Chinese (#2705)
* 📝Translating docs to Simplified Chinese * update files * 📝Translating docs to Simplified Chinese * 📝Translating docs to Simplified Chinese * 📝Translating docs to Simplified Chinese * update files * 📝Translating docs to Simplified Chinese * 📝Translating docs to Simplified Chinese * update files * translate 'hf_file_system.md' * update files
1 parent ca3f674 commit 6be2b3e

File tree

3 files changed

+253
-0
lines changed

3 files changed

+253
-0
lines changed

docs/source/cn/_toctree.yml

+4
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@
1616
title: 集合
1717
- local: guides/community
1818
title: 社区
19+
- local: guides/overview
20+
title: 概览
21+
- local: guides/hf_file_system
22+
title: Hugging Face 文件系统
1923
- title: "concepts"
2024
sections:
2125
- local: concepts/git_vs_http
+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
<!--⚠️ 请注意,此文件为 Markdown 格式,但包含我们文档生成器的特定语法(类似于 MDX),可能无法在您的 Markdown 查看器中正确渲染。
2+
-->
3+
4+
# 通过文件系统 API 与 Hub 交互
5+
6+
除了 [`HfApi`]`huggingface_hub` 库还提供了 [`HfFileSystem`],这是一个符合 [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) 规范的 Python 文件接口,用于与 Hugging Face Hub 交互。[`HfFileSystem`] 基于 [`HfApi`] 构建,提供了典型的文件系统操作,如 `cp``mv``ls``du``glob``get_file``put_file`
7+
8+
<Tip warning={true}>
9+
10+
[`HfFileSystem`] 提供了 fsspec 兼容性,这对于需要它的库(例如,直接使用 `pandas` 读取 Hugging Face 数据集)非常有用。然而,由于这种兼容性层,会引入额外的开销。为了更好的性能和可靠性,建议尽可能使用 [`HfApi`] 方法。
11+
12+
13+
</Tip>
14+
15+
## 使用方法
16+
17+
```python
18+
>>> from huggingface_hub import HfFileSystem
19+
>>> fs = HfFileSystem()
20+
21+
>>> # 列出目录中的所有文件
22+
>>> fs.ls("datasets/my-username/my-dataset-repo/data", detail=False)
23+
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']
24+
25+
>>> # 列出仓库中的所有 ".csv" 文件
26+
>>> fs.glob("datasets/my-username/my-dataset-repo/**/*.csv")
27+
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']
28+
29+
>>> # 读取远程文件
30+
>>> with fs.open("datasets/my-username/my-dataset-repo/data/train.csv", "r") as f:
31+
... train_data = f.readlines()
32+
33+
>>> # 远程文件内容读取为字符串
34+
>>> train_data = fs.read_text("datasets/my-username/my-dataset-repo/data/train.csv", revision="dev")
35+
36+
>>> # 写入远程文件
37+
>>> with fs.open("datasets/my-username/my-dataset-repo/data/validation.csv", "w") as f:
38+
... f.write("text,label")
39+
... f.write("Fantastic movie!,good")
40+
```
41+
42+
可以传递可选的 `revision` 参数,以从特定提交(如分支、标签名或提交哈希)运行操作。
43+
44+
与 Python 内置的 `open` 不同,`fsspec``open` 默认是二进制模式 `"rb"`。这意味着您必须明确设置模式为 `"r"` 以读取文本模式,或 `"w"` 以写入文本模式。目前不支持追加到文件(模式 `"a"``"ab"`
45+
46+
## 集成
47+
48+
[`HfFileSystem`] 可以与任何集成了 `fsspec` 的库一起使用,前提是 URL 遵循以下格式:
49+
50+
```
51+
hf://[<repo_type_prefix>]<repo_id>[@<revision>]/<path/in/repo>
52+
```
53+
54+
<div class="flex justify-center">
55+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/huggingface_hub/hf_urls.png"/>
56+
</div>
57+
58+
对于数据集,`repo_type_prefix``datasets/`,对于Space,`repo_type_prefix``spaces/`,模型不需要在 URL 中使用这样的前缀。
59+
60+
以下是一些 [`HfFileSystem`] 简化与 Hub 交互的有趣集成:
61+
62+
* 从 Hub 仓库读取/写入 [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-writing-remote-files) DataFrame :
63+
64+
```python
65+
>>> import pandas as pd
66+
67+
>>> # 将远程 CSV 文件读取到 DataFrame
68+
>>> df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv")
69+
70+
>>> # 将 DataFrame 写入远程 CSV 文件
71+
>>> df.to_csv("hf://datasets/my-username/my-dataset-repo/test.csv")
72+
```
73+
74+
同样的工作流程也适用于 [Dask](https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html)[Polars](https://pola-rs.github.io/polars/py-polars/html/reference/io.html) DataFrames.
75+
76+
* 使用 [DuckDB](https://duckdb.org/docs/guides/python/filesystems) 查询(远程)Hub文件:
77+
78+
```python
79+
>>> from huggingface_hub import HfFileSystem
80+
>>> import duckdb
81+
82+
>>> fs = HfFileSystem()
83+
>>> duckdb.register_filesystem(fs)
84+
>>> # 查询远程文件并将结果返回为 DataFrame
85+
>>> fs_query_file = "hf://datasets/my-username/my-dataset-repo/data_dir/data.parquet"
86+
>>> df = duckdb.query(f"SELECT * FROM '{fs_query_file}' LIMIT 10").df()
87+
```
88+
89+
* 使用 [Zarr](https://zarr.readthedocs.io/en/stable/tutorial.html#io-with-fsspec) 将 Hub 作为数组存储:
90+
91+
```python
92+
>>> import numpy as np
93+
>>> import zarr
94+
95+
>>> embeddings = np.random.randn(50000, 1000).astype("float32")
96+
97+
>>> # 将数组写入仓库
98+
>>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="w") as root:
99+
... foo = root.create_group("embeddings")
100+
... foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4')
101+
... foobar[:] = embeddings
102+
103+
>>> # 从仓库读取数组
104+
>>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="r") as root:
105+
... first_row = root["embeddings/experiment_0"][0]
106+
```
107+
108+
## 认证
109+
110+
在许多情况下,您必须登录 Hugging Face 账户才能与 Hub 交互。请参阅文档的[认证](../quick-start#authentication) 部分,了解有关 Hub 上认证方法的更多信息。
111+
112+
也可以通过将您的 token 作为参数传递给 [`HfFileSystem`] 以编程方式登录:
113+
114+
```python
115+
>>> from huggingface_hub import HfFileSystem
116+
>>> fs = HfFileSystem(token=token)
117+
```
118+
119+
如果您以这种方式登录,请注意在共享源代码时不要意外泄露令牌!

docs/source/cn/guides/overview.md

+130
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
<!--⚠️ 请注意,此文件为 Markdown 格式,但包含我们文档生成器的特定语法(类似于 MDX),可能无法在您的 Markdown 查看器中正确渲染。
2+
-->
3+
4+
# 操作指南
5+
6+
在本节中,您将找到帮助您实现特定目标的实用指南。
7+
查看这些指南,了解如何使用 huggingface_hub 解决实际问题:
8+
9+
<div class="mt-10">
10+
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">
11+
12+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
13+
href="./repository">
14+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
15+
仓库
16+
</div><p class="text-gray-700">
17+
如何在 Hub 上创建仓库?如何配置它?如何与之交互?
18+
</p>
19+
</a>
20+
21+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
22+
href="./download">
23+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
24+
下载文件
25+
</div><p class="text-gray-700">
26+
如何从 Hub 下载文件?如何下载仓库?
27+
</p>
28+
</a>
29+
30+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
31+
href="./upload">
32+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
33+
上传文件
34+
</div><p class="text-gray-700">
35+
如何上传文件或文件夹?如何对 Hub 上的现有仓库进行更改?
36+
</p>
37+
</a>
38+
39+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
40+
href="./search">
41+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
42+
搜索
43+
</div><p class="text-gray-700">
44+
如何高效地搜索超过 200k+ 个公共模型、数据集和Space?
45+
</p>
46+
</a>
47+
48+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
49+
href="./hf_file_system">
50+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
51+
HfFileSystem
52+
</div><p class="text-gray-700">
53+
如何通过一个模仿 Python 文件接口的便捷接口与 Hub 交互?
54+
</p>
55+
</a>
56+
57+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
58+
href="./inference">
59+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
60+
推理
61+
</div><p class="text-gray-700">
62+
如何使用加速推理 API 进行预测?
63+
</p>
64+
</a>
65+
66+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
67+
href="./community">
68+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
69+
社区
70+
</div><p class="text-gray-700">
71+
如何与社区(讨论和拉取请求)互动?
72+
</p>
73+
</a>
74+
75+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
76+
href="./collections">
77+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
78+
集合
79+
</div><p class="text-gray-700">
80+
如何以编程方式构建集合?
81+
</p>
82+
</a>
83+
84+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
85+
href="./manage-cache">
86+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
87+
缓存
88+
</div><p class="text-gray-700">
89+
缓存系统如何工作?如何从中受益?
90+
</p>
91+
</a>
92+
93+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
94+
href="./model-cards">
95+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
96+
模型卡片
97+
</div><p class="text-gray-700">
98+
如何创建和分享模型卡片?
99+
</p>
100+
</a>
101+
102+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
103+
href="./manage-spaces">
104+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
105+
管理您的Space
106+
</div><p class="text-gray-700">
107+
如何管理您的Space的硬件和配置?
108+
</p>
109+
</a>
110+
111+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
112+
href="./integrations">
113+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
114+
集成库
115+
</div><p class="text-gray-700">
116+
将库集成到 Hub 中意味着什么?如何实现?
117+
</p>
118+
</a>
119+
120+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
121+
href="./webhooks_server">
122+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
123+
Webhooks 服务器
124+
</div><p class="text-gray-700">
125+
如何创建一个接收 Webhooks 的服务器并将其部署为一个Space?
126+
</p>
127+
</a>
128+
129+
</div>
130+
</div>

0 commit comments

Comments
 (0)