|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "## 从Arxiv加载论文并进行摘要\n", |
| 7 | + "## Extract Key Information from Arxiv Pages\n", |
8 | 8 | "Arxiv网站上一篇《Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference》英文论文,其论文编号为:2501.12959。示例尝试加载这篇论文,并对其内容进行中文摘要。"
|
9 | 9 | ]
|
10 | 10 | },
|
11 | 11 | {
|
12 | 12 | "cell_type": "code",
|
13 |
| - "execution_count": 20, |
| 13 | + "execution_count": 30, |
14 | 14 | "metadata": {},
|
15 | 15 | "outputs": [],
|
16 | 16 | "source": [
|
|
31 | 31 | },
|
32 | 32 | {
|
33 | 33 | "cell_type": "code",
|
34 |
| - "execution_count": 21, |
| 34 | + "execution_count": 31, |
35 | 35 | "metadata": {},
|
36 | 36 | "outputs": [
|
37 | 37 | {
|
|
58 | 58 | },
|
59 | 59 | {
|
60 | 60 | "cell_type": "code",
|
61 |
| - "execution_count": 22, |
| 61 | + "execution_count": 45, |
62 | 62 | "metadata": {},
|
63 | 63 | "outputs": [],
|
64 | 64 | "source": [
|
65 |
| - "spliter = RecursiveCharacterTextSplitter(chunk_size=256, chunk_overlap=2)\n", |
| 65 | + "spliter = RecursiveCharacterTextSplitter(chunk_size=256, chunk_overlap=32)\n", |
66 | 66 | "texts = spliter.split_documents(docs)\n",
|
67 | 67 | "# pprint(texts)"
|
68 | 68 | ]
|
|
76 | 76 | },
|
77 | 77 | {
|
78 | 78 | "cell_type": "code",
|
79 |
| - "execution_count": 23, |
| 79 | + "execution_count": 46, |
80 | 80 | "metadata": {},
|
81 | 81 | "outputs": [
|
82 |
| - { |
83 |
| - "name": "stderr", |
84 |
| - "output_type": "stream", |
85 |
| - "text": [ |
86 |
| - "/var/folders/5x/c0q41fpx6l540lsl42_bzk5h0000gq/T/ipykernel_73163/2510094347.py:5: UserWarning: Parameters {'presence_penalty', 'top_p', 'frequency_penalty'} should be specified explicitly. Instead they were passed in as part of `model_kwargs` parameter.\n", |
87 |
| - " llm = get_model('openai')\n" |
88 |
| - ] |
89 |
| - }, |
90 | 82 | {
|
91 | 83 | "name": "stdout",
|
92 | 84 | "output_type": "stream",
|
93 | 85 | "text": [
|
94 |
| - "'本文提出了一种高效的、无需训练的提示压缩方法EHPC,通过评估头部在长文本输入中选择最重要的令牌,从而加速长文本推理。EHPC在两个主流基准测试中取得了最先进的结果,有效降低了商业API调用的复杂性和成本。与基于键值缓存的加速方法相比,EHPC具有竞争力,有望提高LLM在长文本任务中的效率。EHPC通过评估头部选择重要令牌,加速长文本推理,降低内存使用,并与KV缓存压缩方法竞争。EHPC在提示压缩基准测试上取得了新的最先进性能,降低了商业LLM的API成本和内存使用。'\n" |
| 86 | + "('本文提出了一种基于评估头(Evaluator '\n", |
| 87 | + " 'Heads)的高效提示压缩方法EHPC,用于加速长上下文Transformer推理。通过识别Transformer模型中特定的注意力头,EHPC能够在预填充阶段快速筛选出重要信息,仅保留关键token进行推理。该方法无需额外训练,显著降低了长上下文处理的计算成本和内存开销。实验表明,EHPC在主流基准测试中达到了最先进的性能,有效减少了商业API调用成本,并在长文本推理加速任务中表现出色。')\n" |
95 | 88 | ]
|
96 | 89 | }
|
97 | 90 | ],
|
98 | 91 | "source": [
|
99 | 92 | "doc_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
|
100 | 93 | "#文本拼接\n",
|
101 |
| - "content = lambda docs: \"\\n\\n\".join(doc.page_content for doc in docs) \n", |
102 |
| - "prompt = PromptTemplate.from_template(\"请使用中文总结以下内容,控制在140个字以内:\\n\\n{content}\")\n", |
103 |
| - "llm = get_model('openai')\n", |
| 94 | + "prompt = PromptTemplate.from_template(\"请使用中文总结以下内容,控制在140个字以内:{content}\")\n", |
| 95 | + "# 由于openai gpt-3.5-tubro 最大token数为16385,超出了文档的限制,此处使用deepseek模型\n", |
| 96 | + "llm = get_model('deepseek')\n", |
| 97 | + "# pprint(prompt.invoke('{input}'))\n", |
104 | 98 | "\n",
|
105 | 99 | "# 链\n",
|
106 | 100 | "chain = (\n",
|
107 |
| - " {\"content\": lambda docs: content(docs)}\n", |
| 101 | + " {\"content\": lambda docs: \"\\n\\n\".join(doc.page_content for doc in docs)}\n", |
108 | 102 | " | prompt\n",
|
109 | 103 | " | llm\n",
|
110 | 104 | " | StrOutputParser()\n",
|
111 | 105 | ")\n",
|
112 | 106 | "\n",
|
113 |
| - "pprint(chain.invoke(texts[:50]))\n" |
| 107 | + "pprint(chain.invoke(texts))\n" |
114 | 108 | ]
|
115 | 109 | },
|
116 | 110 | {
|
|
122 | 116 | },
|
123 | 117 | {
|
124 | 118 | "cell_type": "code",
|
125 |
| - "execution_count": 24, |
| 119 | + "execution_count": 34, |
126 | 120 | "metadata": {},
|
127 | 121 | "outputs": [
|
128 | 122 | {
|
|
0 commit comments