Skip to content

Commit 4466eee

Browse files
committed
Create 5分钟用Python生成极品家丁小说词云.ipynb
1 parent 5bf3de7 commit 4466eee

File tree

1 file changed

+364
-0
lines changed

1 file changed

+364
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,364 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## 5分钟用Python生成极品家丁全本小说词云\n",
8+
"\n",
9+
"使用如下Python库:\n",
10+
"1. jieba:中文分词库,自然语言处理必备,https://github.com/fxsjy/jieba\n",
11+
"2. imageio:读取图片,或者写出图片,http://imageio.github.io\n",
12+
"3. wordcloud:词云生成,可以设定背景图片,https://github.com/amueller/word_cloud"
13+
]
14+
},
15+
{
16+
"cell_type": "code",
17+
"execution_count": 1,
18+
"metadata": {},
19+
"outputs": [],
20+
"source": [
21+
"import jieba\n",
22+
"import imageio\n",
23+
"import wordcloud"
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {},
29+
"source": [
30+
"## 1、将《极品家丁》小说文字分词"
31+
]
32+
},
33+
{
34+
"cell_type": "code",
35+
"execution_count": 2,
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"fin = open('jipinjiading.txt', encoding='utf-8')\n",
40+
"# 读取txt全部文字\n",
41+
"txt = fin.read()"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": 3,
47+
"metadata": {},
48+
"outputs": [
49+
{
50+
"data": {
51+
"text/plain": [
52+
"'\\ufeff《极品家丁》 / 作者:禹岩\\n简介:【本书阅读方法】参考《唐伯虎点秋香》!年轻的销售经理,因为一次意外经历,来到了一个完全不同的世界,成为萧家大宅里一名光荣的——家丁!暮晓春来迟先于百花知岁岁种桃树'"
53+
]
54+
},
55+
"execution_count": 3,
56+
"metadata": {},
57+
"output_type": "execute_result"
58+
}
59+
],
60+
"source": [
61+
"txt[:100]"
62+
]
63+
},
64+
{
65+
"cell_type": "code",
66+
"execution_count": 4,
67+
"metadata": {},
68+
"outputs": [
69+
{
70+
"name": "stderr",
71+
"output_type": "stream",
72+
"text": [
73+
"Building prefix dict from the default dictionary ...\n",
74+
"Loading model from cache C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\jieba.cache\n",
75+
"Loading model cost 0.703 seconds.\n",
76+
"Prefix dict has been built succesfully.\n"
77+
]
78+
}
79+
],
80+
"source": [
81+
"# 使用jieba分词,精确模式(还有全模式和搜索引擎模式)\n",
82+
"wordlist = jieba.lcut(txt)"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": 5,
88+
"metadata": {},
89+
"outputs": [
90+
{
91+
"data": {
92+
"text/plain": [
93+
"list"
94+
]
95+
},
96+
"execution_count": 5,
97+
"metadata": {},
98+
"output_type": "execute_result"
99+
}
100+
],
101+
"source": [
102+
"type(wordlist)"
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": 6,
108+
"metadata": {},
109+
"outputs": [
110+
{
111+
"data": {
112+
"text/plain": [
113+
"['\\ufeff', '《', '极品', '家丁', '》', ' ', '/', ' ', '作者', ':']"
114+
]
115+
},
116+
"execution_count": 6,
117+
"metadata": {},
118+
"output_type": "execute_result"
119+
}
120+
],
121+
"source": [
122+
"wordlist[:10]"
123+
]
124+
},
125+
{
126+
"cell_type": "code",
127+
"execution_count": 7,
128+
"metadata": {},
129+
"outputs": [],
130+
"source": [
131+
"string = \" \".join(wordlist)"
132+
]
133+
},
134+
{
135+
"cell_type": "markdown",
136+
"metadata": {},
137+
"source": [
138+
"## 2、读取图片-用于词云的背景图"
139+
]
140+
},
141+
{
142+
"cell_type": "code",
143+
"execution_count": 8,
144+
"metadata": {},
145+
"outputs": [],
146+
"source": [
147+
"image = imageio.imread(\"python-logo.jpg\")"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": 9,
153+
"metadata": {},
154+
"outputs": [
155+
{
156+
"data": {
157+
"text/plain": [
158+
"(600, 600, 3)"
159+
]
160+
},
161+
"execution_count": 9,
162+
"metadata": {},
163+
"output_type": "execute_result"
164+
}
165+
],
166+
"source": [
167+
"# 该图片是600行、600列的尺寸,每个相似点由3个数字组成\n",
168+
"image.shape"
169+
]
170+
},
171+
{
172+
"cell_type": "markdown",
173+
"metadata": {},
174+
"source": [
175+
"<div style=\"text-align:center\">Python-logo图片</div>\n",
176+
"<img style=\"width:200px;height:200px\" src=\"python-logo.jpg\"/>"
177+
]
178+
},
179+
{
180+
"cell_type": "code",
181+
"execution_count": 10,
182+
"metadata": {},
183+
"outputs": [
184+
{
185+
"data": {
186+
"text/plain": [
187+
"Array([255, 255, 255], dtype=uint8)"
188+
]
189+
},
190+
"execution_count": 10,
191+
"metadata": {},
192+
"output_type": "execute_result"
193+
}
194+
],
195+
"source": [
196+
"# 这三个数字组合起来代表一个颜色\n",
197+
"# 比如255、255、255是白色的\n",
198+
"image[0][0]"
199+
]
200+
},
201+
{
202+
"cell_type": "code",
203+
"execution_count": 11,
204+
"metadata": {},
205+
"outputs": [
206+
{
207+
"data": {
208+
"text/plain": [
209+
"Array([ 55, 113, 163], dtype=uint8)"
210+
]
211+
},
212+
"execution_count": 11,
213+
"metadata": {},
214+
"output_type": "execute_result"
215+
}
216+
],
217+
"source": [
218+
"# 图片顶部,中间的位置\n",
219+
"image[200][300]"
220+
]
221+
},
222+
{
223+
"cell_type": "markdown",
224+
"metadata": {},
225+
"source": [
226+
"<div style=\"text-align:center\">像素图片:</div>\n",
227+
"<div style=\"text-align:center\">在这个网站转换:https://www.sioe.cn/yingyong/yanse-rgb-16/</div>\n",
228+
"<img src=\"python-logo-pix01.png\"/>"
229+
]
230+
},
231+
{
232+
"cell_type": "code",
233+
"execution_count": 12,
234+
"metadata": {},
235+
"outputs": [
236+
{
237+
"data": {
238+
"text/plain": [
239+
"Array([255, 213, 69], dtype=uint8)"
240+
]
241+
},
242+
"execution_count": 12,
243+
"metadata": {},
244+
"output_type": "execute_result"
245+
}
246+
],
247+
"source": [
248+
"# 图片底部,中间的位置\n",
249+
"image[500][200]"
250+
]
251+
},
252+
{
253+
"cell_type": "markdown",
254+
"metadata": {},
255+
"source": [
256+
"<div style=\"text-align:center\">像素图片:</div>\n",
257+
"<div style=\"text-align:center\">在这个网站转换:https://www.sioe.cn/yingyong/yanse-rgb-16/</div>\n",
258+
"<img src=\"python-logo-pix02.png\"/>"
259+
]
260+
},
261+
{
262+
"cell_type": "markdown",
263+
"metadata": {},
264+
"source": [
265+
"## 3、生成词云图片"
266+
]
267+
},
268+
{
269+
"cell_type": "code",
270+
"execution_count": 13,
271+
"metadata": {},
272+
"outputs": [],
273+
"source": [
274+
"wc = wordcloud.WordCloud(width=600,\n",
275+
" height=600,\n",
276+
" background_color='white',\n",
277+
" # 指定字体路径,中文需要指定\n",
278+
" font_path='msyh.ttc',\n",
279+
" # mask 指定词云形状图片,默认为矩形\n",
280+
" mask=image,\n",
281+
" # 提高清晰度\n",
282+
" scale=15)"
283+
]
284+
},
285+
{
286+
"cell_type": "code",
287+
"execution_count": 14,
288+
"metadata": {},
289+
"outputs": [
290+
{
291+
"data": {
292+
"text/plain": [
293+
"<wordcloud.wordcloud.WordCloud at 0x186068755c8>"
294+
]
295+
},
296+
"execution_count": 14,
297+
"metadata": {},
298+
"output_type": "execute_result"
299+
}
300+
],
301+
"source": [
302+
"# 将string变量传入wc的generate()方法\n",
303+
"# 给词云输入文字\n",
304+
"wc.generate(string)"
305+
]
306+
},
307+
{
308+
"cell_type": "code",
309+
"execution_count": 15,
310+
"metadata": {},
311+
"outputs": [
312+
{
313+
"data": {
314+
"text/plain": [
315+
"<wordcloud.wordcloud.WordCloud at 0x186068755c8>"
316+
]
317+
},
318+
"execution_count": 15,
319+
"metadata": {},
320+
"output_type": "execute_result"
321+
}
322+
],
323+
"source": [
324+
"# 将词云图片导出到当前文件夹\n",
325+
"wc.to_file('result_wordcloud.png')"
326+
]
327+
},
328+
{
329+
"cell_type": "markdown",
330+
"metadata": {},
331+
"source": [
332+
"<img src=\"result_wordcloud.png\" style=\"width:400px;height:400px\">"
333+
]
334+
},
335+
{
336+
"cell_type": "code",
337+
"execution_count": null,
338+
"metadata": {},
339+
"outputs": [],
340+
"source": []
341+
}
342+
],
343+
"metadata": {
344+
"kernelspec": {
345+
"display_name": "Python 3",
346+
"language": "python",
347+
"name": "python3"
348+
},
349+
"language_info": {
350+
"codemirror_mode": {
351+
"name": "ipython",
352+
"version": 3
353+
},
354+
"file_extension": ".py",
355+
"mimetype": "text/x-python",
356+
"name": "python",
357+
"nbconvert_exporter": "python",
358+
"pygments_lexer": "ipython3",
359+
"version": "3.7.4"
360+
}
361+
},
362+
"nbformat": 4,
363+
"nbformat_minor": 2
364+
}

0 commit comments

Comments
 (0)