Skip to content

Commit 3998b84

Browse files
debug
1 parent 69df883 commit 3998b84

File tree

9 files changed

+1036
-0
lines changed

9 files changed

+1036
-0
lines changed

emot_demo/Emoji_Dict.p

88.1 KB
Binary file not shown.

emot_demo/Emoticon_Dict.p

3.4 KB
Binary file not shown.

emot_demo/emoji_exploration.ipynb

Lines changed: 383 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,383 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "d7877b2d",
6+
"metadata": {},
7+
"source": [
8+
"### 文本处理之处理颜文字\n",
9+
"1. 可选处理:\n",
10+
" - 删除颜文字\n",
11+
" - 将实体的颜文字转成对应的单词\n",
12+
" - 将带情感的颜文字转成对应的 token\n",
13+
"2. Emoji 大全: http://www.unicode.org/emoji/charts/emoji-list.html\n",
14+
"3. 应用: \n",
15+
" - 帮助提高 情感分析的准确度\n",
16+
" - "
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": 111,
22+
"id": "fe8be4b1",
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"import re\n",
27+
"import csv\n",
28+
"import json\n",
29+
"import pickle\n",
30+
"import numpy as np\n",
31+
"from emoji import demojize\n",
32+
"from emot.emo_unicode import UNICODE_EMOJI, EMOJI_UNICODE, EMOTICONS_EMO\n",
33+
"# from deepmoji.sentence_tokenizer import SentenceTokenizer\n",
34+
"# from deepmoji.model_def import deepmoji_feature_encoding\n",
35+
"# from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH"
36+
]
37+
},
38+
{
39+
"cell_type": "markdown",
40+
"id": "7ae2bd55",
41+
"metadata": {},
42+
"source": [
43+
"### 使用 emot\n",
44+
"- Github: https://github.com/NeelShah18/emot\n",
45+
"- Tutorial: https://medium.com/geekculture/text-preprocessing-how-to-handle-emoji-emoticon-641bbfa6e9e7"
46+
]
47+
},
48+
{
49+
"cell_type": "code",
50+
"execution_count": 103,
51+
"id": "7c7b3db7",
52+
"metadata": {},
53+
"outputs": [],
54+
"source": [
55+
"def load_emoji_vocab():\n",
56+
" with open('Emoji_Dict.p', 'rb') as fp:\n",
57+
" return {v: k for k, v in pickle.load(fp).items()}\n",
58+
"\n",
59+
" \n",
60+
"def load_emoicon_vocab():\n",
61+
" with open('Emoticon_Dict.p', 'rb') as fp:\n",
62+
" return {v: k for k, v in pickle.load(fp).items()}\n",
63+
" \n",
64+
" \n",
65+
"def convert_emojis_to_word(text, emoji_mapping):\n",
66+
" \"\"\"\n",
67+
" 这效率有点低呀\n",
68+
" \"\"\"\n",
69+
" for emot in emoji_mapping:\n",
70+
" text = re.sub(r'('+emot+')', \"_\".join(emoji_mapping[emot].replace(\",\",\"\").replace(\":\",\"\").split()), text)\n",
71+
" return text\n",
72+
"\n",
73+
"\n",
74+
"def convert_emojis_to_description(text):\n",
75+
" for emj in UNICODE_EMOJI:\n",
76+
" text = text.replace(emj, UNICODE_EMOJI[emj])\n",
77+
" return text\n",
78+
"\n",
79+
"\n",
80+
"def convert_emoticons_to_description(text):\n",
81+
" \"\"\"\n",
82+
" 将 文字表情转为 描述, 比如 :-)\n",
83+
" \"\"\"\n",
84+
" for emi in EMOTICONS_EMO:\n",
85+
" label = EMOTICONS_EMO[emi].replace(\" \", \"_\").replace(\",\", \"_\").lower()\n",
86+
" text = text.replace(emi, f\":{label}:\")\n",
87+
" return text\n",
88+
"\n",
89+
"\n",
90+
"emoji_mapping = load_emoji_vocab()\n",
91+
"emoicon_mapping = load_emoicon_vocab()"
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": 104,
97+
"id": "2ccdaad7",
98+
"metadata": {},
99+
"outputs": [
100+
{
101+
"data": {
102+
"text/plain": [
103+
"'I won 1st_place_medal in cricket'"
104+
]
105+
},
106+
"execution_count": 104,
107+
"metadata": {},
108+
"output_type": "execute_result"
109+
}
110+
],
111+
"source": [
112+
"convert_emojis_to_word(\"I won 🥇 in 🏏\", emoji_mapping)"
113+
]
114+
},
115+
{
116+
"cell_type": "code",
117+
"execution_count": 105,
118+
"id": "d7d5e47a",
119+
"metadata": {},
120+
"outputs": [
121+
{
122+
"data": {
123+
"text/plain": [
124+
"'I like to eat pizza'"
125+
]
126+
},
127+
"execution_count": 105,
128+
"metadata": {},
129+
"output_type": "execute_result"
130+
}
131+
],
132+
"source": [
133+
"convert_emojis_to_word(\"I like to eat 🍕\", emoji_mapping)"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": 106,
139+
"id": "e01efadc",
140+
"metadata": {},
141+
"outputs": [
142+
{
143+
"data": {
144+
"text/plain": [
145+
"'Hilarious :face_with_tears_of_joy:. The feeling of making a sale :smiling_face_with_sunglasses:, The feeling of actually fulfilling orders :unamused_face:'"
146+
]
147+
},
148+
"execution_count": 106,
149+
"metadata": {},
150+
"output_type": "execute_result"
151+
}
152+
],
153+
"source": [
154+
"convert_emojis_to_placeholder(\"Hilarious 😂. The feeling of making a sale 😎, The feeling of actually fulfilling orders 😒\")"
155+
]
156+
},
157+
{
158+
"cell_type": "code",
159+
"execution_count": 108,
160+
"id": "fab11f92",
161+
"metadata": {},
162+
"outputs": [
163+
{
164+
"data": {
165+
"text/plain": [
166+
"'Hello :happy_face_smiley: :happy_face_smiley:'"
167+
]
168+
},
169+
"execution_count": 108,
170+
"metadata": {},
171+
"output_type": "execute_result"
172+
}
173+
],
174+
"source": [
175+
"convert_emoticons_to_description(\"Hello :-) :-)\")"
176+
]
177+
},
178+
{
179+
"cell_type": "code",
180+
"execution_count": 102,
181+
"id": "93f7cc0c",
182+
"metadata": {},
183+
"outputs": [
184+
{
185+
"data": {
186+
"text/plain": [
187+
"True"
188+
]
189+
},
190+
"execution_count": 102,
191+
"metadata": {},
192+
"output_type": "execute_result"
193+
}
194+
],
195+
"source": [
196+
"EMOJI_UNICODE[':face_with_tears_of_joy:'] == '😂'"
197+
]
198+
},
199+
{
200+
"cell_type": "code",
201+
"execution_count": 51,
202+
"id": "8831eabc",
203+
"metadata": {},
204+
"outputs": [
205+
{
206+
"data": {
207+
"text/plain": [
208+
"'Nice app:‑:face_with_hand_over_mouth: I like it'"
209+
]
210+
},
211+
"execution_count": 51,
212+
"metadata": {},
213+
"output_type": "execute_result"
214+
}
215+
],
216+
"source": [
217+
"convert_emojis_to_placeholder(\"Nice app:‑🤭 I like it\")"
218+
]
219+
},
220+
{
221+
"cell_type": "markdown",
222+
"id": "a6458398",
223+
"metadata": {},
224+
"source": [
225+
"### 使用 emoji\n",
226+
"- Github: https://github.com/carpedm20/emoji"
227+
]
228+
},
229+
{
230+
"cell_type": "code",
231+
"execution_count": 53,
232+
"id": "bc33d4f9",
233+
"metadata": {},
234+
"outputs": [
235+
{
236+
"data": {
237+
"text/plain": [
238+
"'I won :1st_place_medal: in :cricket_bat_and_ball:'"
239+
]
240+
},
241+
"execution_count": 53,
242+
"metadata": {},
243+
"output_type": "execute_result"
244+
}
245+
],
246+
"source": [
247+
"demojize(\"I won 🥇 in 🏏\", emoji_mapping)"
248+
]
249+
},
250+
{
251+
"cell_type": "code",
252+
"execution_count": 54,
253+
"id": "01a530d6",
254+
"metadata": {},
255+
"outputs": [
256+
{
257+
"data": {
258+
"text/plain": [
259+
"'I like to eat :pizza:'"
260+
]
261+
},
262+
"execution_count": 54,
263+
"metadata": {},
264+
"output_type": "execute_result"
265+
}
266+
],
267+
"source": [
268+
"demojize(\"I like to eat 🍕\", emoji_mapping)"
269+
]
270+
},
271+
{
272+
"cell_type": "code",
273+
"execution_count": 55,
274+
"id": "254c5d61",
275+
"metadata": {},
276+
"outputs": [
277+
{
278+
"data": {
279+
"text/plain": [
280+
"'Hilarious :face_with_tears_of_joy:. The feeling of making a sale :smiling_face_with_sunglasses:, The feeling of actually fulfilling orders :unamused_face:'"
281+
]
282+
},
283+
"execution_count": 55,
284+
"metadata": {},
285+
"output_type": "execute_result"
286+
}
287+
],
288+
"source": [
289+
"demojize(\"Hilarious 😂. The feeling of making a sale 😎, The feeling of actually fulfilling orders 😒\")"
290+
]
291+
},
292+
{
293+
"cell_type": "code",
294+
"execution_count": 56,
295+
"id": "5eac81f2",
296+
"metadata": {},
297+
"outputs": [
298+
{
299+
"data": {
300+
"text/plain": [
301+
"'Nice app:‑:face_with_hand_over_mouth: I like it'"
302+
]
303+
},
304+
"execution_count": 56,
305+
"metadata": {},
306+
"output_type": "execute_result"
307+
}
308+
],
309+
"source": [
310+
"demojize(\"Nice app:‑🤭 I like it\")"
311+
]
312+
},
313+
{
314+
"cell_type": "code",
315+
"execution_count": 109,
316+
"id": "8fb8a455",
317+
"metadata": {},
318+
"outputs": [
319+
{
320+
"data": {
321+
"text/plain": [
322+
"'Hello :-) :-)'"
323+
]
324+
},
325+
"execution_count": 109,
326+
"metadata": {},
327+
"output_type": "execute_result"
328+
}
329+
],
330+
"source": [
331+
"demojize(\"Hello :-) :-)\")"
332+
]
333+
},
334+
{
335+
"cell_type": "markdown",
336+
"id": "d5e815cf",
337+
"metadata": {},
338+
"source": [
339+
"### 使用 DeepMoji\n",
340+
"- official site: https://deepmoji.mit.edu/\n",
341+
"- Github: https://github.com/bfelbo/DeepMoji/tree/master/examples\n",
342+
"- Tutorial: https://medium.com/@bjarkefelbo/what-can-we-learn-from-emojis-6beb165a5ea0\n"
343+
]
344+
},
345+
{
346+
"cell_type": "markdown",
347+
"id": "774a6d95",
348+
"metadata": {},
349+
"source": [
350+
"Use DeepMoji to encode texts into emotional feature vectors."
351+
]
352+
},
353+
{
354+
"cell_type": "code",
355+
"execution_count": null,
356+
"id": "8dfebac2",
357+
"metadata": {},
358+
"outputs": [],
359+
"source": []
360+
}
361+
],
362+
"metadata": {
363+
"kernelspec": {
364+
"display_name": "Python 3 (ipykernel)",
365+
"language": "python",
366+
"name": "python3"
367+
},
368+
"language_info": {
369+
"codemirror_mode": {
370+
"name": "ipython",
371+
"version": 3
372+
},
373+
"file_extension": ".py",
374+
"mimetype": "text/x-python",
375+
"name": "python",
376+
"nbconvert_exporter": "python",
377+
"pygments_lexer": "ipython3",
378+
"version": "3.10.2"
379+
}
380+
},
381+
"nbformat": 4,
382+
"nbformat_minor": 5
383+
}

0 commit comments

Comments
 (0)