Skip to content

Commit 5fb48ba

Browse files
authored
Задача 1
1 parent dc55864 commit 5fb48ba

File tree

2 files changed

+170
-0
lines changed

2 files changed

+170
-0
lines changed
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
In comparison to dogs, cats have not undergone major changes during the domestication process.
2+
As cat simply catenates streams of bytes, it can be also used to concatenate binary files, where it will just concatenate sequence of bytes.
3+
A common interactive use of cat for a single file is to output the content of a file to standard output.
4+
Cats can hear sounds too faint or too high in frequency for human ears, such as those made by mice and other small animals.
5+
In one, people deliberately tamed cats in a process of artificial selection, as they were useful predators of vermin.
6+
The domesticated cat and its closest wild ancestor are both diploid organisms that possess 38 chromosomes and roughly 20,000 genes.
7+
Domestic cats are similar in size to the other members of the genus Felis, typically weighing between 4 and 5 kg (8.8 and 11.0 lb).
8+
However, if the output is piped or redirected, cat is unnecessary.
9+
cat with one named file is safer where human error is a concern - one wrong use of the default redirection symbol ">" instead of "<" (often adjacent on keyboards) may permanently delete the file you were just needing to read.
10+
In terms of legibility, a sequence of commands starting with cat and connected by pipes has a clear left-to-right flow of information.
11+
Cat command is one of the basic commands that you learned when you started in the Unix / Linux world.
12+
Using cat command, the lines received from stdin can be redirected to a new file using redirection symbols.
13+
When you type simply cat command without any arguments, it just receives the stdin content and displays it in the stdout.
14+
Leopard was released on October 26, 2007 as the successor of Tiger (version 10.4), and is available in two editions.
15+
According to Apple, Leopard contains over 300 changes and enhancements over its predecessor, Mac OS X Tiger.
16+
As of Mid 2010, some Apple computers have firmware factory installed which will no longer allow installation of Mac OS X Leopard.
17+
Since Apple moved to using Intel processors in their computers, the OSx86 community has developed and now also allows Mac OS X Tiger and later releases to be installed on non-Apple x86-based computers.
18+
OS X Mountain Lion was released on July 25, 2012 for purchase and download through Apple's Mac App Store, as part of a switch to releasing OS X versions online and every year.
19+
Apple has released a small patch for the three most recent versions of Safari running on OS X Yosemite, Mavericks, and Mountain Lion.
20+
The Mountain Lion release marks the second time Apple has offered an incremental upgrade, rather than releasing a new cat entirely.
21+
Mac OS X Mountain Lion installs in place, so you won't need to create a separate disk or run the installation off an external drive.
22+
The fifth major update to Mac OS X, Leopard, contains such a mountain of features - more than 300 by Apple's count.
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "2087a1a9",
6+
"metadata": {},
7+
"source": [
8+
"## Задача 1: Сравнение предложений\n",
9+
"Дан набор предложений, скопированных с Википедии. Каждое из них имеет \"кошачью тему\" в одном из трех смыслов:\n",
10+
"* кошки (животные)\n",
11+
"* UNIX-утилита cat для вывода содержимого файлов\n",
12+
"* версии операционной системы OS X, названные в честь семейства кошачьих \n",
13+
"\n",
14+
"Задача - найти два предложения, которые ближе всего по смыслу к расположенному в самой первой строке. В качестве меры близости по смыслу использовать косинусное расстояние.\n"
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": 191,
20+
"id": "d60fac0d",
21+
"metadata": {},
22+
"outputs": [
23+
{
24+
"name": "stdout",
25+
"output_type": "stream",
26+
"text": [
27+
"(22, 254)\n"
28+
]
29+
}
30+
],
31+
"source": [
32+
"import scipy as sp\n",
33+
"import numpy as np\n",
34+
"import re\n",
35+
"# Открываем файл и считываем все предложения переводя их в нижний регистр\n",
36+
"f = open('sentences.txt', 'r')\n",
37+
"t = f.read().lower()\n",
38+
"# Заводим список состоящий из всех предложений\n",
39+
"sentences = t.split(\"\\n\")\n",
40+
"# n - количество предложений (в конце - 1, потому что в конце файла стоит переход строки)\n",
41+
"n = len(t.split(\"\\n\")) - 1\n",
42+
"# Делим весь текст t на слова и убираем пустые, результат записываем в переменную b\n",
43+
"a = re.split('[^a-z]', t)\n",
44+
"b = []\n",
45+
"for word in a:\n",
46+
" if word == '':\n",
47+
" continue\n",
48+
" b.append(word)\n",
49+
"# Составляем словарь из всех различных слов, индексом каждого слова является число от 0 до (d - 1), где d - число различных слов в тексте\n",
50+
"dic = {}\n",
51+
"d = 0\n",
52+
"for word in b:\n",
53+
" if dic.get(word) == None:\n",
54+
" dic[word] = d\n",
55+
" d += 1\n",
56+
"# Создаем матрицу размером n на d\n",
57+
"matrix = np.zeros((n, d))\n",
58+
"print(matrix.shape)"
59+
]
60+
},
61+
{
62+
"cell_type": "code",
63+
"execution_count": 192,
64+
"id": "1659d59d",
65+
"metadata": {},
66+
"outputs": [],
67+
"source": [
68+
"# Функция считает количество слов word в предложении sen\n",
69+
"def cnt_of_word_in_sentence(sen, word):\n",
70+
" words_in = re.split('[^a-z]', sen)\n",
71+
" ans = 0\n",
72+
" for i in words_in:\n",
73+
" if i == word:\n",
74+
" ans += 1\n",
75+
" return ans\n",
76+
"# Заполняем матрицу: элемент с индексом (i, j) равен количеству вхождений j-го слова в i-е предложение\n",
77+
"for i in range(n):\n",
78+
" for k, j in dic.items():\n",
79+
" matrix[i, j] = cnt_of_word_in_sentence(sentences[i], k)"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": 193,
85+
"id": "ca122f11",
86+
"metadata": {},
87+
"outputs": [
88+
{
89+
"name": "stdout",
90+
"output_type": "stream",
91+
"text": [
92+
"1 : 0.9527544408738466\n",
93+
"2 : 0.8644738145642124\n",
94+
"3 : 0.8951715163278082\n",
95+
"4 : 0.7770887149698589\n",
96+
"5 : 0.9402385695332803\n",
97+
"6 : 0.7327387580875756\n",
98+
"7 : 0.9258750683338899\n",
99+
"8 : 0.8842724875284311\n",
100+
"9 : 0.9055088817476932\n",
101+
"10 : 0.8328165362273942\n",
102+
"11 : 0.8804771390665607\n",
103+
"12 : 0.8396432548525454\n",
104+
"13 : 0.8703592552895671\n",
105+
"14 : 0.8740118423302576\n",
106+
"15 : 0.9442721787424647\n",
107+
"16 : 0.8406361854220809\n",
108+
"17 : 0.956644501523794\n",
109+
"18 : 0.9442721787424647\n",
110+
"19 : 0.8885443574849294\n",
111+
"20 : 0.8427572744917122\n",
112+
"21 : 0.8250364469440588\n",
113+
"Answer is 4 and 6\n"
114+
]
115+
}
116+
],
117+
"source": [
118+
"# Находим косинусные расстояния от нулевой строки до всех остальных\n",
119+
"from scipy.spatial import distance\n",
120+
"for i in range(1, n):\n",
121+
" print(i,\":\", sp.spatial.distance.cosine(matrix[0],matrix[i]))\n",
122+
"# Индексы строк ближайших к нулевой по косинусному расстоянию: 4 и 6\n",
123+
"print(\"Answer is 4 and 6\")"
124+
]
125+
}
126+
],
127+
"metadata": {
128+
"kernelspec": {
129+
"display_name": "Python 3 (ipykernel)",
130+
"language": "python",
131+
"name": "python3"
132+
},
133+
"language_info": {
134+
"codemirror_mode": {
135+
"name": "ipython",
136+
"version": 3
137+
},
138+
"file_extension": ".py",
139+
"mimetype": "text/x-python",
140+
"name": "python",
141+
"nbconvert_exporter": "python",
142+
"pygments_lexer": "ipython3",
143+
"version": "3.9.7"
144+
}
145+
},
146+
"nbformat": 4,
147+
"nbformat_minor": 5
148+
}

0 commit comments

Comments
 (0)