From 5c019f0327bc1870222bfd553887cfbfe8adeb7e Mon Sep 17 00:00:00 2001 From: Miniyahil Kebede Date: Mon, 12 Aug 2024 01:04:41 +0300 Subject: [PATCH] Project description added --- README.md | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 88760e4..d2d3d7b 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,39 @@ # [Tikvah](https://t.me/s/tikvahethiopia) Telegram channel analysis repo ! this repo analysis is done for **learning purpose** +## one of analysis dashboard + ![Image](./image.png) +### libraries used +- bs4 + +### install by using +```bash +pip install -r requirements.txt +``` + +### Preprocessing Steps Completed: + +- Fetched HTML data. +- Extracted data into JSON format. +- Filtered Amharic keywords, removing entries with: + - Emojis + - English characters + - Special characters + - Numbers +- Filtered out stop words. + +### change top_n (default:500) in top-words.py and run + +```bash +python top-words.py +``` + ## final data be like ```bash -let data = { + data = { "ሰዎች": 14109, "ከተማ": 10457, "ክልል": 9968, @@ -39,7 +66,7 @@ let data = { ... ``` -## you can also do extra processing and analysis, create pull request. +## you can also do extra processing and analysis and create pull request. --- ## if you like this repo, please give it the star. \ No newline at end of file