Skip to content

Chat Log Analyzer with TF-IDF & NLTK A Python script to analyze chat logs between User and AI. It extracts messages, preprocesses them using NLTK (tokenization, POS tagging, lemmatization), and computes TF-IDF scores to summarize the most relevant topics discussed

Notifications You must be signed in to change notification settings

asayem172153/chat_parse_from_txt_and_summarize

Repository files navigation

Chat Log Parser and Summarizer

A Python tool to analyze and summarize chat logs from .txt files using NLP techniques. Extracts key topics, message statistics, and generates summaries.

Features

  • Parse single or multiple chat log files
  • Extract speaker-specific messages (User/AI)
  • Identify main topics using TF-IDF and lemmatization
  • Generate summary statistics (message counts, keywords)

Prerequisites

  • Python 3.12.4
  • pip 24.0+

Setup

1. Create and activate virtual environment

python -m venv venv

# Windows:
venv\Scripts\activate

# Mac/Linux:
source venv/bin/activate

2. Install dependencies

pip install -r requirements.txt
python -m nltk.downloader stopwords wordnet punkt

Usage

1. For Single Chat File

python ai_chat_summarize_for_single_txt_file.py

Output Example:

Total messages: 4
User messages: 2
AI messages: 2

 Summary
 - The conversation had 15 exchanges
 - The user asked mainly about python and use
 - Most common keywords: python, use, hi, tell, sure

for_single_file

2. For Multiple Chat Files

python ai_chat_summarize_to_parse_all_txt_and_analysis.py

Output Example:

Total messages: 8
User messages: 4
AI messages: 4

Summary
The conversation had 26 exchanges
The user asked mainly about python and ai
Most common keywords: python, ai, data, hi, learn

mltiple_txt_parse

3. Jupyter Notebook Option

jupyter notebook AI_Chat_Log_Summarizer_multiple_txt_parse.ipynb

for_ipynb

Adding Screenshots

  1. Create an assets/ folder:
    mkdir assets
  2. Save screenshot (e.g., sample_output.png) in this folder

Project Structure

.
├── chat_log/                  # Folder for input chat logs (.txt files)
├── venv/                      # Virtual environment (ignored)
├── assets/                    # For screenshots and images
├── .gitignore
├── requirements.txt
├── README.md
├── ai_chat_summarize_for_single_txt_file.py
├── ai_chat_summarize_to_parse_all_txt_and_analysis.py
└── AI_Chat_Log_Summarizer_multiple_txt_parse.ipynb

Technical Details

  • Uses NLTK for tokenization and lemmatization
  • TF-IDF vectorization for keyword extraction
  • Regular expression pattern matching for message parsing:
    PATTERN = r'(User|AI):\s*(.*?)(?=\n*User:|\n*AI:|\$)'

Troubleshooting

  • If you get NLTK errors, re-run:
    python
    >>> import nltk
    >>> nltk.download('stopwords') 
    and so on (necessary libraries)
  • For virtual environment issues:
    deactivate

About

Chat Log Analyzer with TF-IDF & NLTK A Python script to analyze chat logs between User and AI. It extracts messages, preprocesses them using NLTK (tokenization, POS tagging, lemmatization), and computes TF-IDF scores to summarize the most relevant topics discussed

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published