🧠 Insightify API: Dual-Lingual Sentiment & N-Gram Engine

📌 Overview

Insightify Sentiment API is a robust NLP microservice capable of understanding context in both English and Indonesian.

Unlike basic sentiment tools, this engine goes beyond simple "Positive/Negative" labeling. It features a Batch Processing Pipeline that can digest entire Excel/CSV datasets, extract key insights using N-Gram Analysis, and calculate text complexity statistics in real-time. Powered by state-of-the-art RoBERTa Transformers, it offers high-accuracy inference for diverse use cases.

✨ Key Features

🌐 Dual-Core Intelligence (EN & ID)

The system intelligently routes requests to specialized models:

English Core: Powered by cardiffnlp/twitter-roberta-base for nuance detection in global text.
Indonesian Core: Powered by w11wo/indonesian-roberta-base to accurately process local slang and formal Bahasa Indonesia.

📊 Batch Insight Extraction (The "Deep Dive")

It doesn't just process rows; it analyzes the whole picture. When you upload a dataset:

Sentiment Classification: Labels every row with confidence scores.
Keyword Extraction: Uses N-Gram logic to find the most frequent phrases (e.g., "bad service", "pengiriman lambat") specific to a sentiment group.
Statistical Analysis: Calculates word count and sentence complexity averages per sentiment.

🛡️ Defensive Architecture

Lazy Loading: Models are loaded into memory only upon the first request to optimize resource usage.
Strict Validation: Prevents crashes by validating file types (.csv, .xlsx), encoding (UTF-8/Latin1), and empty inputs before processing.
Smart Error Handling: Returns detailed, actionable HTTP error codes (400 vs 500) so developers know exactly what went wrong.

🛠️ Tech Stack

Framework: FastAPI (Asynchronous)
NLP Core: Hugging Face Transformers (PyTorch)
Data Processing: Pandas & NumPy
Deployment: Hugging Face Spaces (Dockerized)

🚀 The Processing Pipeline

Routing: Request hits specific endpoint (EN or ID).
Ingestion:
- Single Mode: Validates JSON string.
- Batch Mode: Reads binary stream -> Converts to DataFrame -> Checks for "komentar" column.
Inference: Text is passed through the tokenizer and RoBERTa model.
Extraction (Batch Only):
- Aggregates data by sentiment.
- Runs N-Gram vectorization to find top keywords.
Response: Returns a structured JSON containing predictions, statistics, and preview data.

🔌 Integration Guide (API Contract)

Live Base URL

https://silvio0-simple-sentiment-analyst.hf.space

1. Single Text Analysis (Real-time)

Analyze a single sentence instantly.

Endpoints:
- 🇬🇧 English: /predict-sentiment/en
- 🇮🇩 Indonesian: /predict-sentiment/id
Method: POST

Body (JSON):

{
  "text_input": "Pelayanan sangat memuaskan dan cepat!"
}

Response (JSON):

{
  "prediction": "positive",
  "confidence": 0.98
}

2. Batch File Analysis (Deep Insight)

Upload a file to analyze thousands of rows at once.

Endpoints:
- 🇬🇧 English: /predict-table-sentiment/en
- 🇮🇩 Indonesian: /predict-table-sentiment/id
Method: POST
Body (Form-Data):
- file: (Binary) Required. Only accepts .csv or .xlsx. Must contain a column named 'komentar'.
- num: (Int) Number of keywords to extract. Default: 5 (Min: 1, Max: 10).
- ngram_min: (Int) Min word phrase length. Default: 1 (Min: 1, Max: 3).
- ngram_max: (Int) Max word phrase length. Default: 1 (Min: 1, Max: 3).
- sentiment: (String) Context filter for extraction. Options: "positive", "negative", "neutral".

Response (JSON):

{
  "status": "Success",
  "filename": "data_review.csv",
  "rows": 10,
  "data_preview": [
    {
      "komentar": "This app is really great!"
    },
    {
      "komentar": "Slow and often crashes, so annoying"
    }
  ],
  "predict_result": [
    {
      "komentar": "This app is really great!",
      "Sentiment": "positive",
      "Confidence": "98.8%",
      "Text Length": 1,
      "Word Length": 5
    },
    {
      "komentar": "Slow and often crashes, so annoying",
      "Sentiment": "negative",
      "Confidence": "93.6%",
      "Text Length": 1,
      "Word Length": 6
    },
    {
      "komentar": "The design is cool but sometimes it errors.",
      "Sentiment": "negative",
      "Confidence": "53.2%",
      "Text Length": 1,
      "Word Length": 8
    }
  ],
  "sentiment_count": [
    {
      "Sentiment": "negative",
      "count": 6
    },
    {
      "Sentiment": "positive",
      "count": 4
    }
  ],
  "top_keywords": [
    {
      "Word": "app",
      "Jumlah": 1
    },
    {
      "Word": "really",
      "Jumlah": 1
    },
    {
      "Word": "great",
      "Jumlah": 1
    },
    {
      "Word": "love",
      "Jumlah": 1
    },
    {
      "Word": "new",
      "Jumlah": 1
    }
  ],
  "text_length": [
    {
      "Sentiment": "negative",
      "Text Length": 1
    },
    {
      "Sentiment": "positive",
      "Text Length": 1
    }
  ],
  "word_length": [
    {
      "Sentiment": "positive",
      "Word Length": 6
    },
    {
      "Sentiment": "negative",
      "Word Length": 8
    }
  ]
}

📚 Interactive Documentation (Swagger UI)

Don't write code to test. Use the built-in GUI:

Access Docs: https://silvio0-simple-sentiment-analyst.hf.space/docs
Select Endpoint: Choose between Single Text or Table Sentiment.
Upload/Type: Input your data directly in the browser.
Execute: See the full analysis JSON response immediately.

📦 Local Installation

Clone the Repository

git clone https://github.com/viochris/Insightify-Sentiment-API.git
cd Insightify-Sentiment-API

Install Dependencies
```
pip install -r requirements.txt
```
Run the Server
```
uvicorn main:app --reload
```
Output: Uvicorn running on http://127.0.0.1:8000

Author: Silvio Christian, Joe "Turning raw text into actionable insights."

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
sample_data		sample_data
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Insightify API: Dual-Lingual Sentiment & N-Gram Engine

📌 Overview

✨ Key Features

🌐 Dual-Core Intelligence (EN & ID)

📊 Batch Insight Extraction (The "Deep Dive")

🛡️ Defensive Architecture

🛠️ Tech Stack

🚀 The Processing Pipeline

🔌 Integration Guide (API Contract)

Live Base URL

1. Single Text Analysis (Real-time)

2. Batch File Analysis (Deep Insight)

📚 Interactive Documentation (Swagger UI)

📦 Local Installation

About

Uh oh!

Releases

Packages

Languages

License

viochris/Insightify-Sentiment-API

Folders and files

Latest commit

History

Repository files navigation

🧠 Insightify API: Dual-Lingual Sentiment & N-Gram Engine

📌 Overview

✨ Key Features

🌐 Dual-Core Intelligence (EN & ID)

📊 Batch Insight Extraction (The "Deep Dive")

🛡️ Defensive Architecture

🛠️ Tech Stack

🚀 The Processing Pipeline

🔌 Integration Guide (API Contract)

Live Base URL

1. Single Text Analysis (Real-time)

2. Batch File Analysis (Deep Insight)

📚 Interactive Documentation (Swagger UI)

📦 Local Installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages