Skip to content

A robust NLP microservice powering Insightify. Leverages FastAPI and Transformers (RoBERTa) to provide real-time English & Indonesian sentiment analysis, batch CSV/Excel processing, and N-Gram keyword extraction.

License

Notifications You must be signed in to change notification settings

viochris/Insightify-Sentiment-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  Insightify API: Dual-Lingual Sentiment & N-Gram Engine

Python FastAPI HuggingFace Pandas HuggingFace Status

๐Ÿ“Œ Overview

Insightify Sentiment API is a robust NLP microservice capable of understanding context in both English and Indonesian.

Unlike basic sentiment tools, this engine goes beyond simple "Positive/Negative" labeling. It features a Batch Processing Pipeline that can digest entire Excel/CSV datasets, extract key insights using N-Gram Analysis, and calculate text complexity statistics in real-time. Powered by state-of-the-art RoBERTa Transformers, it offers high-accuracy inference for diverse use cases.

โœจ Key Features

๐ŸŒ Dual-Core Intelligence (EN & ID)

The system intelligently routes requests to specialized models:

  • English Core: Powered by cardiffnlp/twitter-roberta-base for nuance detection in global text.
  • Indonesian Core: Powered by w11wo/indonesian-roberta-base to accurately process local slang and formal Bahasa Indonesia.

๐Ÿ“Š Batch Insight Extraction (The "Deep Dive")

It doesn't just process rows; it analyzes the whole picture. When you upload a dataset:

  • Sentiment Classification: Labels every row with confidence scores.
  • Keyword Extraction: Uses N-Gram logic to find the most frequent phrases (e.g., "bad service", "pengiriman lambat") specific to a sentiment group.
  • Statistical Analysis: Calculates word count and sentence complexity averages per sentiment.

๐Ÿ›ก๏ธ Defensive Architecture

  • Lazy Loading: Models are loaded into memory only upon the first request to optimize resource usage.
  • Strict Validation: Prevents crashes by validating file types (.csv, .xlsx), encoding (UTF-8/Latin1), and empty inputs before processing.
  • Smart Error Handling: Returns detailed, actionable HTTP error codes (400 vs 500) so developers know exactly what went wrong.

๐Ÿ› ๏ธ Tech Stack

  • Framework: FastAPI (Asynchronous)
  • NLP Core: Hugging Face Transformers (PyTorch)
  • Data Processing: Pandas & NumPy
  • Deployment: Hugging Face Spaces (Dockerized)

๐Ÿš€ The Processing Pipeline

  1. Routing: Request hits specific endpoint (EN or ID).
  2. Ingestion:
    • Single Mode: Validates JSON string.
    • Batch Mode: Reads binary stream -> Converts to DataFrame -> Checks for "komentar" column.
  3. Inference: Text is passed through the tokenizer and RoBERTa model.
  4. Extraction (Batch Only):
    • Aggregates data by sentiment.
    • Runs N-Gram vectorization to find top keywords.
  5. Response: Returns a structured JSON containing predictions, statistics, and preview data.

๐Ÿ”Œ Integration Guide (API Contract)

Live Base URL

https://silvio0-simple-sentiment-analyst.hf.space

1. Single Text Analysis (Real-time)

Analyze a single sentence instantly.

  • Endpoints:
    • ๐Ÿ‡ฌ๐Ÿ‡ง English: /predict-sentiment/en
    • ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian: /predict-sentiment/id
  • Method: POST
  • Body (JSON):
    {
      "text_input": "Pelayanan sangat memuaskan dan cepat!"
    }
  • Response (JSON):
    {
      "prediction": "positive",
      "confidence": 0.98
    }

2. Batch File Analysis (Deep Insight)

Upload a file to analyze thousands of rows at once.

  • Endpoints:
    • ๐Ÿ‡ฌ๐Ÿ‡ง English: /predict-table-sentiment/en
    • ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian: /predict-table-sentiment/id
  • Method: POST
  • Body (Form-Data):
    • file: (Binary) Required. Only accepts .csv or .xlsx. Must contain a column named 'komentar'.
    • num: (Int) Number of keywords to extract. Default: 5 (Min: 1, Max: 10).
    • ngram_min: (Int) Min word phrase length. Default: 1 (Min: 1, Max: 3).
    • ngram_max: (Int) Max word phrase length. Default: 1 (Min: 1, Max: 3).
    • sentiment: (String) Context filter for extraction. Options: "positive", "negative", "neutral".
  • Response (JSON):
    {
      "status": "Success",
      "filename": "data_review.csv",
      "rows": 10,
      "data_preview": [
        {
          "komentar": "This app is really great!"
        },
        {
          "komentar": "Slow and often crashes, so annoying"
        }
      ],
      "predict_result": [
        {
          "komentar": "This app is really great!",
          "Sentiment": "positive",
          "Confidence": "98.8%",
          "Text Length": 1,
          "Word Length": 5
        },
        {
          "komentar": "Slow and often crashes, so annoying",
          "Sentiment": "negative",
          "Confidence": "93.6%",
          "Text Length": 1,
          "Word Length": 6
        },
        {
          "komentar": "The design is cool but sometimes it errors.",
          "Sentiment": "negative",
          "Confidence": "53.2%",
          "Text Length": 1,
          "Word Length": 8
        }
      ],
      "sentiment_count": [
        {
          "Sentiment": "negative",
          "count": 6
        },
        {
          "Sentiment": "positive",
          "count": 4
        }
      ],
      "top_keywords": [
        {
          "Word": "app",
          "Jumlah": 1
        },
        {
          "Word": "really",
          "Jumlah": 1
        },
        {
          "Word": "great",
          "Jumlah": 1
        },
        {
          "Word": "love",
          "Jumlah": 1
        },
        {
          "Word": "new",
          "Jumlah": 1
        }
      ],
      "text_length": [
        {
          "Sentiment": "negative",
          "Text Length": 1
        },
        {
          "Sentiment": "positive",
          "Text Length": 1
        }
      ],
      "word_length": [
        {
          "Sentiment": "positive",
          "Word Length": 6
        },
        {
          "Sentiment": "negative",
          "Word Length": 8
        }
      ]
    }

๐Ÿ“š Interactive Documentation (Swagger UI)

Don't write code to test. Use the built-in GUI:

  1. Access Docs: https://silvio0-simple-sentiment-analyst.hf.space/docs
  2. Select Endpoint: Choose between Single Text or Table Sentiment.
  3. Upload/Type: Input your data directly in the browser.
  4. Execute: See the full analysis JSON response immediately.

๐Ÿ“ฆ Local Installation

  1. Clone the Repository

    git clone https://github.com/viochris/Insightify-Sentiment-API.git
    cd Insightify-Sentiment-API
  2. Install Dependencies

    pip install -r requirements.txt
  3. Run the Server

    uvicorn main:app --reload

    Output: Uvicorn running on http://127.0.0.1:8000


Author: Silvio Christian, Joe "Turning raw text into actionable insights."

About

A robust NLP microservice powering Insightify. Leverages FastAPI and Transformers (RoBERTa) to provide real-time English & Indonesian sentiment analysis, batch CSV/Excel processing, and N-Gram keyword extraction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages