Skip to content

Latest commit

 

History

History
76 lines (61 loc) · 3.93 KB

File metadata and controls

76 lines (61 loc) · 3.93 KB

News-Articles-Sorting-using-MLOps - Ongoing

Overview

The News-Articles-Sorting project is a machine learning-based system that aims to automatically categorize and sort news articles into predefined categories. The project leverages natural language processing (NLP) techniques and supervised learning algorithms to achieve accurate classification of news articles.

This repository contains all the necessary code and resources to train and deploy the News-Articles-Sorting model. The project utilizes a labeled dataset of news articles, where each article is assigned to a specific category. By training a machine learning model on this dataset, the system can then classify unseen news articles into the appropriate categories.

Problem Statement:

In today’s world, data is power. With News companies having terabytes of data stored in servers, everyone is in the quest to discover insights that add value to the organization. With various examples to quote in which analytics is being used to drive actions, one that stands out is news article classification.

Nowadays on the Internet there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This way, the machine learning model for automated news classification could be used to identify topics of untracked news and/or make individual suggestions based on the user’s prior interests.

Approach

Techniques like clustering and associating rule-based algorithms can be applied to group together similar text. The ML algorithms learn the mapping function between the text and the tags based on already categorized data. Algorithms such as SVM, Neural Networks, Random Forest are commonly used for text classification.

Website

File structure

.
├── app_exception           # Custom exception
├── application_logging     # logging
├── data_given              # Given Data
├── data                    # raw / processed/ transformed data
├── saved_models            # classification model
├── report                  # model parameter and pipeline reports.
├── src                     # Source files for project implementation
├── webapp                  # ml web application
├── dvc.yaml                # data version control pipeline.
├── app.py                  # Flask backend
├── param.yaml              # parameters
├── requirements.txt
└── README.md

Dataset

BBC News

Dataset Description

  • BBC News Train.csv - the training set of 1490 records
  • BBC News Test.csv - the test set of 736 records

Data fields

  • ArticleId - Article id unique # given to the record
  • Article - text of the header and article
  • Category - cateogry of the article (tech, business, sport, entertainment, politics/li>

Features

  • Automatic classification of news articles into predefined categories.
  • Training and evaluation of machine learning models for article categorization.
  • Web-based interface for interacting with the system and classifying news articles.

Experiments

Contributing

Contributions to the News-Articles-Sorting project are welcome! If you would like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them with descriptive commit messages.
  4. Push your changes to your forked repository.
  5. Submit a pull request explaining your changes.

License

This project is licensed under the MIT License.

Contact

If you have any questions or suggestions regarding the project, please feel free to contact the project maintainer at Gmail