data-cleaning

Implements the DMI imputation algorithm for imputing missing values in a dataset from Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques

java data data-mining analysis mining weka imputation data-analysis preprocessing data-cleaning datamining data-cleansing missing-values missing-value-imputation

Updated Aug 22, 2020
Java

grahman20 / CAIRAD

Star

CAIRAD class implements the CAIRAD techique for detecting noisy values in a dataset for Weka

java data-science machine-learning data-mining algorithm analytics artificial-intelligence data-analysis preprocessing data-cleaning noise-cancellation corrupt-data noisy-value-detection corrupt-data-detection

Updated Mar 24, 2023
Java

nadment / pdi-cleanse-plugin

Star

Extension step for Pentaho Data Integration

etl pentaho pdi kettle data-integration data-cleaning

Updated Aug 8, 2019
Java

P0bbn / Philly-Phelons

Star

A program to analyze crime and sports data in Philadelphia to determine if a correlation exists between whether a Philly team's win/loss to a given team has an effect on the city's violent crime rate. Conclusion--> If the Flyer's lose to the Redwings, stay out of Center City.

java gui libraries data-analysis data-cleaning csv-parsing performance-optimization narrative-design large-datasets

Updated May 19, 2022
Java

P0bbn / Flight-Data-Analysis

Star

A simple Java program to perform quantified data analysis on a very large dataset of air traffic information

java libraries data-analysis data-cleaning large-dataset

Updated Mar 2, 2023
Java

grahman20 / LFD

Star

LFD is a data-driven discretization technique that does not require any user input. LFD uses low frequency values as cut points and thus reduces the information loss due to discretization. It uses all other categorical attributes and any numerical attribute that has already been categorized.

java data-science machine-learning data-mining analytics attributes classification data-analysis preprocessing features variables discretization data-cleaning numerical categorization discretization-algorithm

Updated Mar 25, 2023
Java

mwilchek / Java-Programs-2017

Star

This repo is for new java programs I make in 2017.

java machine-learning data-cleaning

Updated May 24, 2017
Java

grahman20 / SiMI

Star

SiMI imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. Using the similarity and correlations, missing values are then imputed. To achieve a higher quality of imputation some segments are merged together using a novel approach.

data-science linear-regression dataset missing-data preprocessing data-cleaning decision-tree decision-tree-classifier missing-values decision-forest decision-forest-algorithm missing-value-handling missing-data-imputation missing-value-imputation numerical-missing-value categorical-missing-value

Updated Mar 24, 2023
Java

JM-Lab / jm-metric

Star

Preparing Data for Analytics

analytics reactive-streams data-cleaning

Updated Jul 30, 2022
Java

brandonjblank / JavaCSVFinancialParser

Star

A program that parses CSV bank statement files for data, removing non-vendor details from transaction description fields.

java finance data csv javafx data-analysis data-cleaning csv-parser transaction-descriptions

Updated Nov 1, 2019
Java

grahman20 / DMI

Star

DMI Class implements the DMI imputation algorithm for imputing missing values in a dataset from Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques

java data-science data data-mining analysis linear-regression weka imputation missing-data preprocessing missing expectation-maximization-algorithm data-cleaning decision-tree imputation-algorithm missing-value-treatment missing-value-handling missing-value-imputation

Updated Mar 24, 2023
Java

olusegunajibola / open-data-web-services

Star

Open Data and Web Services

java data data-mining open-data data-cleaning elipse

Updated Jun 27, 2022
Java

columbustech / value_normalizer

Star

Value Normalizer is a microservice which can be used to normalize values in the column of a csv file. This tool allows you to upload csv file to the server, select a column and then normalize it based on user feedback. This repository contains both backend code(normalizer) and the UI code (normalizer-ui) which can be hosted together or separatel…

normalizer data-wrangling data-normalization data-cleaning

Updated Jan 5, 2023
Java

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-cleaning

Here are 17 public repositories matching this topic...

bakdata / dedupe

ah89 / ApproxDC

malipramod / datapreprocessingsystem

manasiladdha / MachineLearningOnYelpDataset

zislam / DMI

grahman20 / CAIRAD

nadment / pdi-cleanse-plugin

P0bbn / Philly-Phelons

P0bbn / Flight-Data-Analysis

grahman20 / LFD

mwilchek / Java-Programs-2017

grahman20 / SiMI

JM-Lab / jm-metric

brandonjblank / JavaCSVFinancialParser

grahman20 / DMI

olusegunajibola / open-data-web-services

columbustech / value_normalizer

Improve this page

Add this topic to your repo