Java DSL for (online) deduplication
-
Updated
Feb 27, 2024 - Java
Java DSL for (online) deduplication
Preprocessing of data (e.g. filling missing values, normalization,etc.) in field of Data Mining (Knowledge Discovery).
Predict the degree of likeness for new yelp businesses using Machine Learning
Implements the DMI imputation algorithm for imputing missing values in a dataset from Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques
CAIRAD class implements the CAIRAD techique for detecting noisy values in a dataset for Weka
Extension step for Pentaho Data Integration
A program to analyze crime and sports data in Philadelphia to determine if a correlation exists between whether a Philly team's win/loss to a given team has an effect on the city's violent crime rate. Conclusion--> If the Flyer's lose to the Redwings, stay out of Center City.
A simple Java program to perform quantified data analysis on a very large dataset of air traffic information
LFD is a data-driven discretization technique that does not require any user input. LFD uses low frequency values as cut points and thus reduces the information loss due to discretization. It uses all other categorical attributes and any numerical attribute that has already been categorized.
This repo is for new java programs I make in 2017.
SiMI imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. Using the similarity and correlations, missing values are then imputed. To achieve a higher quality of imputation some segments are merged together using a novel approach.
A program that parses CSV bank statement files for data, removing non-vendor details from transaction description fields.
DMI Class implements the DMI imputation algorithm for imputing missing values in a dataset from Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques
Open Data and Web Services
Value Normalizer is a microservice which can be used to normalize values in the column of a csv file. This tool allows you to upload csv file to the server, select a column and then normalize it based on user feedback. This repository contains both backend code(normalizer) and the UI code (normalizer-ui) which can be hosted together or separatel…
Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."