Data_Cleaning__financial_systems

Data de-duplication and record linkage in financial systems Record matching is an important process for data integration, reconciliation and data cleaning by de-duplication, is a task of identifying records within one or multiple databases that refer to the same entity. Duplicate records often do not share common key and contain erroneous data that makes record matching a demanding task. The objectives of this project is • Develop a technique using cocktail ap-proach to produce a record matching and data de-duplication technique for financial record systems. Today, large collections of financial records are stored in databases, which may contain multiple records that refer to same subjects full information can be built by combining all information referring to an entity. Simple string matching will not be a feasible option for detecting duplicate records because of the inconsistencies such as data entry errors, typographical errors, data in different formats and missing data. Record linkage algorithms are classified in two broad categories, a rule-based or heuristic approach or a probabilistic-based approach. In this project we use cocktail algorithm, that is, we will use rule-based and probabilistic algorithms both to get the best F-score and recall value. Since, is case of rule based approach domain knowledge is critical and often leads to issues if manually created, hence EM algorithms (Expectation Maximization) will be used to generate rules based on data itself. This model gives the best results.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
P1_mod1.py		P1_mod1.py
README.md		README.md
SN_slidingWindow.py		SN_slidingWindow.py
computemi.py		computemi.py
createBlocks.py		createBlocks.py
matching_rule_vector.py		matching_rule_vector.py
record-pair_matching.py		record-pair_matching.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data_Cleaning__financial_systems

About

Uh oh!

Releases

Packages

Languages

CodeHuman96/Data_Cleaning__financial_systems

Folders and files

Latest commit

History

Repository files navigation

Data_Cleaning__financial_systems

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages