Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
datasets		datasets
lib		lib
.gitignore		.gitignore
README.rdoc		README.rdoc
affinegap.c		affinegap.c
affinegap.pyx		affinegap.pyx
blocking.py		blocking.py
canonical_example.py		canonical_example.py
clustering.py		clustering.py
core.py		core.py
dedupe.py		dedupe.py
lr.py		lr.py
predicates.py		predicates.py
setup.cfg		setup.cfg
setup.py		setup.py
testaffine.py		testaffine.py
testclustering.py		testclustering.py
training_sample.py		training_sample.py

Repository files navigation

Deduplication Library¶ ↑

A free python library for accurate and scaleable deduplication and entity-resolution.

Based on Mikhail Yuryevich Bilenko’s Ph. D dissertation: Learnable Similarity Functions and their Application to Record Linkage and Clustering

Current solutions break easily, don’t scale, and require significant developer time. Our solution is robust, can handle a large volume of data, and can be trained by anyone.

Python Dependencies¶ ↑

numpy (numpy.scipy.org/)

Team¶ ↑

Forest Gregg
Derek Eder derek.eder@opencityapps.org

Usage¶ ↑

> python setup.py build_ext --inplace
> python dedupe.py
(use 'y', 'n' and 'u' keys to flag duplicates for active learning)

Other Executable Modules¶ ↑

blocking.py - loads in test data and finds optimum blocking predicates
canonical_example.py - loads in canonical restaurant test data and trains based on provided known duplicates. outputs precision and recall values
predicates.py - tests the functionality of defined predicates
training_sample.py - tests active learning with user input

Errors / Bugs¶ ↑

If something is not behaving intuitively, it is a bug, and should be reported. Report it here: github.com/open-city/dedupe/issues

Note on Patches/Pull Requests¶ ↑

Fork the project.
Make your feature addition or bug fix.
Send us a pull request. Bonus points for topic branches.

Copyright¶ ↑

See LICENSE for details github.com/open-city/dedupe/wiki/License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deduplication Library¶ ↑

Python Dependencies¶ ↑

Team¶ ↑

Usage¶ ↑

Other Executable Modules¶ ↑

Errors / Bugs¶ ↑

Note on Patches/Pull Requests¶ ↑

Copyright¶ ↑

About

Used by 334

Contributors 54

Languages

License

dedupeio/dedupe

Folders and files

Latest commit

History

Repository files navigation

Deduplication Library¶ ↑

Python Dependencies¶ ↑

Team¶ ↑

Usage¶ ↑

Other Executable Modules¶ ↑

Errors / Bugs¶ ↑

Note on Patches/Pull Requests¶ ↑

Copyright¶ ↑

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Used by 334

Contributors 54

Languages