Skip to content

Commit

Permalink
Merge pull request #147 from vmarkovtsev/master
Browse files Browse the repository at this point in the history
Add the missing information about the duplicates dataset
  • Loading branch information
vmarkovtsev authored Jul 31, 2019
2 parents b200f76 + 416a059 commit c7e9111
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions Duplicates/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,16 @@ print(len(ds.assignments))
print(len(ds.pairs))
```

### Origin

The choice of the files was designed in the included [notebooks](notebooks).

### Limitations

There were ~4 active human reviewers who did the labeling, they were from
the same company, and talked to each other. Hence there can be bias in the labels.
Code duplication is subjective, anyway.

### License

Code: MIT. Labels: Open Data Commons Open Database License (ODbL). Actual file contents © their authors.

0 comments on commit c7e9111

Please sign in to comment.