Sommelier

We are thrilled to present a new approach to analyze Big Data by accessing the smallest quantity of data possible: the Sommelier Sampling.

With the raise of Big Data and Analytics, the demand for optimization on the way data is handled is growing every day. Executing complex data analytics queries on ever increasing datasets costs time and money, access to data will become more expensive and will induce to memory walls, being sampling techniques the perfect solution.

Sampling is a powerful but also feared technique for approximating query answers. The main issue about sampling in large data platforms is that it does not offer sizable savings with only a small effect on the answer quality and the error estimation is still challenging.

Some experts have written reports to demonstrate its benefits, along with rules and calculations to measure error estimation. From this point, we propose to study and implement an open-sourced version of such studies.

Doc

In the spec.ipynb notebook we present a few examples to better understand the different types of samples that exist and when they should be used. We create synthetic datasets, normally distributed data and skewed data affects query results when sample techniques are used. We will see how the cardinality between data is key.

Discussion

We look forward for your feedback! Any project-structure convenience or doubt that you have, make sure to let us know! The issues are open.

Also, if you want to contribute, don't hesitate to contact with us!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
spec.ipynb		spec.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sommelier

Doc

Discussion

About

Releases

Packages

Contributors 3

Languages

License

bsc-dd/sommelier

Folders and files

Latest commit

History

Repository files navigation

Sommelier

Doc

Discussion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages