Stars
VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social …
FeatHub - A stream-batch unified feature store for real-time machine learning
The Open Source Feature Store for Machine Learning
Open source platform for the machine learning lifecycle
by ex-googlers, for ex-googlers - a lookup table of similar tech & services
Flink CDC is a streaming data integration tool
Interesting readings and talks on computer science
The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
Code Samples for my Ververica Webinar "99 Ways to Enrich Streaming Data with Apache Flink"
Upserts, Deletes And Incremental Processing on Big Data.
The official home of the Presto distributed SQL query engine for big data
Docker containers for testing in scala
Apache Pinot - A realtime distributed OLAP datastore
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Notes on the book Clean Code - A Handbook of Agile Software Craftsmanship by Robert C. Martin
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
DataStax Connector for Apache Spark to Apache Cassandra
Apache Druid: a high performance real-time analytics database.
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
A very simple, sample, Akka HTTP RESTful service