Releases: bullet-db/bullet-core
Rate limiting for BufferingSubscriber, New functions: EXPLODE, LATERAL VIEW, NOT RLIKE, NOT RLIKE ANY, TRIM, ABS, BETWEEN, NOT BETWEEN, SUBSTRING, UNIXTIMESTAMP
This release adds a bunch of new functions to be supported through BQL:
EXPLODE - for exploding maps and lists into constituents in the SELECT or with a LATERAL VIEW
LATERAL VIEW - to be used with EXPLODE to generate a cross product with the exploded field and a row for all rows in a result
NOT RLIKE, NOT RLIKE ANY - the NOT versions of the supported RLIKE and RLIKE ANY
BETWEEN and NOT BETWEEN - two functions to check if a numeric or a string typed value is between two other values or fields
SUBSTRING - to get parts of a String with support for negative indexing
UNIXTIMESTAMP - to get the current UTC unix timestamp or to convert a given date argument or field into a unix timestamp with an optional pattern for parsing the date
It also adds rate limiting capabilities to the com.yahoo.bullet.pubsub.BufferingSubscriber so that the various PubSub usages of them can support rate limiting if needed. A new constructor that enables rate limiting and takes arguments for the number of messages per given time interval has been added to this class.
First release using Screwdriver
No new changes
First release on Maven Central - Bintray EOL
No changes
Yaml loading from jvyaml to snakeyaml
This release updates how Yaml is loaded in BulletConfig. We switched the library from jvyaml to snakeyaml. This is largely to allow parsing empty strings in values and support a more up-to-date yaml specification.
Storage layer updates
This release revamps the Storage layer to make it more adaptable to different storages. It breaks interface but since the storage layer was in pre-release and not used anywhere, this is just a minor version update for bullet-core. The various classes can be found under com.yahoo.bullet.core.storage.
- The
StorageManagerhas different interfaces to work with byte[], String and now takes a type bound to get implementation specific objects. StorageManagernow supports working with namespaces (to encapsulate multiple logical units of storage - think tables) and partitions per namespace. It supports a new Criteria interface to query and modify arbitrary storages that need interfaces beyond what the basic StorageManager provides.- Three reference implementations:
3.1NullStorageManager- use if no storage is needed.
3.2MemoryStorageManager- unpartitioned, namespace-less implementation for storing everything in memory
3.3MultiMemoryStorageManager- supports partitions and namespaces. An example criteria for countingMultiMemoryCountingCriteriais provided - A new StorageConfig class for storages similar to PubSubConfig for PubSubs
Ternary Logic
This release migrates to using Bullet Record 1.1.0 that support ternary logic. All boolean operations now work with a ternary or 3-value logic system with NULL as the third value. This follows the standard SQL semantics when comparing or operating on NULLs. There should be no interface changes but behaviors will be different when operating on NULLs.
First Major Release - Expressions, Storage, Async queries, no more JSON!
Bullet 1.0 is here! This release marks the first major version of Bullet.
The two high level changes that are happening with this release are:
-
No more JSON queries. Until now, queries were mainly parsed from JSON and were sent from web service to backend as JSON strings where they would be de-serialized, configured, and initialized (and validated). In this release, we move from using JSON serialization to constructing and sending query objects directly. (BQL will be the main method of query construction going from here.)
-
Expressions. Expressions enable first-order logic in virtually all parts of the Bullet query (besides Aggregations which still use field names). This allows us to filter on and project - and consequently aggregate on - more than just individual fields.
PubSub
- PubSubMessage
1.1. Content changed tobyte[]fromString; addedgetContentAsString()which replaces the oldgetContent()
1.2. Sequence number removed since it was not used - Metadata
2.1. Creation timestamp added. This is used in Querier as the query start time.
2.2.Metadata copy()added. This is expected to be overridden by other PubSubs. e.g. in the subclass RESTMetadata,copy()returns a RESTMetadata. - Publisher
3.1.PubSubMessage send(PubSubMessage),PubSubMessage send(String, String)in now return the sent PubSubMessage for any configured StorageManagers since the message may be modified. (Previously returned void)
New Interfaces
- PubSubResponder abstract class added and extended by any class that responds to a PubSubMessage.
1.1. PubSubResponder is used in Bullet Service 1.0.0 both synchronously and asynchronously.
1.2. BulletPubSubResponder implementation provided which publishes results to a configured PubSub - Added StorageManager abstract class which is used to store and retrieve PubSubMessages. Primarily used in Bullet Service for persistent queries (reference implementations landing soon!) sending queries and receiving results.
2.1. MemoryStorageManager implementation that stores objects in memory
2.2. NullStorageManager implementation that stores nothing (in the event you do not want to use a StorageManager)
Metrics (New)
- Added MetricEvent which is a simple wrapper that represents a metric event.
- Added MetricCollector which is a utility class for storing frequency and average metrics for string keys.
- Added MetricPublisher abstract class
3.1. MetricEventPublisher abstract class that extends MetricPublisher to publish MetricEvents.
3.2. HTTPMetricEventPublisher implementation of MetricEventPublisher which publishes MetricEvents to a given URL and can retry multiple times.
Expressions (New)
- Added Expressions and Evaluators. Expressions enable first-order logic in queries, and evaluators are constructed from expressions and evaluated on Bullet records.
1.1. Supported expressions: Field, Value, List, Unary, Binary, NAry, and Cast
1.2. Supported operations:- Arithmetic: +, -, *, /
- Comparators: =, !=, >, <, >=, <= (with ANY/ALL for value-to-list comparisons)
- Boolean logic: AND, OR, XOR
- Unary: NOT, SIZEOF, IS [NOT] NULL
- Binary: CONTAINSKEY, CONTAINVALUE, SIZEIS
- If-then-else: IF
- Regex LIKE: RLIKE
Query and Querier
- Projection and Filter now use expressions.
- Projection now explicitly supports three projection types:
COPY,NO_COPY, andPASS_THROUGHdenoting how fields should be projected.
2.1.COPY- the record is copied before new fields are projected, e.g. in the computation post-aggregation.
2.2.NO_COPY- fields are projected onto an empty record
2.3.PASS_THROUGH- the original record is passed on with no projection - Computation and OrderBy post-aggregations now use expressions.
- Added Having post-aggregation which is a filter applied after aggregation.
- Added Culling post-aggregation which removes specified fields. This takes the place of the implicit transient fields previously generated in Querier.
- Querier now takes a Query object. Consequently, there is no initialization or error-checking done in the Querier anymore.
- The query start time is now taken from metadata rather than set at Querier initialization; therefore, the query start time is set when the query is initially sent.
- Querier result metadata now includes the original query string in addition to the query object’s JSON.
- Queries are now error-checked in the constructors whereas previously, queries were initialized and validated after construction.
- Added Aggregation subclasses for the different types of aggregations.
- Renamed previous “parsing” package to “query” package since we no longer parse queries from JSON.
- Aggregation operations and strategies moved to “querying” package.
Miscellaneous
- Added a constructor
BulletError(String, String) - BulletException is now a RuntimeException and also takes a single BulletError rather than a list of errors.
- Removed Initializable interface
- Some strategies renamed
4.1. Raw to RawStrategy
4.2. TopK to FrequentItemsSketchingStrategy
4.3. CountDistinct to ThetaSketchingStrategy
4.4. GroupBy to TupleSketchingStrategy - ThetaSketch now rounds the resulting count
- Updated SimpleEqualityPartitioner to work with Expressions and TypedObjects
- Fixed a bug in SimpleEqualityPartitioner where the partitioner did not differentiate properly between missing fields and null fields.
- Typesystem moved to Bullet Record 1.0.0
- The default BulletRecordProvider class is now “com.yahoo.bullet.record.avro.TypedAvroBulletRecordProvider” from “com.yahoo.bullet.record.AvroBulletRecordProvider”
- Can now get a configured Bullet Record Schema from BulletConfig.
QueryManager partition cleanup
This release fixes a bug in the QueryManager, where a partition, once empty - all queries in it expired, would not be cleaned up. This was a leak and could eventually lead to unbounded growth of the partition key space. This could be seen in the stats where the PARTITION_COUNT stat would just monotonically increase.
QueryManager Logging Fixes
The logging in QueryManager was broken and did not print values for variables. This is now fixed.
Extended nested notation support for Projections
This updates Bullet Record to 0.3.0 and uses the forceSet to support #47 for Projections as well. Since forceSet can potentially be problematic, the Querier now handles the exception instead of crashing.
All queries can now include the full dot separated notation to access whichever field in a BulletRecord (all types supported). However, operations that only work on primitives (no equals on lists for example), still do.