Skip to content

Commit

Permalink
feat: BigTable online store (#3140)
Browse files Browse the repository at this point in the history
* Initial implementation of BigTable online store.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Attempt to run bigtable integration tests.

Currently focusing on just getting the tests running locally. I've only
build python3.8 requirements.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Got the BigTable tests running in local containers

Signed-off-by: Abhin Chhabra <chhabra.abhin@gmail.com>

* Set serialization version when computing entity ID

Signed-off-by: Abhin Chhabra <chhabra.abhin@gmail.com>

* Switch to the recommended layout in bigtable.

This was recommended by the BigTable dev team. Details of this layout
will be added to the documentation in a future commit.

Signed-off-by: Abhin Chhabra <chhabra.abhin@gmail.com>

* Minor bugfixes.

- If a row is empty when fetching data, don't process it more.
- If a task in the threadpool fails, bubble up that failure.
- If a `created_ts` is not available, use an empty string. `None` does
  not automatically serialize to bytes.

Signed-off-by: Abhin Chhabra <chhabra.abhin@gmail.com>

* Move BigTable online store out of contrib

As per feedback on the PR.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Attempt to run integration tests in CI.

Provide the GCP project and the bigtable instance ID for the tests to
connect to.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Delete tables for entity-less feature views.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Table names should be smaller than 50 characters

This is BigTable's table length limit and it's causing test failures.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Optimize bigtable reads.

- Fetch all the rows in one bigtable fetch.
- Get only the columns that are necessary (using a column regex filter).

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* dynamodb: switch to `mock_dynamodb`

The latest rebuilding of requirements has upgraded the `moto` library
past the `4.0.0` release, which has a couple of breaking changes.
Specifically, the `mock_dynamodb2` decorator has been deprecated. See
https://github.com/spulec/moto/blob/master/CHANGELOG.md#400 for more
details.

The actual PR (getmoto/moto#4919) mentions that
it's because the `mock_dynamodb` decorator is now equivalent to the
`mock_dynamodb2` decorator.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* minor: rename `BigTable` to `Bigtable`

This matches the GCP docs.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Wrote some Bigtable documentation.

Closely mirrors the docs for the other online stores.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Bugfix: Deal with missing row keys.

It looks like the bigtable client will just skip over non-existent row
keys.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Fix linting issues.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Generate requirements files.

- As of version `1.49`, the various python packages in the [grpc
  repo](https://github.com/grpc/grpc/tree/master/src/python) require
  `protobuf>=4.21.3`. Unfortunately, this is incompatible with all
  versions of `tensorflow-metadata` (see [this
  issue](tensorflow/metadata#37)). And since
  `piptools` doesn't backtrack during dependency resolution, the
  requirement files cannot be regenerated without adding an upper limit
  on these grpc libraries directly in `setup.py`.
- The previous attempt to upgrade usages of the `mock_dynamodb2`
  decorator to the newest version failed. Since I'm not an expert in
  dynamodb, it made sense to just cap the test tool to the version
  already being used in CI.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Don't bother materializing created timestamp.

Had a discussion with Danny about whether it's useful to copy this
column. He agreed that there's no value to storing this in the online
store.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Remove `tensorflow-metadata`.

Turns out that this dependency is not required. We removed all
references to it in [this
PR](#2063), but did not remove it
from `setup.py`. Removing it has caused many of the restrictions imposed
in previous commits to be unnecessary.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* Minor fix to Bigtable documentation.

Feedback from Danny mentioned that Bigtable should be able to store
multiple versions of the same key and fetch the latest at read time.
This makes sense and means that concurrent writes should work just fine.

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>

* update roadmap docs

Signed-off-by: Danny Chiao <danny@tecton.ai>

* Fix roadmap doc

Signed-off-by: Danny Chiao <danny@tecton.ai>

* Change link to point to roadmap page

Signed-off-by: Danny Chiao <danny@tecton.ai>

* change order in roadmap

Signed-off-by: Danny Chiao <danny@tecton.ai>

Signed-off-by: Abhin Chhabra <abhin.chhabra@shopify.com>
Signed-off-by: Abhin Chhabra <chhabra.abhin@gmail.com>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Co-authored-by: Danny Chiao <danny@tecton.ai>
  • Loading branch information
chhabrakadabra and adchia authored Oct 5, 2022
1 parent b9b9c54 commit 6bc91c2
Show file tree
Hide file tree
Showing 19 changed files with 900 additions and 407 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,12 +173,12 @@ The list below contains the functionality that contributors are planning to deve
* [x] [DynamoDB](https://docs.feast.dev/reference/online-stores/dynamodb)
* [x] [Redis](https://docs.feast.dev/reference/online-stores/redis)
* [x] [Datastore](https://docs.feast.dev/reference/online-stores/datastore)
* [x] [Bigtable](https://docs.feast.dev/reference/online-stores/bigtable)
* [x] [SQLite](https://docs.feast.dev/reference/online-stores/sqlite)
* [x] [Azure Cache for Redis (community plugin)](https://github.com/Azure/feast-azure)
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
* [x] [Cassandra / AstraDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/cassandra)
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/adding-support-for-a-new-online-store)
* [x] [Cassandra / AstraDB](https://docs.feast.dev/reference/online-stores/cassandra)
* [ ] Bigtable (in progress)
* **Feature Engineering**
* [x] On-demand Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))
* [x] Streaming Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1UzEyETHUaGpn0ap4G82DHluiCj7zEbrQLkJJkKSv4e8/edit))
Expand Down
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@
* [Redis](reference/online-stores/redis.md)
* [Datastore](reference/online-stores/datastore.md)
* [DynamoDB](reference/online-stores/dynamodb.md)
* [Bigtable](reference/online-stores/bigtable.md)
* [PostgreSQL (contrib)](reference/online-stores/postgres.md)
* [Cassandra + Astra DB (contrib)](reference/online-stores/cassandra.md)
* [MySQL (contrib)](reference/online-stores/mysql.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/third-party-integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Don't see your offline store or online store of choice here? Check out our guide

## Integrations

See [Functionality and Roadmap](../../#-functionality-and-roadmap)
See [Functionality and Roadmap](../roadmap.md)

## Standards

Expand Down
4 changes: 4 additions & 0 deletions docs/reference/online-stores/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ Please see [Online Store](../../getting-started/architecture-and-components/onli
[dynamodb.md](dynamodb.md)
{% endcontent-ref %}

{% content-ref url="bigtable.md" %}
[bigtable.md](mysql.md)
{% endcontent-ref %}

{% content-ref url="postgres.md" %}
[postgres.md](postgres.md)
{% endcontent-ref %}
Expand Down
56 changes: 56 additions & 0 deletions docs/reference/online-stores/bigtable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Bigtable online store

## Description

The [Bigtable](https://cloud.google.com/bigtable) online store provides support for
materializing feature values into Cloud Bigtable. The data model used to store feature
values in Bigtable is described in more detail
[here](../../specs/online_store_format.md#google-bigtable-online-store-format).

## Getting started

In order to use this online store, you'll need to run `pip install 'feast[gcp]'`. You
can then get started with the command `feast init REPO_NAME -t gcp`.

## Example

{% code title="feature_store.yaml" %}
```yaml
project: my_feature_repo
registry: data/registry.db
provider: gcp
online_store:
type: bigtable
project_id: my_gcp_project
instance: my_bigtable_instance
```
{% endcode %}
The full set of configuration options is available in
[BigtableOnlineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.online_stores.bigtable.BigtableOnlineStoreConfig).
## Functionality Matrix
The set of functionality supported by online stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the Bigtable online store.
| | Bigtable |
|-----------------------------------------------------------|----------|
| write feature values to the online store | yes |
| read feature values from the online store | yes |
| update infrastructure (e.g. tables) in the online store | yes |
| teardown infrastructure (e.g. tables) in the online store | yes |
| generate a plan of infrastructure changes | no |
| support for on-demand transforms | yes |
| readable by Python SDK | yes |
| readable by Java | no |
| readable by Go | no |
| support for entityless feature views | yes |
| support for concurrent writing to the same key | yes |
| support for ttl (time to live) at retrieval | no |
| support for deleting expired data | no |
| collocated by feature view | yes |
| collocated by feature service | no |
| collocated by entity key | yes |
To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).
4 changes: 2 additions & 2 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ The list below contains the functionality that contributors are planning to deve
* [x] [DynamoDB](https://docs.feast.dev/reference/online-stores/dynamodb)
* [x] [Redis](https://docs.feast.dev/reference/online-stores/redis)
* [x] [Datastore](https://docs.feast.dev/reference/online-stores/datastore)
* [x] [Bigtable](https://docs.feast.dev/reference/online-stores/bigtable)
* [x] [SQLite](https://docs.feast.dev/reference/online-stores/sqlite)
* [x] [Azure Cache for Redis (community plugin)](https://github.com/Azure/feast-azure)
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
* [x] [Cassandra / AstraDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/cassandra)
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/adding-support-for-a-new-online-store)
* [x] [Cassandra / AstraDB](https://docs.feast.dev/reference/online-stores/cassandra)
* [ ] Bigtable (in progress)
* **Feature Engineering**
* [x] On-demand Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))
* [x] Streaming Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1UzEyETHUaGpn0ap4G82DHluiCj7zEbrQLkJJkKSv4e8/edit))
Expand Down
25 changes: 24 additions & 1 deletion docs/specs/online_store_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,29 @@ Other types of entity keys are not supported in this version of the specificatio

![Datastore Online Example](datastore_online_example.png)

## Google Bigtable Online Store Format

[Bigtable storage model](https://cloud.google.com/bigtable/docs/overview#storage-model)
consists of massively scalable tables, with each row keyed by a "row key". The rows in a
table are stored lexicographically sorted by this row key.

We use the following structure to store feature data in Bigtable:

* All feature data for an entity or a specific group of entities is stored in the same
table. The table name is derived by concatenating the lexicographically sorted names
of entities.
* This implementation only uses one column family per table, named `features`.
* Each row key is created by concatenating a hash derived from the specific entity keys
and the name of the feature view. Each row only stores feature values for a specific
feature view. This arrangement also means that feature values for a given group of
entities are colocated.
* The columns used in each row are named after the features in the feature view.
Bigtable is perfectly content being sparsely populated.
* By default, we store 1 historical value of each feature value. This can be configured
using the `max_versions` setting in `BigtableOnlineStoreConfig`. This implementation
of the online store does not have the ability to revert any given value to its old
self. To use the historical version, you'll have to use custom code.

## Cassandra/Astra DB Online Store Format

### Overview
Expand Down Expand Up @@ -250,4 +273,4 @@ message BoolList {
repeated bool val = 1;
}
```
```
Loading

0 comments on commit 6bc91c2

Please sign in to comment.