Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Commit

Permalink
Design documents refresh.
Browse files Browse the repository at this point in the history
-------------
Created by MOE: http://code.google.com/p/moe-java
MOE_MIGRATED_REVID=102353961
  • Loading branch information
arthurhsu authored and freshp86 committed Sep 4, 2015
1 parent f433689 commit ed5eb6a
Show file tree
Hide file tree
Showing 6 changed files with 240 additions and 97 deletions.
29 changes: 15 additions & 14 deletions docs/dd/00_intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,36 @@

## 0. Introduction

Lovefield is a relational query engine and is built in a way very similar to
Lovefield is a relational database and is built in a way very similar to
traditional RDBMS in many aspects. In this chapter, the design philosophy of
Lovefield and basic driving factors will be discussed.

### 0.1 Motivation

The best thing of software engineering is that most problems can be solved in
different ways. Object databases and relational database are invented to solve
different ways. Object databases and relational databases are invented to solve
data access problems from different points of view and requirements.
Unfortunately the support of relational databases in browsers is in an
unsatisfactory state after WebSQL being deprecated, and thus Lovefield is
created to offer the choice that developers shall be honored to have.

### 0.2 Data Store

Originally Lovefield was bundled with IndexedDB only. Per popular requests, it
is now engineered to wrap different storage technologies into a separate layer,
and is able to couple different storage technologies with the query engine.
Lovefield abstracts data persistence into classes implementing `lf.BackStore`.
This makes Lovefield adaptive to different storage media and technologies.
This also helps to decouple storage from the query engine.

The supported data store types are:
The supported data stores are:

* IndexedDB - All data persisted on IndexedDB
* Memory - All data are transient and stored in-memory
* IndexedDB - All data persisted on IndexedDB provided by browser.
* Memory - All data are transient and stored in-memory.
* Firebase - Data is persisted in Firebase, a cloud database that synchronized
among all its clients.

There are several experimental data stores:

* WebSQL - Provided to fill in the gap of Safari lacking IndexedDB support
* Firebase - Provided to test server-to-client end-to-end solution
* WebSQL - Provided to fill in the gap of Safari lacking IndexedDB support.
* LocalStorage - Provided for proof of concept for handling external changes.

Each storage technology has different limitations and constraints. Lovefield
contributors are required to have a good understanding of these boundary
Expand Down Expand Up @@ -58,14 +60,13 @@ Lovefield's codebase is checked by Closure compiler with very strict options.

Lovefield consists following components:

* Schema Parser and Code-generator (SPAC, `spac/`)
* Schema (`lib/schema/`)
* Caching support (`lib/cache/`)
* Schema Management (`spac/` and `lib/schema`)
* Caching and Memory Management (`lib/cache/`)
* Query engine
* Query builder (`lib/query/`)
* Relation, query plan generator/optimizer/runner (`lib/proc/`)
* Predicates (`lib/pred/`)
* Data store management (`lib/backstore/`)
* Data stores (`lib/backstore/`)
* Indices (`lib/index/`)

These components will be detailed in following chapters.
Expand Down
25 changes: 13 additions & 12 deletions docs/dd/01_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ A row consists of columns, and the columns are defined in the schema.
A schema is the way users define the layout of their database. Lovefield
offers two different ways of defining a schema: dynamic and static. The default
is dynamic schema creation, which is carried out using provided API. The
static schema creation is considered advanced usage and is done via SPAC.
static schema creation is considered advanced usage and is done via SPAC (
Schema Parser and Code generator).

The dynamic schema creation uses the builder pattern. There are two levels of
builder:

Expand All @@ -29,9 +31,9 @@ builder:

#### 1.2.1 `lf.schema.Builder`

This builder will create a concrete object `lf.schema.DatabaseSchema_` which
implements `lf.schema.Database`. The differences between the object and
SPAC-generated code are minimum.
This builder creates a concrete object `lf.schema.DatabaseSchema_` which
implements `lf.schema.Database`. The differences between this object and the
object from SPAC-generated class are minimum.

#### 1.2.2 `lf.schema.TableBuilder`

Expand All @@ -47,10 +49,13 @@ keys).
Once the table class object is generated, the `TableBuilder` instantiates
an object of that class and returns it to the database builder.

There are significant differences between SPAC-generated code and the table
class from `TableBuilder`. The main difference is that the implementation from
`TableBuilder` is more complicated because it needs to consider all different
type combinations and conversions.
There are significant differences between SPAC-generated class and the table
class from `TableBuilder`. The main differences are:

* Implementation from `TableBuilder` is more complicated because it needs to
consider all different type combinations and conversions.
* SPAC-generated class has better type annotations, thus better compiler
coverage.

In order to improve performance, the implementation uses a hash table to
store functions for different data types, so that function selection according
Expand Down Expand Up @@ -118,10 +123,6 @@ expansion. All macros start with the `#` sign, which is an invalid character
for JavaScript so that errors can be spotted easily if the macros inside
template were not fully expanded.

Code generator has grown to a point that is quite hard to maintain. The plan
is to divide the code generator into several different classes and make it
easy to work with again.

##### 1.3.3.1 Simple Macros

The following table lists the simple macros supported by current code generator:
Expand Down
163 changes: 115 additions & 48 deletions docs/dd/02_data_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,30 @@

## 2. Data Store

In the beginning of Lovefield design, it was identified that two different
data stores need to exist: a persistent backstore for applications that need
persistence, and a volatile in-memory backstore for remaining applications and
for testing purposes. By defining the interface of `lf.raw.BackStore`, the
need to use persistent backstore in tests is vastly reduced, and data stores
can be swapped for better test-ability.

> #### Why not persistent storage in testing?
> One cruel fact of testing is that tests can fail, which is the reason why
> tests exist. When tests fail, it needs to clean up after itself. Using a
> persistent storage will make the clean-up way more complicated, especially
> when a test is failed by a JavaScript exception. Moreover, most tests shall
> not involve persistent storage anyway.
As a result, there are two different data store implementations created:
IndexedDB as the persistent store, and Memory as the temporary/volatile store.
In the beginning of Lovefield design, it was identified that two types of
data stores need to exist: persistent data store for applications that need
persistence, and volatile data store for remaining applications and testing.
As time goes by, there are more needs to integrate with different
persistent storage technologies. For example, the integration of Firebase allows
developers to have the best of both worlds: a relational query engine on the
browser-side, and a fully synchronized database among all clients on the
Firebase server.

As a result, Lovefield uses a plug-in architecture for data stores. All data
stores implement `lf.BackStore` interface so that query engine can be decoupled
from actual storage technology.

For databases that features "upgrade mode", an `lf.raw.BackStore` interface is
provided to handle that special case.

### 2.1 Common Nominator

A data store has three methods:
All data stores has three methods in common:

* `init`: indicate the initialization of a data store, which typically means
establishing connection to persisted instance. The database upgrade process
is also initiated here. The initialization also identifies maximum row id
to-date and continue row id generation from there.
establishing connection to a persisted instance. The database upgrade process
is also initiated here when needed. The initialization also identifies
maximum row id to-date and continue row id generation from there.

* `createTx`: return a transaction object from data store, and hint the data
store that an atomic writing operation is going to happen. The transaction
Expand All @@ -36,7 +35,8 @@ A data store has three methods:
of a transaction into one single flush.

* `close`: closes the connection to persisted instance. This call is
best-effort due to IndexedDB limitations.
best-effort due to technical limitations (e.g. both IndexedDB and WebSQL have
no reliable `close` call).

### 2.2 Memory Store

Expand Down Expand Up @@ -70,33 +70,18 @@ the logical transaction is committed. As a result, the IndexedDB transaction
object is named `Tx` throughout the code to distinguish from the logical
transaction object.

### 2.4 Firebase Store
#### 2.3.1 Storage Format

Lovefield can sit on top of [Firebase](https://www.firebase.com). Lovefield uses
Firebase as a cloud backstore. As a result, one must follow these three rules
when use Lovefield on top of Firebase:
The data will be stored in IndexedDB using following hierarchy

1. All clients accessing the database must use Lovefield.
2. Only one client can be used to create the database.
3. Database upgrade needs to be carried out in a different manner: clients other
than the upgrading one must not be using the database shall there be a
database upgrade.
* The whole database will be stored in an IndexedDB using database's name
* Each table corresponds to an object store
* Each row corresponds to an object in object store. A row contains two fields:
* id: unique row id across database
* value: actual payload of the row, each field in this object corresponds to
a column in schema

### 2.5 Database Upgrade

The upgrade process is triggered by bumping up the schema version. Lovefield
will open the database using updated number, and in turn IndexedDB will fire the
`onupgradeneeded` event. In that event, Lovefield will

1. create tables (i.e. `objectstores`) which exist in schema but not in
IndexedDB
2. wrap the database connection into `lf.backstore.IndexedDBRawBackstore`
3. call user-provided `onUpgrade` function with the wrapped instance

This design still exposes the user under the risk of accidentally auto-commit.
However, there existed no known better alternative that works cross-browser.

### 2.6 Bundled Mode Experiment
#### 2.3.2 Bundled Mode Experiment

In bundled mode, Lovefield will store rows differently. Internally Lovefield
assigns a unique row id to each logical row. In bundled mode, Lovefield will
Expand Down Expand Up @@ -134,14 +119,55 @@ Users who enabled bundled mode needs to keep the following facts in mind:
* Bundled mode is designed mainly for data tables with 10K+ rows. Smaller
database may experience slower performance by enabling bundle mode. User is
supposed to benchmark and determine if bundled mode is feasible.
* There is no support for converting non-bundled to bundled database, and vice
versa. Manual conversion is possible but will not be easy.
* To convert non-bundled to bundled database or the other way around:
* `db.export()` to get the JavaScript representation of the whole database
* Completely delete the original database using `window.deleteDatabase()`
* Recreate the database again using `connect()`
* Use `db.import()` to import previously exported data
* Bundled database is harder to examine via developer tools. The pages serialize
the payload as string before storing them. This is done so because of way
greater performance in Chrome (tested on v39.0.2171.36) for large JSON
objects.

### 2.7 WebSQL Experiment
### 2.4 Firebase Store

Lovefield can sit on top of [Firebase](https://www.firebase.com). Lovefield uses
Firebase as a cloud backstore. As a result, one must follow these three rules
when use Lovefield on top of Firebase:

1. All clients accessing the database must use Lovefield.
2. Only one client can be used to create the database.
3. Database upgrade needs to be carried out in a different manner: clients other
than the upgrading one must not be using the database shall there be a
database upgrade (this is typically done using Firebase security control
instead).

#### 2.4.1 Storage Format

For performance reasons, Lovefield stores data in Firebase very differently.
The store is structured like this:

```js
<schema_name>: {
"@rev": {
R: <N>,
},
"@db": {
version: <schema version>
},
"@table": {
<table_name>: <table_id>
},
<row id 1>: { R: <N1>, T: <T1>, P: <object1> },
<row id 2>: { R: <N2>, T: <T2>, P: <object2> },
...
}
```

R stands for revision, T stands for table id, and P stands for payload. It's
abbreviated to ensure optimal over-the-wire transmission.

### 2.5 WebSQL Store

As of August, 2015, tests show that Lovefield's IndexedDB backstore could
not be run on Safari 8 or Safari 9 beta. Safari simply throws mysterious DOM
Expand All @@ -151,3 +177,44 @@ to WebKit/Apple but there's no word about ETA of fix.
A WebSQL-based back store is created to fill this gap. The WebSQL backstore
shall be considered a gap-stopping patch and will be removed as soon as Apple
fixes IndexedDB bugs in Safari.

#### 2.5.1 Storage Format

WebSQL stores data similar to IndexedDB's structure:

* The whole database will be stored in a WebSQL instance using database's name
* A special table named `__lf_ver` that stores metadata for the whole database
* Each table corresponds to a WebSQL table
* Each row corresponds to a row in WebSQL table. A row contains two fields:
* id: unique row id across database
* value: serialized JSON string of the row's payload

### 2.6 Database Upgrade

The upgrade process is triggered by bumping up the schema version. Lovefield
will open the database using updated number, and in turn IndexedDB will fire the
`onupgradeneeded` event. In that event, Lovefield will

1. create tables (i.e. `objectstores`) which exist in schema but not in
IndexedDB
2. wrap the database connection into `lf.backstore.IndexedDBRawBackstore`
3. call user-provided `onUpgrade` function with the wrapped instance

This design still exposes the user under the risk of accidentally auto-commit.
However, there existed no known better alternative that works cross-browser.

### 2.7 External Change

Contents inside a data store can be changed by other sessions or even other
clients. Most data stores lack external change notifications. As a result,
for cross-tab implementations, one need to use Web Worker or Service Worker to
host the database. Lovefield team is aware of this problem and working closely
with IndexedDB team to pursue a change notification standard for IndexedDB.

Local Storage and Firebase change notifications to inform Lovefield that the
contents have changed by external sources. By default, Lovefield listens to
these changes, and will update

* Database cache
* Indices
* Observed queries
18 changes: 18 additions & 0 deletions docs/dd/03_life_of_db.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@ Then, it will gather all the information and determine the next row id to
use for this connection. All the ids are indexed by IndexedDB and theoretically
the scan shall be done in O(N) time where N is the number of tables in schema.


#### 3.2.3 Firebase Initialization

For Firebase initialization, it first will attempt to obtain `@db/version` and
`@rev/R` for database version and change revision. When there is a version
mismatch, Lovefield will call your `onUpgrade` handler, but this time the name
is a bit deceiving. In the case of Firebase, this typically means that the user
is running a cached JS on browser, and what you really want to do is to have
them refresh the session and reload an updated binary.

#### 3.2.3 Service Initialization

Object instances of the cache (`lf.cache.DefaultCache`), query engine
Expand Down Expand Up @@ -99,6 +109,14 @@ during database initialization, which is not optimal especially for large data
sets. In the future, Lovefield plans to implement an MRU-based lazy-load cache
that loads data in the background on demand.

#### 3.2.5 Special Handling for Firebase

For Firebase, the prefetch data will actually trigger Firebase to load data
over the wire during the initialization of database. This generally is not a
problem since Firebase.js may already had those data. If you had a large amount
of data, you will need to fine tune your code and the Firebase server-side
settings to overcome this issue.

### 3.3 Life of Query

Once the database is fully initialized, it can start accepting queries. The life
Expand Down
Loading

0 comments on commit ed5eb6a

Please sign in to comment.