chainweb-data stores and serves data from the Kadena Public Blockchain in a
form optimized for lookups and analysis by humans. With this reindexed data we
can easily determine mining statistics and confirm transaction contents.
chainweb-data requires Postgres configured with UTF8 encoding. If you plan to host a chainweb-data instance
on a cloud machine (e.g. Amazon EC2), we recommend that you run the postgres
instance on an instance attached storage unit.
The chainweb-node instance should have good performance to avoid timeouts during fill operations, use ssd storage for the db.
chainweb-data can be built with either cabal, stack, or
Nix. Building with nix is the most
predictable because it is a single build command once you’ve installed Nix.
This process will go significantly faster if you set up the Kadena nix cache
as described
here.
Building with cabal or stack will probably require a little more knowledge
of the Haskell ecosystem.
git clone https://github.com/kadena-io/chainweb-data cd chainweb-data nix-build
By default, chainweb-data will attempt to connect to Postgres via the
following values:
| Field | Value |
|---|---|
| Host | localhost |
| Port | 5432 |
| User | postgres |
| Pass | Empty |
| DB Name | postgres |
You can alter these defaults via command line flags, or via a Postgres Connection String.
Assuming you had set up Postgres, done a createdb chainweb-data, and had
configured user permissions for a user named joe, the following would connect
to a local Postgres database at port 5432:
chainweb-data <command> --service-host=<node> --dbuser=joe --dbname=chainweb-data
chainweb-data <command> --service-host=<node> --dbstring="host=localhost port=5432..."
chainweb-data syncs its data from a running chainweb-node. The node’s
Service address is specified with the --service-host command.
If a custom service enpoint port is used, you can specify it with --service-port
chainweb-data <command> --service-host=foo.chainweb.com ...
chainweb-data also needs some special node configuration. The server
command needs headerStream and some of the other stats and information made
available requires allowReadsInLocal. Increasing the throttling settings on
the node also makes the backfill and gaps operations dramatically faster.
chainweb:
allowReadsInLocal: true
headerStream: true
throttling:
global: 1000
You can find an example node config in this repository in [node-config-for-chainweb-data.yaml](node-config-for-chainweb-data.yaml).
When running chainweb-data for the first time you should run chainweb-data
server -m (with the necessary DB and node options of course). This will create
the database and start filling the database with blocks. Wait a couple minutes,
then run chainweb-data fill (again with the necessary options). After the
fill operation finishes, you can run server again with the -f option and it
will automatically run fill once a day to populate the DB with missing blocks.
listen fetches live data from a chainweb-node whose headerStream
configuration value is true.
> chainweb-data listen --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized 28911337084492566901513774
As a new block comes in, its chain number is printed as a single digit.
listen will continue until you stop it.
server is just like listen, but also runs an HTTP server that serves a
few endpoints for doing common queries.
Additionally, it can serve an OpenAPI v3 spec of the API when the hidden
--serve-swagger-ui option is enabled, offering a basic interface for interacting
with the API. This feature, however, is kept unofficial for now due to
its rudimentary documentation.
By specifying the optional --no-listen argument, the server can be made read-only,
allowing multiple servers to serve from the same database.
/txs/recentgets a list of recent transactions/txs/search?search=foo&limit=20&offset=40&minheight=100&maxheight=200searches for transactions containing the stringfooor the provided transaction pact id, with the additional option to filter results based on block height./txs/tx?requestkey=<request-key>gets the details of a transaction with the given request key/txs/txs?requestkey=<request-key>same as txs, but returns a list of transactions, which allows the client to handle multiple appearances due to orphans./txs/events?search=foo&limit=20&offset=40&minheight=100&maxheight=200searches for transaction events containing the stringfoo, and allows for results to be filtered by block height. It also offers pagination with limit and offset parameters./statsreturns a few stats such as transaction count and coins in circulation/coinsreturns just the coins in circulation/txs/account/<account-identifier>?token=coin&chainid=12&minheight=100&maxheight=200&limit=20&offset=40provides transactions related to the specified account identifier. It includes additional options to filter results based on the token name, chain ID, and block height, as well as pagination controls via limit and offset parameters.
For more detailed information, see the API definition here.
All of chainweb-data’s search endpoints (/txs/{events,search,account}) support a common workflow
for efficiently retrieving the results of a given search in non-overlapping batches.
A request to any one of these endpoints that match more rows than the number asked with the limit
query parameter will respond with a Chainweb-Next response header containing a token. That token
can be used to call the same endpoint with the same query parameters plus the token passed in via
the next query parameter in order to retreive the next batch of results.
chainweb-data supports a Chainweb-Execution-Strategy request header that can be used (probably by
chainweb-data operators by setting it in the API gateway) to enable
an upper bound on the amount of time the server will spend for searching results. Normally, the
search endpoints will produce the given limit-many results if the search matches at least that many
entries. However, if Chainweb-Execution-Strategy: Bounded is passed in, the response can contain
less than limit rows even though there are potentially more matches, if those matches aren’t found
quickly enough. In such a case, the returned Chainweb-Next token will act as a cursor for the search,
so it’s possible to keep searching by making successive calls with subsequent Chainweb-Next tokens.
fill fills in missing blocks. This command used to be called gaps but it has
been improved to encompass all block filling operations.
> chainweb-data fill --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
Deprecated: The backfill command is deprecated and will be removed in future
releases. Use the fill command instead.
backfill rapidly fills the database downward from the lowest block height it
can find for each chain.
Note: If your database is empty, you must fetch at least one block for each
chain first via listen before doing backfill! If backfill detects any
empty chains, it won’t proceed.
> chainweb-data backfill --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized Backfilling... [INFO] Processed blocks: 1000. Progress sample: Chain 9, Height 361720 [INFO] Processed blocks: 2000. Progress sample: Chain 4, Height 361670
backfill will stop when it reaches height 0.
backfill-transfers fills entries in the transfers table from the highest block
height it can find for each chain up until the height that events for coinbase
transfers began to exist.
Note: If the transfers table is empty, you must fetch at least one row for each
chain first via listen before doing backfill-transfers! If backfill-transfers detects any
empty chains, it won’t proceed.
Deprecated: The backfill command is deprecated and will be removed in future
releases. Use the fill command instead.
gaps fills in missing blocks that may have been missed during listen or
backfill. Such gaps will naturally occur if you turn listen off or use
single.
> chainweb-data gaps --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized [INFO] Processed blocks: 1000. Progress sample: Chain 9, Height 361624 [INFO] Processed blocks: 2000. Progress sample: Chain 9, Height 362938 [INFO] Filled in 2113 missing blocks.
single allows you to sync a block at any location in the blockchain.
> chainweb-data single --chain=0 --height=200 --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized [INFO] Filled in 1 blocks.
Note: Even though you specified a single chain/height pair, you might see it report that it filled in more than one block. This is expected, and will occur when orphans/forks are present at that height.
migrate allows you to migrate the database schema to the latest version and exit.
This can be useful for separating the migration step from running the ETL and/or HTTP service.
> chainweb-data migrate --dbuser=joe --dbname=chainweb-data
check-schema is used to perform a check of the ORM definitions against the DB schema.
> chainweb-data check-schema --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
A common use case for chainweb-data is to primarily run it as a worker process to
populate a Postgres database with blockchain data. In this case, chainweb-data
operators often want to run their schema migrations and modify the schema according
to their needs. Obviously, by introducing arbitrary schema changes, we can not guarantee unimpeded operation of chainweb-data.
Any node operator that wishes to modify the
database, takes on the responsibility of ensuring that their changes
do not interfere with the current operation of chainweb-data.
A node operator is also responsible for considering their changes in the face of future releases of chainweb-data.
chainweb-data provides a way to help with this process.
Any version of chainweb-data comes with a set of schema migrations included in the
binary that are applied by default to the database at migration time. These migrations
are defined in the haskell-src/db-schema/migrations directory. It is possible to override
these migrations by calling chainweb-data with the optional --migrations-folder argument.
However, in order to add migrations to the
default set, an --extra-migrations-folder argument is provided.
The default migrations that come with chainweb-data have the following file
name format: X.Y.Z.N_NAME.sql. Here X.Y.Z is the version of chainweb-data after which the
migration was introduced. N is the migration number. These migrations are executed
in “alphabetical order” considering X,Y,Z and N to be the elements by which they are sorted.
The migration procedure will fetch the already executed migrations from the database and check them against the migrations provided through the --extra-migrations-folder and the --migrations-folder arguments.
If the already executed migrations are a prefix (i.e. they were run in the correct order and have no gaps or extras)
of the expected migrations, then the rest of the migrations will be executed.
By taking advantage of this alphabetical sorting, chainweb-data operators can insert custom migrations to be executed at the moment they desire.
For example, version 2.3.0 of chainweb-data will have migrations done on top of version 2.2.0, thus having migrations named 2.2.0.N_... (N>=1). Creating a custom migration named 2.3.0.0.N_... will guarantee that it’ll be executed after the new migrations that come with version 2.3.0 and before the new migrations of future versions, which are guaranteed to have a name greater than 2.3.0.1_....
Likewise, chainweb-data operators that run the latest commit
from the master branch can also inject their migrations. For example, if the latest commit
has the last migration named 2.2.0.1_..., then their migrations can be named 2.2.0.1.N_....
It’s important to note, that running chainweb-data from an unreleased commit of the
master branch is **not officially supported** and even though we aim to avoid it, we can change
new migrations of the master branch without notice, so you may have to fix your database
manually by undoing migrations and removing schema_migrations entries.
chainweb-data operators that specialize their database schema are strongly advised to review
the incoming migrations **before** they upgrade their chainweb-data versions. This will allow
them to detect any potential conflicts and insert new schema migrations to be executed at the right moment, to
accommodate the incoming changes.