Table of Contents generated with DocToc
Chainsformer is an Apache Arrow Flight service built on top of ChainStorage as a stateless adaptor service. It currently supports batch data processing and micro batch data streaming from ChainStorage service to the Spark data processing platform.
It aims to provide a set of easy to use interfaces to support spark consumers to read and process ChainStorage Data on the Spark platform:
- It defines a set of standardized block and transaction data schema for each asset class (i.e EVM assets or bitcoin).
- It provides data transformation capability from protobuf to Arrow format.
- It can be easily scaled up to support higher data throughput.
- It can be easily integrated via the Chainsformer Spark Connector (https://github.com/coinbase/chainsformer-spark-source) for structured data streaming.
Make sure your local go version is 1.18 by running the following commands:
brew install go@1.18
brew unlink go
brew link go@1.18
brew install protobuf@3.21.12
brew unlink protobuf
brew link protobuf
To set up for the first time (only done once):
make bootstrap
Rebuild everything:
make build
Chainsformer depends on the following environment variables to resolve the path of the configuration.
The directory structure is as follows: config/chainsformer/{blockchain}/{network}/{environment}.yml
.
CHAINSFORMER_CONFIG
: This env var, in the format of{blockchain}-{network}
, determines the blockchain and network managed by the service. The naming is defined in chainstorage/protos/coinbase/c3/common/common.protpCHAINSFORMER_ENVIRONMENT
: This env var controls the{environment}
in which the service is deployed. Possible values includeproduction
,development
, andlocal
(which is also the default value).
Asset specific configurations are stored in the config
directory under the Chainsformer service repo. The config folder structure follows the following form ./config/chainsformer/{blockchain}/{network}/base.yml
- Simply follow the config folder structure to add new configurations for any new blockchains or new networks of existing blockchains.
- Add new tests in the config_test.go
- Add new test configs in teh testapp.go
Clone the Chainsformer service repo:
git clone https://github.com/coinbase/chainsformer.git
Change directory to the Chainsformer service repo:
cd chainsformer
Setup Chainstorage SDK credentials
export CHAINSTORAGE_SDK_AUTH_HEADER=cb-nft-api-token
export CHAINSTORAGE_SDK_AUTH_TOKEN=****
To set up Chainsformer for the first time (only done once):
make bootstrap
Rebuild Chainsformer:
make build
Start the Chainsformer service with default CHAINSFORMER_CONFIG=ethereum-mainnet
:
make server
Query Chainsformer for a range of blocks
go run ./cmd/client --env local --blockchain ethereum --network mainnet --start 0 --end 10 --table blocks
Query Chainsformer for a range of block events
go run ./cmd/client --env local --blockchain ethereum --network mainnet --start 0 --end 10 --table streamed_blocks
Calling the GetSchema
API
cmd=$(echo -n '{"table": "blocks"}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetSchema
Calling the GetFlightInfo
API to partition the data
cmd=$(echo -n '{"batch_query": {"start_height": 0, "end_height": 10, "table": "blocks"}}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetFlightInfo
Take one of the ticket
returned by the above command
...
"endpoint": [
{
"ticket": {
"ticket": "eyJiYXRjaF9xdWVyeSI6eyJlbmRfaGVpZ2h0IjoiMTAiLCJ0YWJsZSI6ImJsb2NrcyJ9fQ=="
}
}
]
...
Calling the DoGet
API to get data for one of the partition
grpcurl --plaintext -d '{"ticket": "eyJiYXRjaF9xdWVyeSI6eyJlbmRfaGVpZ2h0IjoiMTAiLCJ0YWJsZSI6ImJsb2NrcyJ9fQ=="}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoGet
API to get data of a specific partition
cmd=$(echo -n '{"batch_query":{"start_height":"1", "end_height":"2", "table":"blocks"}}' | base64)
grpcurl --plaintext -d '{"ticket": '"\"$cmd\""'}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoAction
API to get the tip in ChainStorage via Chainsformer
grpcurl --plaintext -d '{"type": "TIP"}' localhost:9090 arrow.flight.protocol.FlightService.DoAction | jq '.body | @base64d'
Calling the GetSchema
API
cmd=$(echo -n '{"table": "streamed_blocks"}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetSchema
Calling the GetFlightInfo
API to partition the data
cmd=$(echo -n '{"stream_query": {"start_sequence": 0, "end_sequence": 10, "table": "streamed_blocks"}}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetFlightInfo
Take one of the ticket
returned by the above command
...
"endpoint": [
{
"ticket": {
"ticket": "eyJzdHJlYW1fcXVlcnkiOnsic3RhcnRfc2VxdWVuY2UiOiIxIiwiZW5kX3NlcXVlbmNlIjoiMTAiLCJ0YWJsZSI6InN0cmVhbWVkX2Jsb2NrcyJ9fQ=="
}
}
]
...
Calling the DoGet
API to get data for one of the partition
grpcurl --plaintext -d '{"ticket": "eyJzdHJlYW1fcXVlcnkiOnsic3RhcnRfc2VxdWVuY2UiOiIxIiwiZW5kX3NlcXVlbmNlIjoiMTAiLCJ0YWJsZSI6InN0cmVhbWVkX2Jsb2NrcyJ9fQ=="}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoGet
API to get data of a specific partition
cmd=$(echo -n '{"stream_query":{"start_sequence":"1", "end_sequence":"2", "table":"streamed_blocks"}}' | base64)
grpcurl --plaintext -d '{"ticket": '"\"$cmd\""'}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoAction
API to get the tip in ChainStorage via Chainsformer
grpcurl --plaintext -d '{"type": "STREAM_TIP"}' localhost:9090 arrow.flight.protocol.FlightService.DoAction | jq '.body | @base64d'
# Run everything
make test
Under development
Under development