This POC aims to demonstrate how to secure HBase coprocessors for industrial use (especially in multitenancy environment).
It based on HBase version : 1.2.3.
Coprocessors is a very power full mechanism it allow us to call "Map-Reduce" tasks for "real-time" applications.
But coprocessors are dangerous for several reasons. This reasons are summarized as well in HBase Book :
Coprocessors are an advanced feature of HBase and are intended to be used by system developers only
This term of system developer
is an HBase team consensus for people that want's to open coprocessors to HBase users,
and others that want's to keep Coprocessors for HBase developers.
Thanks to @nkeywal fot this explanation.
Here we consider that coprocessors is use for business logic on top of HBase and they don't need to known HBase low level inner working, and often in a multitenancy environment.
Identified coprocessors issues are :
-
can crash region servers.
An exception (other than IOException) bring down region server. -
can break down the cluster in case of bad request
Client retry mechanism (this is good) can propagate a fail and consume resources for nothing -
hog a lot of memory/CPU
Long running and heavy memory consumption in HBAse JVM can slow down other HBase features -
comes without metrics There is no metrics on custom Coprocessors in the HBase API
-
can break security configuration by bypass other coprocessors
-
can beak down the cluster in case of load failures
-
comes without process isolation
Coprocessors are executed in the RegionServer JVM. -
API may changes on minor HBase version Interfaces are still in @InterfaceStability.Evolving state.
One of the common solution is to write defensive code. But it's an heavy process to setup in industry (review, tests, etc.).
The purpose of this project is to applies custom policies on HBase coprocessors, to fix up common identified issues.
This table bellow resume for each issue the state of the given solution :
Problem | Solution |
---|---|
can crash region servers. | FOUND / IMPLEMENTED |
break down the cluster in case of bad request | FOUND / IMPLEMENTED |
hog a lot of memory/CPU | FOUND / PARTIALLY |
comes without metrics | FOUND / IMPLEMENTED |
can break coprocessors chains (bypass/complete) | FOUND / IMPLEMENTED |
can beak down the cluster in case of load failures | FOUND / N/A |
comes without process isolation | FOUND / TO IMPLEMENT |
API may changes on minor HBase version | N/A |
Those solutions are certainly not perfect but it's try to gives a pragmatic solution to those issues.
Use your favorite design pattern : proxy
to be sure that all methods call are wrapped through a policy verifier.
Pro : Easy, low overload Cons: Intrusive, not possible on existing coprocessors, needs to be applies at compile time
Use Java agent to enhance all HBase coprocessors hosts. For each instantiated coprocessors create a dynamic proxy witch applies policies.
Pro : hard to implement, more important load time overhead Cons: HBase bytecode modification
-
can crash region servers
Create a policy that catch Throwable and rethrow it as IOException (or derived ones). -
can break down the cluster in case of bad request
Create a policy that implements a retry limit (region server side) base on input queries. TODO : Implements an HBase cluster wide fails cache (maybe based on an Hbase table?) -
hog a lot of memory/CPU
Create a policy that implements a timeout logic. TODO : Create a policy that profile memory of execution at runtime. -
comes without metrics Create a logger policy Create a metrics policy based on hadoop metrics2.
-
can break security configuration by bypass other coprocessors
Create a policy that wrap ObserverContext and throw Exception when bypass and/or complete method are called. -
can beak down the cluster in case of load failures
Use hbase.coprocessor.aborterror = false This avoid to break the entire RegionServer only Table with incriminated coprocessors are unloaded. -
comes without process isolation
TODO : Create a separate process that communicate with pipe with a demonized instance of the coprocessor in another PolicyVerifier.
This solution comes with a high overhead (+Security, -Complexity, -Performance).
This solution should be compatible with only CoprocessorService interface, others interfaces contains non serializable fields).
This solution should fixed an other HBase coprocessors issue in multitenancy environment which is : a coprocessor could access/modify other coprocessors in memory data. -
API may changes on minor HBase version
I'm not really sure there is a real solution for that, just be sure to take care about before implementing a Coprocessor. You can note that endpoint coprocessors are not really impacted by this issue since his interface is based on Protobuf.
- Improve tests assertions
- Instantiates policies from configuration
- Dynamic policies configuration (through sighup see HBASE-14529?)
- Configuration for a set of coprocessors
- Advanced benchmark
- Add proxy for BulkLoadObserver, EndpointObserver
- Check/improve adaptation of multi coprocessor type (Master / Region, etc.) at Compile time
- Tests all coprocessors adapted methods
If you want to run integration tests outside gradle environment,
you need to update PATH
environment variable to add workspace/developer/bin
.
$ PATH=$PATH;`workspace`/developer/bin
$ gradlew test
-
Run :
$ gradlew
-
Copy
build/libs/poc-hbase-coprocessor-1.0.0-SNAPSHOT.jar
into the sandbox -
Copy the jar file into
/usr/hdp/current/hbase-master/lib/
and/usr/hdp/current/hbase-regionserver/lib/
.$ cp poc-hbase-coprocessor-1.0.0-SNAPSHOT.jar /usr/hdp/current/hbase-master/lib/ $ cp poc-hbase-coprocessor-1.0.0-SNAPSHOT.jar /usr/hdp/current/hbase-regionserver/lib/
-
Go to
Ambari > Hbase> Configs > Advanced > Advanced hbase-env > hbase-env template
A the tail of config, add# Add Coprocessor policies agent export HBASE_REGIONSERVER_OPTS=" -javaagent:/usr/hdp/current/hbase-master/lib/poc-hbase-coprocessor-1.0.0-SNAPSHOT.jar $HBASE_REGIONSERVER_OPTS" export HBASE_MASTER_OPTS=" -javaagent:/usr/hdp/current/hbase-regionserver/lib/poc-hbase-coprocessor-1.0.0-SNAPSHOT.jar $HBASE_MASTER_OPTS "
-
Add extra configuration (Custom hbase-site > Add property)
Key : hbase.coprocessors.policy.white-list Value : org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint,org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint,org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor,org.apache.hadoop.hbase.backup.master.BackupController
-
Restart Hbase
-
In hbase shell
$ disable 'table' $ alter 'table', 'coprocessor' => '|org.apache.hadoop.hbase.coprocessor.AggregateImplementation||' $ enable 'table'