-
Notifications
You must be signed in to change notification settings - Fork 602
Usage with Authentication
This page will outline how to configure the MongoDB Hadoop Connector to authenticate to a MongoDB cluster as well as provide a few examples.
The Hadoop connector may need to do the following, depending on your configuration:
- Run the
splitVector
command (only when the input is not sharded). - Read the
config.shards
collection (only when the input is sharded). - Run the
collStats
command (only when the input is not sharded but still behind a mongos). - Read from input collections, including reading directly from shards, in the case of
MongoShardSplitter
(whenever the input is from MongoDB). - Write to output collections (whenever the output is to MongoDB).
The first two items above require special privilege on the "admin" database. The rest can have permissions defined only on the specific databases where they apply. In other words, you can choose to split up these privileges among two users: one in the admin database, and another in the input/output database. Please consult the MongoDB Manual for details on what roles to grant users to fit your needs.
Credentials are passed to the Hadoop connector through MongoDB connection strings. If the only privileges your job requires are input/output from/to a collection, then it's sufficient to define a user on the input/output databases and supply the credentials in the string passed to the mongo.input.uri
option. This extremely basic configuration only works if you're reading the collection as a single split (and therefore don't need to call splitVector
or read from config.chunks
).
In cases where you need extra privileges (such as when calling splitVector
, for example), the Hadoop connector needs to be able to authenticate against the "admin" database. Pass these credentials into the mongo.auth.uri
option. Note that because the connection string is a URI, you can pass in a different host from mongo.input.uri
here. This is handy when the admin
database is located on a different host in your MongoDB cluster. N.B. When writing output back to MongoDB, the Hadoop connector will use the credentials in mongo.auth.uri
over mongo.input.uri
to authenticate before writing.
The Hadoop connector supports SSL through the MongoDB Java Driver, which in turn supports SSL through the JDK itself. To enable SSL, enable the ssl option in the input/output connection strings passed to mongo.input.uri
and mongo.output.uri
, respectively. Example.
hadoop jar \
-libjars mongo-hadoop.jar,mongo-java-driver.jar \
-Dmongo.input.uri=mongodb://user:password@mongodb-server:27017/database.collection \
-Dmongo.output.uri=mongodb://user2:password2@mongodb-server:27017/database2.collection2
If "administrator" also has read/write privileges on your input collection:
hadoop jar \
-libjars mongo-hadoop.jar,mongo-java-driver.jar \
-Dmongo.input.uri=mongodb://mongodb-server:27017/database.collection \
-Dmongo.auth.uri=mongodb://administrator:password@mongodb-server:27017/admin
If "administrator" does not have read/write privileges on your input collection, but "user" does:
hadoop jar \
-libjars mongo-hadoop.jar,mongo-java-driver.jar \
-Dmongo.input.uri=mongodb://user:password@mongodb-server:27017/database.collection \
-Dmongo.auth.uri=mongodb://administrator:password@mongodb-server:27017/admin
hadoop jar \
-libjars mongo-hadoop.jar,mongo-java-driver.jar \
-Dmongo.input.uri=mongodb://mongodb-server:27017/database.collection?ssl=true \
-Dmongo.auth.uri=mongodb://user:password@mongodb-server:27017/admin?ssl=true \
-Djavax.net.ssl.trustStore=/path/to/trust/store \
-Djavax.net.ssl.trustStorePassword=trustStorePassword
See also the SSL Reference for using SSL with the MongoDB Java Driver.