Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-2002. Update documentation for 0.4.1 release. #1331

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 42 additions & 25 deletions hadoop-hdds/docs/content/beyond/Containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ Docker heavily is used at the ozone development with three principal use-cases:
* __dev__:
* We use docker to start local pseudo-clusters (docker provides unified environment, but no image creation is required)
* __test__:
* We create docker images from the dev branches to test ozone in kubernetes and other container orchestator system
* We provide _apache/ozone_ images for each release to make it easier the evaluation of Ozone. These images are __not__ created __for production__ usage.
* We create docker images from the dev branches to test ozone in kubernetes and other container orchestrator system
* We provide _apache/ozone_ images for each release to make it easier for evaluation of Ozone.
These images are __not__ created __for production__ usage.

<div class="alert alert-warning" role="alert">
We <b>strongly</b> recommend that you create your own custom images when you
Expand All @@ -36,7 +37,7 @@ shipped container images and k8s resources as examples and guides to help you
</div>

* __production__:
* We document how can you create your own docker image for your production cluster.
* We have documentation on how you can create your own docker image for your production cluster.

Let's check out each of the use-cases in more detail:

Expand All @@ -46,38 +47,41 @@ Ozone artifact contains example docker-compose directories to make it easier to

From distribution:

```
```bash
cd compose/ozone
docker-compose up -d
```

After a local build
After a local build:

```
```bash
cd hadoop-ozone/dist/target/ozone-*/compose
docker-compose up -d
```

These environments are very important tools to start different type of Ozone clusters at any time.

To be sure that the compose files are up-to-date, we also provide acceptance test suites which start the cluster and check the basic behaviour.
To be sure that the compose files are up-to-date, we also provide acceptance test suites which start
the cluster and check the basic behavior.

The acceptance tests are part of the distribution, and you can find the test definitions in `./smoketest` directory.
The acceptance tests are part of the distribution, and you can find the test definitions in `smoketest` directory.

You can start the tests from any compose directory:

For example:

```
```bash
cd compose/ozone
./test.sh
```

### Implementation details

`./compose` tests are based on the apache/hadoop-runner docker image. The image itself doesn't contain any Ozone jar file or binary just the helper scripts to start ozone.
`compose` tests are based on the apache/hadoop-runner docker image. The image itself does not contain
any Ozone jar file or binary just the helper scripts to start ozone.

hadoop-runner provdes a fixed environment to run Ozone everywhere, but the ozone distribution itself is mounted from the including directory:
hadoop-runner provdes a fixed environment to run Ozone everywhere, but the ozone distribution itself
is mounted from the including directory:

(Example docker-compose fragment)

Expand All @@ -91,7 +95,9 @@ hadoop-runner provdes a fixed environment to run Ozone everywhere, but the ozone

```

The containers are conigured based on environment variables, but because the same environment variables should be set for each containers we maintain the list of the environment variables in a separated file:
The containers are configured based on environment variables, but because the same environment
variables should be set for each container we maintain the list of the environment variables
in a separated file:

```
scm:
Expand All @@ -111,23 +117,32 @@ OZONE-SITE.XML_ozone.enabled=True
#...
```

As you can see we use naming convention. Based on the name of the environment variable, the appropariate hadoop config XML (`ozone-site.xml` in our case) will be generated by a [script](https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest/scripts) which is included in the `hadoop-runner` base image.
As you can see we use naming convention. Based on the name of the environment variable, the
appropriate hadoop config XML (`ozone-site.xml` in our case) will be generated by a
[script](https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest/scripts) which is
included in the `hadoop-runner` base image.

The [entrypoint](https://github.com/apache/hadoop/blob/docker-hadoop-runner-latest/scripts/starter.sh) of the `hadoop-runner` image contains a helper shell script which triggers this transformation and cab do additional actions (eg. initialize scm/om storage, download required keytabs, etc.) based on environment variables.
The [entrypoint](https://github.com/apache/hadoop/blob/docker-hadoop-runner-latest/scripts/starter.sh)
of the `hadoop-runner` image contains a helper shell script which triggers this transformation and
can do additional actions (eg. initialize scm/om storage, download required keytabs, etc.)
based on environment variables.

## Test/Staging

The `docker-compose` based approach is recommended only for local test not for multi node cluster. To use containers on a multi-node cluster we need a Container Orchestrator like Kubernetes.
The `docker-compose` based approach is recommended only for local test, not for multi node cluster.
To use containers on a multi-node cluster we need a Container Orchestrator like Kubernetes.

Kubernetes example files are included in the `kubernetes` folder.
Kubernetes example files are included in `kubernetes` folder.

*Please note*: all the provided images are based the `hadoop-runner` image which contains all the required tool for testing in staging environments. For production we recommend to create your own, hardened image with your own base image.
*Please note*: all the provided images are based the `hadoop-runner` image which contains all the
required tool for testing in staging environments. For production we recommend to create your own,
hardened image with your own base image.

### Test the release

The release can be tested with deploying any of the example clusters:

```
```bash
cd kubernetes/examples/ozone
kubectl apply -f
```
Expand All @@ -139,13 +154,13 @@ Plese note that in this case the latest released container will be downloaded fr
To test a development build you can create your own image and upload it to your own docker registry:


```
```bash
mvn clean install -f pom.ozone.xml -DskipTests -Pdocker-build,docker-push -Ddocker.image=myregistry:9000/name/ozone
```

The configured image will be used in all the generated kubernetes resources files (`image:` keys are adjusted during the build)

```
```bash
cd kubernetes/examples/ozone
kubectl apply -f
```
Expand All @@ -160,10 +175,12 @@ adjust base image, umask, security settings, user settings according to your own

You can use the source of our development images as an example:

* Base image: https://github.com/apache/hadoop/blob/docker-hadoop-runner-jdk11/Dockerfile
* Docker image: https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/src/main/Dockerfile
* [Base image] (https://github.com/apache/hadoop/blob/docker-hadoop-runner-jdk11/Dockerfile)
* [Docker image] (https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/src/main/docker/Dockerfile)

Most of the elements are optional and just helper function but to use the provided example kubernetes resources you may need the scripts from [here](https://github.com/apache/hadoop/tree/docker-hadoop-runner-jdk11/scripts)
Most of the elements are optional and just helper function but to use the provided example
kubernetes resources you may need the scripts from
[here](https://github.com/apache/hadoop/tree/docker-hadoop-runner-jdk11/scripts)

* The two python scripts convert environment variables to real hadoop XML config files
* The start.sh executes the python scripts (and other initialization) based on environment variables.
Expand Down Expand Up @@ -205,14 +222,14 @@ Ozone related container images and source locations:
<td>This is the base image used for testing Hadoop Ozone.
This is a set of utilities that make it easy for us run ozone.</td>
</tr>
<tr>
<!---tr>
<th scope="row">3</th>
<td>apache/ozone:build (WIP)</td>
<td>https://github.com/apache/hadoop-docker-ozone</td>
<td>ozone-build </td>
<td> </td>
<td> </td>
<td>TODO: Add more documentation here.</td>
</tr>
</tr-->
</tbody>
</table>
7 changes: 4 additions & 3 deletions hadoop-hdds/docs/content/beyond/DockerCheatSheet.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ weight: 4
limitations under the License.
-->

In the `compose` directory of the ozone distribution there are multiple pseudo-cluster setup which can be used to run Ozone in different way (for example with secure cluster, with tracing enabled, with prometheus etc.).
In the `compose` directory of the ozone distribution there are multiple pseudo-cluster setup which
can be used to run Ozone in different way (for example: secure cluster, with tracing enabled,
with prometheus etc.).

If the usage is not document in a specific directory the default usage is the following:

Expand All @@ -31,8 +33,7 @@ cd compose/ozone
docker-compose up -d
```

The data of the container is ephemeral and deleted together with the docker volumes. To force the deletion of existing data you can always delete all the temporary data:

The data of the container is ephemeral and deleted together with the docker volumes.
```bash
docker-compose down
```
Expand Down
2 changes: 1 addition & 1 deletion hadoop-hdds/docs/content/beyond/RunningWithHDFS.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ To start ozone with HDFS you should start the the following components:
2. HDFS Datanode (from the Hadoop distribution with the plugin on the
classpath from the Ozone distribution)
3. Ozone Manager (from the Ozone distribution)
4. Storage Container manager (from the Ozone distribution)
4. Storage Container Manager (from the Ozone distribution)

Please check the log of the datanode whether the HDDS/Ozone plugin is started or
not. Log of datanode should contain something like this:
Expand Down
6 changes: 3 additions & 3 deletions hadoop-hdds/docs/content/concept/Datanodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ actual data streams. This is the default Storage container format. From
Ozone's perspective, container is a protocol spec, actual storage layouts
does not matter. In other words, it is trivial to extend or bring new
container layouts. Hence this should be treated as a reference implementation
of containers under Ozone.
of containers under Ozone.

## Understanding Ozone Blocks and Containers

Expand All @@ -51,13 +51,13 @@ shows the logical layout out of Ozone block.

The container ID lets the clients discover the location of the container. The
authoritative information about where a container is located is with the
Storage Container Manager or SCM. In most cases, the container location will
Storage Container Manager (SCM). In most cases, the container location will be
cached by Ozone Manager and will be returned along with the Ozone blocks.


Once the client is able to locate the contianer, that is, understand which
data nodes contain this container, the client will connect to the datanode
read the data the data stream specified by container ID:Local ID. In other
and read the data stream specified by _Container ID:Local ID_. In other
words, the local ID serves as index into the container which describes what
data stream we want to read from.

Expand Down
2 changes: 1 addition & 1 deletion hadoop-hdds/docs/content/concept/Hdds.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ summary: Storage Container Manager or SCM is the core metadata service of Ozone

Storage container manager provides multiple critical functions for the Ozone
cluster. SCM acts as the cluster manager, Certificate authority, Block
manager and the replica manager.
manager and the Replica Manager.

{{<card title="Cluster Management" icon="tasks">}}
SCM is in charge of creating an Ozone cluster. When an SCM is booted up via <kbd>init</kbd> command, SCM creates the cluster identity and root certificates needed for the SCM certificate authority. SCM manages the life cycle of a data node in the cluster.
Expand Down
6 changes: 3 additions & 3 deletions hadoop-hdds/docs/content/concept/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Ozone.

![FunctionalOzone](FunctionalOzone.png)

Any distributed system can viewed from different perspectives. One way to
Any distributed system can be viewed from different perspectives. One way to
look at Ozone is to imagine it as Ozone Manager as a name space service built on
top of HDDS, a distributed block store.

Expand All @@ -67,8 +67,8 @@ Another way to visualize Ozone is to look at the functional layers; we have a
We have a data storage layer, which is basically the data nodes and they are
managed by SCM.

The replication layer, provided by Ratis is used to replicate metadata (Ozone
Manager and SCM) and also used for consistency when data is modified at the
The replication layer, provided by Ratis is used to replicate metadata (OM and SCM)
and also used for consistency when data is modified at the
data nodes.

We have a management server called Recon, that talks to all other components
Expand Down
20 changes: 10 additions & 10 deletions hadoop-hdds/docs/content/concept/OzoneManager.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@ summary: Ozone Manager is the principal name space service of Ozone. OM manages
limitations under the License.
-->

Ozone Manager or OM is the namespace manager for Ozone.
Ozone Manager (OM) is the namespace manager for Ozone.

This means that when you want to write some data, you ask Ozone
manager for a block and Ozone Manager gives you a block and remembers that
information. When you want to read the that file back, you need to find the
address of the block and Ozone manager returns it you.
Manager for a block and Ozone Manager gives you a block and remembers that
information. When you want to read that file back, you need to find the
address of the block and Ozone Manager returns it you.

Ozone manager also allows users to organize keys under a volume and bucket.
Ozone Manager also allows users to organize keys under a volume and bucket.
Volumes and buckets are part of the namespace and managed by Ozone Manager.

Each ozone volume is the root of an independent namespace under OM.
Expand Down Expand Up @@ -57,17 +57,17 @@ understood if we trace what happens during a key write and key read.

* To write a key to Ozone, a client tells Ozone manager that it would like to
write a key into a bucket that lives inside a specific volume. Once Ozone
manager determines that you are allowed to write a key to specified bucket,
Manager determines that you are allowed to write a key to the specified bucket,
OM needs to allocate a block for the client to write data.

* To allocate a block, Ozone manager sends a request to Storage Container
Manager or SCM; SCM is the manager of data nodes. SCM picks three data nodes
* To allocate a block, Ozone Manager sends a request to Storage Container
Manager (SCM); SCM is the manager of data nodes. SCM picks three data nodes
into which client can write data. SCM allocates the block and returns the
block ID to Ozone Manager.

* Ozone manager records this block information in its metadata and returns the
block and a block token (a security permission to write data to the block)
the client.
to the client.

* The client uses the block token to prove that it is allowed to write data to
the block and writes data to the data node.
Expand All @@ -82,6 +82,6 @@ Ozone manager.
* Key reads are simpler, the client requests the block list from the Ozone
Manager
* Ozone manager will return the block list and block tokens which
allows the client to read the data from nodes.
allows the client to read the data from data nodes.
* Client connects to the data node and presents the block token and reads
the data from the data node.
8 changes: 4 additions & 4 deletions hadoop-hdds/docs/content/interface/JavaApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,21 +74,21 @@ It is possible to pass an array of arguments to the createVolume by creating vol

Once you have a volume, you can create buckets inside the volume.

{{< highlight bash >}}
{{< highlight java >}}
// Let us create a bucket called videos.
assets.createBucket("videos");
OzoneBucket video = assets.getBucket("videos");
{{< /highlight >}}

At this point we have a usable volume and a bucket. Our volume is called assets and bucket is called videos.
At this point we have a usable volume and a bucket. Our volume is called _assets_ and bucket is called _videos_.

Now we can create a Key.

### Reading and Writing a Key

With a bucket object the users can now read and write keys. The following code reads a video called intro.mp4 from the local disk and stores in the video bucket that we just created.
With a bucket object the users can now read and write keys. The following code reads a video called intro.mp4 from the local disk and stores in the _video_ bucket that we just created.

{{< highlight bash >}}
{{< highlight java >}}
// read data from the file, this is a user provided function.
byte [] videoData = readFile("intro.mp4");

Expand Down
8 changes: 6 additions & 2 deletions hadoop-hdds/docs/content/interface/OzoneFS.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ summary: Hadoop Compatible file system allows any application that expects an HD
limitations under the License.
-->

The Hadoop compatible file system interface allpws storage backends like Ozone
The Hadoop compatible file system interface allows storage backends like Ozone
to be easily integrated into Hadoop eco-system. Ozone file system is an
Hadoop compatible file system.

Expand All @@ -36,7 +36,7 @@ ozone sh volume create /volume
ozone sh bucket create /volume/bucket
{{< /highlight >}}

Once this is created, please make sure that bucket exists via the listVolume or listBucket commands.
Once this is created, please make sure that bucket exists via the _volume list_ or _bucket list_ commands.

Please add the following entry to the core-site.xml.

Expand All @@ -45,6 +45,10 @@ Please add the following entry to the core-site.xml.
<name>fs.o3fs.impl</name>
<value>org.apache.hadoop.fs.ozone.OzoneFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.o3fs.impl</name>
<value>org.apache.hadoop.fs.ozone.OzFs</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>o3fs://bucket.volume</value>
Expand Down
Loading