Skip to content

Commit

Permalink
[fix](doc)Document correction, delete some old document content, add …
Browse files Browse the repository at this point in the history
…some explanatory information (apache#11504)

Document correction, delete some old document content, add some explanatory information
  • Loading branch information
hf200012 authored Aug 4, 2022
1 parent 9ed36aa commit 277e5e7
Show file tree
Hide file tree
Showing 26 changed files with 78 additions and 96 deletions.
2 changes: 1 addition & 1 deletion docs/en/docs/admin-manual/privilege-ldap/user-privilege.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Doris's new privilege management system refers to Mysql's privilege management m

In a permission system, a user is identified as a User Identity. User ID consists of two parts: username and userhost. Username is a user name, which is composed of English upper and lower case. Userhost represents the IP from which the user link comes. User_identity is presented as username@'userhost', representing the username from userhost.

Another expression of user_identity is username@['domain'], where domain is the domain name, which can be resolved into a set of IPS by DNS BNS (Baidu Name Service). The final expression is a set of username@'userhost', so we use username@'userhost'to represent it.
Another expression of user_identity is username@['domain'], where domain is the domain name, which can be resolved into a set of IPS by DNS . The final expression is a set of username@'userhost', so we use username@'userhost'to represent it.

2. Privilege

Expand Down
11 changes: 2 additions & 9 deletions docs/en/docs/advanced/broker.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,8 @@ Different types of brokers support different storage systems.
* Support simple authentication access
* Support kerberos authentication access
* Support HDFS HA mode access

2. Baidu HDFS / AFS (not supported by open source version)

* Support UGI simple authentication access

3. Baidu Object Storage BOS (not supported by open source version)

* Support AK / SK authentication access
2. Object storage
- All object stores that support the S3 protocol

## Function provided by Broker

Expand Down Expand Up @@ -200,4 +194,3 @@ Authentication information is usually provided as a Key-Value in the Property Ma
)
```
The configuration for accessing the HDFS cluster can be written to the hdfs-site.xml file. When users use the Broker process to read data from the HDFS cluster, they only need to fill in the cluster file path and authentication information.
2 changes: 1 addition & 1 deletion docs/en/docs/data-operate/export/export-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ Usually, a query plan for an Export job has only two parts `scan`- `export`, and
* If the amount of table data is too large, it is recommended to export it by partition.
* During the operation of the Export job, if FE restarts or cuts the master, the Export job will fail, requiring the user to resubmit.
* If the Export job fails, the `__doris_export_tmp_xxx` temporary directory generated in the remote storage and the generated files will not be deleted, requiring the user to delete them manually.
* If the Export job runs successfully, the `__doris_export_tmp_xxx` directory generated in the remote storage may be retained or cleared according to the file system semantics of the remote storage. For example, in Baidu Object Storage (BOS), after removing the last file in a directory through rename operation, the directory will also be deleted. If the directory is not cleared, the user can clear it manually.
* If the Export job runs successfully, the `__doris_export_tmp_xxx` directory generated in the remote storage may be retained or cleared according to the file system semantics of the remote storage. For example, in object storage (supporting the S3 protocol), after removing the last file in a directory through rename operation, the directory will also be deleted. If the directory is not cleared, the user can clear it manually.
* When the Export runs successfully or fails, the FE reboots or cuts, then some information of the jobs displayed by `SHOW EXPORT` will be lost and cannot be viewed.
* Export jobs only export data from Base tables, not Rollup Index.
* Export jobs scan data and occupy IO resources, which may affect the query latency of the system.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

# External storage data import

The following mainly introduces how to import data stored in an external system. For example (HDFS, AWS S3, BOS of Baidu Cloud, OSS of Alibaba Cloud, COS of Tencent Cloud)
The following mainly introduces how to import data stored in an external system. For example (HDFS, All object stores that support the S3 protocol)
## HDFS LOAD

### Ready to work
Expand Down Expand Up @@ -111,7 +111,7 @@ Hdfs load creates an import statement. The import method is basically the same a

Starting from version 0.14, Doris supports the direct import of data from online storage systems that support the S3 protocol through the S3 protocol.

This document mainly introduces how to import data stored in AWS S3. It also supports the import of other object storage systems that support the S3 protocol, such as Baidu Cloud’s BOS, Alibaba Cloud’s OSS and Tencent Cloud’s COS, etc.
This document mainly introduces how to import data stored in AWS S3. It also supports the import of other object storage systems that support the S3 protocol.
### Applicable scenarios

* Source data in S3 protocol accessible storage systems, such as S3, BOS.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -482,9 +482,6 @@ Users can control the status of jobs through `stop/pause/resume` commands.

You can use [STOP SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STOP-SYNC-JOB.md) ; [PAUSE SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/PAUSE-SYNC-JOB.md); And [RESUME SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/RESUME-SYNC-JOB.md); commands to view help and examples.

## Case Combat

[How to use Apache Doris Binlog Load and examples](https://doris.apache.org/blogs/PracticalCases/doris-binlog-load.html)

## Related Parameters

Expand Down
22 changes: 3 additions & 19 deletions docs/en/docs/data-operate/import/import-way/broker-load-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ under the License.

Broker load is an asynchronous import method, and the supported data sources depend on the data sources supported by the [Broker](../../../advanced/broker.md) process.

Because the data in the Doris table is ordered, Broker load uses the doris cluster resources to sort the data when importing data. To complete the migration of massive historical data for Spark load, the Doris cluster resource usage is relatively large. , this method is used when the user does not have Spark computing resources. If there are Spark computing resources, it is recommended to use [Spark load](./SPARK-LOAD.md).

Users need to create [Broker load](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md) import through MySQL protocol and import by viewing command to check the import result.

## Applicable scene
Expand Down Expand Up @@ -245,7 +247,7 @@ LOAD LABEL demo.label_20220402
)
with HDFS (
"fs.defaultFS"="hdfs://10.220.147.151:8020",
"hadoop.username"="root"
"hdfs_user"="root"
)
PROPERTIES
(
Expand Down Expand Up @@ -407,19 +409,6 @@ Currently the Profile can only be viewed after the job has been successfully exe
Please refer to the Best Practices section in the document to modify the FE configuration items `max_bytes_per_broker_scanner` and `max_broker_concurrency`
- `org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe` during import
The reason for this problem may be that when importing data from external storage (such as HDFS), because there are too many files in the directory, it takes too long to list the file directory. Here, the Broker RPC Timeout defaults to 10 seconds, and the timeout needs to be adjusted appropriately here. time.
Modify the `fe.conf` configuration file to add the following parameters:
````
broker_timeout_ms = 10000
##The default here is 10 seconds, you need to increase this parameter appropriately
````
Adding parameters here requires restarting the FE service.
- Import error: `failed to send batch` or `TabletWriter add batch with unknown id`
Modify `query_timeout` and `streaming_load_rpc_max_alive_time_sec` appropriately.
Expand All @@ -445,11 +434,6 @@ Currently the Profile can only be viewed after the job has been successfully exe
Note: If you use the orc file directly generated by some hive versions, the header in the orc file is not hive meta data, but (_col0, _col1, _col2, ...), which may cause Invalid Column Name error, then you need to use set to map
- Import error: `Login failure for xxx from keytab xxx.keytab`
The reason for this problem is that when the broker accesses the kerberos authenticated cluster during import, the authentication fails. First, make sure that `kerberos_principal` and `kerberos_keytab` are configured correctly. If there is no problem, you need to set JAVA_OPTS="" in fe.conf
Add -Djava.security.krb5.conf=/xxx/krb5.conf to the JAVA_OPTS_FOR_JDK_9="" parameter,You also need to copy hdfs-site.xml in hadoop to broker/conf
## more help
For more detailed syntax and best practices used by Broker Load, see [Broker Load](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md) command manual, you can also enter `HELP BROKER LOAD` in the MySql client command line for more help information.
12 changes: 6 additions & 6 deletions docs/en/docs/data-operate/import/import-way/load-json-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,12 +443,12 @@ code INT NULL
```bash
curl --location-trusted -u user:passwd -H "format: json" -H "read_json_by_line: true" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
```
Import result:
100 beijing 1
101 shanghai NULL
102 tianjin 3
103 chongqing 4
Import result:
100 beijing 1
101 shanghai NULL
102 tianjin 3
103 chongqing 4
5. Transform the imported data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Currently we only support routine load from the Kafka system. This section detai

1. Support unauthenticated Kafka access and Kafka clusters certified by SSL.
2. The supported message format is csv text or json format. Each message is a line in csv format, and the end of the line does not contain a ** line break.
3. Kafka 0.10.0.0 (inclusive) or above is supported by default. If you want to use Kafka versions below 0.10.0.0 (0.9.0.x, 0.8.x.y), you need to modify the configuration of be, set the value of kafka_broker_version_fallback to be the older version and set the value of kafka_api_version_request to be false, or directly set the value of property.broker.version.fallback to the old version and set the value of property.api.version.request to be false when creating routine load. The cost of the old version is that some new features of routine load may not be available, such as setting the offset of the kafka partition by time.
3. Kafka 0.10.0.0 (inclusive) or above is supported by default. If you want to use Kafka versions below 0.10.0.0 (0.9.0, 0.8.2, 0.8.1, 0.8.0), you need to modify the configuration of be, set the value of kafka_broker_version_fallback to be the older version, or directly set the value of property.broker.version.fallback to the old version when creating routine load. The cost of the old version is that some of the new features of routine load may not be available, such as setting the offset of the kafka partition by time.

### Create a routine load task

Expand Down Expand Up @@ -115,7 +115,7 @@ The detailed syntax for creating a routine load task can be connected to Doris a

`desired_concurrent_number` is used to specify the degree of concurrency expected for a routine job. That is, a job, at most how many tasks are executing at the same time. For Kafka load, the current actual concurrency is calculated as follows:

```
```
Min(partition num, desired_concurrent_number, Config.max_routine_load_task_concurrent_num)
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ under the License.

Spark load realizes the preprocessing of load data by spark, improves the performance of loading large amount of Doris data and saves the computing resources of Doris cluster. It is mainly used for the scene of initial migration and large amount of data imported into Doris.

Spark load uses the resources of the spark cluster to sort the data to be imported, and Doris be writes files directly, which can greatly reduce the resource usage of the Doris cluster, and is very good for historical mass data migration to reduce the resource usage and load of the Doris cluster. Effect.

If users do not have the resources of Spark cluster and want to complete the migration of external storage historical data conveniently and quickly, they can use [Broker load](./BROKER-LOAD.md) . Compared with Spark load, importing Broker load will consume more resources on the Doris cluster.

Spark load is an asynchronous load method. Users need to create spark type load job by MySQL protocol and view the load results by `show load`.

## Applicable scenarios
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,9 +141,9 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`

Import the filter conditions specified by the task. Stream load supports filtering of where statements specified for raw data. The filtered data will not be imported or participated in the calculation of filter ratio, but will be counted as `num_rows_unselected`.

+ partition
+ partitions

Partition information for tables to be imported will not be imported if the data to be imported does not belong to the specified Partition. These data will be included in `dpp.abnorm.ALL`.
Partitions information for tables to be imported will not be imported if the data to be imported does not belong to the specified Partition. These data will be included in `dpp.abnorm.ALL`.

+ columns

Expand Down Expand Up @@ -175,7 +175,7 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
The default two-phase bulk transaction commit is off.
> **Open method:** Configure `disable_stream_load_2pc=false` in be.conf (restart takes effect) and declare `two_phase_commit=true` in HEADER.
> **Open method:** Configure `disable_stream_load_2pc=false` in be.conf and declare `two_phase_commit=true` in HEADER.
Example:
Expand Down Expand Up @@ -216,7 +216,7 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
"status": "Success",
"msg": "transaction [18037] abort successfully."
}
```
```

### Return results

Expand Down Expand Up @@ -304,7 +304,6 @@ Users can view completed stream load tasks through `show stream load`.

By default, BE does not record Stream Load records. If you want to view records that need to be enabled on BE, the configuration parameter is: `enable_stream_load_record=true`. For details, please refer to [BE Configuration Items](https://doris.apache. org/zh-CN/docs/admin-manual/config/be-config)


## Relevant System Configuration

### FE configuration
Expand Down
14 changes: 13 additions & 1 deletion docs/en/docs/data-table/basic-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,19 @@ Doris uses MySQL protocol to communicate. Users can connect to Doris cluster thr

### Root User Logon and Password Modification

Doris has built-in root and admin users, and the password is empty by default. After starting the Doris program, you can connect to the Doris cluster through root or admin users.
Doris has built-in root and admin users, and the password is empty by default.

>Remarks:
>
>The default root and admin users provided by Doris are admin users
>
>The >root user has all the privileges of the cluster by default. Users who have both Grant_priv and Node_priv can grant this permission to other users and have node change permissions, including operations such as adding, deleting, and going offline of FE, BE, and BROKER nodes.
>
>admin user has ADMIN_PRIV and GRANT_PRIV privileges
>
>For specific instructions on permissions, please refer to [Permission Management](/docs/admin-manual/privilege-ldap/user-privilege)
After starting the Doris program, you can connect to the Doris cluster through root or admin users.
Use the following command to log in to Doris:

```sql
Expand Down
4 changes: 2 additions & 2 deletions docs/en/docs/faq/install-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ A metadata log needs to be successfully written in most Follower nodes to be con

The role of Observer is the same as the meaning of this word. It only acts as an observer to synchronize the metadata logs that have been successfully written, and provides metadata reading services. He will not be involved in the logic of the majority writing.

Typically, 1 Follower + 2 Observer or 3 Follower + N Observer can be deployed. The former is simple to operate and maintain, and there is almost no consistency agreement between followers to cause such complex error situations (most of Baidu's internal clusters use this method). The latter can ensure the high availability of metadata writing. If it is a high concurrent query scenario, Observer can be added appropriately.
Typically, 1 Follower + 2 Observer or 3 Follower + N Observer can be deployed. The former is simple to operate and maintain, and there is almost no consistency agreement between followers to cause such complex error situations (Most companies use this method). The latter can ensure the high availability of metadata writing. If it is a high concurrent query scenario, Observer can be added appropriately.

### Q4. A new disk is added to the node, why is the data not balanced to the new disk?

Expand Down Expand Up @@ -291,8 +291,8 @@ ERROR 1105 (HY000): errCode = 2, detailMessage = driver connect Error: HY000 [My
```
The solution is to use the `Connector/ODBC 8.0.28` version of ODBC Connector and select `Linux - Generic` in the operating system, this version of ODBC Driver uses openssl version 1.1. Or use a lower version of ODBC connector, e.g. [Connector/ODBC 5.3.14](https://dev.mysql.com/downloads/connector/odbc/5.3.html). For details, see the [ODBC exterior documentation](../ecosystem/external-table/odbc-of-doris.md).


You can verify the version of openssl used by MySQL ODBC Driver by

```
ldd /path/to/libmyodbc8w.so |grep libssl.so
```
Expand Down
4 changes: 2 additions & 2 deletions docs/en/docs/install/install-deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ See the section on `lower_case_table_names` variables in [Variables](../advanced

#### (Optional) FS_Broker deployment

Broker is deployed as a plug-in, independent of Doris. If you need to import data from a third-party storage system, you need to deploy the corresponding Broker. By default, it provides fs_broker to read HDFS ,Baidu cloud BOS and Amazon S3. Fs_broker is stateless and it is recommended that each FE and BE node deploy a Broker.
Broker is deployed as a plug-in, independent of Doris. If you need to import data from a third-party storage system, you need to deploy the corresponding Broker. By default, it provides fs_broker to read HDFS ,Object storage (supporting S3 protocol). Fs_broker is stateless and it is recommended that each FE and BE node deploy a Broker.

* Copy the corresponding Broker directory in the output directory of the source fs_broker to all the nodes that need to be deployed. It is recommended to maintain the same level as the BE or FE directories.

Expand Down Expand Up @@ -491,6 +491,6 @@ Broker is a stateless process that can be started or stopped at will. Of course,

```shell
vim /etc/supervisord.conf

minfds=65535 ; (min. avail startup file descriptors;default 1024)
```
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ CREATE USER user_identity [IDENTIFIED BY 'password'] [DEFAULT ROLE 'role_name']

In Doris, a user_identity uniquely identifies a user. user_identity consists of two parts, user_name and host, where username is the username. host Identifies the host address where the client connects. The host part can use % for fuzzy matching. If no host is specified, it defaults to '%', which means the user can connect to Doris from any host.

The host part can also be specified as a domain, the syntax is: 'user_name'@['domain'], even if it is surrounded by square brackets, Doris will think this is a domain and try to resolve its ip address. Currently, only Baidu's internal BNS resolution is supported.
The host part can also be specified as a domain, the syntax is: 'user_name'@['domain'], even if it is surrounded by square brackets, Doris will think this is a domain and try to resolve its ip address. .

If a role (ROLE) is specified, the newly created user will be automatically granted the permissions of the role. If not specified, the user has no permissions by default. The specified ROLE must already exist.

Expand Down
Loading

0 comments on commit 277e5e7

Please sign in to comment.