diff --git a/docs/en/docs/admin-manual/privilege-ldap/user-privilege.md b/docs/en/docs/admin-manual/privilege-ldap/user-privilege.md index 329b717ef5d6f1..b5db18346c0b28 100644 --- a/docs/en/docs/admin-manual/privilege-ldap/user-privilege.md +++ b/docs/en/docs/admin-manual/privilege-ldap/user-privilege.md @@ -34,7 +34,7 @@ Doris's new privilege management system refers to Mysql's privilege management m In a permission system, a user is identified as a User Identity. User ID consists of two parts: username and userhost. Username is a user name, which is composed of English upper and lower case. Userhost represents the IP from which the user link comes. User_identity is presented as username@'userhost', representing the username from userhost. - Another expression of user_identity is username@['domain'], where domain is the domain name, which can be resolved into a set of IPS by DNS BNS (Baidu Name Service). The final expression is a set of username@'userhost', so we use username@'userhost'to represent it. + Another expression of user_identity is username@['domain'], where domain is the domain name, which can be resolved into a set of IPS by DNS . The final expression is a set of username@'userhost', so we use username@'userhost'to represent it. 2. Privilege diff --git a/docs/en/docs/advanced/broker.md b/docs/en/docs/advanced/broker.md index 2212839a3ddc70..7b463aaf020d0d 100644 --- a/docs/en/docs/advanced/broker.md +++ b/docs/en/docs/advanced/broker.md @@ -63,14 +63,8 @@ Different types of brokers support different storage systems. * Support simple authentication access * Support kerberos authentication access * Support HDFS HA mode access - -2. Baidu HDFS / AFS (not supported by open source version) - - * Support UGI simple authentication access - -3. Baidu Object Storage BOS (not supported by open source version) - - * Support AK / SK authentication access +2. Object storage +- All object stores that support the S3 protocol ## Function provided by Broker @@ -200,4 +194,3 @@ Authentication information is usually provided as a Key-Value in the Property Ma ) ``` The configuration for accessing the HDFS cluster can be written to the hdfs-site.xml file. When users use the Broker process to read data from the HDFS cluster, they only need to fill in the cluster file path and authentication information. - diff --git a/docs/en/docs/data-operate/export/export-manual.md b/docs/en/docs/data-operate/export/export-manual.md index dbe05e7e9f86f5..221f2e1eebf887 100644 --- a/docs/en/docs/data-operate/export/export-manual.md +++ b/docs/en/docs/data-operate/export/export-manual.md @@ -191,7 +191,7 @@ Usually, a query plan for an Export job has only two parts `scan`- `export`, and * If the amount of table data is too large, it is recommended to export it by partition. * During the operation of the Export job, if FE restarts or cuts the master, the Export job will fail, requiring the user to resubmit. * If the Export job fails, the `__doris_export_tmp_xxx` temporary directory generated in the remote storage and the generated files will not be deleted, requiring the user to delete them manually. -* If the Export job runs successfully, the `__doris_export_tmp_xxx` directory generated in the remote storage may be retained or cleared according to the file system semantics of the remote storage. For example, in Baidu Object Storage (BOS), after removing the last file in a directory through rename operation, the directory will also be deleted. If the directory is not cleared, the user can clear it manually. +* If the Export job runs successfully, the `__doris_export_tmp_xxx` directory generated in the remote storage may be retained or cleared according to the file system semantics of the remote storage. For example, in object storage (supporting the S3 protocol), after removing the last file in a directory through rename operation, the directory will also be deleted. If the directory is not cleared, the user can clear it manually. * When the Export runs successfully or fails, the FE reboots or cuts, then some information of the jobs displayed by `SHOW EXPORT` will be lost and cannot be viewed. * Export jobs only export data from Base tables, not Rollup Index. * Export jobs scan data and occupy IO resources, which may affect the query latency of the system. diff --git a/docs/en/docs/data-operate/import/import-scenes/external-storage-load.md b/docs/en/docs/data-operate/import/import-scenes/external-storage-load.md index 27eb0fb3cbef8d..f1e0146941617b 100644 --- a/docs/en/docs/data-operate/import/import-scenes/external-storage-load.md +++ b/docs/en/docs/data-operate/import/import-scenes/external-storage-load.md @@ -26,7 +26,7 @@ under the License. # External storage data import -The following mainly introduces how to import data stored in an external system. For example (HDFS, AWS S3, BOS of Baidu Cloud, OSS of Alibaba Cloud, COS of Tencent Cloud) +The following mainly introduces how to import data stored in an external system. For example (HDFS, All object stores that support the S3 protocol) ## HDFS LOAD ### Ready to work @@ -111,7 +111,7 @@ Hdfs load creates an import statement. The import method is basically the same a Starting from version 0.14, Doris supports the direct import of data from online storage systems that support the S3 protocol through the S3 protocol. -This document mainly introduces how to import data stored in AWS S3. It also supports the import of other object storage systems that support the S3 protocol, such as Baidu Cloud’s BOS, Alibaba Cloud’s OSS and Tencent Cloud’s COS, etc. +This document mainly introduces how to import data stored in AWS S3. It also supports the import of other object storage systems that support the S3 protocol. ### Applicable scenarios * Source data in S3 protocol accessible storage systems, such as S3, BOS. diff --git a/docs/en/docs/data-operate/import/import-way/binlog-load-manual.md b/docs/en/docs/data-operate/import/import-way/binlog-load-manual.md index e7e90aa2eba3d2..061bfff950cfc1 100644 --- a/docs/en/docs/data-operate/import/import-way/binlog-load-manual.md +++ b/docs/en/docs/data-operate/import/import-way/binlog-load-manual.md @@ -482,9 +482,6 @@ Users can control the status of jobs through `stop/pause/resume` commands. You can use [STOP SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STOP-SYNC-JOB.md) ; [PAUSE SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/PAUSE-SYNC-JOB.md); And [RESUME SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/RESUME-SYNC-JOB.md); commands to view help and examples. -## Case Combat - -[How to use Apache Doris Binlog Load and examples](https://doris.apache.org/blogs/PracticalCases/doris-binlog-load.html) ## Related Parameters diff --git a/docs/en/docs/data-operate/import/import-way/broker-load-manual.md b/docs/en/docs/data-operate/import/import-way/broker-load-manual.md index 4601d0179ccbb1..946e70bf979948 100644 --- a/docs/en/docs/data-operate/import/import-way/broker-load-manual.md +++ b/docs/en/docs/data-operate/import/import-way/broker-load-manual.md @@ -28,6 +28,8 @@ under the License. Broker load is an asynchronous import method, and the supported data sources depend on the data sources supported by the [Broker](../../../advanced/broker.md) process. +Because the data in the Doris table is ordered, Broker load uses the doris cluster resources to sort the data when importing data. To complete the migration of massive historical data for Spark load, the Doris cluster resource usage is relatively large. , this method is used when the user does not have Spark computing resources. If there are Spark computing resources, it is recommended to use [Spark load](./SPARK-LOAD.md). + Users need to create [Broker load](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md) import through MySQL protocol and import by viewing command to check the import result. ## Applicable scene @@ -245,7 +247,7 @@ LOAD LABEL demo.label_20220402 ) with HDFS ( "fs.defaultFS"="hdfs://10.220.147.151:8020", - "hadoop.username"="root" + "hdfs_user"="root" ) PROPERTIES ( @@ -407,19 +409,6 @@ Currently the Profile can only be viewed after the job has been successfully exe Please refer to the Best Practices section in the document to modify the FE configuration items `max_bytes_per_broker_scanner` and `max_broker_concurrency` -- `org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe` during import - - The reason for this problem may be that when importing data from external storage (such as HDFS), because there are too many files in the directory, it takes too long to list the file directory. Here, the Broker RPC Timeout defaults to 10 seconds, and the timeout needs to be adjusted appropriately here. time. - - Modify the `fe.conf` configuration file to add the following parameters: - - ```` - broker_timeout_ms = 10000 - ##The default here is 10 seconds, you need to increase this parameter appropriately - ```` - - Adding parameters here requires restarting the FE service. - - Import error: `failed to send batch` or `TabletWriter add batch with unknown id` Modify `query_timeout` and `streaming_load_rpc_max_alive_time_sec` appropriately. @@ -445,11 +434,6 @@ Currently the Profile can only be viewed after the job has been successfully exe Note: If you use the orc file directly generated by some hive versions, the header in the orc file is not hive meta data, but (_col0, _col1, _col2, ...), which may cause Invalid Column Name error, then you need to use set to map -- Import error: `Login failure for xxx from keytab xxx.keytab` - - The reason for this problem is that when the broker accesses the kerberos authenticated cluster during import, the authentication fails. First, make sure that `kerberos_principal` and `kerberos_keytab` are configured correctly. If there is no problem, you need to set JAVA_OPTS="" in fe.conf - Add -Djava.security.krb5.conf=/xxx/krb5.conf to the JAVA_OPTS_FOR_JDK_9="" parameter,You also need to copy hdfs-site.xml in hadoop to broker/conf - ## more help For more detailed syntax and best practices used by Broker Load, see [Broker Load](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md) command manual, you can also enter `HELP BROKER LOAD` in the MySql client command line for more help information. diff --git a/docs/en/docs/data-operate/import/import-way/load-json-format.md b/docs/en/docs/data-operate/import/import-way/load-json-format.md index 4c12419a175c07..eafdc543e44e0f 100644 --- a/docs/en/docs/data-operate/import/import-way/load-json-format.md +++ b/docs/en/docs/data-operate/import/import-way/load-json-format.md @@ -443,12 +443,12 @@ code INT NULL ```bash curl --location-trusted -u user:passwd -H "format: json" -H "read_json_by_line: true" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load ``` -​ Import result: - - 100 beijing 1 - 101 shanghai NULL - 102 tianjin 3 - 103 chongqing 4 + Import result: + + 100 beijing 1 + 101 shanghai NULL + 102 tianjin 3 + 103 chongqing 4 5. Transform the imported data diff --git a/docs/en/docs/data-operate/import/import-way/routine-load-manual.md b/docs/en/docs/data-operate/import/import-way/routine-load-manual.md index 334f9c5f7ee5a1..3d7ce77fd5539b 100644 --- a/docs/en/docs/data-operate/import/import-way/routine-load-manual.md +++ b/docs/en/docs/data-operate/import/import-way/routine-load-manual.md @@ -83,7 +83,7 @@ Currently we only support routine load from the Kafka system. This section detai 1. Support unauthenticated Kafka access and Kafka clusters certified by SSL. 2. The supported message format is csv text or json format. Each message is a line in csv format, and the end of the line does not contain a ** line break. -3. Kafka 0.10.0.0 (inclusive) or above is supported by default. If you want to use Kafka versions below 0.10.0.0 (0.9.0.x, 0.8.x.y), you need to modify the configuration of be, set the value of kafka_broker_version_fallback to be the older version and set the value of kafka_api_version_request to be false, or directly set the value of property.broker.version.fallback to the old version and set the value of property.api.version.request to be false when creating routine load. The cost of the old version is that some new features of routine load may not be available, such as setting the offset of the kafka partition by time. +3. Kafka 0.10.0.0 (inclusive) or above is supported by default. If you want to use Kafka versions below 0.10.0.0 (0.9.0, 0.8.2, 0.8.1, 0.8.0), you need to modify the configuration of be, set the value of kafka_broker_version_fallback to be the older version, or directly set the value of property.broker.version.fallback to the old version when creating routine load. The cost of the old version is that some of the new features of routine load may not be available, such as setting the offset of the kafka partition by time. ### Create a routine load task @@ -115,7 +115,7 @@ The detailed syntax for creating a routine load task can be connected to Doris a `desired_concurrent_number` is used to specify the degree of concurrency expected for a routine job. That is, a job, at most how many tasks are executing at the same time. For Kafka load, the current actual concurrency is calculated as follows: - ``` + ``` Min(partition num, desired_concurrent_number, Config.max_routine_load_task_concurrent_num) ``` diff --git a/docs/en/docs/data-operate/import/import-way/spark-load-manual.md b/docs/en/docs/data-operate/import/import-way/spark-load-manual.md index 76625addbde590..d801d3af546258 100644 --- a/docs/en/docs/data-operate/import/import-way/spark-load-manual.md +++ b/docs/en/docs/data-operate/import/import-way/spark-load-manual.md @@ -28,6 +28,10 @@ under the License. Spark load realizes the preprocessing of load data by spark, improves the performance of loading large amount of Doris data and saves the computing resources of Doris cluster. It is mainly used for the scene of initial migration and large amount of data imported into Doris. +Spark load uses the resources of the spark cluster to sort the data to be imported, and Doris be writes files directly, which can greatly reduce the resource usage of the Doris cluster, and is very good for historical mass data migration to reduce the resource usage and load of the Doris cluster. Effect. + +If users do not have the resources of Spark cluster and want to complete the migration of external storage historical data conveniently and quickly, they can use [Broker load](./BROKER-LOAD.md) . Compared with Spark load, importing Broker load will consume more resources on the Doris cluster. + Spark load is an asynchronous load method. Users need to create spark type load job by MySQL protocol and view the load results by `show load`. ## Applicable scenarios diff --git a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md index 54ba913b110e26..7e92f1b1f9e2fb 100644 --- a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md +++ b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md @@ -141,9 +141,9 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL` Import the filter conditions specified by the task. Stream load supports filtering of where statements specified for raw data. The filtered data will not be imported or participated in the calculation of filter ratio, but will be counted as `num_rows_unselected`. -+ partition ++ partitions - Partition information for tables to be imported will not be imported if the data to be imported does not belong to the specified Partition. These data will be included in `dpp.abnorm.ALL`. + Partitions information for tables to be imported will not be imported if the data to be imported does not belong to the specified Partition. These data will be included in `dpp.abnorm.ALL`. + columns @@ -175,7 +175,7 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL` The default two-phase bulk transaction commit is off. - > **Open method:** Configure `disable_stream_load_2pc=false` in be.conf (restart takes effect) and declare `two_phase_commit=true` in HEADER. + > **Open method:** Configure `disable_stream_load_2pc=false` in be.conf and declare `two_phase_commit=true` in HEADER. Example: @@ -216,7 +216,7 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL` "status": "Success", "msg": "transaction [18037] abort successfully." } - ``` + ``` ### Return results @@ -304,7 +304,6 @@ Users can view completed stream load tasks through `show stream load`. By default, BE does not record Stream Load records. If you want to view records that need to be enabled on BE, the configuration parameter is: `enable_stream_load_record=true`. For details, please refer to [BE Configuration Items](https://doris.apache. org/zh-CN/docs/admin-manual/config/be-config) - ## Relevant System Configuration ### FE configuration diff --git a/docs/en/docs/data-table/basic-usage.md b/docs/en/docs/data-table/basic-usage.md index 1212b6213a199c..e35ff93730c6ec 100644 --- a/docs/en/docs/data-table/basic-usage.md +++ b/docs/en/docs/data-table/basic-usage.md @@ -33,7 +33,19 @@ Doris uses MySQL protocol to communicate. Users can connect to Doris cluster thr ### Root User Logon and Password Modification -Doris has built-in root and admin users, and the password is empty by default. After starting the Doris program, you can connect to the Doris cluster through root or admin users. +Doris has built-in root and admin users, and the password is empty by default. + +>Remarks: +> +>The default root and admin users provided by Doris are admin users +> +>The >root user has all the privileges of the cluster by default. Users who have both Grant_priv and Node_priv can grant this permission to other users and have node change permissions, including operations such as adding, deleting, and going offline of FE, BE, and BROKER nodes. +> +>admin user has ADMIN_PRIV and GRANT_PRIV privileges +> +>For specific instructions on permissions, please refer to [Permission Management](/docs/admin-manual/privilege-ldap/user-privilege) + +After starting the Doris program, you can connect to the Doris cluster through root or admin users. Use the following command to log in to Doris: ```sql diff --git a/docs/en/docs/faq/install-faq.md b/docs/en/docs/faq/install-faq.md index 311b0cf8d11b8d..dd3fb581ef6eaf 100644 --- a/docs/en/docs/faq/install-faq.md +++ b/docs/en/docs/faq/install-faq.md @@ -57,7 +57,7 @@ A metadata log needs to be successfully written in most Follower nodes to be con The role of Observer is the same as the meaning of this word. It only acts as an observer to synchronize the metadata logs that have been successfully written, and provides metadata reading services. He will not be involved in the logic of the majority writing. -Typically, 1 Follower + 2 Observer or 3 Follower + N Observer can be deployed. The former is simple to operate and maintain, and there is almost no consistency agreement between followers to cause such complex error situations (most of Baidu's internal clusters use this method). The latter can ensure the high availability of metadata writing. If it is a high concurrent query scenario, Observer can be added appropriately. +Typically, 1 Follower + 2 Observer or 3 Follower + N Observer can be deployed. The former is simple to operate and maintain, and there is almost no consistency agreement between followers to cause such complex error situations (Most companies use this method). The latter can ensure the high availability of metadata writing. If it is a high concurrent query scenario, Observer can be added appropriately. ### Q4. A new disk is added to the node, why is the data not balanced to the new disk? @@ -291,8 +291,8 @@ ERROR 1105 (HY000): errCode = 2, detailMessage = driver connect Error: HY000 [My ``` The solution is to use the `Connector/ODBC 8.0.28` version of ODBC Connector and select `Linux - Generic` in the operating system, this version of ODBC Driver uses openssl version 1.1. Or use a lower version of ODBC connector, e.g. [Connector/ODBC 5.3.14](https://dev.mysql.com/downloads/connector/odbc/5.3.html). For details, see the [ODBC exterior documentation](../ecosystem/external-table/odbc-of-doris.md). - You can verify the version of openssl used by MySQL ODBC Driver by + ``` ldd /path/to/libmyodbc8w.so |grep libssl.so ``` diff --git a/docs/en/docs/install/install-deploy.md b/docs/en/docs/install/install-deploy.md index e0d1a9517b4243..7eec424d1127b4 100644 --- a/docs/en/docs/install/install-deploy.md +++ b/docs/en/docs/install/install-deploy.md @@ -246,7 +246,7 @@ See the section on `lower_case_table_names` variables in [Variables](../advanced #### (Optional) FS_Broker deployment -Broker is deployed as a plug-in, independent of Doris. If you need to import data from a third-party storage system, you need to deploy the corresponding Broker. By default, it provides fs_broker to read HDFS ,Baidu cloud BOS and Amazon S3. Fs_broker is stateless and it is recommended that each FE and BE node deploy a Broker. +Broker is deployed as a plug-in, independent of Doris. If you need to import data from a third-party storage system, you need to deploy the corresponding Broker. By default, it provides fs_broker to read HDFS ,Object storage (supporting S3 protocol). Fs_broker is stateless and it is recommended that each FE and BE node deploy a Broker. * Copy the corresponding Broker directory in the output directory of the source fs_broker to all the nodes that need to be deployed. It is recommended to maintain the same level as the BE or FE directories. @@ -491,6 +491,6 @@ Broker is a stateless process that can be started or stopped at will. Of course, ```shell vim /etc/supervisord.conf - + minfds=65535 ; (min. avail startup file descriptors;default 1024) ``` \ No newline at end of file diff --git a/docs/en/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md b/docs/en/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md index acac27cf5583ac..62db8a2f6bee10 100644 --- a/docs/en/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md +++ b/docs/en/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md @@ -43,7 +43,7 @@ CREATE USER user_identity [IDENTIFIED BY 'password'] [DEFAULT ROLE 'role_name'] In Doris, a user_identity uniquely identifies a user. user_identity consists of two parts, user_name and host, where username is the username. host Identifies the host address where the client connects. The host part can use % for fuzzy matching. If no host is specified, it defaults to '%', which means the user can connect to Doris from any host. -The host part can also be specified as a domain, the syntax is: 'user_name'@['domain'], even if it is surrounded by square brackets, Doris will think this is a domain and try to resolve its ip address. Currently, only Baidu's internal BNS resolution is supported. +The host part can also be specified as a domain, the syntax is: 'user_name'@['domain'], even if it is surrounded by square brackets, Doris will think this is a domain and try to resolve its ip address. . If a role (ROLE) is specified, the newly created user will be automatically granted the permissions of the role. If not specified, the user has no permissions by default. The specified ROLE must already exist. diff --git a/docs/zh-CN/docs/advanced/broker.md b/docs/zh-CN/docs/advanced/broker.md index b63432c5f4d90c..b7bd374efbcd0b 100644 --- a/docs/zh-CN/docs/advanced/broker.md +++ b/docs/zh-CN/docs/advanced/broker.md @@ -60,12 +60,8 @@ Broker 在 Doris 系统架构中的位置如下: - 支持简单认证访问 - 支持通过 kerberos 认证访问 - 支持 HDFS HA 模式访问 -2. 百度 HDFS/AFS(开源版本不支持) - - 支持通过 ugi 简单认证访问 -3. 百度对象存储 BOS(开源版本不支持) - - 支持通过 AK/SK 认证访问 - -## 需要 Broker 的操作 +2. 对象存储 + - 所有支持S3协议的对象存储 1. [Broker Load](../data-operate/import/import-way/broker-load-manual.md) 2. [数据导出(Export)](../data-operate/export/export-manual.md) diff --git a/docs/zh-CN/docs/data-operate/export/export-manual.md b/docs/zh-CN/docs/data-operate/export/export-manual.md index 94d7de39b8b414..267061abd8e566 100644 --- a/docs/zh-CN/docs/data-operate/export/export-manual.md +++ b/docs/zh-CN/docs/data-operate/export/export-manual.md @@ -184,7 +184,7 @@ FinishTime: 2019-06-25 17:08:34 * 如果表数据量过大,建议按照分区导出。 * 在 Export 作业运行过程中,如果 FE 发生重启或切主,则 Export 作业会失败,需要用户重新提交。 * 如果 Export 作业运行失败,在远端存储中产生的 `__doris_export_tmp_xxx` 临时目录,以及已经生成的文件不会被删除,需要用户手动删除。 -* 如果 Export 作业运行成功,在远端存储中产生的 `__doris_export_tmp_xxx` 目录,根据远端存储的文件系统语义,可能会保留,也可能会被清除。比如在百度对象存储(BOS)中,通过 rename 操作将一个目录中的最后一个文件移走后,该目录也会被删除。如果该目录没有被清除,用户可以手动清除。 +* 如果 Export 作业运行成功,在远端存储中产生的 `__doris_export_tmp_xxx` 目录,根据远端存储的文件系统语义,可能会保留,也可能会被清除。比如对象存储(支持S3协议)中,通过 rename 操作将一个目录中的最后一个文件移走后,该目录也会被删除。如果该目录没有被清除,用户可以手动清除。 * 当 Export 运行完成后(成功或失败),FE 发生重启或切主,则 [SHOW EXPORT](../../sql-manual/sql-reference/Show-Statements/SHOW-EXPORT.md) 展示的作业的部分信息会丢失,无法查看。 * Export 作业只会导出 Base 表的数据,不会导出 Rollup Index 的数据。 * Export 作业会扫描数据,占用 IO 资源,可能会影响系统的查询延迟。 diff --git a/docs/zh-CN/docs/data-operate/import/import-scenes/external-storage-load.md b/docs/zh-CN/docs/data-operate/import/import-scenes/external-storage-load.md index 9887459ff9ca7c..8496f1d56e8c87 100644 --- a/docs/zh-CN/docs/data-operate/import/import-scenes/external-storage-load.md +++ b/docs/zh-CN/docs/data-operate/import/import-scenes/external-storage-load.md @@ -26,7 +26,7 @@ under the License. # 外部存储数据导入 -本文档主要介绍如何导入外部系统中存储的数据。例如(HDFS,AWS S3,百度云的BOS,阿里云的OSS和腾讯云的COS) +本文档主要介绍如何导入外部系统中存储的数据。例如(HDFS,所有支持S3协议的对象存储) ## HDFS LOAD @@ -116,7 +116,7 @@ Hdfs load 创建导入语句,导入方式和[Broker Load](../../../data-operat 从0.14 版本开始,Doris 支持通过S3协议直接从支持S3协议的在线存储系统导入数据。 -下面主要介绍如何导入 AWS S3 中存储的数据。也支持导入其他支持S3协议的对象存储系统导入,如果百度云的BOS,阿里云的OSS和腾讯云的COS等、 +下面主要介绍如何导入 AWS S3 中存储的数据。也支持导入其他支持S3协议的对象存储系统导入。 ### 适用场景 diff --git a/docs/zh-CN/docs/data-operate/import/import-way/binlog-load-manual.md b/docs/zh-CN/docs/data-operate/import/import-way/binlog-load-manual.md index d33ce037dc43c5..93886113439e1a 100644 --- a/docs/zh-CN/docs/data-operate/import/import-way/binlog-load-manual.md +++ b/docs/zh-CN/docs/data-operate/import/import-way/binlog-load-manual.md @@ -464,9 +464,6 @@ binlog_desc 用户可以通过 STOP/PAUSE/RESUME 三个命令来控制作业的停止,暂停和恢复。可以通过 [STOP SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STOP-SYNC-JOB.md) ; [PAUSE SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/PAUSE-SYNC-JOB.md); 以及 [RESUME SYNC JOB](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/RESUME-SYNC-JOB.md); -## 案例实战 - -[Apache Doris Binlog Load使用方法及示例](https://doris.apache.org/zh-CN/blogs/PracticalCases/doris-binlog-load.html) ## 相关参数 diff --git a/docs/zh-CN/docs/data-operate/import/import-way/broker-load-manual.md b/docs/zh-CN/docs/data-operate/import/import-way/broker-load-manual.md index 420b4d3aa98f32..3aade30ce1183c 100644 --- a/docs/zh-CN/docs/data-operate/import/import-way/broker-load-manual.md +++ b/docs/zh-CN/docs/data-operate/import/import-way/broker-load-manual.md @@ -28,6 +28,8 @@ under the License. Broker load 是一个异步的导入方式,支持的数据源取决于 [Broker](../../../advanced/broker.md) 进程支持的数据源。 +因为 Doris 表里的数据是有序的,所以 Broker load 在导入数据的时是要利用doris 集群资源对数据进行排序,想对于 Spark load 来完成海量历史数据迁移,对 Doris 的集群资源占用要比较大,这种方式是在用户没有 Spark 这种计算资源的情况下使用,如果有 Spark 计算资源建议使用 [Spark load](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/SPARK-LOAD.md)。 + 用户需要通过 MySQL 协议 创建 [Broker load](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md) 导入,并通过查看导入命令检查导入结果。 ## 适用场景 @@ -244,7 +246,7 @@ LOAD LABEL demo.label_20220402 ) with HDFS ( "fs.defaultFS"="hdfs://10.220.147.151:8020", - "hadoop.username"="root" + "hdfs_user"="root" ) PROPERTIES ( @@ -407,19 +409,6 @@ FE 的配置参数 `async_loading_load_task_pool_size` 用于限制同时运行 请参照文档中最佳实践部分,修改 FE 配置项 `max_bytes_per_broker_scanner` 和 `max_broker_concurrency` -- 导入过程中出现 `org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe` - - 出现这个问题的原因可能是到从外部存储(例如HDFS)导入数据的时候,因为目录下文件太多,列出文件目录的时间太长,这里Broker RPC Timeout 默认是10秒,这里需要适当调整超时时间。 - - 修改 `fe.conf` 配置文件,添加下面的参数: - - ``` - broker_timeout_ms = 10000 - ##这里默认是10秒,需要适当加大这个参数 - ``` - - 这里添加参数,需要重启 FE 服务。 - - 导入报错:`failed to send batch` 或 `TabletWriter add batch with unknown id` 适当修改 `query_timeout` 和 `streaming_load_rpc_max_alive_time_sec`。 @@ -445,12 +434,6 @@ FE 的配置参数 `async_loading_load_task_pool_size` 用于限制同时运行 注:如果使用某些 hive 版本直接生成的 orc 文件,orc 文件中的表头并非 hive meta 数据,而是(_col0, _col1, _col2, ...), 可能导致 Invalid Column Name 错误,那么则需要使用 set 进行映射 -- 导入出错:`Login failure for xxx from keytab xxx.keytab` - - 出现这个问题的原因是导入的时候broker访问kerberos认证的集群时候,认证没有通过,首先确定`kerberos_principal`和`kerberos_keytab`配置是否正确,如果没问题,则需要在fe.conf中JAVA_OPTS="" - JAVA_OPTS_FOR_JDK_9="" 参数里面添加-Djava.security.krb5.conf=/xxx/krb5.conf,还需要将hadoop中的hdfs-site.xml复制到broker/conf下 - - ## 更多帮助 关于 Broker Load 使用的更多详细语法及最佳实践,请参阅 [Broker Load](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md) 命令手册,你也可以在 MySql 客户端命令行下输入 `HELP BROKER LOAD` 获取更多帮助信息。 diff --git a/docs/zh-CN/docs/data-operate/import/import-way/routine-load-manual.md b/docs/zh-CN/docs/data-operate/import/import-way/routine-load-manual.md index 9e950f5bde514c..91d2be3579892f 100644 --- a/docs/zh-CN/docs/data-operate/import/import-way/routine-load-manual.md +++ b/docs/zh-CN/docs/data-operate/import/import-way/routine-load-manual.md @@ -76,7 +76,7 @@ under the License. 1. 支持无认证的 Kafka 访问,以及通过 SSL 方式认证的 Kafka 集群。 2. 支持的消息格式为 csv, json 文本格式。csv 每一个 message 为一行,且行尾**不包含**换行符。 -3. 默认支持 Kafka 0.10.0.0(含) 以上版本。如果要使用 Kafka 0.10.0.0 以下版本 (0.9.0.x, 0.8.x.y),需要修改 be 的配置,将 kafka_broker_version_fallback 的值设置为要兼容的旧版本并将 kafka_api_version_request 的值设置为 false,或者在创建routine load的时候直接设置 property.broker.version.fallback 的值为要兼容的旧版本 并将 property.api.version.request 的值设置为 false,使用旧版本的代价是routine load 的部分新特性可能无法使用,如根据时间设置 kafka 分区的 offset。 +3. 默认支持 Kafka 0.10.0.0(含) 以上版本。如果要使用 Kafka 0.10.0.0 以下版本 (0.9.0, 0.8.2, 0.8.1, 0.8.0),需要修改 be 的配置,将 kafka_broker_version_fallback 的值设置为要兼容的旧版本,或者在创建routine load的时候直接设置 property.broker.version.fallback的值为要兼容的旧版本,使用旧版本的代价是routine load 的部分新特性可能无法使用,如根据时间设置 kafka 分区的 offset。 ### 创建任务 diff --git a/docs/zh-CN/docs/data-operate/import/import-way/spark-load-manual.md b/docs/zh-CN/docs/data-operate/import/import-way/spark-load-manual.md index 1afc0970f8f1be..b6269ea176d8bc 100644 --- a/docs/zh-CN/docs/data-operate/import/import-way/spark-load-manual.md +++ b/docs/zh-CN/docs/data-operate/import/import-way/spark-load-manual.md @@ -28,6 +28,10 @@ under the License. Spark load 通过外部的 Spark 资源实现对导入数据的预处理,提高 Doris 大数据量的导入性能并且节省 Doris 集群的计算资源。主要用于初次迁移,大数据量导入 Doris 的场景。 + Spark load 是利用了 spark 集群的资源对要导入的数据的进行了排序,Doris be 直接写文件,这样能大大降低 Doris 集群的资源使用,对于历史海量数据迁移降低 Doris 集群资源使用及负载有很好的效果。 + +如果用户在没有 Spark 集群这种资源的情况下,又想方便、快速的完成外部存储历史数据的迁移,可以使用 [Broker load](./BROKER-LOAD.md) 。相对 Spark load 导入 Broker load 对 Doris 集群的资源占用会更高。 + Spark load 是一种异步导入方式,用户需要通过 MySQL 协议创建 Spark 类型导入任务,并通过 `SHOW LOAD` 查看导入结果。 ## 适用场景 diff --git a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md index 714cc0a98901e6..25f8607ea5dcf4 100644 --- a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md +++ b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md @@ -142,7 +142,7 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的 导入任务指定的过滤条件。Stream load 支持对原始数据指定 where 语句进行过滤。被过滤的数据将不会被导入,也不会参与 filter ratio 的计算,但会被计入`num_rows_unselected`。 -- partition +- Partitions 待导入表的 Partition 信息,如果待导入数据不属于指定的 Partition 则不会被导入。这些数据将计入 `dpp.abnorm.ALL` @@ -186,7 +186,7 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的 默认的两阶段批量事务提交为关闭。 - > **开启方式:** 在be.conf中配置`disable_stream_load_2pc=false`(重启生效) 并且 在 HEADER 中声明 `two_phase_commit=true` 。 + > **开启方式:** 在be.conf中配置`disable_stream_load_2pc=false` 并且 在 HEADER 中声明 `two_phase_commit=true` 。 示例: @@ -227,7 +227,7 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的 "status": "Success", "msg": "transaction [18037] abort successfully." } - ``` + ``` ### 返回结果 diff --git a/docs/zh-CN/docs/data-table/basic-usage.md b/docs/zh-CN/docs/data-table/basic-usage.md index fe9ba6cd824302..36a1632a3f396f 100644 --- a/docs/zh-CN/docs/data-table/basic-usage.md +++ b/docs/zh-CN/docs/data-table/basic-usage.md @@ -32,7 +32,19 @@ Doris 采用 MySQL 协议进行通信,用户可通过 MySQL client 或者 MySQ ### Root用户登录与密码修改 -Doris 内置 root 和 admin 用户,密码默认都为空。启动完 Doris 程序之后,可以通过 root 或 admin 用户连接到 Doris 集群。 使用下面命令即可登录 Doris,登录后进入到Doris对应的Mysql命令行操作界面: +Doris 内置 root 和 admin 用户,密码默认都为空。 + +>备注: +> +>Doris 提供的默认 root 和 admin 用户是管理员用户 +> +>root 用户默认拥有集群所有权限。同时拥有 Grant_priv 和 Node_priv 的用户,可以将该权限赋予其他用户,拥有节点变更权限,包括 FE、BE、BROKER 节点的添加、删除、下线等操作。 +> +>admin用户拥有 ADMIN_PRIV 和 GRANT_PRIV 权限 +> +>关于权限这块的具体说明可以参照[权限管理](/docs/admin-manual/privilege-ldap/user-privilege) + +启动完 Doris 程序之后,可以通过 root 或 admin 用户连接到 Doris 集群。 使用下面命令即可登录 Doris,登录后进入到Doris对应的Mysql命令行操作界面: ```bash [root@doris ~]# mysql -h FE_HOST -P9030 -uroot diff --git a/docs/zh-CN/docs/faq/install-faq.md b/docs/zh-CN/docs/faq/install-faq.md index f3adee6030e13a..bdace4f478b236 100644 --- a/docs/zh-CN/docs/faq/install-faq.md +++ b/docs/zh-CN/docs/faq/install-faq.md @@ -57,7 +57,7 @@ priorty_network 的值是 CIDR 格式表示的。分为两部分,第一部分 Observer 角色和这个单词的含义一样,仅仅作为观察者来同步已经成功写入的元数据日志,并且提供元数据读服务。他不会参与多数写的逻辑。 -通常情况下,可以部署 1 Follower + 2 Observer 或者 3 Follower + N Observer。前者运维简单,几乎不会出现 Follower 之间的一致性协议导致这种复杂错误情况(百度内部集群大多使用这种方式)。后者可以保证元数据写的高可用,如果是高并发查询场景,可以适当增加 Observer。 +通常情况下,可以部署 1 Follower + 2 Observer 或者 3 Follower + N Observer。前者运维简单,几乎不会出现 Follower 之间的一致性协议导致这种复杂错误情况(企业大多使用这种方式)。后者可以保证元数据写的高可用,如果是高并发查询场景,可以适当增加 Observer。 ### Q4. 节点新增加了新的磁盘,为什么数据没有均衡到新的磁盘上? @@ -288,7 +288,8 @@ ERROR 1105 (HY000): errCode = 2, detailMessage = driver connect Error: HY000 [My 解决方式是使用`Connector/ODBC 8.0.28` 版本的 ODBC Connector, 并且在操作系统处选择 `Linux - Generic`, 这个版本的ODBC Driver 使用 openssl 1.1 版本。或者使用低版本的ODBC Connector,比如[Connector/ODBC 5.3.14](https://dev.mysql.com/downloads/connector/odbc/5.3.html)。具体使用方式见 [ODBC外表使用文档](../ecosystem/external-table/odbc-of-doris.md)。 可以通过如下方式验证 MySQL ODBC Driver 使用的openssl 版本 + ``` ldd /path/to/libmyodbc8w.so |grep libssl.so ``` -如果输出包含 `libssl.so.10` 则使用过程中可能出现问题, 如果包含`libssl.so.1.1` 则与doris 1.0 兼容。 +如果输出包含 `libssl.so.10` 则使用过程中可能出现问题, 如果包含`libssl.so.1.1` 则与doris 1.0 兼容 diff --git a/docs/zh-CN/docs/install/install-deploy.md b/docs/zh-CN/docs/install/install-deploy.md index 93cb298687a0e7..6c9f7d24d2e44a 100644 --- a/docs/zh-CN/docs/install/install-deploy.md +++ b/docs/zh-CN/docs/install/install-deploy.md @@ -247,7 +247,7 @@ doris默认为表名大小写敏感,如有表名大小写不敏感的需求需 #### (可选)FS_Broker 部署 -Broker 以插件的形式,独立于 Doris 部署。如果需要从第三方存储系统导入数据,需要部署相应的 Broker,默认提供了读取 HDFS 、百度云 BOS 及 Amazon S3 的 fs_broker。fs_broker 是无状态的,建议每一个 FE 和 BE 节点都部署一个 Broker。 +Broker 以插件的形式,独立于 Doris 部署。如果需要从第三方存储系统导入数据,需要部署相应的 Broker,默认提供了读取 HDFS 、对象存储的 fs_broker。fs_broker 是无状态的,建议每一个 FE 和 BE 节点都部署一个 Broker。 * 拷贝源码 fs_broker 的 output 目录下的相应 Broker 目录到需要部署的所有节点上。建议和 BE 或者 FE 目录保持同级。 @@ -364,6 +364,6 @@ Broker 以插件的形式,独立于 Doris 部署。如果需要从第三方存 ```shell vim /etc/supervisord.conf - + minfds=65535 ; (min. avail startup file descriptors;default 1024) ``` diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md b/docs/zh-CN/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md index 983b566c6f4de0..fb771aa260fbcf 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Account-Management-Statements/CREATE-USER.md @@ -43,7 +43,7 @@ CREATE USER user_identity [IDENTIFIED BY 'password'] [DEFAULT ROLE 'role_name'] 在 Doris 中,一个 user_identity 唯一标识一个用户。user_identity 由两部分组成,user_name 和 host,其中 username 为用户名。host 标识用户端连接所在的主机地址。host 部分可以使用 % 进行模糊匹配。如果不指定 host,默认为 '%',即表示该用户可以从任意 host 连接到 Doris。 -host 部分也可指定为 domain,语法为:'user_name'@['domain'],即使用中括号包围,则 Doris 会认为这个是一个 domain,并尝试解析其 ip 地址。目前仅支持百度内部的 BNS 解析。 +host 部分也可指定为 domain,语法为:'user_name'@['domain'],即使用中括号包围,则 Doris 会认为这个是一个 domain,并尝试解析其 ip 地址。 如果指定了角色(ROLE),则会自动将该角色所拥有的权限赋予新创建的这个用户。如果不指定,则该用户默认没有任何权限。指定的 ROLE 必须已经存在。