Skip to content

Commit

Permalink
[Improve](metadata)Start the script to set metadata_failure_recovery (a…
Browse files Browse the repository at this point in the history
  • Loading branch information
CalvinKirs authored Sep 14, 2023
1 parent 1a4929b commit 64337a8
Show file tree
Hide file tree
Showing 11 changed files with 50 additions and 58 deletions.
12 changes: 9 additions & 3 deletions bin/start_fe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ OPTS="$(getopt \
-l 'helper:' \
-l 'image:' \
-l 'version' \
-l 'metadata_failure_recovery' \
-- "$@")"

eval set -- "${OPTS}"
Expand All @@ -41,6 +42,7 @@ HELPER=''
IMAGE_PATH=''
IMAGE_TOOL=''
OPT_VERSION=''
METADATA_FAILURE_RECOVERY=''
while true; do
case "$1" in
--daemon)
Expand All @@ -51,6 +53,10 @@ while true; do
OPT_VERSION="--version"
shift
;;
--metadata_failure_recovery)
METADATA_FAILURE_RECOVERY="-r"
shift
;;
--helper)
HELPER="$2"
shift 2
Expand Down Expand Up @@ -215,15 +221,15 @@ fi

if [[ "${IMAGE_TOOL}" -eq 1 ]]; then
if [[ -n "${IMAGE_PATH}" ]]; then
${LIMIT:+${LIMIT}} "${JAVA}" ${final_java_opt:+${final_java_opt}} org.apache.doris.DorisFE -i "${IMAGE_PATH}"
${LIMIT:+${LIMIT}} "${JAVA}" ${final_java_opt:+${final_java_opt}} org.apache.doris.DorisFE -i "${IMAGE_PATH} ${METADATA_FAILURE_RECOVERY}"
else
echo "Internal Error. USE IMAGE_TOOL like : ./start_fe.sh --image image_path"
fi
elif [[ "${RUN_DAEMON}" -eq 1 ]]; then
nohup ${LIMIT:+${LIMIT}} "${JAVA}" ${final_java_opt:+${final_java_opt}} -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError="kill -9 %p" org.apache.doris.DorisFE ${HELPER:+${HELPER}} "$@" >>"${LOG_DIR}/fe.out" 2>&1 </dev/null &
nohup ${LIMIT:+${LIMIT}} "${JAVA}" ${final_java_opt:+${final_java_opt}} -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError="kill -9 %p" org.apache.doris.DorisFE ${HELPER:+${HELPER}} "${METADATA_FAILURE_RECOVERY}" "$@" >>"${LOG_DIR}/fe.out" 2>&1 </dev/null &
else
export DORIS_LOG_TO_STDERR=1
${LIMIT:+${LIMIT}} "${JAVA}" ${final_java_opt:+${final_java_opt}} -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError="kill -9 %p" org.apache.doris.DorisFE ${HELPER:+${HELPER}} ${OPT_VERSION:+${OPT_VERSION}} "$@" </dev/null
${LIMIT:+${LIMIT}} "${JAVA}" ${final_java_opt:+${final_java_opt}} -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError="kill -9 %p" org.apache.doris.DorisFE ${HELPER:+${HELPER}} ${OPT_VERSION:+${OPT_VERSION}} "${METADATA_FAILURE_RECOVERY}" "$@" </dev/null
fi

echo $! >"${pidfile}"
18 changes: 6 additions & 12 deletions docs/en/docs/admin-manual/cluster-management/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,38 +156,32 @@ admin set frontend config("disable_tablet_scheduler" = "true");
echo "cluster_id=123456" >> ${DORIS_NEW_HOME}/conf/fe.conf
```

4. Add metadata failover configuration in fe.conf

```shell
echo "metadata_failure_recovery=true" >> ${DORIS_NEW_HOME}/conf/fe.conf
```

5. Copy the metadata directory doris-meta of the online environment Master FE to the test environment
4. Copy the metadata directory doris-meta of the online environment Master FE to the test environment

```shell
cp ${DORIS_OLD_HOME}/fe/doris-meta/* ${DORIS_NEW_HOME}/fe/doris-meta
```

6. Change the cluster_id in the VERSION file copied to the test environment to 123456 (that is, the same as in step 3)
5. Change the cluster_id in the VERSION file copied to the test environment to 123456 (that is, the same as in step 3)

```shell
vi ${DORIS_NEW_HOME}/fe/doris-meta/image/VERSION
clusterId=123456
```

7. In the test environment, run the startup FE
6. In the test environment, run the startup FE

```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon --metadata_failure_recovery
```

8. Observe whether the startup is successful through the FE log fe.log
7. Observe whether the startup is successful through the FE log fe.log

```shell
tail -f ${DORIS_NEW_HOME}/log/fe.log
```

9. If the startup is successful, it means that there is no problem with the compatibility, stop the FE process of the test environment, and prepare for the upgrade
8. If the startup is successful, it means that there is no problem with the compatibility, stop the FE process of the test environment, and prepare for the upgrade

```
sh ${DORIS_NEW_HOME}/bin/stop_fe.sh
Expand Down
14 changes: 6 additions & 8 deletions docs/en/docs/admin-manual/maint-monitor/metadata-operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,12 +190,10 @@ FE may fail to start bdbje and synchronize between FEs for some reasons. Phenome
3. The following operations are performed on the FE nodes selected in step 2.

1. If the node is an OBSERVER, first change the `role=OBSERVER` in the `meta_dir/image/ROLE` file to `role=FOLLOWER`. (Recovery from the OBSERVER node will be more cumbersome, first follow the steps here, followed by a separate description)
2. Add configuration in fe.conf: `metadata_failure_recovery=true`.
3. Run `sh bin/start_fe.sh` to start the FE
4. If normal, the FE will start in the role of MASTER, similar to the description in the previous section `Start a single node FE`. You should see the words `transfer from XXXX to MASTER` in fe.log.
5. After the start-up is completed, connect to the FE first, and execute some query imports to check whether normal access is possible. If the operation is not normal, it may be wrong. It is recommended to read the above steps carefully and try again with the metadata previously backed up. If not, the problem may be more serious.
6. If successful, through the `show frontends;` command, you should see all the FEs you added before, and the current FE is master.
7. Delete the `metadata_failure_recovery=true` configuration item in fe.conf, or set it to `false`, and restart the FE (**Important**).
2. Run `sh bin/start_fe.sh --metadata_failure_recovery` to start the FE
3. If normal, the FE will start in the role of MASTER, similar to the description in the previous section `Start a single node FE`. You should see the words `transfer from XXXX to MASTER` in fe.log.
4. After the start-up is completed, connect to the FE first, and execute some query imports to check whether normal access is possible. If the operation is not normal, it may be wrong. It is recommended to read the above steps carefully and try again with the metadata previously backed up. If not, the problem may be more serious.
5. If successful, through the `show frontends;` command, you should see all the FEs you added before, and the current FE is master.


> If you are recovering metadata from an OBSERVER node, after completing the above steps, you will find that the current FE role is OBSERVER, but `IsMaster` appears as `true`. This is because the "OBSERVER" seen here is recorded in Doris's metadata, but whether it is master or not, is recorded in bdbje's metadata. Because we recovered from an OBSERVER node, there was inconsistency. Please take the following steps to fix this problem (we will fix it in a later version):
Expand All @@ -207,7 +205,7 @@ FE may fail to start bdbje and synchronize between FEs for some reasons. Phenome
> 5. After confirming that the new FOLLOWER is working properly, the new FOLLOWER metadata is used to perform a failure recovery operation again.
> 6. The purpose of the above steps is to manufacture a metadata of FOLLOWER node artificially, and then use this metadata to restart fault recovery. This avoids inconsistencies in recovering metadata from OBSERVER.

>The meaning of `metadata_failure_recovery = true` is to empty the metadata of `bdbje`. In this way, bdbje will not contact other FEs before, but start as a separate FE. This parameter needs to be set to true only when restoring startup. After recovery, it must be set to false. Otherwise, once restarted, the metadata of bdbje will be emptied again, which will make other FEs unable to work properly.
>The meaning of `metadata_failure_recovery` is to empty the metadata of `bdbje`. In this way, bdbje will not contact other FEs before, but start as a separate FE. This parameter needs to be set to true only when restoring startup. After recovery, it must be set to false. Otherwise, once restarted, the metadata of bdbje will be emptied again, which will make other FEs unable to work properly.

4. After the successful execution of step 3, we delete the previous FEs from the metadata by using the `ALTER SYSTEM DROP FOLLOWER/OBSERVER` command and add them again by adding new FEs.

Expand Down Expand Up @@ -244,7 +242,7 @@ FE currently has the following ports

1. edit_log_port

If this port needs to be replaced, it needs to be restored with reference to the operations in the `Failure Recovery` section. Because the port has been persisted into bdbje's own metadata (also recorded in Doris's own metadata), it is necessary to clear bdbje's metadata by setting `metadata_failure_recovery=true`.
If this port needs to be replaced, it needs to be restored with reference to the operations in the `Failure Recovery` section. Because the port has been persisted into bdbje's own metadata (also recorded in Doris's own metadata), it is necessary to clear bdbje's metadata by setting `metadata_failure_recovery` when Fe start.

2. http_port

Expand Down
18 changes: 6 additions & 12 deletions docs/zh-CN/docs/admin-manual/cluster-management/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,38 +156,32 @@ admin set frontend config("disable_tablet_scheduler" = "true");
echo "cluster_id=123456" >> ${DORIS_NEW_HOME}/conf/fe.conf
```

4. 在 fe.conf 添加元数据故障恢复配置

```shell
echo "metadata_failure_recovery=true" >> ${DORIS_NEW_HOME}/conf/fe.conf
```

5. 拷贝线上环境 Master FE 的元数据目录 doris-meta 到测试环境
4. 拷贝线上环境 Master FE 的元数据目录 doris-meta 到测试环境

```shell
cp ${DORIS_OLD_HOME}/fe/doris-meta/* ${DORIS_NEW_HOME}/fe/doris-meta
```

6. 将拷贝到测试环境中的 VERSION 文件中的 cluster_id 修改为 123456(即与第3步中相同)
5. 将拷贝到测试环境中的 VERSION 文件中的 cluster_id 修改为 123456(即与第3步中相同)

```shell
vi ${DORIS_NEW_HOME}/fe/doris-meta/image/VERSION
clusterId=123456
```

7. 在测试环境中,运行启动 FE
6. 在测试环境中,运行启动 FE

```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon --metadata_failure_recovery
```

8. 通过 FE 日志 fe.log 观察是否启动成功
7. 通过 FE 日志 fe.log 观察是否启动成功

```shell
tail -f ${DORIS_NEW_HOME}/log/fe.log
```

9. 如果启动成功,则代表兼容性没有问题,停止测试环境的 FE 进程,准备升级
8. 如果启动成功,则代表兼容性没有问题,停止测试环境的 FE 进程,准备升级

```
sh ${DORIS_NEW_HOME}/bin/stop_fe.sh
Expand Down
17 changes: 8 additions & 9 deletions docs/zh-CN/docs/admin-manual/maint-monitor/metadata-operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,13 +189,12 @@ FE 有可能因为某些原因出现无法启动 bdbje、FE 之间无法同步

3. 以下操作都在由第2步中选择出来的 FE 节点上进行。

1. 如果该节点是一个 OBSERVER,先将 `meta_dir/image/ROLE` 文件中的 `role=OBSERVER` 改为 `role=FOLLOWER`。(从 OBSERVER 节点恢复会比较麻烦,先按这里的步骤操作,后面会有单独说明)
2. 在 fe.conf 中添加配置:`metadata_failure_recovery=true`
3. 执行 `sh bin/start_fe.sh` 启动这个 FE。
4. 如果正常,这个 FE 会以 MASTER 的角色启动,类似于前面 `启动单节点 FE` 一节中的描述。在 fe.log 应该会看到 `transfer from XXXX to MASTER` 等字样。
5. 启动完成后,先连接到这个 FE,执行一些查询导入,检查是否能够正常访问。如果不正常,有可能是操作有误,建议仔细阅读以上步骤,用之前备份的元数据再试一次。如果还是不行,问题可能就比较严重了。
6. 如果成功,通过 `show frontends;` 命令,应该可以看到之前所添加的所有 FE,并且当前 FE 是 master。
7. 将 fe.conf 中的 `metadata_failure_recovery=true` 配置项删除,或者设置为 `false`,然后重启这个 FE(**重要**)。
1. 如果该节点是一个 OBSERVER,先将 `meta_dir/image/ROLE` 文件中的 `role=OBSERVER` 改为 `role=FOLLOWER`。(从 OBSERVER 节点恢复会比较麻烦,先按这里的步骤操作,后面会有单独说明))
2. 执行 `sh bin/start_fe.sh --metadata_failure_recovery` 启动这个 FE。
3. 如果正常,这个 FE 会以 MASTER 的角色启动,类似于前面 `启动单节点 FE` 一节中的描述。在 fe.log 应该会看到 `transfer from XXXX to MASTER` 等字样。
4. 启动完成后,先连接到这个 FE,执行一些查询导入,检查是否能够正常访问。如果不正常,有可能是操作有误,建议仔细阅读以上步骤,用之前备份的元数据再试一次。如果还是不行,问题可能就比较严重了。
5. 如果成功,通过 `show frontends;` 命令,应该可以看到之前所添加的所有 FE,并且当前 FE 是 master。
6. 后重启这个 FE(**重要**)。


> 如果你是从一个 OBSERVER 节点的元数据进行恢复的,那么完成如上步骤后,通过 `show frontends;` 语句你会发现,当前这个 FE 的角色为 OBSERVER,但是 `IsMaster` 显示为 `true`。这是因为,这里看到的 “OBSERVER” 是记录在 Doris 的元数据中的,而是否是 master,是记录在 bdbje 的元数据中的。因为我们是从一个 OBSERVER 节点恢复的,所以这里出现了不一致。请按如下步骤修复这个问题(这个问题我们会在之后的某个版本修复):
Expand All @@ -207,7 +206,7 @@ FE 有可能因为某些原因出现无法启动 bdbje、FE 之间无法同步
> 5. 确认这个新的 FOLLOWER 是可以正常工作之后,用这个新的 FOLLOWER 的元数据,重新执行一遍故障恢复操作。
> 6. 以上这些步骤的目的,其实就是人为的制造出一个 FOLLOWER 节点的元数据,然后用这个元数据,重新开始故障恢复。这样就避免了从 OBSERVER 恢复元数据所遇到的不一致的问题。
> `metadata_failure_recovery=true` 的含义是,清空 "bdbje" 的元数据。这样 bdbje 就不会再联系之前的其他 FE 了,而作为一个独立的 FE 启动。这个参数只有在恢复启动时才需要设置为 true。恢复完成后,一定要设置为 false,否则一旦重启,bdbje 的元数据又会被清空,导致其他 FE 无法正常工作。
> `metadata_failure_recovery` 的含义是,清空 "bdbje" 的元数据。这样 bdbje 就不会再联系之前的其他 FE 了,而作为一个独立的 FE 启动。这个参数只有在恢复启动时才需要设置为 true。恢复完成后,一定要设置为 false,否则一旦重启,bdbje 的元数据又会被清空,导致其他 FE 无法正常工作。
4. 第3步执行成功后,我们再通过 `ALTER SYSTEM DROP FOLLOWER/OBSERVER` 命令,将之前的其他的 FE 从元数据删除后,按加入新 FE 的方式,重新把这些 FE 添加一遍。

Expand Down Expand Up @@ -244,7 +243,7 @@ FE 目前有以下几个端口

1. edit_log_port

如果需要更换这个端口,则需要参照 `故障恢复` 一节中的操作,进行恢复。因为该端口已经被持久化到 bdbje 自己的元数据中(同时也记录在 Doris 自己的元数据中),需要通过设置 `metadata_failure_recovery=true` 来清空 bdbje 的元数据。
如果需要更换这个端口,则需要参照 `故障恢复` 一节中的操作,进行恢复。因为该端口已经被持久化到 bdbje 自己的元数据中(同时也记录在 Doris 自己的元数据中),需要启动 FE 时通过指定 `--metadata_failure_recovery` 来清空 bdbje 的元数据。

2. http_port

Expand Down
10 changes: 0 additions & 10 deletions fe/fe-common/src/main/java/org/apache/doris/common/Config.java
Original file line number Diff line number Diff line change
Expand Up @@ -271,16 +271,6 @@ public class Config extends ConfigBase {
+ "each element is a CIDR representation of the network address"})
public static String priority_networks = "";

@ConfField(description = {"是否重置 BDBJE 的复制组,如果所有的可选节点都无法启动,"
+ "可以将元数据拷贝到另一个节点,并将这个配置设置为 true,尝试重启 FE。更多信息请参阅官网的元数据故障恢复文档。",
"If true, FE will reset bdbje replication group(that is, to remove all electable nodes info) "
+ "and is supposed to start as Master. "
+ "If all the electable nodes can not start, we can copy the meta data "
+ "to another node and set this config to true to try to restart the FE. "
+ "For more information, please refer to the metadata failure recovery document "
+ "on the official website."})
public static String metadata_failure_recovery = "false";

@ConfField(mutable = true, description = {"是否忽略元数据延迟,如果 FE 的元数据延迟超过这个阈值,"
+ "则非 Master FE 仍然提供读服务。这个配置可以用于当 Master FE 因为某些原因停止了较长时间,"
+ "但是仍然希望非 Master FE 可以提供读服务。",
Expand Down
6 changes: 6 additions & 0 deletions fe/fe-core/src/main/java/org/apache/doris/DorisFE.java
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import org.apache.doris.catalog.Env;
import org.apache.doris.common.CommandLineOptions;
import org.apache.doris.common.Config;
import org.apache.doris.common.FeConstants;
import org.apache.doris.common.LdapConfig;
import org.apache.doris.common.Log4jConfig;
import org.apache.doris.common.ThreadPoolManager;
Expand Down Expand Up @@ -255,6 +256,8 @@ private static CommandLineOptions parseArgs(String[] args) {
options.addOption("f", "from", true, "Specify the start scan key");
options.addOption("t", "to", true, "Specify the end scan key");
options.addOption("m", "metaversion", true, "Specify the meta version to decode log value");
options.addOption("r", FeConstants.METADATA_FAILURE_RECOVERY_KEY, false,
"Check if the specified metadata recover is valid");

CommandLine cmd = null;
try {
Expand Down Expand Up @@ -288,6 +291,9 @@ private static CommandLineOptions parseArgs(String[] args) {
}
return new CommandLineOptions(false, "", null, imagePath);
}
if (cmd.hasOption('r') || cmd.hasOption(FeConstants.METADATA_FAILURE_RECOVERY_KEY)) {
System.setProperty(FeConstants.METADATA_FAILURE_RECOVERY_KEY, "true");
}
if (cmd.hasOption('b') || cmd.hasOption("bdb")) {
if (cmd.hasOption('l') || cmd.hasOption("listdb")) {
// list bdb je databases
Expand Down
3 changes: 2 additions & 1 deletion fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java
Original file line number Diff line number Diff line change
Expand Up @@ -1658,7 +1658,8 @@ private void checkLowerCaseTableNames() {
* frontend log is deleted because of checkpoint.
*/
private void checkCurrentNodeExist() {
if (Config.metadata_failure_recovery.equals("true")) {
boolean metadataFailureRecovery = null != System.getProperty(FeConstants.METADATA_FAILURE_RECOVERY_KEY);
if (metadataFailureRecovery) {
return;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,6 @@ public class FeConstants {
public static String FS_PREFIX_FILE = "file";
public static final String INTERNAL_DB_NAME = "__internal_schema";
public static String TEMP_MATERIZLIZE_DVIEW_PREFIX = "internal_tmp_materialized_view_";

public static String METADATA_FAILURE_RECOVERY_KEY = "metadata_failure_recovery";
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

import org.apache.doris.catalog.Env;
import org.apache.doris.common.Config;
import org.apache.doris.common.FeConstants;
import org.apache.doris.ha.BDBHA;
import org.apache.doris.ha.BDBStateChangeListener;
import org.apache.doris.ha.FrontendNodeType;
Expand Down Expand Up @@ -90,13 +91,14 @@ public BDBEnvironment() {
// The setup() method opens the environment and database
public void setup(File envHome, String selfNodeName, String selfNodeHostPort,
String helperHostPort, boolean isElectable) {

boolean metadataFailureRecovery = null != System.getProperty(FeConstants.METADATA_FAILURE_RECOVERY_KEY);
// Almost never used, just in case the master can not restart
if (Config.metadata_failure_recovery.equals("true")) {
if (metadataFailureRecovery) {
if (!isElectable) {
LOG.error("Current node is not in the electable_nodes list. will exit");
System.exit(-1);
}
LOG.info("start group reset");
DbResetRepGroup resetUtility = new DbResetRepGroup(
envHome, PALO_JOURNAL_GROUP, selfNodeName, selfNodeHostPort);
resetUtility.reset();
Expand Down
Loading

0 comments on commit 64337a8

Please sign in to comment.