Skip to content

Commit

Permalink
[chore](recover) add a config to recover remaining data in emergency (a…
Browse files Browse the repository at this point in the history
  • Loading branch information
yangzhg authored Apr 28, 2023
1 parent 365ac54 commit 43e70ab
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 1 deletion.
14 changes: 14 additions & 0 deletions docs/en/docs/admin-manual/config/fe-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -1785,6 +1785,20 @@ In some very special circumstances, such as code bugs, or human misoperation, et

Set to true so that Doris will automatically use blank replicas to fill tablets which all replicas have been damaged or missing

#### `recover_with_skip_missing_version`

Default:disable

IsMutable:true

MasterOnly:true

In some scenarios, there is an unrecoverable metadata problem in the cluster, and the visibleVersion of the data does not match be. In this case, it is still necessary to restore the remaining data (which may cause problems with the correctness of the data). This configuration is the same as` recover_with_empty_tablet` should only be used in emergency situations
This configuration has three values:
* disable : If an exception occurs, an error will be reported normally.
* ignore_version: ignore the visibleVersion information recorded in fe partition, use replica version
* ignore_all: In addition to ignore_version, when encountering no queryable replica, skip it directly instead of throwing an exception

#### `min_clone_task_timeout_sec` `And max_clone_task_timeout_sec`

Default:Minimum 3 minutes, maximum two hours
Expand Down
18 changes: 18 additions & 0 deletions docs/zh-CN/docs/admin-manual/config/fe-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -1785,6 +1785,24 @@ show data (其他用法:HELP SHOW DATA)

在这种情况下,您可以将此配置设置为 true。 系统会将损坏的 tablet 替换为空 tablet,以确保查询可以执行。 (但此时数据已经丢失,所以查询结果可能不准确)

#### `recover_with_skip_missing_version`

默认值:disable

是否可以动态配置:true

是否为 Master FE 节点独有的配置项:true

有些场景下集群出现了不可恢复的元数据问题,数据已的visibleversion 已经和be 不匹配,

这种情况下仍然需要恢复剩余的数据(可能能会导致数据的正确性有问题),这个配置同`recover_with_empty_tablet` 一样只能在紧急情况下使用

这个配置有三个值:

* disable :出现异常会正常报错。
* ignore_version: 忽略 fe partition 中记录的visibleVersion 信息, 使用replica version
* ignore_all: 除了ignore_version, 在遇到找不到可查询的replica 时,直接跳过而不是抛出异常

#### `min_clone_task_timeout_sec``max_clone_task_timeout_sec`

默认值:最小3分钟,最大两小时
Expand Down
14 changes: 14 additions & 0 deletions fe/fe-common/src/main/java/org/apache/doris/common/Config.java
Original file line number Diff line number Diff line change
Expand Up @@ -1439,6 +1439,20 @@ public class Config extends ConfigBase {
@ConfField(mutable = true, masterOnly = true)
public static boolean recover_with_empty_tablet = false;

/**
* In some scenarios, there is an unrecoverable metadata problem in the cluster,
* and the visibleVersion of the data does not match be. In this case, it is still
* necessary to restore the remaining data (which may cause problems with the correctness of the data).
* This configuration is the same as` recover_with_empty_tablet` should only be used in emergency situations
* This configuration has three values:
* disable : If an exception occurs, an error will be reported normally.
* ignore_version: ignore the visibleVersion information recorded in fe partition, use replica version
* ignore_all: In addition to ignore_version, when encountering no queryable replica,
* skip it directly instead of throwing an exception
*/
@ConfField(mutable = true, masterOnly = true)
public static String recover_with_skip_missing_version = "disable";

/**
* Whether to add a delete sign column when create unique table
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
import org.apache.doris.catalog.Replica;
import org.apache.doris.catalog.Tablet;
import org.apache.doris.common.AnalysisException;
import org.apache.doris.common.Config;
import org.apache.doris.common.ErrorCode;
import org.apache.doris.common.ErrorReport;
import org.apache.doris.common.UserException;
Expand Down Expand Up @@ -708,6 +709,20 @@ private void addScanRangeLocations(Partition partition,
}
for (Tablet tablet : tablets) {
long tabletId = tablet.getId();
if (!Config.recover_with_skip_missing_version.equalsIgnoreCase("disable")) {
long tabletVersion = -1L;
for (Replica replica : tablet.getReplicas()) {
if (replica.getVersion() > tabletVersion) {
tabletVersion = replica.getVersion();
}
}
if (tabletVersion != visibleVersion) {
LOG.warn("tablet {} version {} is not equal to partition {} version {}",
tabletId, tabletVersion, partition.getId(), visibleVersion);
visibleVersion = tabletVersion;
visibleVersionStr = String.valueOf(visibleVersion);
}
}
TScanRangeLocations scanRangeLocations = new TScanRangeLocations();
TPaloScanRange paloRange = new TPaloScanRange();
paloRange.setDbName("");
Expand Down Expand Up @@ -783,7 +798,12 @@ private void addScanRangeLocations(Partition partition,
scanBackendIds.add(backend.getId());
}
if (tabletIsNull) {
throw new UserException(tabletId + " have no queryable replicas. err: " + Joiner.on(", ").join(errs));
if (Config.recover_with_skip_missing_version.equalsIgnoreCase("ignore_all")) {
continue;
} else {
throw new UserException(tabletId + " have no queryable replicas. err: "
+ Joiner.on(", ").join(errs));
}
}
TScanRange scanRange = new TScanRange();
scanRange.setPaloScanRange(paloRange);
Expand Down

0 comments on commit 43e70ab

Please sign in to comment.