Skip to content

Commit

Permalink
[feature](colocate) support cross database colocate join (apache#18152)
Browse files Browse the repository at this point in the history
  • Loading branch information
morningman authored Apr 3, 2023
1 parent e260dca commit ecd3fd0
Show file tree
Hide file tree
Showing 18 changed files with 476 additions and 127 deletions.
19 changes: 19 additions & 0 deletions docs/en/docs/advanced/join-optimization/colocation-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,25 @@ PROPERTIES(
If the specified group does not exist, Doris automatically creates a group that contains only the current table. If the Group already exists, Doris checks whether the current table satisfies the Colocation Group Schema. If satisfied, the table is created and added to the Group. At the same time, tables create fragments and replicas based on existing data distribution rules in Groups.
Group belongs to a database, and its name is unique in a database. Internal storage is the full name of Group `dbId_groupName`, but users only perceive groupName.

<version since="dev">

In version 2.0, Doris supports cross-Database Group. When creating a table, you need to use the keyword `__global__` as a prefix of the Group name. like:

```
CREATE TABLE tbl (k1 int, v1 int sum)
DISTRIBUTED BY HASH(k1)
BUCKETS 8
PROPERTIES(
"colocate_with" = "__global__group1"
);
```

The Group prefixed with `__global__` no longer belongs to a Database, and its name is also globally unique.

Cross-Database Colocate Join can be realized by creating a Global Group.

</version>

### Delete table

When the last table in Group is deleted completely (deleting completely means deleting from the recycle bin). Usually, when a table is deleted by the `DROP TABLE` command, it will be deleted after the default one-day stay in the recycle bin, and the group will be deleted automatically.
Expand Down
21 changes: 20 additions & 1 deletion docs/zh-CN/docs/advanced/join-optimization/colocation-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,25 @@ PROPERTIES(

如果指定的 Group 不存在,则 Doris 会自动创建一个只包含当前这张表的 Group。如果 Group 已存在,则 Doris 会检查当前表是否满足 Colocation Group Schema。如果满足,则会创建该表,并将该表加入 Group。同时,表会根据已存在的 Group 中的数据分布规则创建分片和副本。 Group 归属于一个 Database,Group 的名字在一个 Database 内唯一。在内部存储是 Group 的全名为 `dbId_groupName`,但用户只感知 groupName。

<version since="dev">

2.0 版本中,Doris 支持了跨Database的 Group。在建表时,需使用关键词 `__global__` 作为 Group 名称的前缀。如:

```
CREATE TABLE tbl (k1 int, v1 int sum)
DISTRIBUTED BY HASH(k1)
BUCKETS 8
PROPERTIES(
"colocate_with" = "__global__group1"
);
```

`__global__` 前缀的 Group 不再归属于一个 Database,其名称也是全局唯一的。

通过创建 Global Group,可以实现跨 Database 的 Colocate Join。

</version>

### 删表

当 Group 中最后一张表彻底删除后(彻底删除是指从回收站中删除。通常,一张表通过 `DROP TABLE` 命令删除后,会在回收站默认停留一天的时间后,再删除),该 Group 也会被自动删除。
Expand Down Expand Up @@ -408,4 +427,4 @@ Doris 提供了几个和 Colocation Join 有关的 HTTP Restful API,用于查

其中 Body 是以嵌套数组表示的 BucketsSequence 以及每个 Bucket 中分片分布所在 BE 的 id。

注意,使用该命令,可能需要将 FE 的配置 `disable_colocate_relocate``disable_colocate_balance` 设为 true。即关闭系统自动的 Colocation 副本修复和均衡。否则可能在修改后,会被系统自动重置。
注意,使用该命令,可能需要将 FE 的配置 `disable_colocate_relocate``disable_colocate_balance` 设为 true。即关闭系统自动的 Colocation 副本修复和均衡。否则可能在修改后,会被系统自动重置。
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,11 @@ public final class FeMetaVersion {
public static final int VERSION_117 = 117;
// change frontend meta to json, add hostname to MasterInfo
public static final int VERSION_118 = 118;
// TablePropertyInfo add db id
public static final int VERSION_119 = 119;

// note: when increment meta version, should assign the latest version to VERSION_CURRENT
public static final int VERSION_CURRENT = VERSION_118;
public static final int VERSION_CURRENT = VERSION_119;

// all logs meta version should >= the minimum version, so that we could remove many if clause, for example
// if (FE_METAVERSION < VERSION_94) ...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,11 @@ public void checkDistribution(DistributionInfo distributionInfo) throws DdlExcep
// distribution col type
for (int i = 0; i < distributionColTypes.size(); i++) {
Type targetColType = distributionColTypes.get(i);
// varchar and string has same distribution hash value if it's data is same
if (targetColType.isVarcharOrStringType() && info.getDistributionColumns().get(i).getType()
.isVarcharOrStringType()) {
continue;
}
if (!targetColType.equals(info.getDistributionColumns().get(i).getType())) {
ErrorReport.reportDdlException(ErrorCode.ERR_COLOCATE_TABLE_MUST_HAS_SAME_DISTRIBUTION_COLUMN_TYPE,
info.getDistributionColumns().get(i).getName(), targetColType);
Expand All @@ -98,7 +103,7 @@ public void checkDistribution(DistributionInfo distributionInfo) throws DdlExcep
}
}

public void checkReplicaAllocation(PartitionInfo partitionInfo) throws DdlException {
private void checkReplicaAllocation(PartitionInfo partitionInfo) throws DdlException {
for (ReplicaAllocation replicaAlloc : partitionInfo.idToReplicaAllocation.values()) {
if (!replicaAlloc.equals(this.replicaAlloc)) {
ErrorReport.reportDdlException(ErrorCode.ERR_COLOCATE_TABLE_MUST_HAS_SAME_REPLICATION_ALLOCATION,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import org.apache.doris.common.io.Text;
import org.apache.doris.common.io.Writable;
import org.apache.doris.persist.ColocatePersistInfo;
import org.apache.doris.persist.gson.GsonPostProcessable;
import org.apache.doris.persist.gson.GsonUtils;
import org.apache.doris.resource.Tag;

Expand Down Expand Up @@ -52,16 +53,23 @@
import java.util.stream.Collectors;

/**
* maintain the colocate table related indexes and meta
* maintain the colocation table related indexes and meta
*/
public class ColocateTableIndex implements Writable {
private static final Logger LOG = LogManager.getLogger(ColocateTableIndex.class);

public static class GroupId implements Writable {
public static class GroupId implements Writable, GsonPostProcessable {
public static final String GLOBAL_COLOCATE_PREFIX = "__global__";

@SerializedName(value = "dbId")
public Long dbId;
@SerializedName(value = "grpId")
public Long grpId;
// only available when dbId = 0
// because for global colocate table, the dbId is 0, so we do not know which db the table belongs to,
// so we use tblId2DbId to record the dbId of each table
@SerializedName(value = "tblId2DbId")
private Map<Long, Long> tblId2DbId = Maps.newHashMap();

private GroupId() {
}
Expand All @@ -71,6 +79,23 @@ public GroupId(long dbId, long grpId) {
this.grpId = grpId;
}

public void addTblId2DbId(long tblId, long dbId) {
Preconditions.checkState(this.dbId == 0);
tblId2DbId.put(tblId, dbId);
}

public void removeTblId2DbId(long tblId) {
tblId2DbId.remove(tblId);
}

public long getDbIdByTblId(long tblId) {
return tblId2DbId.get(tblId);
}

public int getTblId2DbIdSize() {
return tblId2DbId.size();
}

public static GroupId read(DataInput in) throws IOException {
if (Env.getCurrentEnvJournalVersion() < FeMetaVersion.VERSION_105) {
GroupId groupId = new GroupId();
Expand Down Expand Up @@ -102,6 +127,13 @@ public boolean equals(Object obj) {
return dbId.equals(other.dbId) && grpId.equals(other.grpId);
}

@Override
public void gsonPostProcess() throws IOException {
if (tblId2DbId == null) {
tblId2DbId = Maps.newHashMap();
}
}

@Override
public int hashCode() {
int result = 17;
Expand All @@ -114,6 +146,18 @@ public int hashCode() {
public String toString() {
return dbId + "." + grpId;
}

public static String getFullGroupName(long dbId, String colocateGroup) {
if (colocateGroup.startsWith(GLOBAL_COLOCATE_PREFIX)) {
return colocateGroup;
} else {
return dbId + "_" + colocateGroup;
}
}

public static boolean isGlobalGroupName(String groupName) {
return groupName.startsWith(GLOBAL_COLOCATE_PREFIX);
}
}

// group_name -> group_id
Expand Down Expand Up @@ -155,11 +199,10 @@ private void writeUnlock() {

// NOTICE: call 'addTableToGroup()' will not modify 'group2BackendsPerBucketSeq'
// 'group2BackendsPerBucketSeq' need to be set manually before or after, if necessary.
public GroupId addTableToGroup(long dbId, OlapTable tbl, String groupName, GroupId assignedGroupId) {
public GroupId addTableToGroup(long dbId, OlapTable tbl, String fullGroupName, GroupId assignedGroupId) {
writeLock();
try {
GroupId groupId = null;
String fullGroupName = dbId + "_" + groupName;
if (groupName2Id.containsKey(fullGroupName)) {
groupId = groupName2Id.get(fullGroupName);
} else {
Expand All @@ -168,7 +211,11 @@ public GroupId addTableToGroup(long dbId, OlapTable tbl, String groupName, Group
groupId = assignedGroupId;
} else {
// generate a new one
groupId = new GroupId(dbId, Env.getCurrentEnv().getNextId());
if (GroupId.isGlobalGroupName(fullGroupName)) {
groupId = new GroupId(0, Env.getCurrentEnv().getNextId());
} else {
groupId = new GroupId(dbId, Env.getCurrentEnv().getNextId());
}
}
HashDistributionInfo distributionInfo = (HashDistributionInfo) tbl.getDefaultDistributionInfo();
ColocateGroupSchema groupSchema = new ColocateGroupSchema(groupId,
Expand All @@ -178,6 +225,10 @@ public GroupId addTableToGroup(long dbId, OlapTable tbl, String groupName, Group
group2Schema.put(groupId, groupSchema);
group2ErrMsgs.put(groupId, "");
}
// for global colocate table, dbId is 0, and we need to save the real dbId of the table
if (groupId.dbId == 0) {
groupId.addTblId2DbId(tbl.getId(), dbId);
}
group2Tables.put(groupId, tbl.getId());
table2Group.put(tbl.getId(), groupId);
return groupId;
Expand Down Expand Up @@ -252,6 +303,7 @@ public boolean removeTable(long tableId) {
}

GroupId groupId = table2Group.remove(tableId);
groupId.removeTblId2DbId(tableId);
group2Tables.remove(groupId, tableId);
if (!group2Tables.containsKey(groupId)) {
// all tables of this group are removed, remove the group
Expand Down Expand Up @@ -514,14 +566,19 @@ public GroupId changeGroup(long dbId, OlapTable tbl, String oldGroup, String new
// remove from old group
removeTable(tbl.getId());
}
return addTableToGroup(dbId, tbl, newGroup, assignedGroupId);
String fullNewGroupName = GroupId.getFullGroupName(dbId, newGroup);
return addTableToGroup(dbId, tbl, fullNewGroupName, assignedGroupId);
} finally {
writeUnlock();
}
}

public void replayAddTableToGroup(ColocatePersistInfo info) throws MetaNotFoundException {
Database db = Env.getCurrentInternalCatalog().getDbOrMetaException(info.getGroupId().dbId);
long dbId = info.getGroupId().dbId;
if (dbId == 0) {
dbId = info.getGroupId().getDbIdByTblId(info.getTableId());
}
Database db = Env.getCurrentInternalCatalog().getDbOrMetaException(dbId);
OlapTable tbl = (OlapTable) db.getTableOrMetaException(info.getTableId(),
org.apache.doris.catalog.Table.TableType.OLAP);
writeLock();
Expand All @@ -530,7 +587,8 @@ public void replayAddTableToGroup(ColocatePersistInfo info) throws MetaNotFoundE
for (Map.Entry<Tag, List<List<Long>>> entry : map.entrySet()) {
group2BackendsPerBucketSeq.put(info.getGroupId(), entry.getKey(), entry.getValue());
}
addTableToGroup(info.getGroupId().dbId, tbl, tbl.getColocateGroup(), info.getGroupId());
String fullGroupName = GroupId.getFullGroupName(dbId, tbl.getColocateGroup());
addTableToGroup(dbId, tbl, fullGroupName, info.getGroupId());
} finally {
writeUnlock();
}
Expand Down
27 changes: 15 additions & 12 deletions fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java
Original file line number Diff line number Diff line change
Expand Up @@ -3885,23 +3885,23 @@ public void replayRenameTable(TableInfo tableInfo) throws MetaNotFoundException
}

// the invoker should keep table's write lock
public void modifyTableColocate(Database db, OlapTable table, String colocateGroup, boolean isReplay,
public void modifyTableColocate(Database db, OlapTable table, String assignedGroup, boolean isReplay,
GroupId assignedGroupId)
throws DdlException {

String oldGroup = table.getColocateGroup();
GroupId groupId = null;
if (!Strings.isNullOrEmpty(colocateGroup)) {
String fullGroupName = db.getId() + "_" + colocateGroup;
if (!Strings.isNullOrEmpty(assignedGroup)) {
String fullAssignedGroupName = GroupId.getFullGroupName(db.getId(), assignedGroup);
//When the new name is the same as the old name, we return it to prevent npe
if (!Strings.isNullOrEmpty(oldGroup)) {
String oldFullGroupName = db.getId() + "_" + oldGroup;
if (oldFullGroupName.equals(fullGroupName)) {
String oldFullGroupName = GroupId.getFullGroupName(db.getId(), oldGroup);
if (oldFullGroupName.equals(fullAssignedGroupName)) {
LOG.warn("modify table[{}] group name same as old group name,skip.", table.getName());
return;
}
}
ColocateGroupSchema groupSchema = colocateTableIndex.getGroupSchema(fullGroupName);
ColocateGroupSchema groupSchema = colocateTableIndex.getGroupSchema(fullAssignedGroupName);
if (groupSchema == null) {
// user set a new colocate group,
// check if all partitions all this table has same buckets num and same replication number
Expand Down Expand Up @@ -3938,7 +3938,7 @@ public void modifyTableColocate(Database db, OlapTable table, String colocateGro
backendsPerBucketSeq = table.getArbitraryTabletBucketsSeq();
}
// change group after getting backends sequence(if has), in case 'getArbitraryTabletBucketsSeq' failed
groupId = colocateTableIndex.changeGroup(db.getId(), table, oldGroup, colocateGroup, assignedGroupId);
groupId = colocateTableIndex.changeGroup(db.getId(), table, oldGroup, assignedGroup, assignedGroupId);

if (groupSchema == null) {
Preconditions.checkNotNull(backendsPerBucketSeq);
Expand All @@ -3948,7 +3948,7 @@ public void modifyTableColocate(Database db, OlapTable table, String colocateGro
// set this group as unstable
colocateTableIndex.markGroupUnstable(groupId, "Colocation group modified by user",
false /* edit log is along with modify table log */);
table.setColocateGroup(colocateGroup);
table.setColocateGroup(assignedGroup);
} else {
// unset colocation group
if (Strings.isNullOrEmpty(oldGroup)) {
Expand All @@ -3957,24 +3957,27 @@ public void modifyTableColocate(Database db, OlapTable table, String colocateGro
}

// when replayModifyTableColocate, we need the groupId info
String fullGroupName = db.getId() + "_" + oldGroup;
String fullGroupName = GroupId.getFullGroupName(db.getId(), oldGroup);
groupId = colocateTableIndex.getGroupSchema(fullGroupName).getGroupId();

colocateTableIndex.removeTable(table.getId());
table.setColocateGroup(null);
}

if (!isReplay) {
Map<String, String> properties = Maps.newHashMapWithExpectedSize(1);
properties.put(PropertyAnalyzer.PROPERTIES_COLOCATE_WITH, colocateGroup);
TablePropertyInfo info = new TablePropertyInfo(table.getId(), groupId, properties);
properties.put(PropertyAnalyzer.PROPERTIES_COLOCATE_WITH, assignedGroup);
TablePropertyInfo info = new TablePropertyInfo(db.getId(), table.getId(), groupId, properties);
editLog.logModifyTableColocate(info);
}
LOG.info("finished modify table's colocation property. table: {}, is replay: {}", table.getName(), isReplay);
}

public void replayModifyTableColocate(TablePropertyInfo info) throws MetaNotFoundException {
long dbId = info.getGroupId().dbId;
if (dbId == 0) {
dbId = info.getDbId();
}
Preconditions.checkState(dbId != 0, "replay modify table colocate failed, table id: " + info.getTableId());
long tableId = info.getTableId();
Map<String, String> properties = info.getPropertyMap();

Expand Down
Loading

0 comments on commit ecd3fd0

Please sign in to comment.