Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OceanBase 兼容问题 #5087

Closed
yangbnh opened this issue Feb 2, 2024 · 11 comments
Closed

OceanBase 兼容问题 #5087

yangbnh opened this issue Feb 2, 2024 · 11 comments
Labels

Comments

@yangbnh
Copy link

yangbnh commented Feb 2, 2024

OceanBase数据库,

表主键id的自增策略如下:

租户切主时,插入的下一个值就是之前自增列缓存的最大值,auto_increment_cache_size 是用户设置自增的缓存个数,这个变量默认是1000000;建表插入数据后,查看表结构时当前AUTO_INCREMENT = 1000001(这跟分布式自增列的实现有关系),未指定id插入数据时,auto_increment_increment 自增步长默认为1,id列自动填充;当切主时,自增列缓存会变成2000001

目前,通过项目中原始ddl插入表中,业务运行一段时间后生成的Id如下(抱歉,模糊了部分业务编码):
图片

然后数据库监控 ocp监控,发现long sql,cpu耗尽,具体sql参见

select releasemes0_.Id as id1_15_, releasemes0_.DataChange_LastTime as datachan2_15_, releasemes0_.Message as message3_15_ from ReleaseMessage releasemes0_ where releasemes0_.Id in (2097184 , 2097185 , 2097186 , 2097187 , 2097188 , 2097189 , 2097190 , 2097191 , 2097192 , 2097193 , 2097194 , 2097195 , 2097196 , 2097197 , 2097198 , 2097199 , 2097200 , 2097201 , 2097202 , 2097203 , 2097204 , 2097205 , 2097206 , 2097207 , 2097208 , 2097209 , 2097210 , 2097211 , 2097212 , 2097213 , 2097214 , 2097215 , 2097152 , 2097153 , 2097154 , 2097155 , 2097156 , 2097157 , 2097158 , 2097159 , 2097160 , 2097161 , 2097162 , 2097163 , 2097164 , 2097165 , 2097166 , 2097167 , 2097168 , 2097169 , 2097170 , 2097171 , 2097172 , 2097173 , 2097174 , 2097175 , 2097176 , 2097177 , 2097178 , 2097179 , 2097180 , 2097181 , 2097182 , 2097183 , 2097248 , 2097249 , 2097250 , 2097251 , 2097252 , 2097253 , 2097254 , 2097255 , 2097256 , 2097257 , 2097258 , 2097259 , 2097260 , 2097261 , 2097262 , 2097263 , 2097264 , 2097265 , 2097266 , 2097267 , 2097268 , 2097269 , 2097270 , 2097271 , 2097272 , 2097273 , 2097274 , 2097275 , 2097276 , 2097277 , 2097278 , 2097279 , 2097216 , 2097217 , 2097218 , 2097219 , 2097220 , 2097221 , 2097222 , 2097223 , 2097224 , 2097225 , 2097226 , 2097227 , 2097228 , 2097229 , 2097230 , 2097231 , 2097232 , 2097233 , 2097234 , 2097235 , 2097236 , 2097237 , 2097238 , 2097239 , 2097240 , 2097241 , 2097242 , 2097243 , 2097244 , 2097245 , 2097246 , 2097247 , 2097312 , 2097313 , 2097314 , 2097315 , 2097316 , 2097317 , 2097318 , 2097319 , 2097320 , 2097321 , 2097322 , 2097323 , 2097324 , 2097325 , 2097326 , 2097327 , 2097328 , 2097329 , 2097330 , 2097331 , 2097332 , 2097333 , 2097334 , 2097335 , 2097336 , 2097337 , 2097338 , 2097339 , 2097340 , 2097341 , 2097342 , 2097343 , 2097280 , 2097281 , 2097282 , 2097283 , 2097284 , 2097285 , 2097286 , 2097287 , 2097288 , 2097289 , 2097290 , 2097291 , 2097292 , 2097293 , 2097294 , 2097295 , 2097296 , 2097297 , 2097298 , 2097299 , 2097300 , 2097301 , 2097302 , 2097303 , 2097304 , 2097305 , 2097306 , 2097307 , 2097308 , 2097309 , 2097310 , 2097311 , 2097376 , 2097377 , 2097378 , 2097379 , 2097380 , 2097381 , 2097382 , 2097383 , 2097384 , 2097385 , 2097386 , 2097387 , 2097388 , 2097389 , 2097390 , 2097391 , 2097392 , 2097393 , 2097394 , 2097395 , 2097396 , 2097397 , 2097398 , 2097399 , 2097400 , 2097401 , 2097402 , 2097403 , 2097404 , 2097405 , 2097406 , 2097407 , 2097344 , 2097345 , 2097346 , 2097347 , 2097348 , 2097349 , 2097350 , 2097351 , 2097352 , 2097353 , 2097354 , 2097355 , 2097356 , 2097357 , 2097358 , 2097359 , 2097360 , 2097361 , 2097362 , 2097363 , 2097364 , 2097365 , 2097366 , 2097367 , 2097368 , 2097369 , 2097370 , 2097371 , 2097372 , 2097373 , 2097374 , 2097375 , 2097440 , 2097441 , 2097442 , 2097443 , 2097444 , 2097445 , 2097446 , 2097447 , 2097448 , 2097449 , 2097450 , 2097451 , 2097452 , 2097453 , 2097454 , 2097455 , 2097456 , 2097457 , 2097458 , 2097459 , 2097460 , 2097461 , 2097462 , 2097463 , 2097464 , 2097465 , 2097466 , 2097467 , 2097468 , 2097469 , 2097470 , 2097471 , 2097408 , 2097409 , 2097410 , 2097411 , 2097412 , 2097413 , 2097414 , 2097415 , 2097416 , 2097417 , 2097418 , 2097419 , 2097420 , 2097421 , 2097422 , 2097423 , 2097424 , 2097425 , 2097426 , 2097427 , 2097428 , 2097429 , 2097430 , 2097431 , 2097432 , 2097433 , 2097434 , 2097435 , 2097436 , 2097437 , 2097438 , 2097439 , 2097504 , 2097505 , 2097506 , 2097507 , 2097508 , 2097509 , 2097510 , 2097511 , 2097512 , 2097513 , 2097514 , 2097515 , 2097516 , 2097517 , 2097518 , 2097519 , 2097520 , 2097521 , 2097522 , 2097523 , 2097524 , 2097525 , 2097526 , 2097527 , 2097528 , 2097529 , 2097530 , 2097531 , 2097532 , 2097533 , 2097534 , 2097535 , 2097472 , 2097473 , 2097474 , 2097475 , 2097476 , 2097477 , 2097478 , 2097479 , 2097480 , 2097481 , 2097482 , 2097483 , 2097484 , 2097485 , 2097486 , 2097487 , 2097488 , 2097489 , 2097490 , 2097491 , 2097492 , 2097493 , 2097494 , 2097495 , 2097496 , 2097497 , 2097498 , 2097499 , 2097500 , 2097501 , 2097502 , 2097503 , 2097568 , 2097569 , 2097570 , 2097571 , 2097572 , 2097573 , 2097574 , 2097575 , 2097576 , 2097577 , 2097578 , 2097579 , 2097580 , 2097581 , 2097582 , 2097583 , 2097584 , 2097585 , 2097586 , 2097587 , 2097588 , 2097589 , 2097590 , 2097591 , 2097592 , 2097593 , 2097594 , 2097595 , 2097596 , 2097597 , 2097598 , 2097599 , 2097536 , 2097537 , 2097538 , 2097539 , 2097540 , 2097541 , 2097542 , 2097543 , 2097544 , 2097545 , 2097546 , 2097547 , 2097548 , 2097549 , 2097550 , 2097551 , 2097552 , 2097553 , 2097554 , 2097555 , 2097556 , 2097557 , 2097558 , 2097559 , 2097560 , 2097561 , 2097562 , 2097563 , 2097564 , 2097565 , 2097566 , 2097567 , ...

in的id值太多,不能一一列举.

此问题可复现,在docker部署和集群部署都可复现,麻烦解决之道.不胜感激

Copy link

dosubot bot commented Feb 2, 2024

🤖

你好 @yangbnh!我是 Dosu,一个可以帮助你解决问题、回答问题并成为贡献者的机器人。我在这里帮助你,在等待人类维护者的同时,请放心,我会立即查看你的问题,并尽快给你一个完整的回答。

@yangbnh
Copy link
Author

yangbnh commented Feb 2, 2024

考虑限制ob auto_increment_cache_size的大小,来约束ReleaseMessage的id大小,从而减小此sql的id值范围

想问下是否还有其他的解决方式?

Copy link

dosubot bot commented Feb 2, 2024

🤖

您的问题主要集中在OceanBase数据库的自增ID策略和Apollo的ReleaseMessage表的交互方式上。在Apollo中,ReleaseMessage表的Id字段被定义为AUTO_INCREMENT,这意味着数据库会自动为每个新记录生成一个唯一值,该值是最后使用的值的增量。这是在Apollo的SQL脚本中定义的。

然而,您提到OceanBase的自增ID策略可能会导致ID值的大幅度增长,这可能会影响到某些SQL查询的性能,特别是当查询涉及到大量的ID值时。您提出的解决方案是限制OceanBase的auto_increment_cache_size的大小,以减小ReleaseMessage的ID大小,从而减小此SQL的ID值范围。

这个解决方案看起来是可行的,因为它直接解决了问题的根源,即ID值的大幅度增长。然而,这可能需要您直接在OceanBase数据库上进行配置更改,这可能会影响到其他使用该数据库的应用。

另一种可能的解决方案是优化Apollo中处理ReleaseMessage表的代码,特别是那些涉及到大量ID值的查询。例如,可以考虑使用分页查询来替代一次性查询大量ID值,或者使用其他的数据结构或算法来优化查询性能。

然而,这些解决方案可能需要对Apollo的代码进行一些修改,这可能需要一些时间和资源。在进行这些修改之前,建议先进行一些性能测试,以确定最优的解决方案。

希望这些信息对您有所帮助。如果您有任何其他问题或需要更多的信息,请随时提问。

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@nobodyiam
Copy link
Member

  1. 没太明白为啥这条 sql 会导致 ob 的 cpu 耗尽,这个是用主键查询的。
  2. 另外 in 的 id 值有多少个?看代码没有找到直接查询的地方,唯一有相关性的可能是下面这个 findFirst500ByIdGreaterThanOrderByIdAsc,一次性加载了 500 条记录,压力也不大的。

public interface ReleaseMessageRepository extends PagingAndSortingRepository<ReleaseMessage, Long> {
List<ReleaseMessage> findFirst500ByIdGreaterThanOrderByIdAsc(Long id);

@yangbnh
Copy link
Author

yangbnh commented Feb 5, 2024

  1. 没太明白为啥这条 sql 会导致 ob 的 cpu 耗尽,这个是用主键查询的。

    1. 另外 in 的 id 值有多少个?看代码没有找到直接查询的地方,唯一有相关性的可能是下面这个 findFirst500ByIdGreaterThanOrderByIdAsc,一次性加载了 500 条记录,压力也不大的。

public interface ReleaseMessageRepository extends PagingAndSortingRepository<ReleaseMessage, Long> {
List<ReleaseMessage> findFirst500ByIdGreaterThanOrderByIdAsc(Long id);

感谢回复

其实这个问题目前已经出现多次了,我是从事银行解决方案的,目前在两家城商行都被行里的DBA监控,拉小组解决问题,您说的这个方法我也看过,但是并不是,而且查询的数据量超过6537+行,如图中sql,in的分割逗号6535+,
图片
这个sql是ob的监控,直接复制出来的,而且复制不全.
而且这些数据都带有一定的规律性

@yangbnh
Copy link
Author

yangbnh commented Feb 5, 2024

  1. 没太明白为啥这条 sql 会导致 ob 的 cpu 耗尽,这个是用主键查询的。

    1. 另外 in 的 id 值有多少个?看代码没有找到直接查询的地方,唯一有相关性的可能是下面这个 findFirst500ByIdGreaterThanOrderByIdAsc,一次性加载了 500 条记录,压力也不大的。

public interface ReleaseMessageRepository extends PagingAndSortingRepository<ReleaseMessage, Long> {
List<ReleaseMessage> findFirst500ByIdGreaterThanOrderByIdAsc(Long id);

`com.ctrip.framework.apollo.biz.message.ReleaseMessageScanner.recordMissingReleaseMessageIds()方法中,
图片
会出现,id以这种规律出现,并put到missingReleaseMessages的map中.

在com.ctrip.framework.apollo.biz.message.ReleaseMessageScanner.scanMissingMessages()方法中,会取出missingReleaseMessages的keyset,索引全库数据.
图片
如图所示

@yangbnh
Copy link
Author

yangbnh commented Feb 5, 2024

private void scanMissingMessages() {
Set<Long> missingReleaseMessageIds = missingReleaseMessages.keySet();
Iterable<ReleaseMessage> releaseMessages = releaseMessageRepository
.findAllById(missingReleaseMessageIds);
fireMessageScanned(releaseMessages);
releaseMessages.forEach(releaseMessage -> {
missingReleaseMessageIds.remove(releaseMessage.getId());
});
growAndCleanMissingMessages();
}

private void recordMissingReleaseMessageIds(List<ReleaseMessage> messages, long startId) {
for (ReleaseMessage message : messages) {
long currentId = message.getId();
if (currentId - startId > 1) {
for (long i = startId + 1; i < currentId; i++) {
missingReleaseMessages.putIfAbsent(i, 1);
}
}
startId = currentId;
}
}

问题应该就出现在
if (currentId - startId > 1) {
for (long i = startId + 1; i < currentId; i++) {
missingReleaseMessages.putIfAbsent(i, 1);
}
}
如果想mysql或者mariadb,在auto_increment并且step为1下, 没什么问题,但是ob下如果切主,会导致id以上万的step增长,这块逻辑就会产生问题

所以这块逻辑是否只需要查询startId和currentId之间,在ReleaseMessage表中真实存在的id即可,而不是通过startId++方式将所有的值都put到map中来解决问题呢?

@nobodyiam
Copy link
Member

所以这块逻辑是否只需要查询startId和currentId之间,在ReleaseMessage表中真实存在的id即可

代码本意是如此,在实现上是用了一般数据库自增的特性来实现的。看来对 ob 不适用,可以看下是否有更好地方式来实现。

Copy link

stale bot commented Mar 9, 2024

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 9, 2024
Copy link

stale bot commented Mar 16, 2024

This issue has been automatically closed because it has not had activity in the last 7 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@SirCFC
Copy link

SirCFC commented Nov 11, 2024

@yangbnh 请问最后有找到好的解决方案吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants