-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: The cluster cannot release the loaded collection or query #15623
Comments
querycoord-------------> |
querynode------------>" |
proxy----------> |
After the memory release or query operations that cause cluster exceptions are performed, the proxy is abnormal. |
After the memory release or query operations that cause cluster exceptions are performed, the proxy is abnormal. |
During this period, the querynode restarts. In the attu, the number of nodes is far greater than the actual number of nodes. The actual node that does not exist is still in the registry. The log is as follows: from querycoord----------------》 [2022/02/18 02:44:12.165 +00:00] [WARN] [metrics_info.go:72] ["invalid metrics of query node was found"] [error="getMetrics: queryNode 250 is offline"] |
Does the raw data size or index size exceed the memory limit? @qi49125 |
Less than 50 million aggregated data are loaded to the memory. I can load data to the memory independently on a single 64 GB machine. |
The metadata cache of the cluster is faulty. As a result, no data is actually loaded to the querynode in a timely manner. As a result, the original index cannot be found during the release, and the health status of the cluster is directly affected when the query or release is executed, especially when the proxy functions as the service entry. The client does not respond to the proxy when it connects to the client. |
The memory usage of the pulsar is always high. It is suspected that the pulsar has been processing the previous operations. |
You can restart the pulsar, but not delete origin files |
According to the attu command output, the number of querynodes is 57, but the number of querynodes is only 17. Is there a problem here? Registered Nodes Are Not Cleared After the Node Is Unhealthy? |
I restarted it, and soon after the restart, the content percentage will return to the previous state. |
The log of the proxy build index looks problematic, please check it. @xiaocai2333 |
After my analysis, I find that all your operations depend on the pulsar. If I send certain operations continuously, these operations do not have a cancellation mechanism. As a result, the pulsar execution may be slow. As a result, the cluster responds slowly and is suspended. |
Have you executed CreateIndex after query operation? Can you paste IndexCoord and IndexNode log here? @qi49125 |
At noon, I found many index logs. Therefore, I suspected that the index data was lost. I deleted and created the index on the attu page one hour ago. Currently, the log is normal. The log is as follows: |
Can you give me the indexnode and indexcoord log before |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@qi49125 did you solve this issue? |
Is there an existing issue for this?
Environment
Current Behavior
200 million data records and two sets are imported to the 2.0 cluster. After the first set is loaded, the cluster is normal. After the second set is loaded due to insufficient memory, the querynode is also restarted. As a result, the cluster is abnormal. During the restart of other services, the name and data volume of the collection can be properly displayed in Insight. When a collection or partition is created, data can be properly imported to the new collection, and the previously loaded collection is in the loaded state. However, if you perform a query or click to release the memory, the cluster becomes abnormal, what should we do in this case? We're in the market now.
The service logs are as follows:
pulsar---------->:
01:48:35.232 [bookkeeper-ml-workers-OrderedExecutor-1-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/by-dev-rootcoord-dml_1] Opened new cursor: ManagedCursorImpl{ledger=public/default/persistent/by-dev-rootcoord-dml_1, name=by-dev-dataNode-216-431265917210984449, ackPos=80301:34944, readPos=80301:34945}
01:48:35.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_0-by-dev-dataNode-216-431265917210984449] Rewind from 7:0 to 7:0
01:48:35.232 [bookkeeper-ml-workers-OrderedExecutor-1-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_1-by-dev-dataNode-216-431265917210984449] Rewind from 8:0 to 8:0
01:48:35.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_0] There are no replicated subscriptions on the topic
01:48:35.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_0][by-dev-dataNode-216-431265917210984449] Created new subscription for 77
01:48:35.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created subscription on topic persistent://public/default/by-dev-rootcoord-dml_0 / by-dev-dataNode-216-431265917210984449
01:48:35.233 [bookkeeper-ml-workers-OrderedExecutor-1-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_1] There are no replicated subscriptions on the topic
01:48:35.233 [bookkeeper-ml-workers-OrderedExecutor-1-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_1][by-dev-dataNode-216-431265917210984449] Created new subscription for 76
01:48:35.233 [bookkeeper-ml-workers-OrderedExecutor-1-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created subscription on topic persistent://public/default/by-dev-rootcoord-dml_1 / by-dev-dataNode-216-431265917210984449
01:48:35.235 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.Consumer - Disconnecting consumer: Consumer{subscription=PersistentSubscription{topic=persistent://public/default/by-dev-rootcoord-dml_1, name=by-dev-dataNode-216-431265917210984449}, consumerId=76, consumerName=kxhxo, address=/12.11.0.97:15768}
01:48:35.235 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.AbstractDispatcherSingleActiveConsumer - Removing consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/by-dev-rootcoord-dml_1, name=by-dev-dataNode-216-431265917210984449}, consumerId=76, consumerName=kxhxo, address=/12.11.0.97:15768}
01:48:35.235 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://public/default/by-dev-rootcoord-dml_1][by-dev-dataNode-216-431265917210984449] Successfully disconnected consumers from subscription, proceeding with cursor reset
01:48:35.236 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.Consumer - Disconnecting consumer: Consumer{subscription=PersistentSubscription{topic=persistent://public/default/by-dev-rootcoord-dml_0, name=by-dev-dataNode-216-431265917210984449}, consumerId=77, consumerName=fzatn, address=/12.11.0.97:15768}
01:48:35.236 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.AbstractDispatcherSingleActiveConsumer - Removing consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/by-dev-rootcoord-dml_0, name=by-dev-dataNode-216-431265917210984449}, consumerId=77, consumerName=fzatn, address=/12.11.0.97:15768}
01:48:35.236 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://public/default/by-dev-rootcoord-dml_0][by-dev-dataNode-216-431265917210984449] Successfully disconnected consumers from subscription, proceeding with cursor reset
01:48:35.236 [bookkeeper-ml-workers-OrderedExecutor-1-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_1] Initiate reset position to 80301:34944 on cursor by-dev-dataNode-216-431265917210984449
01:48:35.236 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_0] Initiate reset position to 80331:34943 on cursor by-dev-dataNode-216-431265917210984449
01:48:35.240 [broker-topic-workers-OrderedScheduler-6-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_1-by-dev-dataNode-216-431265917210984449] Rewind from 8:100 to 8:0
01:48:35.241 [broker-topic-workers-OrderedScheduler-5-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_0-by-dev-dataNode-216-431265917210984449] Rewind from 7:100 to 7:0
01:48:35.242 [BookKeeperClientWorker-OrderedExecutor-12-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_0] reset position to 80331:34943 skipping from current read position 7:0 on cursor by-dev-dataNode-216-431265917210984449
01:48:35.242 [BookKeeperClientWorker-OrderedExecutor-13-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_1] reset position to 80301:34944 skipping from current read position 8:0 on cursor by-dev-dataNode-216-431265917210984449
01:48:35.242 [BookKeeperClientWorker-OrderedExecutor-12-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] [persistent://public/default/by-dev-rootcoord-dml_0][by-dev-dataNode-216-431265917210984449] Reset subscription to message id 80331:34943
01:48:35.242 [BookKeeperClientWorker-OrderedExecutor-13-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] [persistent://public/default/by-dev-rootcoord-dml_1][by-dev-dataNode-216-431265917210984449] Reset subscription to message id 80301:34944
01:48:35.339 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Subscribing on topic persistent://public/default/by-dev-rootcoord-dml_1 / by-dev-dataNode-216-431265917210984449
01:48:35.339 [pulsar-io-36-4] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_1-by-dev-dataNode-216-431265917210984449] Rewind from 80301:34944 to 80301:34944
01:48:35.340 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_1] There are no replicated subscriptions on the topic
01:48:35.340 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_1][by-dev-dataNode-216-431265917210984449] Created new subscription for 76
01:48:35.340 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created subscription on topic persistent://public/default/by-dev-rootcoord-dml_1 / by-dev-dataNode-216-431265917210984449
01:48:35.344 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768][persistent://public/default/by-dev-rootcoord-delta_1] Creating producer. producerId=151
01:48:35.345 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] persistent://public/default/by-dev-rootcoord-delta_1 configured with schema false
01:48:35.345 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-rootcoord-delta_1}, client=/12.11.0.97:15768, producerName=standalone-10-1, producerId=151}
01:48:35.347 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768][persistent://public/default/by-dev-datacoord-timetick-channel] Creating producer. producerId=152
01:48:35.347 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] persistent://public/default/by-dev-datacoord-timetick-channel configured with schema false
01:48:35.348 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-datacoord-timetick-channel}, client=/12.11.0.97:15768, producerName=standalone-10-2, producerId=152}
01:48:35.353 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Subscribing on topic persistent://public/default/by-dev-rootcoord-dml_0 / by-dev-dataNode-216-431265917210984449
01:48:35.353 [pulsar-io-36-4] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/by-dev-rootcoord-dml_0-by-dev-dataNode-216-431265917210984449] Rewind from 80331:34943 to 80331:34943
01:48:35.354 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_0] There are no replicated subscriptions on the topic
01:48:35.354 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_0][by-dev-dataNode-216-431265917210984449] Created new subscription for 77
01:48:35.354 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created subscription on topic persistent://public/default/by-dev-rootcoord-dml_0 / by-dev-dataNode-216-431265917210984449
01:48:35.357 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768][persistent://public/default/by-dev-rootcoord-delta_0] Creating producer. producerId=153
01:48:35.358 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] persistent://public/default/by-dev-rootcoord-delta_0 configured with schema false
01:48:35.358 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-rootcoord-delta_0}, client=/12.11.0.97:15768, producerName=standalone-10-3, producerId=153}
01:48:35.360 [pulsar-io-36-4] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768][persistent://public/default/by-dev-datacoord-timetick-channel] Creating producer. producerId=154
01:48:35.360 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] persistent://public/default/by-dev-datacoord-timetick-channel configured with schema false
01:48:35.360 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.97:15768] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-datacoord-timetick-channel}, client=/12.11.0.97:15768, producerName=standalone-10-4, producerId=154}
01:49:17.500 [pulsar-io-36-12] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987][persistent://public/default/by-dev-rootcoord-dml_0] Creating producer. producerId=1
01:49:17.501 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] persistent://public/default/by-dev-rootcoord-dml_0 configured with schema false
01:49:17.501 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-rootcoord-dml_0}, client=/12.11.0.81:13987, producerName=standalone-10-5, producerId=1}
01:49:17.503 [pulsar-io-36-12] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987][persistent://public/default/by-dev-rootcoord-dml_1] Creating producer. producerId=2
01:49:17.504 [pulsar-io-36-12] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987][persistent://public/default/by-dev-rootcoord-dml_0] Creating producer. producerId=3
01:49:17.504 [ForkJoinPool.commonPool-worker-5] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] persistent://public/default/by-dev-rootcoord-dml_1 configured with schema false
01:49:17.504 [ForkJoinPool.commonPool-worker-15] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] persistent://public/default/by-dev-rootcoord-dml_0 configured with schema false
01:49:17.504 [ForkJoinPool.commonPool-worker-5] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-rootcoord-dml_1}, client=/12.11.0.81:13987, producerName=standalone-10-6, producerId=2}
01:49:17.504 [ForkJoinPool.commonPool-worker-15] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-rootcoord-dml_0}, client=/12.11.0.81:13987, producerName=standalone-10-7, producerId=3}
01:49:17.507 [pulsar-io-36-12] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987][persistent://public/default/by-dev-rootcoord-dml_1] Creating producer. producerId=4
01:49:17.507 [ForkJoinPool.commonPool-worker-15] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] persistent://public/default/by-dev-rootcoord-dml_1 configured with schema false
01:49:17.507 [ForkJoinPool.commonPool-worker-15] INFO org.apache.pulsar.broker.service.ServerCnx - [/12.11.0.81:13987] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/by-dev-rootcoord-dml_1}, client=/12.11.0.81:13987, producerName=standalone-10-8, producerId=4}
01:51:10.711 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.OpAddEntry - [public/default/persistent/by-dev-datacoord-timetick-channel] Closing ledger 80491 for being full
01:51:10.713 [main-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/by-dev-datacoord-timetick-channel] Creating a new ledger
01:51:10.713 [main-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/by-dev-datacoord-timetick-channel] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 100, 101, 102, 97, 117, 108, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 98, 121, 45, 100, 101, 118, 45, 100, 97, 116, 97, 99, 111, 111, 114, 100, 45, 116, 105, 109, 101, 116, 105, 99, 107, 45, 99, 104, 97, 110, 110, 101, 108], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
01:51:10.715 [main-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [127.0.0.1:3181] for ledger: 80494
01:51:10.715 [main-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/by-dev-datacoord-timetick-channel] Created new ledger 80494
01:51:10.718 [pulsar-ordered-OrderedExecutor-2-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x10036446df20002 local:/127.0.0.1:58190 remoteserver:localhost/127.0.0.1:2181 lastZxid:540278 xid:44484 sent:44484 recv:44992 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/managed-ledgers/public/default/persistent/by-dev-datacoord-timetick-channel
Expected Behavior
No response
Steps To Reproduce
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: