Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The retention is not working due to non closing ledger #6935

Open
alexku7 opened this issue May 10, 2020 · 10 comments
Open

The retention is not working due to non closing ledger #6935

alexku7 opened this issue May 10, 2020 · 10 comments
Labels
area/broker help wanted lifecycle/stale type/bug The PR fixed a bug or issue reported a bug

Comments

@alexku7
Copy link

alexku7 commented May 10, 2020

Describe the bug

I have been researching the retention problem and high disk usage.
Currently the retention is defined to 2 hours, ttl 600 sec.
"managedLedgerMaxLedgerRolloverTimeMinutes" : "240",
"managedLedgerCursorRolloverTimeInSeconds" : "14400"
"managedLedgerCursorMaxEntriesPerLedger" : "50000"
"managedLedgerMaxEntriesPerLedger" : "50000",

pulsar version is 2.5.0

if i run bin/pulsar-admin topics stats-internal persistent://internal/eyecloud/test591-InputEvents i get the following output

{
"entriesAddedCounter" : 75562,
"numberOfEntries" : 75562,
"totalSize" : 8430787149,
"currentLedgerEntries" : 75562,
"currentLedgerSize" : 8430787149,
"lastLedgerCreatedTimestamp" : "2020-05-09T15:19:16.289Z",
"waitingCursorsCount" : 3,
"pendingAddEntriesCount" : 0,
"lastConfirmedEntry" : "1073:75561",
"state" : "LedgerOpened",
"ledgers" : [ {
"ledgerId" : 1073,
"entries" : 0,
"size" : 0,
"offloaded" : false
} ],
"cursors" : {
"InputEvents-SourceReader" : {
"markDeletePosition" : "1073:75561",
"readPosition" : "1073:75562",
"waitingReadOp" : false,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 75562,
"cursorLedger" : 1075,
"cursorLedgerLastEntry" : 428,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2020-05-09T15:19:16.292Z",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
}

If i understand the output correctly , the ledger was opened almost 1 day ago, it has more then 50000 entries and according to the policies should be closed.

This is relatively a big topic and allocated lot of disk space. i suspect that it's not deleted because the ledger is still opened. I see the same things on the other similar topics .
Is my assumption correct ?

@sijie
Copy link
Member

sijie commented May 11, 2020

Which version of Pulsar are you using? Before 2.5.1, if there is no incoming traffic, the ledger rollover will not be triggered.

@alexku7
Copy link
Author

alexku7 commented May 11, 2020

Hi
We are using 2.5.0

What do you mean no incoming traffic ?
I think the limit of 50k entries and/or the limit of 240 minutes should cause to rollover. Am i right ?

Anyway, what the solution in this case? The topic has 8 Gb data bas doesn't have any incoming traffic , so the ledger is the open state. So the retention will not occur ever ?

@alexku7
Copy link
Author

alexku7 commented May 15, 2020

Hi @sijie
Recently We upgraded to 2.5.1 but unfortunately it seems that the bug is non resolved.

The ledgers are still in open state and not deleted :(

We have some scenario when a producer send huge amount of messages in a short period to the topic and "closes" it and opens a new one.

such topic contains a lot of messages and because of the ledger is remained opened forever , the topic is stored forever. After a while ( few days) we found ourselves with full disks and failed pulsar.

Actually I succeeded to reproduce this bug . If i take one of the problematic topics and simply simulate a little activity by sending messages during few minutes, the ledgers is closed and recycled.

We need help:) That very critical bug for us and i think for other customers

@sijie
Copy link
Member

sijie commented May 18, 2020

@alexku7 I see. It seems that we need to make sure a ledger to be closed if there are no more incoming traffic to that topic.

@sijie
Copy link
Member

sijie commented May 18, 2020

@alexku7 we will pick this item up to fix it for 2.6.0 release.

@alexku7
Copy link
Author

alexku7 commented May 19, 2020

Thank you @sijie
Do you know when the ver 2.6.0 is expected to be released ?

@sijie
Copy link
Member

sijie commented May 19, 2020

@alexku7 it is coming this month. @codelipenghui is the release manager and is already starting on wrapping up the 2.6.0 release.

@trexinc
Copy link
Contributor

trexinc commented May 24, 2020

Simple instruction to replicate issue (replicates in 2.5.2 as well):
Create namespace internal/test with the following retention and TTL:

bin/pulsar-admin namespaces get-retention internal/test
{
"retentionTimeInMinutes" : 60,
"retentionSizeInMB" : 5120
}

bin/pulsar-admin namespaces get-message-ttl internal/test
600

Then run in parallel for about five minutes the following (you can add -m 60000 to the producer):

bin/pulsar-perf produce -r 40000 internal/test/test1
bin/pulsar-client consume -s test internal/test/test1

The topic will then remain like this forever:

bin/pulsar-admin topics stats-internal internal/test/test1
{
"entriesAddedCounter" : 64431,
"numberOfEntries" : 64431,
"totalSize" : 6548489946,
"currentLedgerEntries" : 64431,
"currentLedgerSize" : 6548489946,
"lastLedgerCreatedTimestamp" : "2020-05-23T04:42:56.047Z",
"waitingCursorsCount" : 0,
"pendingAddEntriesCount" : 0,
"lastConfirmedEntry" : "182338:64430",
"state" : "LedgerOpened",
"ledgers" : [ {
"ledgerId" : 182338,
"entries" : 0,
"size" : 0,
"offloaded" : false
} ],
"cursors" : {
"sub" : {
"markDeletePosition" : "182338:64430",
"readPosition" : "182338:64431",
"waitingReadOp" : false,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 64431,
"cursorLedger" : 182339,
"cursorLedgerLastEntry" : 108,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2020-05-23T04:42:56.049Z",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
}
}
}

codelipenghui pushed a commit that referenced this issue Jun 8, 2020
This pull request resolves #7184 

### Motivation
This pull request implements a monitor thread to check if the current topic ledger meets the constraint of `managedLedgerMaxLedgerRolloverTimeMinutes` and trigger a rollover to make the configuration take effect. Another important idea here is, if we trigger a rollover, we can close the current ledger so that we can release the storage of current ledger btw. Because for some less commonly used topics, the current ledger data is likely to have expired, and the current rollover logic will only be triggered when adding a new entry. Obviously, this will result in a waste of disk space.

### Expected behaviors
The monitor thread will be scheduled at fix time interval and the interval is set to `managedLedgerMaxLedgerRolloverTimeMinutes`. Each inspection will make two judgments at the same time, i.e. `currentLedgerEntries > 0` and `currentLedgerIsFull()`. When the number of current entry is equal to 0, it will not trigger a new rollover and we use this to reduce the ledger's creation.

### Modifications
- The main modification took place in `ManagedLedgerImpl`
- In addition, a check thread was added in the `BrokerService`

maybe related to #6935
@govardhan1194
Copy link

Hey,

I am working on a project with pulsar and I am facing the same issue when it comes to offloading the data to s3. The ledger doesnt seem to close.
Is there any update on when the fixed version would be released?

Thanks in advance!

@alexku7
Copy link
Author

alexku7 commented Jul 11, 2020

Hello @sijie
Recently, we have upgraded to pulsar 2.6.0 but unfortunately the issue still exists :(
It seems that the topic still is not deleted because of opened ledger.

I am just wondering if the fix has been merged into 2.6.0 ?
I

cdbartholomew pushed a commit to kafkaesque-io/pulsar that referenced this issue Jul 24, 2020
This pull request resolves apache#7184 

### Motivation
This pull request implements a monitor thread to check if the current topic ledger meets the constraint of `managedLedgerMaxLedgerRolloverTimeMinutes` and trigger a rollover to make the configuration take effect. Another important idea here is, if we trigger a rollover, we can close the current ledger so that we can release the storage of current ledger btw. Because for some less commonly used topics, the current ledger data is likely to have expired, and the current rollover logic will only be triggered when adding a new entry. Obviously, this will result in a waste of disk space.

### Expected behaviors
The monitor thread will be scheduled at fix time interval and the interval is set to `managedLedgerMaxLedgerRolloverTimeMinutes`. Each inspection will make two judgments at the same time, i.e. `currentLedgerEntries > 0` and `currentLedgerIsFull()`. When the number of current entry is equal to 0, it will not trigger a new rollover and we use this to reduce the ledger's creation.

### Modifications
- The main modification took place in `ManagedLedgerImpl`
- In addition, a check thread was added in the `BrokerService`

maybe related to apache#6935
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this issue Aug 24, 2020
This pull request resolves apache#7184 

### Motivation
This pull request implements a monitor thread to check if the current topic ledger meets the constraint of `managedLedgerMaxLedgerRolloverTimeMinutes` and trigger a rollover to make the configuration take effect. Another important idea here is, if we trigger a rollover, we can close the current ledger so that we can release the storage of current ledger btw. Because for some less commonly used topics, the current ledger data is likely to have expired, and the current rollover logic will only be triggered when adding a new entry. Obviously, this will result in a waste of disk space.

### Expected behaviors
The monitor thread will be scheduled at fix time interval and the interval is set to `managedLedgerMaxLedgerRolloverTimeMinutes`. Each inspection will make two judgments at the same time, i.e. `currentLedgerEntries > 0` and `currentLedgerIsFull()`. When the number of current entry is equal to 0, it will not trigger a new rollover and we use this to reduce the ledger's creation.

### Modifications
- The main modification took place in `ManagedLedgerImpl`
- In addition, a check thread was added in the `BrokerService`

maybe related to apache#6935
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker help wanted lifecycle/stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

6 participants