Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix garnet persistence #5087

Merged
merged 40 commits into from
Sep 19, 2024
Merged

Conversation

Alirexaa
Copy link
Contributor

@Alirexaa Alirexaa commented Jul 26, 2024

Fixes: #4870

Microsoft Reviewers: Open in CodeFlow

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-integrations Issues pertaining to Aspire Integrations packages label Jul 26, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 26, 2024
@Alirexaa
Copy link
Contributor Author

WithDataVolume failed to start, related issue in the Garnet repo: microsoft/garnet#536.
also interval is not working, related issue in the Garnet repo: microsoft/garnet#537.
for now only WithDataBindMount works

@Alirexaa Alirexaa marked this pull request as ready for review July 30, 2024 15:49
@Alirexaa
Copy link
Contributor Author

WithPersistence now works, so WithDataVolume and WithDataBindMount
will work.

keysChangedThreshold is unused now. we could mark this method as Obsolete and introduce a new public API.

/// <returns>The <see cref="IResourceBuilder{T}"/>.</returns>
public static IResourceBuilder<GarnetResource> WithPersistence(this IResourceBuilder<GarnetResource> builder,
TimeSpan? interval = null, long keysChangedThreshold = 1)

also despite Aspire.Garnet.Hosting shipped in 8.1, PublicAPI.Shipped.txt is untouched.

I want to write functional tests for WithDataVolume and WithDataBindMount but we should decide about public API first.

cc @eerhardt @mitchdenny @radical

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking the existing API as Obsolete and introducing a new API sounds like the right path to me.

playground/Garnet/Garnet.AppHost/Directory.Build.props Outdated Show resolved Hide resolved
@Alirexaa
Copy link
Contributor Author

@eerhardt, do you agree with the last change?
I want to fix tests but I need your confirmation about the public API signature first.

src/Aspire.Hosting.Garnet/GarnetBuilderExtensions.cs Outdated Show resolved Hide resolved
@@ -83,7 +83,7 @@ public static class GarnetBuilderExtensions
isReadOnly);
if (!isReadOnly)
{
builder.WithPersistence();
builder.WithPersistence(interval: null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate that the overloads are conflicting. Can we solve this somehow? Potentially by making the Obsolete overload not have default parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt, do you mean something like this?

[Obsolete("This method is obsolete and will be removed in a future version. Use the overload without the keysChangedThreshold parameter.")]
  public static IResourceBuilder<GarnetResource> WithPersistence(this IResourceBuilder<GarnetResource> builder,
      TimeSpan? interval, long keysChangedThreshold)
      => WithPersistence(builder, interval);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think removing the optional parameters on the Obsolete overload is a good compromise here.

@mitchdenny - thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the strategy.

@eerhardt
Copy link
Member

eerhardt commented Aug 1, 2024

Looks like the new test is currently failing.

@Alirexaa Alirexaa changed the title WIP: Fix garnet persistence Fix garnet persistence Aug 1, 2024
@mitchdenny
Copy link
Member

So, if I am following all of this correctly the following happened:

  1. Bind mounts were not working.
  2. Garnet fixed the issue, but they also broke their args.
  3. we updated the args we use and started using their 1.0
  4. Bind mounts now work?

I think I'm OK with the breaking change here - it seems necessary.

@Alirexaa
Copy link
Contributor Author

Alirexaa commented Aug 8, 2024

So, if I am following all of this correctly the following happened:

  1. Bind mounts were not working.
  2. Garnet fixed the issue, but they also broke their args.
  3. we updated the args we use and started using their 1.0
  4. Bind mounts now work?

I think I'm OK with the breaking change here - it seems necessary.

Actually, we used the wrong args plus they also had issues in persistance.

@Alirexaa
Copy link
Contributor Author

Alirexaa commented Sep 3, 2024

Looks like the test is still failing.

see microsoft/garnet#536 (comment)

@eerhardt
Copy link
Member

eerhardt commented Sep 4, 2024

Looks like the non-volume test is broken:

[xUnit.net 00:03:32.05]     Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False) [FAIL]
[xUnit.net 00:03:32.05]       StackExchange.Redis.RedisConnectionException : The message timed out in the backlog attempting to send because no connection became available (5000ms) - Last Connection Exception: UnableToConnect (None, 0-read, last-recv: 0) on localhost:33993/Interactive, Flushed/ReadAsync, last: ECHO, origin: ResetNonConnected, outstanding: 12, last-read: 5s ago, last-write: 5s ago, unanswered-write: 5s ago, keep-alive: 60s, state: ConnectedEstablishing, mgr: 10 of 10 available, last-heartbeat: never, global: 17s ago, v: 2.8.0.27420, command=SET, timeout: 5000, inst: 0, qu: 0, qs: 12, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 0, cur-in: 0, sync-ops: 0, async-ops: 11, serverEndpoint: localhost:33993, conn-sec: 40, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: a006N32(SE.Redis-v2.8.0.27420), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=2,Free=32765,Min=2,Max=32767), POOL: (Threads=5,QueuedItems=0,CompletedItems=4179,Timers=8), v: 2.8.0.27420 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
[xUnit.net 00:03:32.05]       ---- StackExchange.Redis.RedisConnectionException : UnableToConnect (None, 0-read, last-recv: 0) on localhost:33993/Interactive, Flushed/ReadAsync, last: ECHO, origin: ResetNonConnected, outstanding: 12, last-read: 5s ago, last-write: 5s ago, unanswered-write: 5s ago, keep-alive: 60s, state: ConnectedEstablishing, mgr: 10 of 10 available, last-heartbeat: never, global: 17s ago, v: 2.8.0.27420
[xUnit.net 00:03:32.05]       Stack Trace:
[xUnit.net 00:03:32.05]         /_/tests/Aspire.Hosting.Garnet.Tests/GarnetFunctionalTests.cs(125,0): at Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.<>c__DisplayClass3_0.<<WithDataShouldPersistStateBetweenUsages>b__0>d.MoveNext()
[xUnit.net 00:03:32.05]         --- End of stack trace from previous location ---
[xUnit.net 00:03:32.05]            at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__3_0>d.MoveNext()
[xUnit.net 00:03:32.05]         --- End of stack trace from previous location ---
[xUnit.net 00:03:32.05]            at Polly.Outcome`1.GetResultOrRethrow()
[xUnit.net 00:03:32.05]            at Polly.ResiliencePipeline.ExecuteAsync(Func`2 callback, CancellationToken cancellationToken)
[xUnit.net 00:03:32.05]         /_/tests/Aspire.Hosting.Garnet.Tests/GarnetFunctionalTests.cs(119,0): at Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume)
[xUnit.net 00:03:32.05]         /_/tests/Aspire.Hosting.Garnet.Tests/GarnetFunctionalTests.cs(139,0): at Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume)
[xUnit.net 00:03:32.05]         --- End of stack trace from previous location ---
[xUnit.net 00:03:32.05]         ----- Inner Stack Trace -----

@Alirexaa
Copy link
Contributor Author

Alirexaa commented Sep 4, 2024

Looks like the non-volume test is broken:

[xUnit.net 00:03:32.05]     Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False) [FAIL]
[xUnit.net 00:03:32.05]       StackExchange.Redis.RedisConnectionException : The message timed out in the backlog attempting to send because no connection became available (5000ms) - Last Connection Exception: UnableToConnect (None, 0-read, last-recv: 0) on localhost:33993/Interactive, Flushed/ReadAsync, last: ECHO, origin: ResetNonConnected, outstanding: 12, last-read: 5s ago, last-write: 5s ago, unanswered-write: 5s ago, keep-alive: 60s, state: ConnectedEstablishing, mgr: 10 of 10 available, last-heartbeat: never, global: 17s ago, v: 2.8.0.27420, command=SET, timeout: 5000, inst: 0, qu: 0, qs: 12, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 0, cur-in: 0, sync-ops: 0, async-ops: 11, serverEndpoint: localhost:33993, conn-sec: 40, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: a006N32(SE.Redis-v2.8.0.27420), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=2,Free=32765,Min=2,Max=32767), POOL: (Threads=5,QueuedItems=0,CompletedItems=4179,Timers=8), v: 2.8.0.27420 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
[xUnit.net 00:03:32.05]       ---- StackExchange.Redis.RedisConnectionException : UnableToConnect (None, 0-read, last-recv: 0) on localhost:33993/Interactive, Flushed/ReadAsync, last: ECHO, origin: ResetNonConnected, outstanding: 12, last-read: 5s ago, last-write: 5s ago, unanswered-write: 5s ago, keep-alive: 60s, state: ConnectedEstablishing, mgr: 10 of 10 available, last-heartbeat: never, global: 17s ago, v: 2.8.0.27420
[xUnit.net 00:03:32.05]       Stack Trace:
[xUnit.net 00:03:32.05]         /_/tests/Aspire.Hosting.Garnet.Tests/GarnetFunctionalTests.cs(125,0): at Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.<>c__DisplayClass3_0.<<WithDataShouldPersistStateBetweenUsages>b__0>d.MoveNext()
[xUnit.net 00:03:32.05]         --- End of stack trace from previous location ---
[xUnit.net 00:03:32.05]            at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__3_0>d.MoveNext()
[xUnit.net 00:03:32.05]         --- End of stack trace from previous location ---
[xUnit.net 00:03:32.05]            at Polly.Outcome`1.GetResultOrRethrow()
[xUnit.net 00:03:32.05]            at Polly.ResiliencePipeline.ExecuteAsync(Func`2 callback, CancellationToken cancellationToken)
[xUnit.net 00:03:32.05]         /_/tests/Aspire.Hosting.Garnet.Tests/GarnetFunctionalTests.cs(119,0): at Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume)
[xUnit.net 00:03:32.05]         /_/tests/Aspire.Hosting.Garnet.Tests/GarnetFunctionalTests.cs(139,0): at Aspire.Hosting.Garnet.Tests.GarnetFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume)
[xUnit.net 00:03:32.05]         --- End of stack trace from previous location ---
[xUnit.net 00:03:32.05]         ----- Inner Stack Trace -----

There is issue in the Linux host, this test works in my local (Windows host)

  Standard Output Messages:
 | [2024-09-03T20:24:37] Aspire.Hosting.DistributedApplication Information: Aspire version: 9.0.0-ci
 | [2024-09-03T20:24:37] Aspire.Hosting.DistributedApplication Information: Distributed application starting.
 | [2024-09-03T20:24:37] Aspire.Hosting.DistributedApplication Information: Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
 | [2024-09-03T20:24:37] Aspire.Hosting.Dcp.DcpHostService Information: Starting DCP with arguments: start-apiserver --monitor 22066 --detach --kubeconfig "/datadisks/disk1/work/AEF2095F/t/aspire.3grUsK/kubeconfig"
 | [2024-09-03T20:24:38] Aspire.Hosting.Dcp.start-apiserver.api-server Information: Starting API server...
 | [2024-09-03T20:24:38] Aspire.Hosting.Dcp.start-apiserver.api-server Information: API server started	{"Address": "127.0.0.1", "Port": 41847}
 | [2024-09-03T20:24:38] Aspire.Hosting.Dcp.start-apiserver.dcp-host Information: Starting DCP controller host
 | [2024-09-03T20:24:38] Aspire.Hosting.Dcp.start-apiserver.dcp-host Information: Started all services	{"count": 1}
 | [2024-09-03T20:24:38] Aspire.Hosting.Dcp.dcpctrl.IdeExecutableRunner Information: Executables cannot be started via IDE: missing required environment variable 'DEBUG_SESSION_PORT'
 | [2024-09-03T20:24:38] Aspire.Hosting.Dcp.dcpctrl Information: starting controller manager
 | [2024-09-03T20:24:38] Aspire.Hosting.DistributedApplication Information: Distributed application started. Press Ctrl+C to shut down.
 | [2024-09-03T20:24:40] Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler Error: could not determine host address and port for container port	{"Container": {"name":"garnet-kvsehqje-2ba90115"}, "Reconciliation": 5, "error": "container '/garnet-kvsehqje-2ba90115' is not running: exited"}
 | [2024-09-03T20:24:40] Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler Error: could not create Endpoint object	{"Container": {"name":"garnet-kvsehqje-2ba90115"}, "Reconciliation": 5, "ServiceName": "garnet-2ba90115", "Workload": "/garnet-kvsehqje-2ba90115", "error": "container '/garnet-kvsehqje-2ba90115' is not running: exited"}
 | [2024-09-03T20:24:40] Aspire.Hosting.Tests.Resources.garnet Information: 1: 2024-09-03T20:24:39.7686067Z \x1b[31m    _________
 | [2024-09-03T20:24:40] Aspire.Hosting.Tests.Resources.garnet Information: 2: 2024-09-03T20:24:39.7686762Z    /_||___||_\      \x1b[0mGarnet 1.0.19 64 bit; standalone mode\x1b[31m
 | [2024-09-03T20:24:40] Aspire.Hosting.Tests.Resources.garnet Information: 3: 2024-09-03T20:24:39.7686832Z    '. \   / .'      \x1b[0mPort: 6379\x1b[31m
 | [2024-09-03T20:24:40] Aspire.Hosting.Tests.Resources.garnet Information: 4: 2024-09-03T20:24:39.7686883Z      '.\ /.'        \x1b[35mhttps://aka.ms/GetGarnet\x1b[31m
 | [2024-09-03T20:24:40] Aspire.Hosting.Tests.Resources.garnet Information: 5: 2024-09-03T20:24:39.7686935Z        '.'
 | [2024-09-03T20:24:40] Aspire.Hosting.Tests.Resources.garnet Information: 6: 2024-09-03T20:24:39.7686983Z \x1b[0m
 | [2024-09-03T20:24:40] Aspire.Hosting.Tests.Resources.garnet Information: 7: 2024-09-03T20:24:39.8603021Z Unable to initialize server due to exception: Access to the path '/data/checkpoints/AOF' is denied.
 | [2024-09-03T20:27:31] Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler Information: no more Container resources are being watched, cancelling container watch	{"Container": {"name":"garnet-kvsehqje-2ba90115"}, "Reconciliation": 8}

see microsoft/garnet#536 (comment)

@Alirexaa
Copy link
Contributor Author

Alirexaa commented Sep 5, 2024

@eerhardt the test still failing

@eerhardt
Copy link
Member

eerhardt commented Sep 5, 2024

@eerhardt the test still failing

I fixed it locally on my linux machine. It should be passing now.

@davidfowl
Copy link
Member

I don't think we need a garnet sample until we have a garnet client. Can we add garnet and valkey to the same sample?

@Alirexaa
Copy link
Contributor Author

I don't think we need a garnet sample until we have a garnet client. Can we add garnet and valkey to the same sample?

I could add garnet to the redis sample in this PR and valkey in another PR.

@Alirexaa
Copy link
Contributor Author

@davidfowl, This PR is ready to merge. If you merge it, I can add the Valkey sample to the playground app.

@mitchdenny mitchdenny merged commit dc43825 into dotnet:main Sep 19, 2024
11 checks passed
@mitchdenny
Copy link
Member

Thanks @Alirexaa

@Alirexaa Alirexaa deleted the fix-garnet-persistence branch September 19, 2024 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-integrations Issues pertaining to Aspire Integrations packages community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garnet WithPersistence Failed To Start
5 participants