Skip to content
This repository was archived by the owner on Nov 27, 2024. It is now read-only.

Add integration tests #25

Merged
merged 7 commits into from
Nov 14, 2023
Merged

Conversation

james-s-tayler
Copy link

Hi,

I've been working on a PR to add some integration tests into OnnxStackCore.sln, so that I have a repeatable way to test and run the functionality that isn't coupled to a particular UI implementation, to guard against regressions as development continues, and to have a way to begin contributing new things.

However, there are some problems I have encountered.

First was that for whatever reason, the tests run fine inside the Docker container, but not on my local ubuntu installation. I figured out it was due to some whacky bug in the OrtRuntime.Extensions library where it's trying to resolve the name for the dll's it needs to call when registering the custom operations, and those files have been renamed at some point, so it claims that it can't find ortextensions but in actual fact the file name changed to libortextensions.so and it needs that. Yet for some magical reason it just works when I run it inside the Docker container. Anyway, don't worry about that, since I can run it inside Docker I'm not so fussed on trying to solve that problem for now, just thought I would mention it.

Second, and more importantly... I had both tests in this PR running perfectly and passing before merging the current master branch which contained 15 new commits. I think those changes might have actually broke something?

The error that I now get when I run the tests is as follows:

yolo@pop-os:~/source/OnnxStack$ docker-compose up --build
Building app
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon  79.35MB
Step 1/8 : FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
 ---> 889872ffeee7
Step 2/8 : WORKDIR /app
 ---> Using cache
 ---> d640a6580dcb
Step 3/8 : RUN apt-get update && apt-get install -y curl
 ---> Using cache
 ---> 11e68392f874
Step 4/8 : RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash && apt-get install -y git-lfs
 ---> Using cache
 ---> 0d9beb7a3096
Step 5/8 : RUN git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -b onnx
 ---> Using cache
 ---> eea1b547a38c
Step 6/8 : COPY . .
 ---> Using cache
 ---> da9d2216040c
Step 7/8 : RUN dotnet build OnnxStackCore.sln
 ---> Using cache
 ---> 504b6c483a04
Step 8/8 : ENTRYPOINT ["dotnet", "test", "OnnxStackCore.sln"]
 ---> Using cache
 ---> 8ceca5b73ff3
Successfully built 8ceca5b73ff3
Successfully tagged onnxstack_app:latest
Starting onnxstack_app_1 ... done
Attaching to onnxstack_app_1
app_1  |   Determining projects to restore...
app_1  |   All projects are up-to-date for restore.
app_1  |   OnnxStack.Core -> /app/OnnxStack.Core/bin/Debug/net7.0/OnnxStack.Core.dll
app_1  |   OnnxStack.StableDiffusion -> /app/OnnxStack.StableDiffusion/bin/Debug/net7.0/OnnxStack.StableDiffusion.dll
app_1  |   OnnxStack.IntegrationTests -> /app/OnnxStack.IntegrationTests/bin/Debug/net7.0/OnnxStack.IntegrationTests.dll
app_1  | Test run for /app/OnnxStack.IntegrationTests/bin/Debug/net7.0/OnnxStack.IntegrationTests.dll (.NETCoreApp,Version=v7.0)
app_1  | Microsoft (R) Test Execution Command Line Tool Version 17.7.2 (x64)
app_1  | Copyright (c) Microsoft Corporation.  All rights reserved.
app_1  | 
app_1  | Starting test execution, please wait...
app_1  | A total of 1 test files matched the specified pattern.
app_1  | info: OnnxStack.IntegrationTests.StableDiffusionTests[0]
app_1  |       Attempting to load model StableDiffusion 1.5
app_1  | info: OnnxStack.IntegrationTests.StableDiffusionTests[0]
app_1  |       Attempting to load model StableDiffusion 1.5
app_1  | info: OnnxStack.StableDiffusion.Diffusers.StableDiffusion.StableDiffusionDiffuser[0]
app_1  |       [DiffuseAsync] - Begin...
app_1  | info: OnnxStack.StableDiffusion.Diffusers.StableDiffusion.StableDiffusionDiffuser[0]
app_1  |       [DiffuseAsync] - Model: StableDiffusion 1.5, Pipeline: StableDiffusion, Diffuser: TextToImage, Scheduler: EulerAncestral
app_1  | [xUnit.net 00:00:51.53]     OnnxStack.IntegrationTests.StableDiffusionTests.GivenTextToImage_WhenInference_ThenImageGenerated [FAIL]
app_1  |   Failed OnnxStack.IntegrationTests.StableDiffusionTests.GivenTextToImage_WhenInference_ThenImageGenerated [28 s]
app_1  |   Error Message:
app_1  |    System.AggregateException : One or more errors occurred. ([ErrorCode:RuntimeException] Non-zero status code returned while running ReorderOutput node. Name:'ReorderOutput_token_942' Status Message: /onnxruntime_src/onnxruntime/core/framework/execution_frame.cc:171 onnxruntime::common::Status onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue(int, int, const onnxruntime::TensorShape*, OrtValue*&, const onnxruntime::Node&) shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,4,64,64} Requested shape:{2,4,64,64}
app_1  | )
app_1  | ---- Microsoft.ML.OnnxRuntime.OnnxRuntimeException : [ErrorCode:RuntimeException] Non-zero status code returned while running ReorderOutput node. Name:'ReorderOutput_token_942' Status Message: /onnxruntime_src/onnxruntime/core/framework/execution_frame.cc:171 onnxruntime::common::Status onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue(int, int, const onnxruntime::TensorShape*, OrtValue*&, const onnxruntime::Node&) shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,4,64,64} Requested shape:{2,4,64,64}
app_1  | 
app_1  |   Stack Trace:
app_1  |      at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
app_1  |    at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
app_1  |    at OnnxStack.StableDiffusion.Services.StableDiffusionService.<>c.<GenerateAsImageAsync>b__10_0(Task`1 t) in /app/OnnxStack.StableDiffusion/Services/StableDiffusionService.cs:line 107
app_1  |    at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
app_1  |    at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
app_1  | --- End of stack trace from previous location ---
app_1  |    at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
app_1  |    at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
app_1  | --- End of stack trace from previous location ---
app_1  |    at OnnxStack.StableDiffusion.Services.StableDiffusionService.GenerateAsImageAsync(IModelOptions model, PromptOptions prompt, SchedulerOptions options, Action`2 progressCallback, CancellationToken cancellationToken) in /app/OnnxStack.StableDiffusion/Services/StableDiffusionService.cs:line 106
app_1  |    at OnnxStack.IntegrationTests.StableDiffusionTests.GivenTextToImage_WhenInference_ThenImageGenerated() in /app/OnnxStack.IntegrationTests/StableDiffusionTests.cs:line 83
app_1  | --- End of stack trace from previous location ---
app_1  | ----- Inner Stack Trace -----
app_1  |    at Microsoft.ML.OnnxRuntime.InferenceSession.<>c__DisplayClass75_0.<RunAsync>b__0(IReadOnlyCollection`1 outputs, IntPtr status)
app_1  | --- End of stack trace from previous location ---
app_1  |    at Microsoft.ML.OnnxRuntime.InferenceSession.RunAsync(RunOptions options, IReadOnlyCollection`1 inputNames, IReadOnlyCollection`1 inputValues, IReadOnlyCollection`1 outputNames, IReadOnlyCollection`1 outputValues)
app_1  |    at OnnxStack.StableDiffusion.Diffusers.StableDiffusion.StableDiffusionDiffuser.SchedulerStep(IModelOptions modelOptions, PromptOptions promptOptions, SchedulerOptions schedulerOptions, DenseTensor`1 promptEmbeddings, Boolean performGuidance, Action`2 progressCallback, CancellationToken cancellationToken) in /app/OnnxStack.StableDiffusion/Diffusers/StableDiffusion/StableDiffusionDiffuser.cs:line 91
app_1  |    at OnnxStack.StableDiffusion.Diffusers.DiffuserBase.DiffuseAsync(IModelOptions modelOptions, PromptOptions promptOptions, SchedulerOptions schedulerOptions, Action`2 progressCallback, CancellationToken cancellationToken) in /app/OnnxStack.StableDiffusion/Diffusers/DiffuserBase.cs:line 116
app_1  |    at OnnxStack.StableDiffusion.Services.StableDiffusionService.DiffuseAsync(IModelOptions modelOptions, PromptOptions promptOptions, SchedulerOptions schedulerOptions, Action`2 progress, CancellationToken cancellationToken) in /app/OnnxStack.StableDiffusion/Services/StableDiffusionService.cs:line 220
app_1  |    at OnnxStack.StableDiffusion.Services.StableDiffusionService.GenerateAsync(IModelOptions model, PromptOptions prompt, SchedulerOptions options, Action`2 progressCallback, CancellationToken cancellationToken) in /app/OnnxStack.StableDiffusion/Services/StableDiffusionService.cs:line 92
app_1  |   Standard Output Messages:
app_1  |  2023-11-13T12:11:15.4959035+00:00 - Information - 0 - OnnxStack.IntegrationTests.StableDiffusionTests - Attempting to load model StableDiffusion 1.5
app_1  |  2023-11-13T12:11:28.0413562+00:00 - Information - 0 - OnnxStack.StableDiffusion.Diffusers.StableDiffusion.StableDiffusionDiffuser - [DiffuseAsync] - Begin...
app_1  |  2023-11-13T12:11:28.0425114+00:00 - Information - 0 - OnnxStack.StableDiffusion.Diffusers.StableDiffusion.StableDiffusionDiffuser - [DiffuseAsync] - Model: StableDiffusion 1.5, Pipeline: StableDiffusion, Diffuser: TextToImage, Scheduler: EulerAncestral
app_1  | 
app_1  | 
app_1  | 
app_1  | Failed!  - Failed:     1, Passed:     1, Skipped:     0, Total:     2, Duration: 28 s - OnnxStack.IntegrationTests.dll (net7.0)
onnxstack_app_1 exited with code 1

Did y'all break something, or do I need to update how I'm calling OnnxStack in order to fix my test? But, also that likely means anyone who was calling it this way and updated to a newer version would likely be experiencing the same exception, right?

I can see it's complaining the tensor shape being different...

@saddam213
Copy link
Member

saddam213 commented Nov 13, 2023

Opps, my bad, seems I broke it for SD models, have commited a fix

I moved the codebase over to the new OnnxRuntime OrtValue API, was a large change and I missed this in my testing as I used LCM which does not do guidance so the error didnt show until I use the model you tried :/

Regarding OrtExtensions, I have noticed a bit of issue online about this, does not seem to be a windows issue, but a mac and linux one, one thing I do know is the app MUST be x64 or it wont work at all, as Mircosoft.ML is x64 only

This repo is new and does change rapidly, so sorry if I break your tests, still tiring to figure our the best way to structure this application as I find new cool things to add, so bare with me :p

@saddam213
Copy link
Member

If you publish the linux as self-contained it "should" run without issue

@saddam213
Copy link
Member

Tests look great, have not even had a chance to a one yet, so this is awesome

one small thing I noticed

services.AddOnnxStack();
services.AddOnnxStackStableDiffusion();

AddOnnxStackStableDiffusion calls AddOnnxStack internally so no need to call both

@james-s-tayler
Copy link
Author

Awesome! Thanks, yeah now that I've pulled the latest master the tests are indeed passing :)

I'll try add an LCM test in as-well, so all bases are covered.

@saddam213
Copy link
Member

Awesome! Thanks, yeah now that I've pulled the latest master the tests are indeed passing :)

I'll try add an LCM test in as-well, so all bases are covered.

It might be easier to do a test with GuidanceScale set to1f or below, as this would simulate a Model with that issue

@james-s-tayler
Copy link
Author

Given these tests are running on CPU execution provider they run super slow, so my current goal at the moment is just get the most minimal set of happy path test cases (do the models load? can we generate an image consistently?) and get that merged. Then look at getting the docker containers reworked, so that the can leverage the Nvidia one that allows for GPU pass through and get the ability to run the test suite much faster, and then work through adding more comprehensive coverage.

@saddam213
Copy link
Member

That would be awesome, appreciate any tests added.

I could setup a local server here in CHCH with some GPUs if that's easier? your NZ right?

@james-s-tayler
Copy link
Author

Oh snap, you're in NZ too! Nice! Didn't see that. Yeah, I'm up in Auckland.

I've got a 4090, so can run them plenty fast locally once the devops side of things supports it, but just need to work through that piece by piece. Ideally, the trajectory is getting the test suite running via hardware acceleration inside a CI/CD pipeline to ensure the integrity of the project as new functionality as developed. Keen also to make it as accessible/friendly as possible in terms of local developer experience to maximize ease of contribution.

@saddam213
Copy link
Member

4090 would be nice, I have a 3090 in my dev but only have P100 a T4 and 2 M40's in my servers, even combined they are not even close to your compute power

@saddam213
Copy link
Member

would you like me to merge this one in now, or would you like to add to this one?

@james-s-tayler
Copy link
Author

I'm still adding to this one. I'm just about to push up the last commit on it since it looks like I've got the LCM tests working now too :) Once that is pushed I will let you know and it'll be ready to merge.

@saddam213
Copy link
Member

sweet as

@james-s-tayler
Copy link
Author

Done! Should be ready to merge now.

@saddam213 saddam213 merged commit a15ee18 into TensorStack-AI:master Nov 14, 2023
@saddam213
Copy link
Member

thanks man!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants