Skip to content

Commit

Permalink
Some cursory perf analysis.
Browse files Browse the repository at this point in the history
  • Loading branch information
liuliu committed Sep 14, 2022
1 parent b39f29d commit 4f476f4
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ Converting the model to FP16 would save memory footprint instantly, but this wil

## Is It Comparable?

Right now, I didn't run any specific optimizations. Further, the model loading as of today for s4nnc requires executing the model once, and we have some optimization runs (find the most efficient kernels etc.) that are not saved. That has been said, we can compare the execution time of txt2img from Swift v.s. the one from CompVis (there are more optimized forks available, but going through them to find the best would take time) of the diffusion process + decoding process. The Swift txt2img on GPU took about 20s while the CompVis took about 11s (both with one 2080 Ti). I haven't done full analysis on where the slowness is from, but likely on the GroupNorm operator.
Right now, I didn't run any specific optimizations. Further, the model loading as of today for s4nnc requires executing the model once, and we have some optimization runs (find the most efficient kernels etc.) that are not saved. That has been said, we can compare the execution time of txt2img from Swift v.s. the one from CompVis (there are more optimized forks available, but going through them to find the best would take time) of the diffusion process + decoding process. The Swift txt2img on GPU took about 17s while the CompVis took about 11s (both with one 2080 Ti). Cursory look at `nvprof` output shows that transpose and not using cublasLt the leading cause for the extra 6s spent.
4 changes: 2 additions & 2 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

git_repository(
name = "s4nnc",
commit = "82c18635de0b46ecca7439099f19ba657cb0db77",
commit = "95b7b5f94f1b4385c67c065457710a674de77f06",
remote = "https://github.com/liuliu/s4nnc.git",
shallow_since = "1663118440 -0400",
shallow_since = "1663197056 -0400",
)

load("@s4nnc//:deps.bzl", "s4nnc_deps")
Expand Down
2 changes: 2 additions & 0 deletions examples/txt2img/main.swift
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,8 @@ func xPrevAndPredX0(
return (xPrev, predX0)
}

graph.workspaceSize = 1_024 * 1_024 * 1_024

graph.withNoGrad {
let tokensTensorGPU = tokensTensor.toGPU(0)
let positionTensorGPU = positionTensor.toGPU(0)
Expand Down

0 comments on commit 4f476f4

Please sign in to comment.