Skip to content
This repository was archived by the owner on Jan 22, 2026. It is now read-only.

Commit 8ca2850

Browse files
authored
Merge pull request #7 from andrew/benchmarks
Add benchmark scripts
2 parents 6d5ad28 + 5a57efc commit 8ca2850

File tree

10 files changed

+227
-2
lines changed

10 files changed

+227
-2
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The database schema stores:
99
- Dependency changes (added/modified/removed) with before/after versions
1010
- Periodic snapshots of full dependency state for efficient point-in-time queries
1111

12-
See [docs/internals.md](docs/internals.md) for a detailed architecture overview and [docs/schema.md](docs/schema.md) for the database schema.
12+
See the [docs](docs/) folder for architecture details, database schema, and benchmarking tools.
1313

1414
Since the database is just SQLite, you can query it directly for ad-hoc analysis:
1515

File renamed without changes.

benchmark/commands.rb

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
#!/usr/bin/env ruby
2+
# frozen_string_literal: true
3+
4+
require "benchmark"
5+
require "optparse"
6+
7+
options = {
8+
iterations: 3,
9+
repo: nil
10+
}
11+
12+
OptionParser.new do |opts|
13+
opts.banner = "Usage: bin/benchmark commands [options]"
14+
15+
opts.on("-r", "--repo=PATH", "Path to repository to benchmark against") do |v|
16+
options[:repo] = v
17+
end
18+
19+
opts.on("-n", "--iterations=N", Integer, "Number of iterations per command (default: 3)") do |v|
20+
options[:iterations] = v
21+
end
22+
23+
opts.on("-h", "--help", "Show this help") do
24+
puts opts
25+
exit
26+
end
27+
end.parse!
28+
29+
unless options[:repo]
30+
puts "Error: --repo is required"
31+
puts "Usage: bin/benchmark commands --repo /path/to/repo"
32+
exit 1
33+
end
34+
35+
repo_path = File.expand_path(options[:repo])
36+
unless File.directory?(repo_path)
37+
puts "Error: #{repo_path} is not a directory"
38+
exit 1
39+
end
40+
41+
unless File.exist?(File.join(repo_path, ".git", "pkgs.sqlite3"))
42+
puts "Error: #{repo_path} does not have a git-pkgs database"
43+
puts "Run 'git pkgs init' in that repository first"
44+
exit 1
45+
end
46+
47+
iterations = options[:iterations]
48+
gem_root = File.expand_path("../..", __FILE__)
49+
50+
# Use bundle exec to ensure we run the local development version
51+
commands = {
52+
"blame" => "bundle exec --gemfile=#{gem_root}/Gemfile ruby -I#{gem_root}/lib #{gem_root}/exe/git-pkgs blame --no-pager",
53+
"stale" => "bundle exec --gemfile=#{gem_root}/Gemfile ruby -I#{gem_root}/lib #{gem_root}/exe/git-pkgs stale --no-pager",
54+
"stats" => "bundle exec --gemfile=#{gem_root}/Gemfile ruby -I#{gem_root}/lib #{gem_root}/exe/git-pkgs stats --no-pager",
55+
"log" => "bundle exec --gemfile=#{gem_root}/Gemfile ruby -I#{gem_root}/lib #{gem_root}/exe/git-pkgs log --no-pager",
56+
"list" => "bundle exec --gemfile=#{gem_root}/Gemfile ruby -I#{gem_root}/lib #{gem_root}/exe/git-pkgs list --no-pager"
57+
}
58+
59+
puts "Command benchmarks"
60+
puts "=" * 60
61+
puts "Repository: #{repo_path}"
62+
puts "Iterations: #{iterations}"
63+
puts
64+
65+
results = {}
66+
67+
Dir.chdir(repo_path) do
68+
commands.each do |name, cmd|
69+
times = []
70+
71+
# Warmup run
72+
system(cmd, out: File::NULL, err: File::NULL)
73+
74+
iterations.times do
75+
time = Benchmark.realtime do
76+
system(cmd, out: File::NULL, err: File::NULL)
77+
end
78+
times << time
79+
end
80+
81+
avg = times.sum / times.size
82+
min = times.min
83+
max = times.max
84+
results[name] = { avg: avg, min: min, max: max }
85+
86+
puts format("%-10s avg: %6.3fs min: %6.3fs max: %6.3fs", name, avg, min, max)
87+
end
88+
end
89+
90+
puts
91+
puts "Total average: #{format("%.3fs", results.values.sum { |r| r[:avg] })}"
File renamed without changes.
File renamed without changes.

bin/benchmark

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#!/usr/bin/env ruby
2+
# frozen_string_literal: true
3+
4+
BENCHMARKS = {
5+
"full" => "Full pipeline benchmark",
6+
"detailed" => "Detailed phase breakdown",
7+
"bulk" => "Bulk insert benchmark",
8+
"db" => "DB operation breakdown",
9+
"commands" => "CLI command benchmarks"
10+
}.freeze
11+
12+
def usage
13+
puts "Usage: bin/benchmark <type> [repo_path] [sample_size]"
14+
puts " bin/benchmark commands --repo /path/to/repo [-n iterations]"
15+
puts
16+
puts "Types:"
17+
BENCHMARKS.each do |name, desc|
18+
puts " #{name.ljust(10)} #{desc}"
19+
end
20+
puts
21+
puts "Example: bin/benchmark full /path/to/repo 500"
22+
exit 1
23+
end
24+
25+
type = ARGV.shift
26+
usage if type.nil? || type == "-h" || type == "--help"
27+
usage unless BENCHMARKS.key?(type)
28+
29+
script = File.expand_path("../benchmark/#{type}.rb", __dir__)
30+
exec("ruby", script, *ARGV)

docs/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Documentation
2+
3+
Technical documentation for git-pkgs maintainers and contributors.
4+
5+
- [internals.md](internals.md) - Architecture overview, how commands work, key algorithms
6+
- [schema.md](schema.md) - Database tables and relationships
7+
- [benchmarking.md](benchmarking.md) - Performance profiling tools
8+
9+
For user-facing documentation, see the main [README](../README.md).

docs/benchmarking.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Benchmarking
2+
3+
git-pkgs includes benchmark scripts for profiling performance. Run them with:
4+
5+
```bash
6+
bin/benchmark <type> [repo_path] [sample_size]
7+
```
8+
9+
The default repo is `/Users/andrew/code/octobox` and sample size is 500 commits.
10+
11+
## Benchmark Types
12+
13+
### full
14+
15+
Full pipeline benchmark with phase breakdown:
16+
17+
```bash
18+
bin/benchmark full /path/to/repo 500
19+
```
20+
21+
Measures time spent in each phase: git diff extraction, manifest filtering, parsing, and database writes. Reports overall throughput in commits/sec.
22+
23+
### detailed
24+
25+
Granular breakdown of each processing step:
26+
27+
```bash
28+
bin/benchmark detailed /path/to/repo 500
29+
```
30+
31+
Shows timing for blob path extraction, regex pre-filtering, bibliothecary identification, and manifest parsing. Also breaks down parsing time by platform (rubygems, npm, etc.) and reports how many commits pass each filter stage.
32+
33+
### bulk
34+
35+
Compares data collection vs bulk insert performance:
36+
37+
```bash
38+
bin/benchmark bulk /path/to/repo 500
39+
```
40+
41+
Separates the time spent analyzing commits from the time spent writing to the database. Uses `insert_all` for bulk operations. Helps identify whether bottlenecks are in git/parsing or database writes.
42+
43+
### db
44+
45+
Individual database operation timing:
46+
47+
```bash
48+
bin/benchmark db /path/to/repo 200
49+
```
50+
51+
Measures each ActiveRecord operation separately: commit creation, branch_commit creation, manifest lookups, change inserts, and snapshot inserts. Shows per-operation averages in milliseconds.
52+
53+
### commands
54+
55+
End-to-end CLI command benchmarks:
56+
57+
```bash
58+
bin/benchmark commands --repo /path/to/repo -n 3
59+
```
60+
61+
Runs actual git-pkgs commands (`blame`, `stale`, `stats`, `log`, `list`) against a repo with an existing database. Measures wall-clock time over multiple iterations. Useful for regression testing command performance.
62+
63+
The repo must already have a database from `git pkgs init`.
64+
65+
## Interpreting Results
66+
67+
The main bottlenecks are typically:
68+
69+
1. **Git blob reads** - extracting file contents from commits
70+
2. **Bibliothecary parsing** - parsing manifest file contents
71+
3. **Database writes** - inserting records (mitigated by bulk inserts)
72+
73+
The regex pre-filter (`might_have_manifests?`) skips most commits cheaply. On a typical codebase, only 10-20% of commits touch files that could be manifests.
74+
75+
Blob OID caching helps when the same manifest content appears across multiple commits. The cache stats show hit rates.
76+
77+
## Example Output
78+
79+
```
80+
Full pipeline benchmark: 500 commits
81+
============================================================
82+
83+
Full pipeline breakdown:
84+
------------------------------------------------------------
85+
git_diff 0.892s (12.3%)
86+
filtering 0.234s (3.2%)
87+
parsing 4.521s (62.4%)
88+
db_writes 1.602s (22.1%)
89+
------------------------------------------------------------
90+
Total 7.249s
91+
92+
Throughput: 69.0 commits/sec
93+
Cache stats: {:cached_blobs=>142, :blobs_with_hits=>89}
94+
```

git-pkgs.gemspec

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ Gem::Specification.new do |spec|
2323
spec.files = IO.popen(%w[git ls-files -z], chdir: __dir__, err: IO::NULL) do |ls|
2424
ls.readlines("\x0", chomp: true).reject do |f|
2525
(f == gemspec) ||
26-
f.start_with?(*%w[bin/ Gemfile .gitignore test/ .github/])
26+
f.start_with?(*%w[bin/ Gemfile .gitignore test/ .github/ docs/ benchmark/]) ||
27+
f.end_with?(*%w[Rakefile CODE_OF_CONDUCT.md CONTRIBUTING.md SECURITY.md])
2728
end
2829
end
2930
spec.bindir = "exe"

0 commit comments

Comments
 (0)