Description
Trying to update various gix dependencies to latest, the pack size seems to have increased by ~50% in the latest release of gix-pack at least in some cases.
In my case I noticed it in a case where I create a relatively wide tree (1500 files with randomly generated names). Doing just that is sufficient to make the issue noticeable.
(the full use case actually wants to trigger a massive conflict, but this is unnecessary and irrelevant in this case)
Steps to reproduce 🕹
[package]
name = "packtest"
version = "0.1.0"
edition = "2024"
[dependencies]
# gix-actor = "*"
# gix-date = "*"
# gix-hash = "*"
# gix-hashtable = "*"
# gix-object = "*"
# gix-pack = "*"
gix-actor = "0.34.0"
gix-date = "0.9.4"
gix-hash = "0.17.0"
gix-hashtable = "=0.8.0"
gix-object = "0.48.0"
gix-pack = "0.58.0"
hex = "0.4.3"
rand = "0.9.1"
// main.rs
use std::collections::BTreeMap;
use std::io::Write as _;
use rand::Rng;
use gix_hash::ObjectId;
use gix_object::WriteTo as _;
type Store = BTreeMap::<gix_hash::ObjectId, gix_object::Object>;
fn store(store: &mut Store, object: gix_object::Object) -> ObjectId {
let mut ser = gix_hash::io::Write::new(Vec::new(), gix_hash::Kind::Sha1);
ser.write(&object.loose_header()).unwrap();
object.write_to(&mut ser).unwrap();
let oid = ser.hash.try_finalize().unwrap();
store.insert(oid, object);
oid
}
fn main() {
let mut objects = Store::new();
let oid = store(
&mut objects,
gix_object::Object::Blob(gix_object::BlobRef{ data: b"xoxo"}.into())
);
let mut files = rand::rng()
.random_iter::<[u8;10]>()
.map(hex::encode)
.take(1500)
.collect::<Vec<String>>();
files.sort();
let mode = gix_object::tree::EntryKind::Blob.into();
let t0 = store(&mut objects, gix_object::Tree {
entries: files.iter().map(|fname| gix_object::tree::Entry {
mode,
oid,
filename: fname.as_str().into(),
}).collect(),
}.into());
store(&mut objects, gix_object::Commit {
tree: t0,
parents: Default::default(),
author: gix_actor::Signature {
name: "foo".into(),
email: "bar@example.com".into(),
time: gix_date::Time {
seconds: 1747920830,
offset: 0,
sign: gix_date::time::Sign::Plus,
},
},
committer: gix_actor::Signature {
name: "foo".into(),
email: "bar@example.com".into(),
time: gix_date::Time {
seconds: 1747920830,
offset: 0,
sign: gix_date::time::Sign::Plus,
},
},
encoding: None,
message: "xxx".into(),
extra_headers: Vec::new(),
}.into());
// let mut fs1: Vec<_> = files.iter().map(|fname| gix_object::tree::Entry {
// mode,
// oid,
// filename: fname.as_str().into(),
// }).collect();
// fs1.push(gix_object::tree::Entry { mode, oid, filename: "a".into()});
// fs1.sort();
// let _t2 = store(&mut objects, gix_object::Tree { entries: fs1 }.into());
let count = objects.len();
let mut pack = Vec::new();
let mut entries_writer = gix_pack::data::output::bytes::FromEntriesIter::new(
objects.into_iter().map(
|(oid, obj)| -> Result<Vec<_>, gix_pack::data::output::entry::Error> {
use gix_pack::data::output::{Entry, Count};
let kind = obj.kind();
let mut buf = Vec::new();
obj.write_to(&mut buf).unwrap();
Ok(vec![Entry::from_data(
&Count::from_data(oid, None),
&gix_object::Data::new(kind, &buf),
)?])
}
),
&mut pack,
count.try_into().unwrap(),
gix_pack::data::Version::V2,
gix_hash::Kind::Sha1,
);
for e in entries_writer.by_ref() {
e.unwrap();
}
entries_writer.digest().unwrap();
println!("{}", pack.len());
}
Running locally I get a pack size of 19375, uncommenting the "latest" packages, commenting the "old" packages and sign
lines, updating, and re-running, I get a pack size of 29737.
Uncommenting the bit which adds a second tree yields respectively 38548 and 59395 which is why I originally thought this was a delta-ification issue, but these sizes are about twice the size of the corresponding single tree version so delta-ification looks like it has never done much in this case.
Git behavior
Creating the same repository in git (with commits and branches to keep the objects alive), including the second tree, yields a pack that's about 20k. Here's a bundle you can check: