@@ -349,6 +349,111 @@ error conditions:
349
349
should not use bundle URIs for fetch unless the server has explicitly
350
350
recommended it through a `bundle.heuristic` value.
351
351
352
+ Example Bundle Provider organization
353
+ ------------------------------------
354
+
355
+ The bundle URI feature is intentionally designed to be flexible to
356
+ different ways a bundle provider wants to organize the object data.
357
+ However, it can be helpful to have a complete organization model described
358
+ here so providers can start from that base.
359
+
360
+ This example organization is a simplified model of what is used by the
361
+ GVFS Cache Servers (see section near the end of this document) which have
362
+ been beneficial in speeding up clones and fetches for very large
363
+ repositories, although using extra software outside of Git.
364
+
365
+ The bundle provider deploys servers across multiple geographies. Each
366
+ server manages its own bundle set. The server can track a number of Git
367
+ repositories, but provides a bundle list for each based on a pattern. For
368
+ example, when mirroring a repository at `https://<domain>/<org>/<repo>`
369
+ the bundle server could have its bundle list available at
370
+ `https://<server-url>/<domain>/<org>/<repo>`. The origin Git server can
371
+ list all of these servers under the "any" mode:
372
+
373
+ [bundle]
374
+ version = 1
375
+ mode = any
376
+
377
+ [bundle "eastus"]
378
+ uri = https://eastus.example.com/<domain>/<org>/<repo>
379
+
380
+ [bundle "europe"]
381
+ uri = https://europe.example.com/<domain>/<org>/<repo>
382
+
383
+ [bundle "apac"]
384
+ uri = https://apac.example.com/<domain>/<org>/<repo>
385
+
386
+ This "list of lists" is static and only changes if a bundle server is
387
+ added or removed.
388
+
389
+ Each bundle server manages its own set of bundles. The initial bundle list
390
+ contains only a single bundle, containing all of the objects received from
391
+ cloning the repository from the origin server. The list uses the
392
+ `creationToken` heuristic and a `creationToken` is made for the bundle
393
+ based on the server's timestamp.
394
+
395
+ The bundle server runs regularly-scheduled updates for the bundle list,
396
+ such as once a day. During this task, the server fetches the latest
397
+ contents from the origin server and generates a bundle containing the
398
+ objects reachable from the latest origin refs, but not contained in a
399
+ previously-computed bundle. This bundle is added to the list, with care
400
+ that the `creationToken` is strictly greater than the previous maximum
401
+ `creationToken`.
402
+
403
+ When the bundle list grows too large, say more than 30 bundles, then the
404
+ oldest "_N_ minus 30" bundles are combined into a single bundle. This
405
+ bundle's `creationToken` is equal to the maximum `creationToken` among the
406
+ merged bundles.
407
+
408
+ An example bundle list is provided here, although it only has two daily
409
+ bundles and not a full list of 30:
410
+
411
+ [bundle]
412
+ version = 1
413
+ mode = all
414
+ heuristic = creationToken
415
+
416
+ [bundle "2022-02-13-1644770820-daily"]
417
+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644770820-daily.bundle
418
+ creationToken = 1644770820
419
+
420
+ [bundle "2022-02-09-1644442601-daily"]
421
+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644442601-daily.bundle
422
+ creationToken = 1644442601
423
+
424
+ [bundle "2022-02-02-1643842562"]
425
+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-02-1643842562.bundle
426
+ creationToken = 1643842562
427
+
428
+ To avoid storing and serving object data in perpetuity despite becoming
429
+ unreachable in the origin server, this bundle merge can be more careful.
430
+ Instead of taking an absolute union of the old bundles, instead the bundle
431
+ can be created by looking at the newer bundles and ensuring that their
432
+ necessary commits are all available in this merged bundle (or in another
433
+ one of the newer bundles). This allows "expiring" object data that is not
434
+ being used by new commits in this window of time. That data could be
435
+ reintroduced by a later push.
436
+
437
+ The intention of this data organization has two main goals. First, initial
438
+ clones of the repository become faster by downloading precomputed object
439
+ data from a closer source. Second, `git fetch` commands can be faster,
440
+ especially if the client has not fetched for a few days. However, if a
441
+ client does not fetch for 30 days, then the bundle list organization would
442
+ cause redownloading a large amount of object data.
443
+
444
+ One way to make this organization more useful to users who fetch frequently
445
+ is to have more frequent bundle creation. For example, bundles could be
446
+ created every hour, and then once a day those "hourly" bundles could be
447
+ merged into a "daily" bundle. The daily bundles are merged into the
448
+ oldest bundle after 30 days.
449
+
450
+ It is recommened that this bundle strategy is repeated with the `blob:none`
451
+ filter if clients of this repository are expecting to use blobless partial
452
+ clones. This list of blobless bundles stays in the same list as the full
453
+ bundles, but uses the `bundle.<id>.filter` key to separate the two groups.
454
+ For very large repositories, the bundle provider may want to _only_ provide
455
+ blobless bundles.
456
+
352
457
Implementation Plan
353
458
-------------------
354
459
0 commit comments