Skip to content

WACZ Aggregation / Multi WACZ Specification #112

Open
@edsu

Description

@edsu

Details about how to aggregate multiple WACZ files into a single WACZ need to be added to the specification. This hinges on resources in the datapackage.json using a url for a WACZ rather than a path. See the Resource Information section in the Data Package specification for details:

{
   "resources": [
      {"hash": "...", "url": "https://example.com/filename_1.wacz", "bytes": "..."}
      {"hash": "...", "url": "https://example.com/filename_2.wacz", "bytes": "..."}
   ]
   ...
}

There should also be a Data Package profile so that clients can easily distinguish between collections and regular WACZ files. Perhaps WACZ-Aggregation?

The specification should document that WACZ users MAY want to use the data-package.json as a place to record additional metadata about crawls. See the browsertrix-cloud API for examples.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

Status

Triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions