Deprecate remaining package metadata and add bulk data format

There are issues with packages as discussed in open-contracting/infrastructure#89, #605, and CRM-4282 (all relevant comments reflected here), and the current packaging formats confer very few benefits.

## Benefits

The benefits of the current packaging format are:

1. A standardized way to publish multiple releases/records as a single file
1. Easy access to metadata:
    * `publisher`
    * `version` and `extensions`
    * `license` and `publicationPolicy`

A package also sets `uri` and `publishedDate`, but this is metadata about the package itself, not about the releases/records it contains.

## Discussion

### Metadata

* `publisher` should be moved to the release-level (see #325)
* `version` and `extensions` can be handled using `describedby` at the release-level (see #426)

Regarding `license` and `publicationPolicy`, paraphrasing open-contracting/infrastructure#89:

* License and publication policy metadata are important, but it isn't critical that they be distributed as data; that said, they can be expressed in the machine-readable description of the OCDS dataset in a data registry, using [DCAT](https://www.w3.org/TR/vocab-dcat-2/) for example (DCAT has a property for license, and a property for publication policy can be added as an extension, which [DCAT-US](https://resources.data.gov/resources/dcat-us/) does with other properties).
* Most open data (CSVs, etc.) have no means of declaring their license or publication policy, but this poses no major problem to reuse – these are instead declared on the HTML pages that serve or link to the data. Users generally only need to refer to these once, so it's not a challenge to data workflows.

See similar comments in https://github.com/open-contracting/standard/issues/325#issuecomment-445892448

As such, all metadata provided by the package can be omitted or moved to the release-level, without major issue.

### Format

We still want a standardized way to publish multiple releases/records as a single file. A minimal package in the current format with all metadata removed would be:

```json
{
  "releases": [
    // big list of releases
  ]
}
```

The problem with this format is that naive applications will load the entire file into memory. Because bulk download OCDS files can be very large (GBs), doing so exhausts memory on much consumer hardware. Iterative JSON parsers like `ijson` can be used to index to the `releases` array and yield one release at a time (as is done in OCDS Kit, for example); however, relatively few users are aware of such libraries, and many common data analysis tools don't use them (Pandas, for example). Indeed, no OCDS software written by ODS uses iterative parsing, leading to memory being exhausted in critical tools like the Data Review Tool on medium-to-large datasets; retrofitting these tools to parse iteratively is not trivial.

Any JSON format that puts releases/records in JSON arrays will suffer the same issue. The only reasonable options are:

1. [Line-delimited JSON](https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON)
1. ZIP files containing individual releases/records

There are other [JSON streaming](https://en.wikipedia.org/wiki/JSON_streaming) options besides line-delimited JSON, but:

1. Line-delimited JSON has the widest support and is easy to publish and use, using common JSON libraries
1. Record separator-delimited JSON is an eccentric format that uses rarely-used record separator characters
1. Concatenated JSON requires specialized JSON libraries

An advantage of a ZIP file is that it can contain additional information, e.g. a `LICENSE.txt` or `publicationPolicy.pdf`. However, OCDS datasets can contain millions of releases/records. Unless the publisher organizes them into directories somehow, the ZIP file will expand into millions of files, which is a barrier to use for many users.

A single (large) line-delimited JSON file is comparatively easier to work with.

## Proposal

Deprecate packages, and recommend publication of OCDS releases/records as line-delimited JSON.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate remaining package metadata and add bulk data format #1084

Benefits

Discussion

Metadata

Format

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deprecate remaining package metadata and add bulk data format #1084

Description

Benefits

Discussion

Metadata

Format

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions