Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket file layout #189

Closed
jlebon opened this issue May 29, 2019 · 22 comments
Closed

Bucket file layout #189

jlebon opened this issue May 29, 2019 · 22 comments

Comments

@jlebon
Copy link
Member

jlebon commented May 29, 2019

How should our various files (metadata & build artifacts) be laid out in the bucket? Note this isn't something that we expect users to care about. Some pieces of published metadata will hold links to files into the bucket, but we should be able to change the bucket layout itself without impacting users. As such, we should also strongly discourage perusing the bucket (which without indexing it would be cumbersome anyway) or predicting locations for new builds.

Here's a strawman based on some initial thoughts in coreos/coreos-assembler#159:

/
    prod/
        streams/
            stable/
                stream.json                   # holds stream metadata
                builds/
                    builds.json               # raw cosa builds listing metadata
                    30.1234-5/
                        release.json          # holds release metadata
                        x86_64/
                            meta.json         # raw cosa build metadata
                            commitmeta.json
                            fedora-coreos-30.8-qemu.qcow2.gz
                            ostree-commit-object
                            ostree-commit.tar
                            ...
                        ppc64le/
                            ...
                        ...
            testing/
            next/
            ...
@dustymabe
Copy link
Member

dustymabe commented May 29, 2019

Looks mostly good. A few comments:

  • i'd prefer to have filename uniqueness so maybe let's not have two meta.json filenames
  • could we add architecture awareness to the example in the description?

@jlebon
Copy link
Member Author

jlebon commented May 29, 2019

i'd prefer to have filename uniqueness so maybe let's not have two meta.json filenames

Yeah fair. Maybe just stream.json in that case?

could we add architecture awareness to the example in the description?

This is mostly dependent on coreos/coreos-assembler#463; though I guess how the various archful builds are laid out are related but somewhat independent of how they get there, so we could strawman that part now too.

@dustymabe
Copy link
Member

so we could strawman that part now too.

Yeah - a concrete example (like the one in your description) is the best way I can think to actually hash it out. My proposal is we simply add arch directories under the buildid:

        builds/
          30.1234-5/      # holds COSA build artifacts
                x86_64/
                aarch64/
                ppc64le/ 

@dustymabe
Copy link
Member

Yeah fair. Maybe just stream.json in that case?

forgot to respond to this: 👍

@jlebon
Copy link
Member Author

jlebon commented May 30, 2019

OK, updated initial comment with those modifications!

@arithx
Copy link
Contributor

arithx commented May 30, 2019

Couple of questions I need answered for the plume work:

  1. Is it intended that there is a different bucket location that these artifacts are initially written to and then copied to this structure via plume in the release process?
  2. Where are we storing the release metadata index in this structure (right under prod/)?
  3. Are all files/artifacts inside of the individual build level buckets things that are meant to be publicly readable (e.x.: should meta.json be fetchable by end users)?

@bgilbert
Copy link
Contributor

  1. Yes.

Note that if the release.json is staged somewhere by COSA and then copied over by plume, incremental updates to it (e.g. to backfill an AMI ID) will need to be copied over again.

@sinnykumari
Copy link
Contributor

Few things need to understand to work on Stream Metadata Generator:

  1. Is stream.json mentioned in design is where Stream Metadata Generator output will be uploaded and made available?
  2. release metadata (release.json ) is available for each builds in a stream. Which one of them stream metadata generator will be using as input for building stream metadata? Is there something like latest/ under builds/ directory pointing to the latest one to be utilized?

@jlebon
Copy link
Member Author

jlebon commented Jun 3, 2019

Is stream.json mentioned in design is where Stream Metadata Generator output will be uploaded and made available?

Yes, I think that's correct.

Is there something like latest/ under builds/ directory pointing to the latest one to be utilized?

Whoops, I had missed the builds.json here from COSA. I updated the first comment with that. So e.g. you can determine what the latest build is by looking at builds.json.

@sinnykumari
Copy link
Contributor

Makes sense, thanks @jlebon

@bgilbert
Copy link
Contributor

bgilbert commented Jun 3, 2019

The current plan is not to use builds.json for that, but instead to use a separate JSON document (#98 (comment)) pointing to the release metadata of released builds.

@jlebon
Copy link
Member Author

jlebon commented Jun 3, 2019

Ahh yeah, right. The builds.json is essentially a raw listing of all the cosa builds stored in that directory. Whereas the "release index" (do we have another name for this yet?) actually represents releases we want users to consume.

  1. Where are we storing the release metadata index in this structure (right under prod/)?

Hmm, seems like this should be a per-stream thing, right? So maybe at /prod/streams/$stream/releases.json ? (Essentially sitting side by side with stream.json).

@bgilbert
Copy link
Contributor

bgilbert commented Jun 3, 2019

We don't have another name. "Release index" works for me. And yes, per-stream.

@arithx
Copy link
Contributor

arithx commented Jun 3, 2019

@jlebon For the individual build release metadata right now the diagram seems to indicate one per architecture, is this correct? Or should there be one per build version (e.x.: prod/<channel>/builds/<version>/release.json)?

@sinnykumari
Copy link
Contributor

The current plan is not to use builds.json for that, but instead to use a separate JSON document (#98 (comment)) pointing to the release metadata of released builds.

By looking at json layout for release index at #98 (comment) , how does one know rwhich release metadata: is the latest one? Not sure if having OSTree version is enough to find that information.

@jlebon
Copy link
Member Author

jlebon commented Jun 4, 2019

For the individual build release metadata right now the diagram seems to indicate one per architecture, is this correct? Or should there be one per build version (e.x.: prod/<channel>/builds/<version>/release.json)?

Yeah good point, I think that's indeed what we want so that it's a single file representing the whole build. Amended first comment!

So thinking more on this, in #98 (comment) regarding release.json:

This seems largely redundant with meta.json, so in principle we might skip it. But it could also be a good opportunity to clean up that format a bit.

I think I would agree with this. Let meta.json be the raw cosa output, and release.json only have a filtered (and cross-arch collected) version of it so we don't end up exposing more information than we need to (e.g. offhand, the pkgdiff is not really useful; diffs should be done at a higher level where we actually know what we're releasing).

@arithx
Copy link
Contributor

arithx commented Jun 5, 2019

Is it intended that there is a different bucket location that these artifacts are initially written to and then copied to this structure via plume in the release process?

From discussions w/ @jlebon & @bgilbert we're going to move forward with a single bucket and just modify the object permissions to be publicly readable as part of the release process.

@bgilbert
Copy link
Contributor

bgilbert commented Jun 8, 2019

By looking at json layout for release index at #98 (comment), how does one know rwhich release metadata: is the latest one?

Everything except the stream metadata builder should be getting that info from the stream metadata. The metadata builder could look at:

  1. A timestamp for each record, if we added one;
  2. The highest version number; or
  3. The last release in the list, if we always appended releases to the end.

3 seems easiest, and I don't think it has substantial downsides?

@sinnykumari
Copy link
Contributor

I like option 3 as well, don't see any downside if we keep convention that latest release metadata info gets always appended in the index file of release metadata.

jlebon added a commit to jlebon/fedora-coreos-tracker that referenced this issue Jun 24, 2019
jlebon added a commit to jlebon/fedora-coreos-tracker that referenced this issue Jun 24, 2019
@jlebon
Copy link
Member Author

jlebon commented Jun 24, 2019

Officially proposed in #208. (Not that we can't make tweaks later on of course).

jlebon added a commit to jlebon/fedora-coreos-tracker that referenced this issue Jun 26, 2019
jlebon added a commit to jlebon/fedora-coreos-tracker that referenced this issue Jul 2, 2019
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Jul 2, 2019
Add a new mode which allows cosa to manipulate multi-arch build layouts:

```
$ find builds
builds
builds/builds.json
builds/30.1
builds/30.1/x86_64
builds/30.1/x86_64/coreos-assembler-config.tar.gz
builds/30.1/x86_64/coreos-assembler-config-git.json
builds/30.1/x86_64/fedora-coreos-30.1-qemu.qcow2
...
```

A pipeline could e.g. dispatch builds for each architecture on different
nodes, then group them back into a single workdir and have it
manipulated by e.g. `buildupload` seamlessly.

This new layout also matches the bucket layout for FCOS (see
coreos/fedora-coreos-tracker#189).

The basic idea is to add a `schema-version` to `builds.json` and denote
the legacy behaviour as "pre-1.0.0", while `1.0.0` contains a different
schema: each element in the `builds` array is now an object, which has
an `id`, and a list of `archs` for which that build has been built:

```
$ cat builds/builds.json
{
    "schema-version": "1.0.0",
    "builds": [
        {
            "id": "30.1",
            "archs": [
                "x86_64"
            ]
        }
    ],
    "timestamp": "2019-06-28T20:50:54Z"
}
```

We retain backwards-compatibility by simply checking the schema version.
Right now, only new workdirs will have this layout. Pipelines which use
`buildprep` will fetch `builds.json` as is and key off of its contents
to determine the bucket layout as well. We can write new code in the
future to convert previously single-arch buckets into the new layout to
then enable multi-arch.
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Jul 3, 2019
Add a new mode which allows cosa to manipulate multi-arch build layouts:

```
$ find builds
builds
builds/builds.json
builds/30.1
builds/30.1/x86_64
builds/30.1/x86_64/coreos-assembler-config.tar.gz
builds/30.1/x86_64/coreos-assembler-config-git.json
builds/30.1/x86_64/fedora-coreos-30.1-qemu.qcow2
...
```

A pipeline could e.g. dispatch builds for each architecture on different
nodes, then group them back into a single workdir and have it
manipulated by e.g. `buildupload` seamlessly.

This new layout also matches the bucket layout for FCOS (see
coreos/fedora-coreos-tracker#189).

The basic idea is to add a `schema-version` to `builds.json` and denote
the legacy behaviour as "pre-1.0.0", while `1.0.0` contains a different
schema: each element in the `builds` array is now an object, which has
an `id`, and a list of `archs` for which that build has been built:

```
$ cat builds/builds.json
{
    "schema-version": "1.0.0",
    "builds": [
        {
            "id": "30.1",
            "archs": [
                "x86_64"
            ]
        }
    ],
    "timestamp": "2019-06-28T20:50:54Z"
}
```

We retain backwards-compatibility by simply checking the schema version.
Right now, only new workdirs will have this layout. Pipelines which use
`buildprep` will fetch `builds.json` as is and key off of its contents
to determine the bucket layout as well. We can write new code in the
future to convert previously single-arch buckets into the new layout to
then enable multi-arch.
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Jul 3, 2019
Add a new mode which allows cosa to manipulate multi-arch build layouts:

```
$ find builds
builds
builds/builds.json
builds/30.1
builds/30.1/x86_64
builds/30.1/x86_64/coreos-assembler-config.tar.gz
builds/30.1/x86_64/coreos-assembler-config-git.json
builds/30.1/x86_64/fedora-coreos-30.1-qemu.qcow2
...
```

A pipeline could e.g. dispatch builds for each architecture on different
nodes, then group them back into a single workdir and have it
manipulated by e.g. `buildupload` seamlessly.

This new layout also matches the bucket layout for FCOS (see
coreos/fedora-coreos-tracker#189).

The basic idea is to add a `schema-version` to `builds.json` and denote
the legacy behaviour as "pre-1.0.0", while `1.0.0` contains a different
schema: each element in the `builds` array is now an object, which has
an `id`, and a list of `archs` for which that build has been built:

```
$ cat builds/builds.json
{
    "schema-version": "1.0.0",
    "builds": [
        {
            "id": "30.1",
            "archs": [
                "x86_64"
            ]
        }
    ],
    "timestamp": "2019-06-28T20:50:54Z"
}
```

We retain backwards-compatibility by simply checking the schema version.
Right now, only new workdirs will have this layout. Pipelines which use
`buildprep` will fetch `builds.json` as is and key off of its contents
to determine the bucket layout as well. We can write new code in the
future to convert previously single-arch buckets into the new layout to
then enable multi-arch.
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Jul 10, 2019
Add a new mode which allows cosa to manipulate multi-arch build layouts:

```
$ find builds
builds
builds/builds.json
builds/30.1
builds/30.1/x86_64
builds/30.1/x86_64/coreos-assembler-config.tar.gz
builds/30.1/x86_64/coreos-assembler-config-git.json
builds/30.1/x86_64/fedora-coreos-30.1-qemu.qcow2
...
```

A pipeline could e.g. dispatch builds for each architecture on different
nodes, then group them back into a single workdir and have it
manipulated by e.g. `buildupload` seamlessly.

This new layout also matches the bucket layout for FCOS (see
coreos/fedora-coreos-tracker#189).

The basic idea is to add a `schema-version` to `builds.json` and denote
the legacy behaviour as "pre-1.0.0", while `1.0.0` contains a different
schema: each element in the `builds` array is now an object, which has
an `id`, and a list of `archs` for which that build has been built:

```
$ cat builds/builds.json
{
    "schema-version": "1.0.0",
    "builds": [
        {
            "id": "30.1",
            "archs": [
                "x86_64"
            ]
        }
    ],
    "timestamp": "2019-06-28T20:50:54Z"
}
```

We retain backwards-compatibility by simply checking the schema version.
Right now, only new workdirs will have this layout. Pipelines which use
`buildprep` will fetch `builds.json` as is and key off of its contents
to determine the bucket layout as well. We can write new code in the
future to convert previously single-arch buckets into the new layout to
then enable multi-arch.
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Jul 11, 2019
Add a new mode which allows cosa to manipulate multi-arch build layouts:

```
$ find builds
builds
builds/builds.json
builds/30.1
builds/30.1/x86_64
builds/30.1/x86_64/coreos-assembler-config.tar.gz
builds/30.1/x86_64/coreos-assembler-config-git.json
builds/30.1/x86_64/fedora-coreos-30.1-qemu.qcow2
...
```

A pipeline could e.g. dispatch builds for each architecture on different
nodes, then group them back into a single workdir and have it
manipulated by e.g. `buildupload` seamlessly.

This new layout also matches the bucket layout for FCOS (see
coreos/fedora-coreos-tracker#189).

The basic idea is to add a `schema-version` to `builds.json` and denote
the legacy behaviour as "pre-1.0.0", while `1.0.0` contains a different
schema: each element in the `builds` array is now an object, which has
an `id`, and a list of `archs` for which that build has been built:

```
$ cat builds/builds.json
{
    "schema-version": "1.0.0",
    "builds": [
        {
            "id": "30.1",
            "archs": [
                "x86_64"
            ]
        }
    ],
    "timestamp": "2019-06-28T20:50:54Z"
}
```

We retain backwards-compatibility by simply checking the schema version.
Right now, only new workdirs will have this layout. Pipelines which use
`buildprep` will fetch `builds.json` as is and key off of its contents
to determine the bucket layout as well. We can write new code in the
future to convert previously single-arch buckets into the new layout to
then enable multi-arch.
jlebon added a commit to coreos/coreos-assembler that referenced this issue Jul 11, 2019
Add a new mode which allows cosa to manipulate multi-arch build layouts:

```
$ find builds
builds
builds/builds.json
builds/30.1
builds/30.1/x86_64
builds/30.1/x86_64/coreos-assembler-config.tar.gz
builds/30.1/x86_64/coreos-assembler-config-git.json
builds/30.1/x86_64/fedora-coreos-30.1-qemu.qcow2
...
```

A pipeline could e.g. dispatch builds for each architecture on different
nodes, then group them back into a single workdir and have it
manipulated by e.g. `buildupload` seamlessly.

This new layout also matches the bucket layout for FCOS (see
coreos/fedora-coreos-tracker#189).

The basic idea is to add a `schema-version` to `builds.json` and denote
the legacy behaviour as "pre-1.0.0", while `1.0.0` contains a different
schema: each element in the `builds` array is now an object, which has
an `id`, and a list of `archs` for which that build has been built:

```
$ cat builds/builds.json
{
    "schema-version": "1.0.0",
    "builds": [
        {
            "id": "30.1",
            "archs": [
                "x86_64"
            ]
        }
    ],
    "timestamp": "2019-06-28T20:50:54Z"
}
```

We retain backwards-compatibility by simply checking the schema version.
Right now, only new workdirs will have this layout. Pipelines which use
`buildprep` will fetch `builds.json` as is and key off of its contents
to determine the bucket layout as well. We can write new code in the
future to convert previously single-arch buckets into the new layout to
then enable multi-arch.
@bgilbert
Copy link
Contributor

Discussed OOB: we should move the stream metadata to a separate directory (e.g. /streams/stable.json) so we can document its URL as the canonical location for stream metadata.

jlebon added a commit to jlebon/fedora-coreos-tracker that referenced this issue Jul 12, 2019
jlebon added a commit that referenced this issue Jul 12, 2019
@jlebon
Copy link
Member Author

jlebon commented Jul 12, 2019

Resolved in #208.

@jlebon jlebon closed this as completed Jul 12, 2019
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Nov 24, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Nov 24, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Nov 24, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
jlebon added a commit to coreos/coreos-assembler that referenced this issue Nov 30, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this issue Dec 1, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this issue Dec 1, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this issue Dec 1, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
dustymabe pushed a commit to coreos/coreos-assembler that referenced this issue Dec 2, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
dustymabe pushed a commit to coreos/coreos-assembler that referenced this issue Dec 2, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this issue Dec 2, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this issue Dec 2, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
jlebon added a commit to coreos/coreos-assembler that referenced this issue Dec 2, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
jlebon added a commit to coreos/coreos-assembler that referenced this issue Dec 2, 2022
Early in FCOS development, we were still debating whether the S3 objects
should be public or not. There was some initial agreement on uploading
first as private and then make specific builds public during the release
process, but in the end we've just always published in public from the
start.

It's not likely we'll change this anytime soon unless a good reason
comes up. So let's just delete the code here that makes the build
objects public since they already are. It'll live on in git at least if
we ever want to restore it.

Related: coreos/fedora-coreos-tracker#189
(cherry picked from commit ef8faa0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants