Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata-service[orchestrator]: generate connector registry with release candidates #44588

Merged
merged 7 commits into from
Sep 4, 2024

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Aug 23, 2024

What

Closes https://github.com/airbytehq/airbyte-internal-issues/issues/9253

On metadata.yaml file upload to the GCS bucket under path <connector-name>/release_candidate/metadata.yaml we want to:

  • Regenerate connector registries
  • If a connector has a release candidate: on the registry entry of the latest version of a connector we want to add a releases.releasesCandidates nested field which will list all active releaseCandidates with their corresponding registry entry and rollout configuration.

How

  • For each registry entry generation we check if there's a corresponding release candidate
  • If the current connector version has release candidate we set create a registry entry with nested releases.releaseCandidates.<rc-version>.rolloutConfiguration and releases.releaseCandidates.<rc-version>.registryEntry

Review guide

  • I suggest to first review models change to grasp what the registry will look like with RC
  • Then review all the changes under assets

Expected registry state

No registry change as the publish command failed.

User Impact

There should be no impact until we release first RCs

Registry generation update demo

📺LOOM DEMO

Case 1: Normal release

Connector state

A new connector version is merged. Its metadata do not have a releases.isReleaseCandidate field.

# metadata.yaml
dockerImageTag: 4.2.0
...
releases:
  isReleaseCandidate: false # default value

airbyte-ci connector publish behavior

  • The connector is published to DockerHub with the 4.2.0 and latest tags.
  • The metadata file is uploaded to the airbyte-cloud-connector-metadata-service to the following paths:
    • Per version: /metadata/airbyte/source-airtable/4.2.0/metadata.yaml
    • Latest: /metadata/airbyte/source-airtable/latest/metadata.yaml

Expected registry state

The registry entry for the connector do not have a releases.releaseCandidate field.
The registry entry points to the 4.2.0 version.

Case 2: Release candidate

Connector state

A new connector version is merged. It's metadata have a releases.isReleaseCandidate field set to true.

# metadata.yaml
dockerImageTag: 4.3.0
...
releases:
  isReleaseCandidate: true
  rolloutConfiguration:
    initialPercentage: 10
    maxPercentage: 100
    advanceDelayMinutes: 60

airbyte-ci connector publish behavior

  • The connector is published to DockerHub with the 4.3.0. No latest tag is pushed.
  • The metadata file is uploaded to the airbyte-cloud-connector-metadata-service to the following paths:
    • Per version: /metadata/airbyte/source-airtable/4.2.0/metadata.yaml
    • Latest: /metadata/airbyte/source-airtable/release_candidate/metadata.yaml

Expected registry state

The registry entry points to the 4.2.0 version.
The registry entry has releases.releaseCandidate.4.3.0 nested field with the release candidate rollout configuration

Case 3: Release candidate promotion to main release

Connector state

The connector 4.3.0 is a release candidate.
The rollout orchestrator determined that the release candidate is stable and should be promoted to the main release.
The rollout orchestror triggers a workflow to promote the release candidate to the main release.

airbyte-ci connector publish behavior

Not yet implemented
We could expose a --promote flag to the airbyte-ci connector publish command to promote a release candidate to the main release.
Example: airbyte-ci connector --name=source-airtable publish --promote
If the connector is a release candidate it would:

  • Push the latest tag to DockerHub
  • Move the /metadata/airbyte/source-airtable/release_candidate/metadata.yaml to /metadata/airbyte/source-airtable/latest/metadata.yaml

Expected registry state

The registry entry points to the 4.3.0 version.
The registry entry has no releases.releaseCandidate field.

Case 4: Rolling back a release candidate

Connector state

The connector 4.3.0 is a release candidate.
The rollout orchestrator determined that the release candidate is not stable and should be rolled back.

airbyte-ci connector publish behavior

Not yet implemented
We could expose a --rollback flag to the airbyte-ci connector publish command to rollback a release candidate.
Example: airbyte-ci connector --name=source-airtable publish --rollback
If the connector is a release candidate it would:

  • Delete the 4.3.0 tag from DockerHub
  • Delete the /metadata/airbyte/source-airtable/release_candidate/metadata.yaml and /metadata/airbyte/source-airtable/4.3.0/metadata.yaml from the metadata service

Expected registry state

The registry entry points to the 4.2.0 version.
The registry entry has no releases.releaseCandidate field.

[TO BE IMPLENTED] Case 5: Connector publish attempt when a release candidate is already published

Connector state

The connector 4.3.0 is a release candidate.
A developer attempts to publish a new version 4.4.0.

airbyte-ci connector publish behavior

The command fails with an error message:

Error: A release candidate is already published for the connector. Please rollback the release candidate before publishing a new version.

This is managed via metadata validation: the connector registry is fetched and if a release candidate is already published, the command fails.

Copy link

vercel bot commented Aug 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Sep 4, 2024 2:52pm

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @alafanechere and the rest of your teammates on Graphite Graphite

@@ -59,7 +59,7 @@ def _convert_json_to_metrics_dict(jsonl_string: str) -> dict:

@asset(required_resource_keys={"latest_metrics_gcs_blob"}, group_name=GROUP_NAME)
@sentry.instrument_asset_op
def latest_connnector_metrics(context: OpExecutionContext) -> dict:
def latest_connector_metrics(context: OpExecutionContext) -> dict:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, who would have been so careless....

a-close-up-of-a-stuffed-monkey-wearing-a-green-shirt-and-blue-jeans

@@ -102,27 +102,36 @@ def github_metadata_definitions(context):
return Output(metadata_definitions, metadata={"preview": [md.json() for md in metadata_definitions]})


def entry_should_be_on_gcs(metadata_entry: LatestMetadataEntry) -> bool:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readability helper.
We might eventually want to change the stale metadata detection logic to get alerts when a RC does not end up in the registry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere I agree, we definitely want to know if a release didn't start rolling out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function specifically referring to whether the entry should be in the "latest" bucket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll file a different issue to update the stale metadata detection to consider release candidates.

@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch 2 times, most recently from a3cd10d to 948833a Compare August 23, 2024 13:12
@@ -330,29 +331,49 @@ def persist_registry_entry_to_json(
return file_handle


def generate_registry_entry(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this out of generate_and_persist_registry_entry to create nested registry entry for release candidates

Comment on lines 350 to 378
raw_entry_dict = metadata_to_registry_entry(metadata_entry, registry_name)
registry_entry_with_spec = apply_spec_to_registry_entry(raw_entry_dict, spec_cache, registry_name)

_, ConnectorModel = get_connector_type_from_registry_entry(registry_entry_with_spec)

registry_model = ConnectorModel.parse_obj(registry_entry_with_spec)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to def generate_registry_entry

@alafanechere alafanechere marked this pull request as ready for review August 23, 2024 13:18
@alafanechere alafanechere requested a review from a team as a code owner August 23, 2024 13:18
Comment on lines 626 to 677
metadata_entry: Optional[LatestMetadataEntry],
release_candidate_metadata_entries: Optional[List[LatestMetadataEntry]],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here:

  • metadata_entry corresponds to a metadata entry partition
  • release_candidate_metadata_entries to an asset with all RC metadata

Matching a metadata entry to its release candidate metadata file is done in find_release_candidate_for_metadata_entry.

@bnchrch I'm not 💯 this is the optimal approach if release_candidate_metadata_entries is growing.
Would there be a mechanism, leveraging partitions, which could make this registry_entry function take a metadata_entry and a release_candidate_metadata_entry partition? In other word the matching would happen elsewhere on partition generation.
In other words: I'm not sure what happens if an assets takes inputs from two different partition definitions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, so this is awkward because (I think) if

  1. A RC metadata gets removed or added
  2. Then all registry entries are reprocessed

(Can you confirm Im right about that?)

If I am then we have three options to fix

  1. We create multiple partition keys. So that the registry entry partitions key is a composite of latest/version etag and RC etag. This will be messy and complex. I dont recommend
  2. On new RC metadata, after processing, we delete the partition key for its corresponding latest to force a reprocess. This also feels wrong.
  3. We hoist the release candidate processing to the registry level, since we want that regenerated on each new RC anyway.

I think #3 makes the most sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes you are right. The AutoMaterialization leads to re-generation of all entries when the release_candidate_metadata_entries is re-materialized... This is definitely not what we want.

We hoist the release candidate processing to the registry level,

I'm not sure I get what you mean there. Aren't we already at the registry level as this assets generate registry entries. You might mean to do it when we aggregated per connector registry entry in a the global registry. Sounds indeed possible. Will do it.

@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch 4 times, most recently from cfb9947 to 964e915 Compare August 26, 2024 14:53
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm breaking the idempotency guarantee Dagster provides here by creating unique run keys.
The default behavior of Dagster is that a RunRequest for a similar run key is skipped.
I had to work around that because I want the refresh_release_candidate_metadata_entries to be rematerialized on metadata file deletion. Metadata delation with lead to a different cursor value, but will always likely be cursor value which has been seen in the past: so the run key will match an existing one and the RunRequest will not happen.
@bnchrch let me know if you have a different approach to suggest. This one feels slightly hacky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but will always likely be cursor value which has been seen in the past

Can you elaborate on this? I'm sure you're right but I'm not seeing why this would be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe youve got it right.

It does feel hacky, but it seems to be exactly what dagster does for scheduled runs (e.g. append a timestamp to ensure it always runs) and I dont see a way around this with sensors.

And I feel protected since we skip when they are the same

But because it feels hacky lets

  1. use a better name for our flag to explain the intent (e.g. allow_duplicate_runs, allow_previous_runs, etc..)
  2. add a comment explaining what / why we do this.

Copy link
Contributor Author

@alafanechere alafanechere Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? I'm sure you're right but I'm not seeing why this would be.
@clnoll

  • Our sensor to check if a GCS path has changed is comparing a cursor value, which is hash of a list of blobs etags stored in the Dagster context to a hash of list of blobs existing on GCS.
  • If the hash value changes the cursor changes and the sensor return a RunRequest
  • a run_key is passed to the RunRequest. This run_key is set to the cursor value.
  • Dagster does not propagate a RunRequest if it has seen the same run_key value previously
  • Let's say the sensor is computed for a list of blobs before an upload and the cursor value is A > run_key="A"
  • Then we upload a new blob: cursor value is B > run_key="B". The sensor returns the RunRequest as the cursor changed, and Dagster schedules the run request as it's a new run_key value.
  • If we then delete the previously uploaded blob the cursor value will be A again (as the hash of the list of blobs is the same as before the upload). cursor value A > run_key="A". The sensor returns the RunRequest as the cursor changed from B to A . But as Dagster has already scheduled a run_key with B value it skips it.

Copy link
Contributor

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments & questions @alafanechere. I think I could use a synchronous conversation about this one, too, to make sure I get the gist.

patternProperties:
"^\\d+\\.\\d+\\.\\d+$":
$ref: "#/definitions/VersionBreakingChange"
VersionBreakingChange:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This corresponds to an existing breaking changes object, right? If so I think we need to be able to share the code instead of duplicating it. It feels too critical to risk letting them get out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll Oh yes that's true. Thanks for spotting it. Will create a separate single definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll fixed in 2393b69

resources_def=METADATA_RESOURCE_TREE,
gcs_blobs_resource_key="release_candidate_metadata_file_blobs",
interval=60,
unique_run_key=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - why do we want a unique run key here when we don't need it for the other sensors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want asset rematerialization to happen on deletion of a blob.
CF: #44588 (comment)

]

SCHEDULES = [
ScheduleDefinition(job=add_new_metadata_partitions, cron_schedule="*/2 * * * *", tags={"dagster/priority": HIGH_QUEUE_PRIORITY}),
ScheduleDefinition(
cron_schedule="0 1 * * *", # Daily at 1am US/Pacific
cron_schedule="*/2 * * * *", # Every 2 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

@@ -102,27 +102,36 @@ def github_metadata_definitions(context):
return Output(metadata_definitions, metadata={"preview": [md.json() for md in metadata_definitions]})


def entry_should_be_on_gcs(metadata_entry: LatestMetadataEntry) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere I agree, we definitely want to know if a release didn't start rolling out.

@@ -102,27 +102,36 @@ def github_metadata_definitions(context):
return Output(metadata_definitions, metadata={"preview": [md.json() for md in metadata_definitions]})


def entry_should_be_on_gcs(metadata_entry: LatestMetadataEntry) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function specifically referring to whether the entry should be in the "latest" bucket?

if metadata_entry.metadata_definition.data.supportLevel == "archived":
return False
if getattr(metadata_entry.metadata_definition.releases, "isReleaseCandidate", False):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accurate? I think the point of truth about whether an entry is still being rolled out (and therefore shouldn't be in the latest bucket) is whether there is an RC in the RC registry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll the stale metadata detection compares metadata on master to metadata in the latest directory.
If we have a release candidate on master it will not be in latest until rollout finalization.
This is why I ignored them.
It's not great but it should not lead to false positive until we rework this stale metadata detection logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool, would you mind adding a comment here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but will always likely be cursor value which has been seen in the past

Can you elaborate on this? I'm sure you're right but I'm not seeing why this would be.

@@ -23,6 +23,13 @@
partitions_def=registry_entry.metadata_partitions_def,
)

release_candidate_metadata_entries_inclusive = AssetSelection.keys("release_candidate_metadata_entries").upstream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "inclusive" referring to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll I think it's inclusive in the sense that .upstream means collect all assets upstream to the listed keys. @bnchrch am I correct?

@bnchrch
Copy link
Contributor

bnchrch commented Aug 27, 2024

@alafanechere Im worried about case 5

Case 5: Connector publish attempt when a release candidate is already published

If im reading it right then the DX is after merge my connector publish will fail if a release candidate already exists.

Perhaps its better if we just overwrite and have the platform reset the rollout process?

raw_entry_dict = metadata_to_registry_entry(metadata_entry, registry_name)
registry_entry_with_spec = apply_spec_to_registry_entry(raw_entry_dict, spec_cache, registry_name)

if release_candidate_metadata_entries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 Since were in the spirit of helper functions for readability I would recommend we do the same here

e.g. def apply_release_candidates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops old comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe youve got it right.

It does feel hacky, but it seems to be exactly what dagster does for scheduled runs (e.g. append a timestamp to ensure it always runs) and I dont see a way around this with sensors.

And I feel protected since we skip when they are the same

But because it feels hacky lets

  1. use a better name for our flag to explain the intent (e.g. allow_duplicate_runs, allow_previous_runs, etc..)
  2. add a comment explaining what / why we do this.

@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch 2 times, most recently from a5abe34 to 36376db Compare August 28, 2024 18:33
@alafanechere
Copy link
Contributor Author

@alafanechere Im worried about case 5

Case 5: Connector publish attempt when a release candidate is already published

If im reading it right then the DX is after merge my connector publish will fail if a release candidate already exists.

Perhaps its better if we just overwrite and have the platform reset the rollout process?

@bnchrch I'll discuss this with @clnoll . It's out of the scope of this PR as this logic is on the airbyte-ci. Overwrite could be safe indeed but not until the temporal worker has a correct logic to stop a rollout and start a new one. I feel that it's 🆗 to think about this later.

group_name=GROUP_NAME,
)
@sentry.instrument_asset_op
def persisted_oss_registry(context: OpExecutionContext, latest_connnector_metrics: dict) -> Output[ConnectorRegistryV0]:
def persisted_oss_registry(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnchrch I reworked this function signature to declare input assets separately instead of fetching per registry entry blobs inside the persisted_oss_registry assets.

generate_and_persist_registry matches latest registry entries with RC entries + metadata.
So the releaseCandidates key is added only on the global registry, not on per connector registry entry.

This is implement this suggestion:

We hoist the release candidate processing to the registry level, since we want that regenerated on each new RC anyway.

So now the logic is:

  • The sensor on per connector registry entry blob change triggers the generate_cloud_registry asset job.
  • If I'm not mistaken this job's asset selection means that all the asset upstream to persisted_oss_registry will get rematerialized by the job.
  • As persisted_oss_registry depends on the new assets I declared it should get fresh rematerialized assets listing the existing release candidates.

@@ -23,6 +23,13 @@
partitions_def=registry_entry.metadata_partitions_def,
)

release_candidate_metadata_entries_inclusive = AssetSelection.keys("release_candidate_metadata_entries").upstream()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll I think it's inclusive in the sense that .upstream means collect all assets upstream to the listed keys. @bnchrch am I correct?

if metadata_entry.metadata_definition.data.supportLevel == "archived":
return False
if getattr(metadata_entry.metadata_definition.releases, "isReleaseCandidate", False):
return False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll the stale metadata detection compares metadata on master to metadata in the latest directory.
If we have a release candidate on master it will not be in latest until rollout finalization.
This is why I ignored them.
It's not great but it should not lead to false positive until we rework this stale metadata detection logic.

if metadata_entry.metadata_definition.data.supportLevel == "archived":
return False
if getattr(metadata_entry.metadata_definition.releases, "isReleaseCandidate", False):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool, would you mind adding a comment here?

def persisted_oss_registry(context: OpExecutionContext, latest_connnector_metrics: dict) -> Output[ConnectorRegistryV0]:
def persisted_oss_registry(
context: OpExecutionContext,
latest_connector_metrics: dict,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we expect to be in latest_connector_metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's some metrics fetched from the data warehouse which end up feeding fields related to usage metrics and sync success.

@@ -578,14 +600,55 @@ def metadata_entry(context: OpExecutionContext) -> Output[Optional[LatestMetadat
return Output(value=metadata_entry, metadata=dagster_metadata)


def find_release_candidates_for_metadata_entry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this isn't used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. It's not used since my latest refactoring.

@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch from 36376db to 020bca9 Compare August 29, 2024 16:43
@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch from 020bca9 to 69bc834 Compare August 29, 2024 17:40
Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questiosn and small stylistic changes! I imagine this will be ready on the next review!

Though I still do have a few minor to medium concerns

The DX of releasing subsequent release candidates:

I think the system should plan to allow automatic rollouts of minor and patch RC versions without manual intervention by a dev.

e.g. if im trying to release 4.1.1-rc over 4.1.0-rc our system should resolve that without the dev having to run any special command

We are no longer a DAG

Im still moderately concerned about our release DAG, no longer being a Directed or acyclic as the platform now has to call back to github/dagster to remove a metadata file

I think this structure will hurt more than we realize and Im not certain gaining a more stable oss connector release is worth it

if getattr(metadata_entry.metadata_definition.releases, "isReleaseCandidate", False):
return False
if (
datetime.datetime.strptime(metadata_entry.last_modified, "%a, %d %b %Y %H:%M:%S %Z").replace(tzinfo=datetime.timezone.utc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 Our grace period caculation should live in its own function with some docs

@@ -102,27 +102,41 @@ def github_metadata_definitions(context):
return Output(metadata_definitions, metadata={"preview": [md.json() for md in metadata_definitions]})


def entry_should_be_on_gcs(metadata_entry: LatestMetadataEntry) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📚 Doc string explaining the why would be useful here!

Comment on lines 117 to 147
release_candidate_registry_entries: List,
release_candidate_metadata_entries: List,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Why do we have both RC registry entries and metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RC registry metadata contains the rolloutConfiguration which is not defined on main version's registry entry.
RC registry entry do not have the rolloutConfiguration, so we read them from the metadata entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. Why dont we instead add rolloutConfiguration to the Registry Entry?

enriched_registry_entry_dict = apply_metrics_to_registry_entry(registry_entry_dict, connector_type, latest_connector_metrics)
if (
latest_registry_entry.dockerRepository in docker_repository_to_rc_metadata_entry
and latest_registry_entry.dockerRepository in docker_repository_to_rc_registry_entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 📚 I think we'd benefit from a apply_release_candidate_entries function to hold this logic

registry_entry_dict = to_json_sanitized_dict(latest_registry_entry)
enriched_registry_entry_dict = apply_metrics_to_registry_entry(registry_entry_dict, connector_type, latest_connector_metrics)
if (
latest_registry_entry.dockerRepository in docker_repository_to_rc_metadata_entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ If im reading this right were trying to catch if the metadata has been deleted? Because the registry entry persists?

I think we instead should make sure when metadata files are deleted, so are registry entry files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this logic is not meant to handle deletion.

  1. We create a mapping between connector and existing RC registry and metadata entry.
  2. We iterate on latest registry entry and find connectors which have RC.
  3. When a RC is found we modify the latest registry entry to add a releases.releaseCandidates.<rc-version> field. The value of this field is a VersionReleaseCandidate object with two properties: rolloutConfiguration (coming from the RC metadata entry) and registryEntry (coming from the RC registry entry). The idea is to nest the RC entry under the latest registry entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok! Sorry, definately misread


@asset(required_resource_keys={"latest_cloud_registry_entries_file_blobs"}, group_name=GROUP_NAME)
@sentry.instrument_asset_op
def latest_cloud_registry_entries(context: OpExecutionContext) -> Output[List]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎 Well done

@alafanechere
Copy link
Contributor Author

alafanechere commented Sep 2, 2024

@bnchrch

The DX of releasing subsequent release candidates:
I think the system should plan to allow automatic rollouts of minor and patch RC versions without manual intervention by a dev. e.g. if im trying to release 4.1.1-rc over 4.1.0-rc our system should resolve that without the dev having to run any special command.

The current manual intervention a dev would take would be to set the isReleaseCandidate field to true in the connector metadata to make it use gradual rollout. Our plan is to make this field default to true once we consider the gradual rollout system stable.
So no manual intervention should be required by a dev, eventually.
We'd might want set this field to false to bypass gradual rollout if needed.

We are no longer a DAG. Im still moderately concerned about our release DAG, no longer being a Directed or acyclic as the platform now has to call back to github/dagster to remove a metadata file. I think this structure will hurt more than we realize and Im not certain gaining a more stable oss connector release is worth it.

Can you elaborate on this intuition (I think this structure will hurt more than we realize)?

Here are some principles we decided to follow:

  • The registry is the source of truth of which version is the main one and which one are RC. We think it's important for observability. We could "just" make the cloud platform consider any new connector as being a RC and go under gradual rollout, but it'd mean that we'd rely on the platform DB to know which is the main version and which are RC. Our registry has been the source of truth so far. I think it makes sense to keep it like it is.
  • So if the registry is listing RCs it has to get a signal when a RC version becomes a main version, or is rolled back, what we've called finalization.
  • Finalization for a RC promotion to main version means:
    • Moving the GCS hosted RC metadata file to the latest folder. It'll automatically retrigger registry materialization according to this PR cja,ge. This "move RC to latest" operation will not be handled by Dagster, it'll be performed by airbyte-ci.
    • Publishing the :latest tag to DockerHub (performed by airbyte-ci)
  • Finalization for a RC rollback means:
    • Deleting the RC metadata file in GCS.

Finalization is implemented in this PR stack #44876
So the platform do not directly interact with airbyte-ci, github or dagster: it'll call a webhook which will trigger a GHA workflow which will call airbyte-ci.
I'm not sure if this explanation addresses your concerns :)

@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch from c188bd2 to a95f522 Compare September 2, 2024 13:41
@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch from 34c97f1 to a0250f2 Compare September 4, 2024 08:07
@alafanechere
Copy link
Contributor Author

@benchrch I reworked the models and registry generation logic to remove the dependency on the global registry generation on RC metadata.

  • rolloutConfiguration is now set, in a registry entry under releases.rolloutConfiguration

Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets release this candidate!

Approved!
a-bunch-of-hounds-are-in-the-back-of-a-car

@@ -1,6 +1,6 @@
[tool.poetry]
name = "orchestrator"
version = "0.4.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 We should run poetry update metadata-service to update the version in the lock file.

IIRC theres an issue with dagster deploy and caching if we dont do this...

breaking_change, documentation_url, version
)
final_registry_releases["breakingChanges"] = breaking_changes
if metadata.get("releases", {}).get("rolloutConfiguration"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 white space good!

Suggested change
if metadata.get("releases", {}).get("rolloutConfiguration"):
if metadata.get("releases", {}).get("rolloutConfiguration"):

@alafanechere alafanechere force-pushed the connector-registry-with-release-candidates branch from a0250f2 to 99635d5 Compare September 4, 2024 14:51
@alafanechere alafanechere merged commit ee698d6 into master Sep 4, 2024
32 checks passed
@alafanechere alafanechere deleted the connector-registry-with-release-candidates branch September 4, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants