Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unify the image spec hash function #2593

Merged
merged 30 commits into from
Aug 28, 2024
Merged

unify the image spec hash function #2593

merged 30 commits into from
Aug 28, 2024

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Jul 21, 2024

Tracking issue

NA

Why are the changes needed?

Problem 1

There are three hash functions in the image spec

  1. __hash__ - used by the lru cache
  2. _calculate_deduped_hash_from_image_spec - used as an identifier in the image config in the serialization context
  3. calculate_hash_from_image_spec - used as an image tag

we should be able to unify 1 and 2

Probelm 2

Local and remote environments are generating different hash values from the same ImageSpec, which causes dynamic tasks to create tasks with the wrong image. Additionally, this discrepancy indirectly causes is_container always to return false, because the hash value is different from _F_IMG_ID

For example:
In flytekit 1.3.14, we added entrypoint to the ImageSpec. If a user registers a workflow with Flytekit 1.3.14 but uses Flytekit 1.12.0 in the ImageSpec, the _F_IMG_ID will always be different from the hash of the ImageSpec. The reason is that we calculate the hash from asdict(image_spec)

The solution is to calculate the hash only from the non-None values in the ImageSpec to ensure the hash is consistent across different flytekit versions.

asdict(spec, dict_factory=lambda x: {k: v for (k, v) in x if v is not None})

What changes were proposed in this pull request?

Add a few methods to ImageSpec

  • id -> It returns a unique hash as identifier of the ImageSpec. it will be used to identify the imageSpec in the ImageConfig or check if the current container image in the pod is built from this image spec in is_container().
  • __hash__ -> it will return the hash of ImageSpec ID
  • tag -> the tag of the image.

Remove

  • _calculate_deduped_hash_from_image_spec
  • calculate_hash_from_image_spec

How was this patch tested?

pyflyte run --remote flyte-example/image_spec_dynamic.py wf

Setup process

from flytekit import ImageSpec, task, dynamic, workflow


new_flytekit = "git+https://github.com/flyteorg/flytekit.git@c81fb9226d76de006bb167a2ed6e1cc8b4c48143"

repro_img = ImageSpec(
    builder="envd",
    packages=["mypy", new_flytekit],
    apt_packages=["git"],
    registry="pingsutw",
)

image_foo = repro_img.with_apt_packages(["curl"])
image_d1 = repro_img.with_apt_packages(["wget"])
image_bar = repro_img.with_apt_packages(["vim"])


if image_foo.is_container():
    print("foo in container.")


# hello
@task(container_image=image_foo)
def foo():
    print("foo")


@task(container_image=image_bar)
def bar():
    print("bar")


@dynamic(container_image=image_d1)
def d1():
    bar()


@workflow
def wf():
    foo()
    d1()

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

NA

Docs link

NA

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Copy link

codecov bot commented Jul 21, 2024

Codecov Report

Attention: Patch coverage is 87.30159% with 8 lines in your changes missing coverage. Please review.

Project coverage is 81.43%. Comparing base (a8f68d7) to head (888519c).
Report is 13 commits behind head on master.

Files Patch % Lines
flytekit/image_spec/image_spec.py 86.66% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2593      +/-   ##
==========================================
+ Coverage   78.91%   81.43%   +2.51%     
==========================================
  Files         316      281      -35     
  Lines       24965    23424    -1541     
  Branches     4012     4012              
==========================================
- Hits        19702    19076     -626     
+ Misses       4548     3653     -895     
+ Partials      715      695      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
flytekit/image_spec/image_spec.py Outdated Show resolved Hide resolved
flytekit/image_spec/image_spec.py Outdated Show resolved Hide resolved
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw marked this pull request as draft July 23, 2024 10:46
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw marked this pull request as ready for review August 24, 2024 09:40
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw
Copy link
Member Author

@thomasjpfan mind taking a look again, thanks!

Comment on lines 147 to 154
image_spec_dict = asdict(spec, dict_factory=lambda x: {k: v for (k, v) in x if v is not None})
image_spec_bytes = image_spec_dict.__str__().encode("utf-8")
tag = (
base64.urlsafe_b64encode(hashlib.md5(image_spec_bytes).digest())
.decode("ascii")
.rstrip("=")
.replace("-", "_")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works:

Suggested change
image_spec_dict = asdict(spec, dict_factory=lambda x: {k: v for (k, v) in x if v is not None})
image_spec_bytes = image_spec_dict.__str__().encode("utf-8")
tag = (
base64.urlsafe_b64encode(hashlib.md5(image_spec_bytes).digest())
.decode("ascii")
.rstrip("=")
.replace("-", "_")
)
tag = spec.id.replace("-", "_")

At this point, can we have .replace("-", "_") in .id ?

Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw merged commit eb20eca into master Aug 28, 2024
100 of 101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants