Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image embedding component #148

Merged
merged 6 commits into from
May 22, 2023
Merged

Conversation

PhilippeMoussalli
Copy link
Contributor

PR that adds the image embedding component. Largely inspired by Niel's PR #111 (inference and batching with dask).

@PhilippeMoussalli PhilippeMoussalli added the Components Implementation of components label May 17, 2023
@PhilippeMoussalli PhilippeMoussalli self-assigned this May 17, 2023
@PhilippeMoussalli PhilippeMoussalli linked an issue May 17, 2023 that may be closed by this pull request
@PhilippeMoussalli PhilippeMoussalli added this to the Alpha milestone May 17, 2023
Base automatically changed from update-schema to main May 22, 2023 06:32
@RobbeSneyders
Copy link
Member

Can you rebase this one @PhilippeMoussalli?

@RobbeSneyders RobbeSneyders changed the title Component/image embedding Add image embedding component May 22, 2023
@PhilippeMoussalli
Copy link
Contributor Author

Can you rebase this one @PhilippeMoussalli?

Done

Copy link
Contributor

@ChristiaensBert ChristiaensBert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Small comments, mainly about docstrings and other small stuff



@dask.delayed
def load(example):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string?



@dask.delayed
def transform(image, processor, device):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string?

embeddings_df = delayed_series.to_frame(name="embeddings_data")

# add index columns
embeddings_df["id"] = dataframe["id"].reset_index(drop=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it needed to reset_index() separately on all the columns individually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point I don't think it's needed


args:
model_id:
description: Model id on the Hugging Face hub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put a default/example value here too? Some people might not know where to look to find the correct CLIP Model ID

@@ -51,7 +51,7 @@ class Type(enum.Enum):

int8_list = pa.list_(pa.int8())

float16_list = pa.list_(pa.float16())
float32_list = pa.list_(pa.float32())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we only supporting float32_list then or do we also need float16 in some cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now we only need that one, this is a temporary fix we're planning on handling nested data types differently at a later stage

@@ -31,7 +39,8 @@
},
)

write_to_hub_op = FondantComponentOp(

write_to_hub_op = ComponentOp(
component_spec_path="components/write_to_hub/fondant_component.yaml",
arguments={
"username": "philippemo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this username stay your name? Or something like 'demo-pipeline'?

@PhilippeMoussalli PhilippeMoussalli merged commit 59d9f1c into main May 22, 2023
@PhilippeMoussalli PhilippeMoussalli deleted the component/image-embedding branch May 22, 2023 09:48
j-branigan pushed a commit that referenced this pull request May 22, 2023
PR that adds the image embedding component. Largely inspired by Niel's
PR #111 (inference and batching with dask).
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
PR that adds the image embedding component. Largely inspired by Niel's
PR #111 (inference and batching with dask).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Components Implementation of components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image embedding component
3 participants