Remove cloud specific prefix #37

NielsRogge · 2023-04-25T08:59:25Z

This PR removes the need for a specific prefix, as we can just include that in the base path of the configuration.

NielsRogge · 2023-04-25T09:49:17Z

fondant/pipeline_utils.py

+    # existing_pipelines = client.list_pipelines(page_size=100).pipelines
+    # for existing_pipeline in existing_pipelines:
+    #     if existing_pipeline.name == pipeline_name:
+    #         # Delete existing pipeline before uploading
+    #         logger.warning(
+    #             f"Pipeline {pipeline_name} already exists. Deleting old pipeline..."
+    #         )
+    #         client.delete_pipeline_version(existing_pipeline.default_version.id)
+    #         client.delete_pipeline(existing_pipeline.id)


This is a temporary fix in order to compile the pipeline.

RobbeSneyders

Thanks @NielsRogge. LGTM.

fondant/dataset.py

GeorgesLorre · 2023-04-25T11:42:36Z

fondant/dataset.py

+        # add subset prefix to columns
+        df = df.rename(
+            columns={
+                col: name + "_" + col for col in df.columns if col not in index_fields


For joining all the data into 1 dataframe we will need to join on the index which gets filtered out here.

The index is not filtered out, this operation only renames the non-index columns

PhilippeMoussalli · 2023-04-25T13:02:44Z

fondant/dataset.py

+        if index is None:
+            index_df = self._load_index()
+            ids = index_df["id"].compute()
+            sources = index_df["source"].compute()


So there is no workaround for having to bring those in memory before filtering?

In order to filter a dataframe based on an index, I assume that you need to have the entire index present.

Correct me if I'm wrong @GeorgesLorre @RobbeSneyders

I would have to look into Dask in more detail before I can give input on this.

fondant/dataset.py

PhilippeMoussalli

LGTM, left a few open questions that we might want to follow up on

Remove cloud specific prefix

NielsRogge requested review from PhilippeMoussalli and RobbeSneyders April 25, 2023 09:48

NielsRogge commented Apr 25, 2023

View reviewed changes

RobbeSneyders approved these changes Apr 25, 2023

View reviewed changes

GeorgesLorre approved these changes Apr 25, 2023

View reviewed changes

Niels Rogge added 8 commits April 25, 2023 14:35

First draft

90703eb

Rename pipeline

b13acca

Rename pipeline

9d561fe

Add logging

b3000b6

More improvements

889c797

More improvements

af861aa

Fix bug

6a81410

More improvements

be0a2ce

NielsRogge force-pushed the fix_prefix branch from c5ac201 to be0a2ce Compare April 25, 2023 12:38

PhilippeMoussalli reviewed Apr 25, 2023

View reviewed changes

fondant/dataset.py Show resolved Hide resolved

PhilippeMoussalli approved these changes Apr 25, 2023

View reviewed changes

Add comment

9e7c721

NielsRogge merged commit 474e1ce into main Apr 25, 2023

RobbeSneyders deleted the fix_prefix branch May 4, 2023 07:34

Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023

Merge pull request #37 from ml6team/fix_prefix

062bac3

Remove cloud specific prefix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cloud specific prefix #37

Remove cloud specific prefix #37

NielsRogge commented Apr 25, 2023

NielsRogge Apr 25, 2023

RobbeSneyders left a comment

GeorgesLorre Apr 25, 2023

NielsRogge Apr 25, 2023

PhilippeMoussalli Apr 25, 2023

NielsRogge Apr 25, 2023

RobbeSneyders Apr 25, 2023

PhilippeMoussalli left a comment

Remove cloud specific prefix #37

Remove cloud specific prefix #37

Conversation

NielsRogge commented Apr 25, 2023

NielsRogge Apr 25, 2023

Choose a reason for hiding this comment

RobbeSneyders left a comment

Choose a reason for hiding this comment

GeorgesLorre Apr 25, 2023

Choose a reason for hiding this comment

NielsRogge Apr 25, 2023

Choose a reason for hiding this comment

PhilippeMoussalli Apr 25, 2023

Choose a reason for hiding this comment

NielsRogge Apr 25, 2023

Choose a reason for hiding this comment

RobbeSneyders Apr 25, 2023

Choose a reason for hiding this comment

PhilippeMoussalli left a comment

Choose a reason for hiding this comment