Skip to content

Usage with Fast.ai

Severin Ibarluzea edited this page Apr 10, 2020 · 6 revisions

Universal Data Tool files integrate easily with fast.ai libraries. First, export your project to a *.udt.csv file.

Download URLs and Put into ImageDataBunch

For image_classification.

from fastai import *
from fastai.vision import *

df = pd.read_csv("./myfile.udt.csv")
samples = df[df.path.str.contains("samples")]

url_list = [str(a) for a in list(samples["imageUrl"])]
open("./images.txt", "w").write("\n".join(url_list))

# Download all the sample images into an "images" directory
download_images("./images.txt", "./images", max_pics = 10000)

labels = pd.DataFrame(data={
    "image": (df.index - 1).astype("str").str.zfill(8) + ".jpg",
    "output": df["output"]
})

# OPTIONAL: Delete bad images and remove missing images
verify_images("./images", delete=True)
missing_images = list([img for img in labels["image"] if not(Path("./images") / img).exists()])
labels = labels[~labels["image"].isin(missing_images)]

# Create ImageDataBunch
data = ImageDataBunch.from_df(".", labels, folder="images", seed=42,
                              label_col="output", bs=4, size=224,
                              ds_tfms=get_transforms()).normalize(imagenet_stats)

ImageDataBunch with Downloaded Files

Move all your files into some kind of "images" directory on the GPU machine. Then run code similar to the code below. The example below is for image classification, but you should be able to modify the steps for other image tasks.

from fastai import *
from fastai.vision import *

df = pd.read_csv("./myfile.udt.csv")

# Create a filename column with just the filenames, remove the other columns
df["filename"] = [a.split("/")[-1] if isinstance(a,str) else a for a in list(df["imageUrl"])]
df = df[["filename", "output.classification"]]

# Remove any invalid entries
df.set_index("filename", inplace=True)
df.drop(np.NaN, inplace=True)
df.reset_index(inplace=True)

data = ImageDataBunch.from_df(".", idb_df, folder="./images", seed=42,
                              label_col="output.classification", bs=4, size=224,
                              ds_tfms=get_transforms()).normalize(imagenet_stats)

Now your databunch is ready for some learning!