Skip to content

Usage avec fast.ai

CedricJean edited this page May 22, 2020 · 1 revision

Les fichiers Universal Data Tool s'intègre facilement avec les librairies fast.ai. En premier lieu, exporte ton projet dans un fichier *.udt.csv.

Télécharge les urls et dépose les dans ImageDataBunch

Pour image_classification.

from fastai import *
from fastai.vision import *

df = pd.read_csv("./myfile.udt.csv")
samples = df[df.path.str.contains("samples")]

url_list = [str(a) for a in list(samples["imageUrl"])]
open("./images.txt", "w").write("\n".join(url_list))

# Télécharge tous les fichiers dans le répertoire "images"
download_images("./images.txt", "./images", max_pics = 10000)

labels = pd.DataFrame(data={
    "image": (df.index - 1).astype("str").str.zfill(8) + ".jpg",
    "output": df["output"]
})

# OPTIONNEL: Efface les mauvaises images ou celles manquantes
verify_images("./images", delete=True)
missing_images = list([img for img in labels["image"] if not(Path("./images") / img).exists()])
labels = labels[~labels["image"].isin(missing_images)]

# Créer ImageDataBunch
data = ImageDataBunch.from_df(".", labels, folder="images", seed=42,
                              label_col="output", bs=4, size=224,
                              ds_tfms=get_transforms()).normalize(imagenet_stats)

ImageDataBunch avec le téléchargement de fichier

Move all your files into some kind of "images" directory on the GPU machine. Then run code similar to the code below. The example below is for image classification, but you should be able to modify the steps for other image tasks.

from fastai import *
from fastai.vision import *

df = pd.read_csv("./myfile.udt.csv")

# Create a filename column with just the filenames, remove the other columns
df["filename"] = [a.split("/")[-1] if isinstance(a,str) else a for a in list(df["imageUrl"])]
df = df[["filename", "output.classification"]]

# Remove any invalid entries
df.set_index("filename", inplace=True)
df.drop(np.NaN, inplace=True)
df.reset_index(inplace=True)

data = ImageDataBunch.from_df(".", idb_df, folder="./images", seed=42,
                              label_col="output.classification", bs=4, size=224,
                              ds_tfms=get_transforms()).normalize(imagenet_stats)

Now your databunch is ready for some learning!