Description
Describe the bug
Per the documentation, it seems that the api/ml/{id}/train
should trigger the training process on the ML backend, which should trigger the fit
function of the LabelStudioMLBase
model. However, when running this command, even via curl
there is no response from the label-studio
server nor does the fit
method get triggered.
Here is the current version of the docker-compose.yml
file for my project:
version: "3.8"
services:
redis:
image: redis:alpine
container_name: redis
hostname: redis
volumes:
- "./data/redis:/data"
expose:
- 6379
labeling:
container_name: labeling_container
image: heartexlabs/label-studio:v1.5.0
ports:
- 8080:8080
depends_on:
- modeling
volumes:
- ./data:/label-studio/data
environment:
- LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
- LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data/media
command: >
bash -c "
label-studio start
--log-level DEBUG
--sampling prediction-score-min
--ml-backends http://modeling_container:9090"
restart: always
modeling:
container_name: modeling_container
build:
context: ./modeling
command: >
bash -c "
label-studio-ml init modeling_backend
--script tools/${MODEL:-model.py}
--force true
&&
label-studio-ml start ./modeling_backend
--port 9090
--debug "
restart: always
volumes:
- ./data/media:/data/
environment:
- MODEL_DIR=/data/models
- RQ_QUEUE_NAME=default
- REDIS_HOST=redis
- REDIS_PORT=6379
- USE_REDIS=true
ports:
- 9090:9090
depends_on:
- redis
links:
- redis
Here is my model.py
file for the ML backend.
from importlib.resources import path
import torch
import torch.nn as nn
import torch.optim as optim
import time
import os
import numpy as np
import requests
import io
import hashlib
import urllib
import cv2
import pathlib
import urllib.parse as urlparse
from skimage import io, color
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms
from label_studio_ml.model import LabelStudioMLBase
from label_studio_ml.utils import get_single_tag_keys, get_choice, is_skipped
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
import layoutparser as lp
image_cache_dir = os.path.join(os.path.dirname(__file__), 'image-cache')
os.makedirs(image_cache_dir, exist_ok=True)
def load_image_from_url(url):
# is_local_file = url.startswith('http://localhost:') and '/data/' in url
# purl = pathlib.Path(url)
pres = urlparse.urlparse(url)
if pres.scheme == '':
purl = pathlib.Path(url)
url = purl.as_uri()
im = io.imread(url)
if len(im.shape) < 3:
# needs to be converted to rgb
im = color.gray2rgb(im)
return im
def convert_block_to_value(block, image_height, image_width):
return {
"height": block.height / image_height*100,
"choices": [str(block.type)],
"rotation": 0,
"width": block.width / image_width*100,
"x": block.coordinates[0] / image_width*100,
"y": block.coordinates[1] / image_height*100,
"score": block.score
}
class ObjectDetectionAPI(LabelStudioMLBase):
def __init__(self, freeze_extractor=False, **kwargs):
super(ObjectDetectionAPI, self).__init__(**kwargs)
# label_map_list = os.environ['LABEL_MAP'].split()
# {int(label_map_list[i]): str(label_map_list[i+1]) for i in range(0, len(label_map_list), 2)}
print('parsed label config:\n ')
print(self.parsed_label_config)
self.from_name, self.to_name, self.value, self.classes =\
get_single_tag_keys(self.parsed_label_config, 'RectangleLabels', 'Image')
self.freeze_extractor = freeze_extractor
self.model = lp.Detectron2LayoutModel(
config_path = 'lp://detectron2/PrimaLayout/mask_rcnn_R_50_FPN_3x/config',
# model_path = 'https://www.dropbox.com/s/bitxe8occzb865u/model_final.pth?dl=1',
### PLEASE REMEMBER TO CHANGE `dl=0` INTO `dl=1` IN THE END
### OF DROPBOX LINKS
extra_config=["MODEL.ROI_HEADS.NMS_THRESH_TEST", 0.2,
"MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "text"}
)
def reset_model(self):
# self.model = ImageClassifier(len(self.classes), self.freeze_extractor)
pass
def predict(self, tasks, **kwargs):
# print('tasks: ', tasks)
print(kwargs)
print('self.value: ', self.value)
image_urls = [task['data'][self.value] for task in tasks]
print('image urls: ', image_urls)
images = [load_image_from_url(url) for url in image_urls]
print('im sizes: ', [im.shape for im in images])
layouts = [self.model.detect(image) for image in images]
print('label config: ', self.parsed_label_config)
print('layouts: ', layouts)
predictions = []
for image, layout in zip(images, layouts):
height, width = image.shape[:2]
result = [
{
'from_name': self.from_name,
'to_name': self.to_name,
"original_height": height,
"original_width": width,
"source": "$image",
'type': 'rectanglelabels',
"value": convert_block_to_value(block, height, width),
} for block in layout
]
predictions.append({'result': result})
return predictions
def fit(self, tasks, workdir=None,
batch_size=32, num_epochs=10, **kwargs):
print("now running the fit function....")
image_urls, image_classes = [], []
# print('Collecting completions...')
# for completion in completions:
# if is_skipped(completion):
# continue
# image_urls.append(completion['data'][self.value])
# image_classes.append(get_choice(completion))
print('tasks: ', tasks)
print('image urls: ', image_urls)
print('image classes: ', image_classes)
# print('Creating dataset...')
# dataset = ImageClassifierDataset(image_urls, image_classes)
# dataloader = DataLoader(dataset, shuffle=True, batch_size=batch_size)
# print('Train model...')
# # self.reset_model()
# self.model.train(dataloader, num_epochs=num_epochs)
# print('Save model...')
# model_path = os.path.join(workdir, 'model.pt')
# self.model.save(model_path)
return {'model_path': None, 'classes': None}
Right now, there isn't much in the fit
function, I just wanted to make sure it was working however nothing gets printed to the logs of the modeling_container
.
To Reproduce
Steps to reproduce the behavior:
- Log in to
http://localhost:8080
- Create a new project (test)
- Add data and configuration. In my case I'm using rectangular bounding boxes.
- Add the ML backend in settings. Will need to use
http://modeling_container:9090
since all containers are on the same docker-compose network. - Add data/annotations
- The auto-predictions in the case do indeed work, triggering the
predict
function specified inmodel.py
- Go to
Settings->Machine Learning
and clickStart Training
on the connected ML backend curl -X POST http://localhost:8080/api/ml/{id}/train -H 'Authorization: Token <token>'
also does nothing.
Expected behavior
Code in the fit
function should trigger when the curl command is launched or "Start Training" button is clicked.
Screenshots
Can provide if needed.
Environment (please complete the following information):
- OS: Ubuntu 18.04 running docker 20.10.17, build 100c701 and docker-compose v 1.29.1, build c34c88b
- Label Studio Version 1.5.0
Additional context
It's entirely possible that I'm not configuring the project correctly, so please let me know.