Back | Next | Contents
WebApp Frameworks
The Recognizer is a Flask-based video tagging/classification webapp with interactive data collection and training. As video is tagged and recorded, an updated model is incrementally re-trained in the background with PyTorch and then used for inference with TensorRT. Both inference and training can run simultaneously, and the re-trained models are dynamically loaded at runtime for inference.
It also supports multi-label tagging, and in addition to recording client video over WebRTC, existing images can be uploaded from the client. The main source files for this example (found under python/www/recognizer
) are as follows:
app.py
(webserver)stream.py
(WebRTC streaming thread)model.py
(DNN inferencing + training)dataset.py
(data tagging + recording)index.html
(frontend presentation)
Launching app.py will start a Flask webserver, a streaming thread that runs WebRTC and inferencing, and a training thread for PyTorch:
$ cd jetson-inference/python/www/recognizer
$ pip3 install -r requirements.txt
$ python3 app.py --data=data/my_dataset
note: receiving browser webcams requires HTTPS/SSL to be enabled
The --data
argument sets the path where your dataset and models are stored under. If you built jetson-inference from source, you should elect to Install PyTorch (or run the install-pytorch.sh
script again). If you're using the Docker container, PyTorch is already installed for you.
After running app.py, you should be able to navigate your browser to https://<JETSON-IP>:8050
and start the stream. The default port is 8050, but you can change that with the --port=N
command-line argument. It's configured by default for WebRTC input and output, but if you want to use a different video input device, you can set that with the --input
argument (for example, --input=/dev/video0
for a V4L2 camera)
If needed first select a client camera from the stream source dropdown on the webpage, and press the Send
button. When ready, enter class tag(s) of what the camera is looking at in the Tags selection box. Once a tag is entered, you'll be able to either Record or Upload images into the dataset. You can hold down the Record button to capture a video sequence. Below is a high-level diagram of the data flow:
graph LR
camera([fa:fa-video-camera Camera])
player([fa:fa-television Browser])
subgraph server ["Jetson (Edge Server)"]
decoder[Decoder]
dataset[("Dataset")]
training["Training"]
inference["Inference"]
decoder-.->|Record|dataset
decoder-->inference
dataset-->training
training-->training
training-- Models --> inference
end
camera-->decoder
inference-- WebRTC -->player
It's recommended to keep the distribution of tags across the classes relatively balanced - otherwise the model will be more likely to be biased towards certain classes. You can view the label distribution and number of images in the dataset by expanding the Training
dropdown.
As you add and tag new data, training can be enabled under the Training
dropdown. The training progress and accuracy will be updated on the page. At the end of each epoch, if the model has the highest accuracy it will be exported to ONNX and loaded into TensorRT for inference.
There are various command-line options for the training that you can set when starting app.py:
CLI Argument | Description | Default |
---|---|---|
--data |
Path to where the data and models will be stored | data/ |
--net |
The DNN architecture (see here for options) | resnet18 |
--net-width |
The width of the model (increase for higher accuracy) | 224 |
--net-height |
The height of the model (increase for higher accuracy) | 224 |
--batch-size |
Training batch size | 1 |
--workers |
Number of dataloader threads | 1 |
--optimizer |
The solver (adam or sgd ) |
adam |
--learning-rate |
Initial optimizer learning rate | 0.001 |
--no-augmentation |
Disable color jitter / random flips on the training data | Enabled |
Inference can be enabled under the Classification
dropdown. When multi-label classification is used (i.e. the dataset contains images with multiple tags), all classification results will be shown that have confidence scores above the threshold that can be controlled from the page.
The app can be extended to trigger actions when certain objects are detected by adding your own code to the Model.Classify()
function:
def Classify(self, img):
"""
Run classification inference and return the results.
"""
if not self.inference_enabled:
return
# returns a list of (classID, confidence) tuples
self.results = self.model_infer.Classify(img, topK=0 if self.dataset.multi_label else 1)
# to trigger custom actions/processing, add them here:
for classID, confidence in self.results:
if self.model_infer.GetClassLabel(classID) == 'person': # update for your classes
print(f"detected a person with {confidence * 100}% confidence") # do something in response
return self.results
When modifying backend server-side Python code, remember to restart app.py for changes to take effect. As with the previous Flask example, various REST queries are used for communicating dynamic settings and state changes between the client and server, which you can also add to.
Next | Camera Streaming and Multimedia
Back | Plotly Dashboard
© 2016-2023 NVIDIA | Table of Contents