Skip to content

Latest commit

 

History

History
110 lines (81 loc) · 7.73 KB

webrtc-recognizer.md

File metadata and controls

110 lines (81 loc) · 7.73 KB

Back | Next | Contents
WebApp Frameworks

Recognizer (Interactive Training)

The Recognizer is a Flask-based video tagging/classification webapp with interactive data collection and training. As video is tagged and recorded, an updated model is incrementally re-trained in the background with PyTorch and then used for inference with TensorRT. Both inference and training can run simultaneously, and the re-trained models are dynamically loaded at runtime for inference.

It also supports multi-label tagging, and in addition to recording client video over WebRTC, existing images can be uploaded from the client. The main source files for this example (found under python/www/recognizer) are as follows:

Running the Example

Launching app.py will start a Flask webserver, a streaming thread that runs WebRTC and inferencing, and a training thread for PyTorch:

$ cd jetson-inference/python/www/recognizer
$ pip3 install -r requirements.txt
$ python3 app.py --data=data/my_dataset

note: receiving browser webcams requires HTTPS/SSL to be enabled

The --data argument sets the path where your dataset and models are stored under. If you built jetson-inference from source, you should elect to Install PyTorch (or run the install-pytorch.sh script again). If you're using the Docker container, PyTorch is already installed for you.

After running app.py, you should be able to navigate your browser to https://<JETSON-IP>:8050 and start the stream. The default port is 8050, but you can change that with the --port=N command-line argument. It's configured by default for WebRTC input and output, but if you want to use a different video input device, you can set that with the --input argument (for example, --input=/dev/video0 for a V4L2 camera)

Collecting Data

If needed first select a client camera from the stream source dropdown on the webpage, and press the Send button. When ready, enter class tag(s) of what the camera is looking at in the Tags selection box. Once a tag is entered, you'll be able to either Record or Upload images into the dataset. You can hold down the Record button to capture a video sequence. Below is a high-level diagram of the data flow:

graph LR
    camera([fa:fa-video-camera Camera])
    player([fa:fa-television Browser])
    subgraph server ["Jetson (Edge Server)"]
        decoder[Decoder]
        dataset[("Dataset")]
        training["Training"]
        inference["Inference"]
        decoder-.->|Record|dataset
        decoder-->inference
        dataset-->training
        training-->training
        training-- Models --> inference
    end
    camera-->decoder
    inference-- WebRTC -->player
Loading

It's recommended to keep the distribution of tags across the classes relatively balanced - otherwise the model will be more likely to be biased towards certain classes. You can view the label distribution and number of images in the dataset by expanding the Training dropdown.

Training

As you add and tag new data, training can be enabled under the Training dropdown. The training progress and accuracy will be updated on the page. At the end of each epoch, if the model has the highest accuracy it will be exported to ONNX and loaded into TensorRT for inference.

There are various command-line options for the training that you can set when starting app.py:

CLI Argument Description Default
--data Path to where the data and models will be stored data/
--net The DNN architecture (see here for options) resnet18
--net-width The width of the model (increase for higher accuracy) 224
--net-height The height of the model (increase for higher accuracy) 224
--batch-size Training batch size 1
--workers Number of dataloader threads 1
--optimizer The solver (adam or sgd) adam
--learning-rate Initial optimizer learning rate 0.001
--no-augmentation Disable color jitter / random flips on the training data Enabled

Inference

Inference can be enabled under the Classification dropdown. When multi-label classification is used (i.e. the dataset contains images with multiple tags), all classification results will be shown that have confidence scores above the threshold that can be controlled from the page.

The app can be extended to trigger actions when certain objects are detected by adding your own code to the Model.Classify() function:

def Classify(self, img):
   """
   Run classification inference and return the results.
   """
   if not self.inference_enabled:
      return
	  
   # returns a list of (classID, confidence) tuples
   self.results = self.model_infer.Classify(img, topK=0 if self.dataset.multi_label else 1)

   # to trigger custom actions/processing, add them here:
   for classID, confidence in self.results:
      if self.model_infer.GetClassLabel(classID) == 'person':             # update for your classes
         print(f"detected a person with {confidence * 100}% confidence")  # do something in response
   
   return self.results

When modifying backend server-side Python code, remember to restart app.py for changes to take effect. As with the previous Flask example, various REST queries are used for communicating dynamic settings and state changes between the client and server, which you can also add to.

Next | Camera Streaming and Multimedia
Back | Plotly Dashboard

© 2016-2023 NVIDIA | Table of Contents