This project demonstrates real-time object detection using the SSD MobileNet V3 architecture, with pre-trained weights from the COCO dataset. The model is used to detect objects in a webcam feed using OpenCV.
MobileNet is a family of neural network architectures optimized for mobile and embedded vision applications. SSD (Single Shot MultiBox Detector) combined with MobileNet V3 is lightweight and designed for efficient object detection in real-time on devices with limited computational resources.
- ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt: Configuration file specifying the model's architecture and layers.
- frozen_inference_graph.pb: Pre-trained model weights, frozen for inference.
- OpenCV (
opencv-python
) - Python 3.x
- Webcam or video input device
-
Clone the repository or download the project files.
-
Install the required libraries:
pip install opencv-python
-
Download the necessary model files:
- Ensure that your webcam is connected.
- Run the
object_detection.py
script:The webcam will open, and you will see real-time detection of objects with bounding boxes and labels.python object_detection.py
import cv2
def Camera():
cam = cv2.VideoCapture(1) # Capture from Webcam
cam.set(3, 740) # Set width
cam.set(4, 580) # Set height
##Loading Class Names and Model
classFile = 'coco.names'
with open(classFile, 'rt') as f:
classNames = f.read().rstrip('\n').split('\n')
configPath = 'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt'
weightpath = 'frozen_inference_graph.pb'
net = cv2.dnn_DetectionModel(weightpath, configPath)
net.setInputSize(320, 230)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)
##Object Detection and Display
while True:
success, img = cam.read()
classIds, confs, bbox = net.detect(img, confThreshold=0.5)
if len(classIds) != 0:
for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
cv2.putText(img, classNames[classId-1], (box[0] + 10, box[1] + 20),
cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), thickness=2)
##Saving Video Output
while True:
success, img = cam.read()
classIds, confs, bbox = net.detect(img, confThreshold=0.5)
if len(classIds) != 0:
for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
cv2.putText(img, classNames[classId-1], (box[0] + 10, box[1] + 20),
cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), thickness=2)
##Exiting the Application
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cam.release()
out.release()
cv2.destroyAllWindows()
##Object Classes This model is trained on the COCO dataset and can detect the following object categories:
person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, TV, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair drier, toothbrush
Here's the README.md
section formatted to include the complete workflow and additional information:
- Connect your Webcam: Ensure that the webcam is properly connected and functioning.
- Run the Script: Run the object detection script
object_detection.py
. The video feed from your webcam will be displayed with bounding boxes around detected objects. - Press 'q' to Exit: You can press the
q
key anytime to exit the application and stop the video feed. - Save Video Output: The detected output is saved in the file
output.avi
by default.
- This project is designed for lightweight applications, suitable for real-time object detection on resource-constrained devices.
- The SSD MobileNet V3 model used here is trained on the COCO dataset, which can recognize up to 90 different objects in real-time.
- Make sure to adjust the webcam source (e.g.,
cv2.VideoCapture(1)
) depending on your webcam setup. It might be necessary to usecv2.VideoCapture(0)
or another index if the default one doesn't work.
This project is licensed under the MIT License. You are free to use, modify, and distribute this project as long as proper attribution is provided.
Contributions are welcome! Feel free to open issues or submit pull requests if you have suggestions or improvements.