This repository contains the source code for the Indoor Scene Detector application. Indoor Scene Detector is a full stack computer vision application built with PyTorch, Captum, Flask, React, Docker, Heroku and GitHub Pages. You can access the application at www.indoorscenedetector.com.
Indoor Scene Detector was created and is maintained by Nico Van den Hooff and Melissa Liow.
Nico maintains the backend, MLOps, and DevOps components of the application. Melissa maintains the frontend component of the application.
Indoor Scene Detector is capable of classifying images of an indoor scene, such as a bedroom or a kitchen. Currently, Indoor Scene Detector includes support for ten categories of scenes: airport, bakery, bar, bedroom, kitchen, living room, pantry, restaurant, subway, and warehouse. Support for more classes is currently under development.
In order to classify a scene, there are four convolutional neural networks available. These include tuned versions of AlexNet, ResNet, or DenseNet, in addition to a simple "vanilla" CNN that has no transfer learning applied to it. If AlexNet, ResNet or DenseNet are used, Indoor Scene Detector demonstrates the power of transfer learning in computer vision, as the tuned versions of these networks should obtain a much higher accuracy in predictions when compared to the simple CNN with no transfer learning.
The data set used in training the CNNs that power Indoor Scene Detector is the Indoor Scene Recognition data set collected by MIT. Right now, the beta version of Indoor Scene Detector is trained on a subset of this data set, which includes 10 out of 67 total classes. The 10 classes used are those with the most pictures for their respective classes in the entire dataset itself. Within these classes there are a total of 5,661 images (approximately 500 images per class). Each model has been trained for of 25 epochs, unless early stopping with a patience of 5 epochs occured.
Each CNN will output the top three predictions for an image ranked by probability in descending order. In addition, a heatmap of the images Saliency attributes is plotted. Saliency is an algorithm that attempts to explain the predictions a CNN makes by calculating the gradient of the output with respect to the input. The absolute value of Saliency attributes can be taken to represent feature importance. To learn more, please see the original paper, or the Captum documentation.
- Open the application at: www.indoorscenedetector.com
- Select one of the preloaded images or upload your own to classify.
- Select the convolutional neural network you would like to use to classify the image.
- Press submit and your image will be classified.
To learn more about making a contribution to Indoor Scene Detector, please see the contributing file.
-
Install Docker for your operating system.
-
Open up a terminal and run the following commands:
a. Clone our repository
git clone https://github.com/nicovandenhooff/indoor-scene-detector.git
b. Change working directories
cd indoor-scene-detector
c. Run the application
docker-compose up
-
In a web browser, navigate to http://localhost:3000/ to view the application.
-
Once you are finished with the application, run the following command the same terminal:
d. To shut down the Docker images, containers etc.
docker-compose down
These steps will require two terminals, referred to as terminal A and terminal B.
-
Open up terminal A and run the following commands to start the frontend:
git clone https://github.com/nicovandenhooff/indoor-scene-detector.git cd indoor-scene-detector/client yarn install yarn start
-
Open up terminal B and run the following commands to start the backend:
cd indoor-scene-detector/api python3 -m venv venv source venv/bin/activate pip install -r requirements.txt cd ../client yarn start-api
-
If it hasn't opened already, navigate to http://localhost:3000/ in a web browser to view the application.
The data set used in building Indoor Scene Detector was the Indoor Scene Recognition data set collected by MIT.