Image-Captioning

Image Captioning is a deep learning based project which is deployed on web. The details of the project is mentioned below with the relevant images.

You saw an image and your brain can easily tell what the image is about, but can a computer tell what the image is representing? Caption generation is a challenging artificial intelligence problem where a textual description must be generated for a given photograph. It requires both methods from computer vision to understand the content of the image and a language model from the field of natural language processing to turn the understanding of the image into words in the right order.

Objective

Image caption generator is a task that involves computer vision and natural language processing concepts to recognize the context of an image and describe them in a natural language like English. Here are some objectives of this project :

● Encode images and their respective captions.
● Train a model to generate captions.
● Deploy the model as a web app so it can be used globally.

Many popular computer vision applications involve trying to recognize things in photographs;
for example:
• Object Classification: What broad category of object is in this photograph?
• Object Identification: Which type of a given object is in this photograph?
• Object Verification: Is the object in the photograph?
• Object Detection: Where are the objects in the photograph?
• Object Landmark Detection: What are the key points for the object in the photograph? • Object Segmentation: What pixels belong to the object in the image?
• Object Recognition: What objects are in this photograph and where are they?
Outside of just recognition, other methods of analysis include:
• Video Motion Analysis uses computer vision to estimate the velocity of objects in a video, or the camera itself.
• In Image Segmentation, algorithms partition images into multiple sets of views.
• Scene reconstruction creates a 3D model of a scene inputted through images or video.
• In Image restoration, noise such as blurring is removed from photos using Machine Learning based filters.

Model Architecture

The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv. layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2.
Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has a different depth in different architectures): the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks.
All hidden layers are equipped with the rectification (ReLU) non-linearity. It is also noted that none of the networks (except for one) contain Local Response Normalisation (LRN), such normalization does not improve the performance on the ILSVRC dataset, but leads to increased memory consumption and computation time.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Model_Images		Model_Images
Notebooks		Notebooks
Screenshots		Screenshots
static		static
templates		templates
.gitignore		.gitignore
Caption_generator.py		Caption_generator.py
README.md		README.md
app.py		app.py
descriptions.txt		descriptions.txt
model_9.h5		model_9.h5
requirements.txt		requirements.txt
tokenizer.pkl		tokenizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-Captioning

Objective

Model Architecture

CNN Architecture

RNN Architecture

Preview Of The Combined Model

Results

Thank you for coming this far!

About

Releases

Packages

Languages

work-mohit/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Image-Captioning

Objective

Model Architecture

CNN Architecture

RNN Architecture

Preview Of The Combined Model

Results

Thank you for coming this far!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages