Deep Perception

Environment perception using deep learning techniques

Repository structure

|_ data
|  |_ images
|  |  |_ training
|  |     |_ image_2 // KITTI train images are here
|  |  |_ testing
|  |     |_ image_2 // KITTI Test images are here
|  |__labels
      |_ labels_2 // KITTI Training labels are here
|_ lib // External code is here

Data structure

#Values    Name      Description
----------------------------------------------------------------------------
   1    type         Describes the type of object: 'Car', 'Van', 'Truck',
                     'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
                     'Misc' or 'DontCare'
   1    truncated    Float from 0 (non-truncated) to 1 (truncated), where
                     truncated refers to the object leaving image boundaries
   1    occluded     Integer (0,1,2,3) indicating occlusion state:
                     0 = fully visible, 1 = partly occluded
                     2 = largely occluded, 3 = unknown
   1    alpha        Observation angle of object, ranging [-pi..pi]
   4    bbox         2D bounding box of object in the image (0-based index):
                     contains left, top, right, bottom pixel coordinates
   3    dimensions   3D object dimensions: height, width, length (in meters)
   3    location     3D object location x,y,z in camera coordinates (in meters)
   1    rotation_y   Rotation ry around Y-axis in camera coordinates [-pi..pi]
   1    score        Only for results: Float, indicating confidence in
                     detection, needed for p/r curves, higher is better.

Todo

Download dataset
Load the data into our current model
[] Listen to YT lectures and read into papers to get a more profund idea of the layers of a convolutional network
[] Define first model for dataset
[] Sliding window should work with cascading classifiers
[] Sliding window should work with different receptive sizes and stages
[] Add whitening option for preprocessing
[] Thresholding of the nll output of the classifiers

Ideas

Downscale images before training

Questions

How do conv nets correctly handle input images of different sizes? What is the best way for downsampled images? Can we feed conv nets different sizes
More Ressources what the single layers in a conv network are actually doing.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
lib		lib
models		models
papers		papers
presentation		presentation
results		results
scripts		scripts
.gitignore		.gitignore
0_cross_data.lua		0_cross_data.lua
0_extract_data.lua		0_extract_data.lua
1_data.lua		1_data.lua
2_model.lua		2_model.lua
3_loss.lua		3_loss.lua
4_train.lua		4_train.lua
5_test.lua		5_test.lua
6_perform_test.lua		6_perform_test.lua
7_show.lua		7_show.lua
LICENSE		LICENSE
README.md		README.md
doall.lua		doall.lua
drawLabels.lua		drawLabels.lua
dst.lua		dst.lua
mt.lua		mt.lua
non_maxima.lua		non_maxima.lua
train_convnet_binary_mean.t7		train_convnet_binary_mean.t7
train_convnet_binary_std.t7		train_convnet_binary_std.t7
train_convnet_car_mean.t7		train_convnet_car_mean.t7
train_convnet_car_std.t7		train_convnet_car_std.t7
train_mean.t7		train_mean.t7
train_std.t7		train_std.t7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Perception

Repository structure

Data structure

Todo

Ideas

Questions

Links

About

Uh oh!

Releases

Packages

Languages

License

gothma/deep-perception

Folders and files

Latest commit

History

Repository files navigation

Deep Perception

Repository structure

Data structure

Todo

Ideas

Questions

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages