Environment perception using deep learning techniques
|_ data
| |_ images
| | |_ training
| | |_ image_2 // KITTI train images are here
| | |_ testing
| | |_ image_2 // KITTI Test images are here
| |__labels
|_ labels_2 // KITTI Training labels are here
|_ lib // External code is here
#Values Name Description
----------------------------------------------------------------------------
1 type Describes the type of object: 'Car', 'Van', 'Truck',
'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
'Misc' or 'DontCare'
1 truncated Float from 0 (non-truncated) to 1 (truncated), where
truncated refers to the object leaving image boundaries
1 occluded Integer (0,1,2,3) indicating occlusion state:
0 = fully visible, 1 = partly occluded
2 = largely occluded, 3 = unknown
1 alpha Observation angle of object, ranging [-pi..pi]
4 bbox 2D bounding box of object in the image (0-based index):
contains left, top, right, bottom pixel coordinates
3 dimensions 3D object dimensions: height, width, length (in meters)
3 location 3D object location x,y,z in camera coordinates (in meters)
1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
1 score Only for results: Float, indicating confidence in
detection, needed for p/r curves, higher is better.
- Download dataset
- Load the data into our current model
- [] Listen to YT lectures and read into papers to get a more profund idea of the layers of a convolutional network
- [] Define first model for dataset
- [] Sliding window should work with cascading classifiers
- [] Sliding window should work with different receptive sizes and stages
- [] Add whitening option for preprocessing
- [] Thresholding of the nll output of the classifiers
- Downscale images before training
- How do conv nets correctly handle input images of different sizes? What is the best way for downsampled images? Can we feed conv nets different sizes
- More Ressources what the single layers in a conv network are actually doing.