Skip to content
/ cLPR Public

A dataset of 3d cubes for learning pose and rotation.

License

Notifications You must be signed in to change notification settings

yvan/cLPR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLPR: Generating A 3D Dataset for learning pose and rotation

Today I'm releasing the CLPR dataset, a baseline dataset for doing work in 3D machine learning. The name stands for Cubes for Learning Pose and Rotation. The goal is to provide a nice baseline dataset of 2 simple colored 3D cubes in many different positions for testing machine learning algorithms whose goal is to learn about pose and rotation.

The data:

The dataset contains 32x32, 64x64, 128x128, 500x500 jpg images of 3 cubes (2d projections of the 3d object). Two of the cubes are essentially the same cube but the faces have been moved around, so if a red face is facing you you should expect to see completely different position/rotation coordinates. The third cub has different colors.

The labels:

The labels are provided in the binary numpy file. The file is called cubeN-x.xx.x.npy where x.xx.x is the numpy version used to dump the file and N is the cube this data file is for. It contains for every frame the position of the nodes, and the rotations applied to them for that cube. So the structure for one would look like:

np.array([
.
.
.
[[x,y,z,xrot,yrot,zrot],[x,y,z,xrot,yrot,zrot],[...],[...],[...],[...],[...],[...]], <-- single frame/rotation
[[x,y,z,xrot,yrot,zrot],[x,y,z,xrot,yrot,zrot],[...],[...],[...],[...],[...],[...]],
.
.
.
])

It's an FRAMES x NODES x 6 numpy array, where the first 3 values of the last dimension are x,y,z position and the last 3 values are rotation. You can also just create whatever labels you want using the script.

Making the dataset:

The dataset is produced using a package called 'pygame' which is a package originally designed for making python games. The package is nice because it lets you easily draw and render objects in 2d or 3d. OpenGL is another option. I didn't use it for a few reasons (it's hard to use, lots of jargon, difficult to read the code). In the future we may have an OpenGL version of the data generator for GPU optimized data generation. I wanted the code to be easy enough for someone with limited experience and basic knowledge of matrix multiplication to understand and accessible to people without GPU access.

The code used to generate the dataset is on github.

Let's go over how the code works.

Step 1 Create a wireframe

First we create a wireframe of the cube. This wireframe consists of nodes, edges, and faces. The nodes are points in space, each represented with x,y,z coordinates. We store all the nodes in an Nx3 numpy array. Edges are two nodes (the indices of the nodes in the numpy array). Faces are 4 nodes connected together (again the indices of the nodes in the numpy array).

You can create a wireframe like this:

# create the nodes
cube_nodes = [[x,y,z] for x in (0,1) for y in (0,1) for z in (0,1)]
cube_nodes = np.array(cube_nodes)
# create the facecolors
cube_faces = [[0,1,3,2], [7,5,4,6], [4,5,1,0], [2,3,7,6], [0,2,6,4], [5,7,3,1]]
cube_faces = np.array(cube_faces)
# add colors for the faces
cube_colors = [[255, 255, 255], [154,205,50], [128,0,0], [70,130,180], [75,0,130], [199,21,133]]
# create a cube with both
cube_colors = np.array(cube_colors)
cube = Wireframe(cube_nodes, cube_faces, cube_colors)

I use to create 8 nodes, 6 faces, and 24 edges. You don't actually need to specify edges, just nodes and faces. This creates a cube. You can run in bash:

python wireframe.py

The only dependencies are python's copy module and numpy. It will automatically create a cube and print the nodes, edges, and faces.

Step 2 Generate rotations

Then I generate a set of rotations I'd like to apply in 3D to the nodes of the cube (the corners of the cube). To generate the rotations I just loop through all possible combinations of rotations from 0-6.3 radians.

rotations = []
for rot_x in np.arange(0, 6.3, 0.3):
    for rot_y in np.arange(0, 6.3, 0.3):
        for rot_z in np.arange(0, 6.3, 0.3):
            rotations.append((rot_x,rot_y,rot_z))

The above snippet will generate 360 degrees of rotation in radians. Then every frame/iteration of pygame we will grab one of these rotations and display it.

Step 3 Render the wireframe

I render the wireframe using pygame's functions to draw polygons. The function takes a series of edges (2 points) and draws a polygon on screen with that color. Below is a quick and dirty version of my code.

for face in faces:
  pygame.draw.polygon(self.screen,
                      color,
                      [(nodes[node,0], nodes[node,1]) for node in face],
                      0)

Step 4 Apply one rotation and re-center the cube

To apply a rotation to this cube I use a rotation matrix for the appropriate axis. So to rotate around the x axis I would use create_rot_x. The functions are located in the wireframe.py file but are not part of the wireframe class. This function returns the appropriate matrix. Then I just do a dot product in the tranfrom function between the nodes of the wireframe and the rotation matrix we created. All this does is multiply the x,y,z positions of our nodes by the right numbers in the matrix such that the new positions are rotated by however many radians.

Step 5 Reset the cube nodes to their initial position

Just store the initial position and reset it to that, there's a method:

cube.reset_nodes()

Step 6 Repeat from step 3

While I do this I screen shot the pygame screen after every rotation, and store position information in a numpy array. During the process I write the images to disk, and at the end i store the

Final step

Visual inspection and programatic testing of labels. Basically I play back the numpy file and also run it through a function which compares it a against known correct values for the positions at every section.

Some things you can do with cLPR:

1 - Use unsupervised learning, Variational Auto Encoders to learn a representation that memorizes the content of the cube and the pose.

2 - Predict the position of the noes the cube from the visual 2d projection.

3 - Learn a representation for each cube.

Thanks

I want to thank express my thanks to peter collingridge who has some good pygame tutorials, and while I have done other blogs with pygame his overview was very helpful and the structure of my code borrows heavily from his python 2 implementation.

About

A dataset of 3d cubes for learning pose and rotation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published