RGB-D Scenes Dataset

This dataset contains 8 scenes annotated with a subset of the objects in the RGB-D Object Dataset (bowls, caps, cereal boxes, coffee mugs, and soda cans). Each scene is a point cloud created by aligning a set of video frames using RGB-D Mapping*. These 3D reconstructions and ground truth object annotations are exactly those used in our ICRA 2012 paper (see below).

Windows users: The dataset was compressed into a tarball using Linux. Some Windows extractors have problems reading the files. One program that can be used to extract the data in Windows is 7zip ( http://www.7-zip.org/ ). Open the single file extracted from the tarball again with 7zip to unpack it into a directory.

The files are named as follows:
<scene>.pcd - A 3D point cloud stored in PCD format, readable with the ROS Point Cloud Library (PCL). Each point is stored with 4 fields: the 3D coordinate (x, y, z), and the color packed into 24 bits with 8 bits per channel (rgb). Software for reading PCD files is available at http://www.cs.washington.edu/rgbd-dataset/software.html.
<scene>.pose - The camera poses for each constituent frame estimated by RGB-D Mapping, where each line is one video frame. The two numbers before the colon form the timestamp. The 7 numbers after the colon is the camera pose (a,b,c,d,x,y,z), where a,b,c,d is the camera orientation expressed as a quarternion, and x,y,z is the camera position in the global coordinate frame. The camera pose of the first frame of each video is always the origin of the global coordinate frame.
<scene>.label - The object labels of each 3D point in the scene. The first number is the number of points in the scene. This is followed by the labels of every point, in the same order as they appear in <scene>.pcd, where 0 is background, 1 is bowl, 2 is cap, 3 is cereal box, 4 is coffee mug, and 5 is soda can.


MATLAB code example for transforming 3D points of a video frame into the global coordinate frame:
Supposing "poses" is a Nx7 matrix storing the camera poses of the N video frames, the following code snippet will transform a set of 3D points from the depth image "depth" of the i-th video frame into the global coordinate frame.

pcloud = depthToCloud(depth);
pts = reshape(pcloud,size(pcloud,1)*size(pcloud,2),3);
pts = bsxfun(@plus,poses(i,5:7),quatrotate([poses(i,1) -poses(i,2:4)],pts));


*Peter Henry, Michael Krainin, Evan Herbst, Xiaofeng Ren, and Dieter Fox. RGB-D Mapping: Using Kinect-Style Depth Cameras for Dense 3D Modeling of Indoor Environments. International Journal of Robotics Research, 2012.


Please cite the following papers if you use this dataset:

Detection-based Object Labeling in 3D Scenes 
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox 
IEEE International Conference on Robotics and Automation (ICRA), May 2012.

A Large-Scale Hierarchical Multi-View RGB-D Object Dataset 
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox 
IEEE International Conference on Robotics and Automation (ICRA), May 2011.