We introduce Room Envelopes, a synthetic dataset that provides dual pointmap representations for indoor scene reconstruction. Each image comes with three complementary views: RGB image, visible surface (depth and normals), and layout surface (depth and normals), with examples below. The visible surface captures all directly visible geometry including furniture and objects, while the layout surface shows structural elements as they would appear without occlusion. This dual representation enables direct supervision for layout reconstruction in occluded regions.
RGB Image
Visible Surface
Depth & Normals
Layout Surface
Depth & Normals
RGB Image
Visible Surface
Depth & Normals
Layout Surface
Depth & Normals
Room Envelopes Dataset: An indoor synthetic dataset providing an image, a visible surface pointmap, and a layout surface pointmap for each camera pose
Feed-forward Scene Reconstruction: A model demonstrating effective room layout estimation using this dataset
Novel Representation: The only dataset providing both first visible depth and layout depth representations for comprehensive indoor scene understanding
Current models for indoor scene reconstruction typically use depth images or layered depth representations for training, but these formats have inherent limitations:
Room Envelopes addresses these limitations by providing:
We trained a layout estimation model by fine-tuning a feed-forward depth estimator, and ran it on real-world indoor scenes. The model, trained exclusively on our synthetic dataset, shows promising generalization capabilities to real-world environments.
Real-World RGB Image
Predicted Layout Depth
Predicted Layout Normals
@article{bahrami2025roomenvelopes,
title={Room Envelopes: A Synthetic Dataset for Indoor Layout Reconstruction from Images},
author={Sam Bahrami and Dylan Campbell},
year={2025},
eprint={2511.03970},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.03970},
}