Leveraging Multi-View Information for Scene Understanding

Elkhouly, MOHAMED DAHY ABDELZAHER

doi:10.15167/elkhouly-mohamed-dahy-abdelzaher_phd2021-03-05

The Humans can effortlessly recognize and perceive the information of the 3D scene on their daily pass of dozens of scenes, e.g. office, bedroom, elevator, kitchen. Even though this process is effortless for humans, it is an open challenge in the Computer Vision domain since the field was first established 60 years ago. 3D scene understanding from multi-views is key for the success of applications such as robot navigation and autonomous driving, etc. In this thesis, we seek to exploit multi-view scenes information to help indoor scene understanding to overcome challenges dependant on visual effects like shadows, specularities, and occlusions towards better scene understanding. Towards this goal, we propose techniques based on multiview scenes with corresponding 3D geometry to estimate semantic color-names, detect multi-view specularities, estimate multi-view 3D mesh colors, and estimate light source positions. We use available large-scale datasets (e.g. Matterport3D) for annotating real-world images datasets like Matterport3D Color Naming Dataset (M3DCN), and Matterport3D Light Sources dataset (M3DLS) while generating and rendering new 3D synthetic datasets like LIGHT Specularity Dataset (LIGHTS) to serve as evaluation and analysis datasets for semantic color-naming, light source position estimation, and high-light specularity detection problems respectively. We demonstrate the effectiveness of our proposed techniques in comparison to the state-of-the-art techniques and show significant improvements over the baselines in quantitative and qualitative evaluations.