Abstract: Monocular three-dimensional (3D) scene understanding tasks, e.g., object size angle and 3D position, estimation are challenging to perform. More successful current methods usually require ...