From Monocular to Learned vSLAM

Authors

Abstract

Size, Weight and Power (SWaP) constraints in robotics cause vSLAM strategies to prefer using monocular cameras due to their high information-to-weight ratio and miniature size. Conventional monoSLAM methodologies compete with stereo and RGB-D SLAM on the front of localization; however, 3D reconstruction of the environment is limited to sparse point clouds. Performance under pure rotation of camera, inherent scale ambiguity and map initialization are few of the many impediments in realizing monoSLAM which arise due to the fact that depth information of the scene is lost when captured by a monocular camera. This imposes adherence to complex algorithms to ameliorate stated issues. In this regard, deep learning architectures have given a new tool to vSLAM strategies by predicting depth maps on learned monocular cues or even regress the full state of the sensor by learning optical flow. Amalgam of these CNNs with conventional vSLAM strategies has given birth to a new class of vision: Learned vSLAM. Motivated by the success of these intelligent vSLAM architectures and their potential role in realization of truly miniature robots, we provide a comprehensive review of Learned vSLAM strategies with their eminence over conventional monoSLAM and impeding limitation.

References

A. Bachrach, R. He, and N. Roy, “Autonomous flight in unknown indoor environments,†International Journal of Micro Air Vehicles, vol. 1, no. 4, pp. 217–228, 2009.

M. Achtelik, A. Bachrach, R. He, S. Prentice, and N. Roy, “Stereo vision and laser odometry for autonomous helicopters in gps-denied indoor environments,†in Unmanned Systems Technology XI, vol. 7332. International Society for Optics and Photonics, Conference Proceedings, p. 733219.

A. Bachrach, S. Prentice, R. He, and N. Roy, “Range–robust autonomous navigation in gps-denied environments,†Journal of Field Robotics, vol. 28, no. 5, pp. 644–666, 2011.

F. Abrate, B. Bona, and M. Indri, “Experimental ekf-based slam for minirovers with ir sensors only,†in EMCR, Conference Proceedings.

M. Bl¨osch, S. Weiss, D. Scaramuzza, and R. Siegwart, “Vision based mav navigation in unknown and unstructured environments,†in Robotics and automation (ICRA), 2010 IEEE international conference on. IEEE, Conference Proceedings, pp. 21–28.

K. Celik, S.-J. Chung, and A. Somani, “Mono-vision corner slam for indoor navigation,†in Electro/Information Technology, 2008. EIT 2008. IEEE International Conference on. IEEE, Conference Proceedings, pp. 343–348.

A. S. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox, and N. Roy, Visual odometry and mapping for autonomous flight using an RGB-D camera. Springer, 2017, pp. 235–252.

F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard, “An evaluationofthergb-dslamsystem,â€inIcra,vol.3,ConferenceProceedings, pp. 1691–1696.

J. Engel, J. St¨uckler, and D. Cremers, “Large-scale direct slam with stereo cameras,†in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Conference Proceedings, pp. 1935–1942.

R. Mur-Artal and J. D. Tard´os, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,†IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.

T. Taketomi, H. Uchiyama, and S. Ikeda, “Visual slam algorithms: A survey from 2010 to 2016,†IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 1, p. 16, 2017.

S. M. Abbas and A. Muhammad, “Outdoor rgb-d slam performance in slow mine detection,†in ROBOTIK 2012; 7th German Conference on Robotics. VDE, Conference Proceedings, pp. 1–6.

K. Tateno, F. Tombari, I. Laina, and N. Navab, “Cnn-slam: Real-time dense monocular slam with learned depth prediction,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6243– 6252.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,†International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.

E. Royer, M. Lhuillier, M. Dhome, and J.-M. Lavest, “Monocular vision for mobile robot localization and autonomous navigation,†International Journal of Computer Vision, vol. 74, no. 3, pp. 237–260, 2007.

A. J. Davison, “Real-time simultaneous localisation and mapping with a single camera,†in Iccv, vol. 3, Conference Proceedings, pp. 1403–1410.

G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,†in Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE Computer Society, Conference Proceedings, pp. 1–10.

R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,†IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.

R.A.Newcombe, S.J.Lovegrove, and A.J.Davison,“Dtam:Densetracking and mapping in real-time,†in 2011 international conference on computer vision. IEEE, Conference Proceedings, pp. 2320–2327.

J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,†in European Conference on Computer Vision. Springer, 2014, pp. 834–849.

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,†in Advances in neural information processing systems, Conference Proceedings, pp. 2366–2374.

I. Sutskever, G. E. Hinton, and A. Krizhevsky, “Imagenet classiï¬cation with deep convolutional neural networks,†Advances in neural information processing systems, pp. 1097–1105, 2012.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,†arXiv preprint arXiv:1409.1556, 2014.

B. Li, C. Shen, Y. Dai, A. Van Den Hengel, and M. He, “Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs,†in Proceedings of the IEEE conference on computer vision and pattern recognition, Conference Proceedings, pp. 1119–1127.

F. Liu, C. Shen, and G. Lin, “Deep convolutional neural ï¬elds for depth estimation from a single image,†in Proceedings of the IEEE conference on computer vision and pattern recognition, Conference Proceedings, pp. 5162–5170.

P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. L. Yuille, “Towards uniï¬ed depth and semantic prediction from a single image,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Conference Proceedings, pp. 2800–2809.

I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,†in 2016 Fourth International Conference on 3D Vision (3DV). IEEE, 2016, pp. 239–248.

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Conference Proceedings, pp. 270–279.

H. Laga, “A survey on deep learning architectures for image-based depth reconstruction,†arXiv preprint arXiv:1906.06113, 2019.

M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davison, “Codeslam—learning a compact, optimisable representation for dense visual slam,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Conference Proceedings, pp. 2560–2568.

J. Tang, J. Folkesson, and P. Jensfelt, “Sparse2dense: From direct sparse odometry to dense 3-d reconstruction,†IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 530–537, 2019.

A. Valada, N. Radwan, and W. Burgard, “Deep auxiliary learning for visual localization and odometry,†in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Conference Proceedings, pp. 6939–6946.

S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,†in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Conference Proceedings, pp. 2043–2050.

A. Kendall and R. Cipolla, “Geometric loss functions for camera pose regression with deep learning,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Conference Proceedings, pp. 5974–5983.

P. Agrawal, J. Carreira, and J. Malik, “Learning to see by moving,†in Proceedings of the IEEE International Conference on Computer Vision, Conference Proceedings, pp. 37–45.

S. Y. Loo, A. J. Amiri, S. Mashohor, S. H. Tang, and H. Zhang, “Cnn-svo: Improving the mapping in semi-direct visual odometry using single-image depth prediction,†arXiv preprint arXiv:1810.01011, 2018.

Y. Li, C. Xie, H. Lu, X. Chen, J. Xiao, and H. Zhang, “Scale-aware monocular slam based on convolutional neural network,†08 2018.

S. Vijayanarasimhan, S. Ricco, C. Schmid, R. Sukthankar, and K. Fragkiadaki, “Sfm-net: Learning of structure and motion from video,†arXiv preprint arXiv:1704.07804, 2017.

B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox, “Demon: Depth and motion network for learning monocular stereo,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Conference Proceedings, pp. 5038–5047.

T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Conference Proceedings, pp. 1851–1858.

F. T. K. Tateno and N. Nawab, “Real-time and scalable incremental segmentation on dense slam,†2015.

A. Handa, T. Whelan, J. McDonald, and A. J. Davison, “A benchmark for rgb-d visual odometry, 3d reconstruction and slam,†in 2014 IEEE international conference on Robotics and automation (ICRA). IEEE, 2014, pp. 1524–1531.

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,†in 2012 IEEE/RSJ

International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 573–580.

S. R. T. R. M. E. R. B. U. F. S. R. M. Cordts, M. Omran and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,†in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.

P. L. A. Geiger and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,†in Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3354–3361.

C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,†in 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp. 15–22.

R. Wang, M. Schworer, and D. Cremers, “Stereo dso: Large-scale direct sparse visual odometry with stereo cameras,†in Proceedings of the IEEE International Conference on Computer Vision, Conference Proceedings, pp. 3903–3911.

R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,†IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.

H. Zhou, B. Ummenhofer, and T. Brox, “Deeptam: Deep tracking and mapping,†in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 822–838.

S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic scene completion from a single depth image,†in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1746–1754.

J. Xiao, A. Owens, and A. Torralba, “Sun3d: A database of big spaces reconstructed using sfm and object labels,†in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1625–1632.

H. Luo, Y. Gao, Y. Wu, C. Liao, X. Yang, and K.-T. Cheng, “Real-time dense monocular slam with online adapted depth prediction network,†IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 470–483, 2018.

R. Garg, V. K. BG, G. Carneiro, and I. Reid, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,†in European Conference on Computer Vision. Springer, 2016, pp. 740–756.

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,†in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.

H. Alismail, B. Browning, and M. B. Dias, “Evaluating pose estimation methods for stereo visual odometry on robots,†2010.

F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,†in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 472–480.

J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,†IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611–625, 2017.

S. Song, S. P. Lichtenberg, and J. Xiao, “Sun rgb-d: A rgb-d scene understanding benchmark suite,†in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 567–576.

S. B. Knorr and D. Kurz, “Leveraging the user’s face for absolute scale estimation in handheld monocular slam,†in 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, Conference Proceedings, pp. 11–17.

J. Mustaniemi, J. Kannala, S. Sarkka, J. Matas, and J. Heikkila, “Inertial based scale estimation for structure from motion on mobile devices,†in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Conference Proceedings, pp. 4394–4401.

R. Li, S. Wang, and D. Gu, “Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities,†Cognitive Computation, vol. 10, no. 6, pp. 875–889, 2018.

Published

2022-03-24
دستگاه بافت مو جوراب افزایش قد ژل افزایش قد