YOLO Architecture-based Object Detection for Optimizing Performance in Video Streams

YOLO Architecture-based Object Detection for Optimizing Performance in Video Streams

© 2022 by IJETT Journal
Volume-70 Issue-11
Year of Publication : 2022
Authors : M. Maheswari, M. S. Josephine, V. Jeyabalaraja
DOI : 10.14445/22315381/IJETT-V70I11P220

How to Cite?

M. Maheswari, M. S. Josephine, V. Jeyabalaraja, "YOLO Architecture-based Object Detection for Optimizing Performance in Video Streams," International Journal of Engineering Trends and Technology, vol. 70, no. 11, pp. 187-196, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I11P220

Nowadays, capturing images with greater quality has become so simple because of the rapid growth in the quality of devices capturing the same. Image capturing is now being accomplished less expensively with the use of modern technologies. Videos are a series of pictures with regular intervals of time. Video offers extra data about the object when the situations change with respect to time intervals. Handling objects in the videos manually is very difficult, requiring the process's automation. In recent years, many developed techniques and training deep neural networks have been used to improve accuracy in object detection, which is computationally intensive. In certain situations, most of the areas in a video frame are background, and the salient objects enclose a little part of the area in the video frame. There is a strong temporal correlation between consecutive frames in a video. Based on these examinations, this work proposes a Convolutional Neural Network (CNN), which reduces the computational needs for video object detection tasks. CNN uses an enhanced YOLO platform for classifying and detecting objects by creating new CNN architecture. The proposed model renders an accuracy of 96.7% in classifying the objects.

Object Detection, Convolutional Neural Networks, Deep Learning, Videos, YoLo, Video Objects, Moving Cars detection

[1] Ahmad T., Ma, Y. Yahya, M.Ahmad, B., Nazir, S and Haq, A.U, "Object Detection through Modified YOLO Neural Network, An Intelligent Decision Support System," Scientific Programming, Article ID. 8403262, pp 1-10, 2020. Crossref, https://doi.org/10.1155/2020/8403262
[2] Balaji S. R and Karthikeyan S, "A Survey on Moving Object Tracking Using Image Processing,” In 2017 11th International Conference on Intelligent Systems and Control (ISCO), pp. 469-474, 2017. Crossref, https://doi.org/10.1109/ISCO.2017.7856037
[3] Bertasius G, Torresani L and Shi J, “Object Detection in Video with Spatiotemporal Sampling Networks,” In Proceedings of the European Conference on Computer Vision (ECCV), pp. 331-346, 2018. Crossref, https://doi.org/10.48550/arXiv.1803.05549
[4] Chauhan A. K and Krishan P, "Moving Object Tracking using Gaussian Mixture Model and Optical Flow," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 4, pp. 5212-5215, 2014.
[5] Chen T and Lu S, "Object-Level Motion Detection from Moving Cameras," IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 11, pp. 2333-2343, 2016. Crossref, https://doi.org/10.1109/TCSVT.2016.2587387
[6] Das D and Saharia S, "Implementation and Performance Evaluation of Background Subtraction Algorithms," International Journal on Computational Sciences & Applications (IJCSA), vol. 4, no. 2, pp. 49-54, 2014. Crossref, https://doi.org/10.48550/arXiv.1405.1815
[7] Eitel A, Springenberg J. T, Spinello L, Riedmiller M and Burgard W, "Multimodal Deep Learning for Robust RGB-D Object Recognition," In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681-687, 2015. Crossref, https://doi.org/10.1109/IROS.2015.7353446.
[8] Garg D and Kotecha K, "Object Detection from Video Sequences Using Deep Learning: An Overview," Advanced Computing and Communication Technologies,vol. 562, pp. 137-148, 2018. Crossref, https://doi.org/10.1007/978-981-10-4603-2_14
[9] Gupta R. K, “Object Detection and Tracking in Video Image,” Doctoral Dissertation, 2014.
[10] Hou R, Chen C and Shah M, "Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos," In Proceedings of the IEEE International Conference on Computer Vision, pp. 5822-5831, 2017.
[11] Kang K, Li H, Yan J, Zeng X, Yang B, Xiao T and Ouyang W, "T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2896-2907, 2018. Crossref, https://doi.org/10.1109/TCSVT.2017.2736553
[12] Sujata Chaudhari, Nisha Malkan, Ayesha Momin and Mohan Bonde, "Yolo Real-Time Object Detection," International Journal of Computer Trends and Technology, vol. 68, no. 6, pp. 70-76, 2020. Crossref, https://doi.org/10.14445/22312803/IJCTT-V68I6P112
[13] Tsang S H, “Review: R-CNN (Object Detection), Coinmonks,” 2019. [Online]. Available: https://medium.com/coinmonks/review-r-cnn-object-detection-b476aba290d1
[14] Kang K, Ouyang W, Li H, and Wang X, "Object Detection from Video Tubelets with Convolutional Neural Networks," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 817-825, 2016. Crossref, https://doi.org/10.1109/CVPR.2016.95
[15] Lekhak D, “Object Detection in Videos using Region Based Convolutional Neural Network,” Thesis, 2017.
[16] Mercanoglu O, Ajabshir V B, Keles H and Tosun S, “Moving Object Detection by a Mounted Moving Camera,” International Conference on Computer as a Tool, Spain, 2015. Crossref, https://doi.org/10.1109/EUROCON.2015.7313714
[17] Oneata D, Revaud J, Verbeek J and Schmid C, "Spatio-Temporal Object Detection Proposals," In European Conference on Computer Vision, Springer, Cham, pp. 737-752, 2014. Crossref, https://doi.org/10.1007/978-3-319-10578-9_48
[18] Ordania S, “Detecting Cars in a Parking Lot using Deep learning,” Masters Dissertation in Computer Science, San Jose State University, pp. 62, 2019. Crossref, https://doi.org/10.31979/etd.m6as-epyd
[19] Panchal P, Prajapati G, Patel S, Shah H and Nasriwala J, "A Review on Object Detection and Tracking Methods," International Journal for Research in Emerging Science and Technology, vol. 2, no. 1, pp. 7-12, 2015.
[20] Peng X and Schmid C, "Multi-Region Two-Stream R-CNN for Action Detection," In European Conference on Computer Vision, Springer, Cham, pp. 744-759, 2016.
[21] Rajkumar, R and Arunnehru J, "A Study on Convolutional Neural Networks with Active Video Tubelets for Object Detection and Classification," In Soft Computing and Signal Processing, Springer, Singapore, pp. 107-115, 2019. Crossref, https://doi.org/10.1007/978-981-13-3393-4_12
[22] Ren S, He K, Girshick R and Sun J, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," In Advances in Neural Information Processing Systems (NIPS), vol. 1, pp. 91–99, 2015. Crossref, https://doi.org/10.5555/2969239.2969250
[23] Rosi S, Meshach W T and Prakash J S, "A Survey on Object Detection and Object tracking in Videos," International Journal of Scientific and Research Publications, vol. 4, no. 11, pp. 1-4, 2014.
[24] Schwarz M, Schulz H and Behnke S, "RGB-D Object Recognition and Pose Estimation Based on Pre-Trained Convolutional Neural Network Features," In Robotics and Automation (ICRA), IEEE International Conference, pp. 1329–1335, 2015. Crossref, https://doi.org/10.1109/ICRA.2015.7139363
[25] Ms.S.Supraja and P.Ranjith Kumar, "An Intelligent Traffic Signal Detection System Using Deep Learning," SSRG International Journal of VLSI & Signal Processing, vol. 8, no. 1, pp. 5-9, 2021. Crossref, https://doi.org/10.14445/23942584/IJVSP-V8I1P102
[26] Song H, Liang H, Li H, Dai Z and Yun X, "Vision-Based Vehicle Detection and Counting System using Deeplearning in Highway Scenes," Journal of Cardiovascular Electrophysiology, vol. 11, pp. 1-11, 2019.
[27] Tang P, Wang C, Wang X, Liu W, Zeng W and Wang J, "Object Detection in Videos by High Quality Object Linking," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 5, pp. 1-7, 2019. Crossref, https://doi.org/10.1109/TPAMI.2019.2910529
[28] Tian B, Li L, Qu Y and Yan L, "Video Object Detection for Tractability with Deep Learning Method," In 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD), pp. 397-401, 2017. Crossref, https://doi.org/10.1109/CBD.2017.75
[29] Wang A, Lu J, Cai J, Cham T. J and Wang G, "Large-Margin Multimodal Deep Learning for RGB-D Object Recognition," IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 1887-1898, 2015. Crossref, https://doi.org/10.1109/TMM.2015.2476655
[30] Yazdi M and Bouwmans T, "New Trends on Moving Object Detection in Video Images Captured by a Moving Camera: A Survey," Computer Science Review, vol. 28, no. 2, pp. 157-177, 2018. Crossref, https://doi.org/10.1016/j.cosrev.2018.03.001
[31] Zeng X, Ouyang W, Yang B, Yan J and Wang X, "Gated Bi-Directional CNN for Object Detection," In European Conference on Computer Vision Springer, Cham, vol. 9911, pp. 354-369, 2016. Crossref, https://doi.org/10.1007/978-3-319-46478-7_22
[32] Zia S, Yuksel B, Yuret D and Yemez Y,"RGB-D Object Recognition using Deep Convolutional Neural Networks," In Proceedings of the IEEE International Conference on Computer Vision, pp. 896-903, 2017.
[33] Danyang Cao, Zhixin Chen and Lei Gao, "An Improved Object Detection Algorithm Based on Multi-Scaled and Deformable Convolutional Neural Networks," Human-centric Computing and Information Sciences, vol. 10, pp. 1-22,2020. Crossref, https://doi.org/10.1186/s13673-020-00219-9
[34] Manish Suyal and Parul Goyal, "A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning," International Journal of Engineering Trends and Technology, vol. 70, no. 7, pp. 43-48, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I7P205
[35] Buhler K, Lambert J and Vilim M, “Real-time Object Tracking in Video CS 229 Course Project,” 2016.
[36] Shahare D and Shende R, "Moving Object Detection with Fixed Camera and Moving Camera for Automated Video Analysis," International Journal of Computer Applications Technology and Research, vol. 3, no. 5, pp. 277-283, 2014.