Real-Time Webcam-Based Hand Gesture Recognition with Face Authentication for 3D Drone Simulation in Godot Engine

Authors

  • Muhammad Rizqi Sholahuddin Politeknik Negeri Bandung
  • Siti Dwi Setiarini Politeknik Negeri Bandung
  • Ardhian Ekawijana Politeknik Negeri Bandung
  • Muhammad Samudera Politeknik Negeri Bandung
  • Firas Atqiya Universitas Padjadjaran

DOI:

https://doi.org/10.35194/mji.v18i1.6483

Keywords:

hand gesture recognition , Godot Engine , drone simulation , face authentication , MediaPipe

Abstract

Most hand gesture control systems for drones depend on specialized hardware such as Leap Motion or Kinect, which raises the cost barrier for educational institutions in developing countries. Integrating face authentication within the same low-cost pipeline remains under-explored. This study develops a real-time, webcam-based system that combines Google MediaPipe hand tracking with facial authentication and a Godot Engine 4.3 3D drone simulation for authenticated, responsive gesture control. A finger-counting algorithm classifies eight gestures across two hands. The left hand drives horizontal motion (forward, backward, left, right) and the right hand drives altitude and yaw (up, down, rotate left, rotate right). Commands travel over UDP to Godot, where a receiver node translates each packet into a native input action. Face authentication uses dlib and the face_recognition library with a 60-frame login counter. All metrics were collected under a fixed condition (normal lighting 300–500 lux, 0.8 m, one subject). The system achieved 100% gesture accuracy across 160 trials, 35.6 FPS pipeline throughput, 0.33 ms one-way UDP latency with 0% packet loss, and 23.9 ms end-to-end gesture-to-drone latency. Face authentication scored 100% recognition with 0% FRR and 19.0% FAR against an unregistered face at the default 0.6 tolerance. A standard-webcam pipeline built entirely from open-source components can deliver responsive, authenticated gesture control for interactive drone simulation, though the single-subject evaluation is an upper bound requiring multi-subject validation. However, the 100% accuracy represents an upper bound as evaluation was limited to a single subject under controlled lighting (300–500 lux) and a fixed distance (0.8 m), requiring further validation across diverse users and environments

References

[1] M. Oudah, A. Al-Naji, and J. Chahl, “Hand Gesture Recognition Based on Computer Vision: A Review of Techniques,” Journal of Imaging, vol. 6, no. 8, p. 73, 2020, doi: 10.3390/jimaging6080073.

[2] Y. Zhang, J. Wang, X. Wang, H. Jing, Z. Sun, and Y. Cai, “Static Hand Gesture Recognition Method Based on the Vision Transformer,” Multimedia Tools and Applications, vol. 82, no. 20, pp. 31309–31328, 2023, doi: 10.1007/s11042-023-14732-3.

[3] H. Shakhatreh et al., “Unmanned Aerial Vehicles (UAVs): A Survey on Civil Applications and Key Research Challenges,” IEEE Access, vol. 7, pp. 48572–48634, 2019, doi: 10.1109/access.2019.2909530.

[4] A. Mairaj, A. I. Baba, and A. Y. Javaid, “Application Specific Drone Simulators: Recent Advances and Challenges,” Simulation Modelling Practice and Theory, vol. 94, pp. 100–117, 2019, doi: 10.1016/j.simpat.2019.01.004.

[5] D. Tezza and M. Andujar, “The State-of-the-Art of Human-Drone Interaction: A Survey,” IEEE Access, vol. 7, pp. 167438–167454, 2019, doi: 10.1109/access.2019.2953900.

[6] N. M. Bhiri, S. Ameur, I. Alouani, M. A. Mahjoub, and A. B. Khalifa, “Hand Gesture Recognition with Focus on Leap Motion: An Overview, Real World Challenges and Future Directions,” Expert Systems with Applications, vol. 226, p. 120125, 2023, doi: 10.1016/j.eswa.2023.120125.

[7] S. S. Rautaray and A. Agrawal, “Vision Based Hand Gesture Recognition for Human Computer Interaction: A Survey,” Artificial Intelligence Review, vol. 43, no. 1, pp. 1–54, 2015, doi: 10.1007/s10462-012-9356-9.

[8] Z. Zhao, H. Luo, G.-H. Song, Z. Chen, Z.-M. Lu, and X. Wu, “Web-Based Interactive Drone Control Using Hand Gesture,” Review of Scientific Instruments, vol. 89, no. 1, 2018, doi: 10.1063/1.5004004.

[9] Y. Yu, X. Wang, Z. Zhong, and Y. Zhang, “ROS-Based UAV Control Using Hand Gesture Recognition,” in 2017 29th Chinese Control And Decision Conference (CCDC), 2017, pp. 6795–6799. doi: 10.1109/ccdc.2017.7978402.

[10] K. Natarajan, T.-H. D. Nguyen, and M. Mete, “Hand Gesture Controlled Drones: An Open Source Library,” in 2018 1st International Conference on Data Intelligence and Security (ICDIS), 2018, pp. 168–175. doi: 10.1109/icdis.2018.00035.

[11] C. Lugaresi et al., “MediaPipe: A Framework for Building Perception Pipelines,” arXiv preprint arXiv:1906.08172, 2019.

[12] G. Sánchez-Brizuela, A. Cisnal, E. de la Fuente-López, J.-C. Fraile, and J. Pérez-Turiel, “Lightweight Real-Time Hand Segmentation Leveraging MediaPipe Landmark Detection,” Virtual Reality, vol. 27, no. 4, pp. 3125–3132, 2023, doi: 10.1007/s10055-023-00858-0.

[13] A. K. Singh, V. A. Kumbhare, and K. Arthi, “Real-Time Human Pose Detection and Recognition Using MediaPipe,” in Advances in Intelligent Systems and Computing, Springer, 2022, pp. 145–154. doi: 10.1007/978-981-16-7088-6_12.

[14] A. D. Agustiani, M. R. Sholahuddin, S. M. Putri, and P. Hidayatullah, “Penggunaan MediaPipe untuk Pengenalan Gesture Tangan Real-Time dalam Pengendalian Presentasi,” Media Jurnal Informatika, vol. 16, no. 2, pp. 147–153, 2024, doi: 10.35194/mji.v16i2.4788.

[15] M. A. Kassab, M. Ahmed, A. Maher, and B. Zhang, “Real-Time Human-UAV Interaction: New Dataset and Two Novel Gesture-Based Interacting Systems,” IEEE Access, vol. 8, pp. 195030–195045, 2020, doi: 10.1109/access.2020.3033157.

[16] B. Chen, C. Hua, D. Li, Y. He, and J. Han, “Intelligent Human–UAV Interaction System with Joint Cross-Validation over Action–Gesture Recognition and Scene Understanding,” Applied Sciences, vol. 9, no. 16, p. 3277, 2019, doi: 10.3390/app9163277.

[17] J. Akagi, T. D. Morris, B. Moon, X. Chen, and C. K. Peterson, “Gesture Commands for Controlling High-Level UAV Behavior,” SN Applied Sciences, vol. 3, no. 6, 2021, doi: 10.1007/s42452-021-04583-8.

[18] A. Imran, R. Ahmed, M. M. Hasan, M. H. U. Ahmed, A. K. M. Azad, and S. A. Alyami, “FaceEngine: A Tracking-Based Framework for Real-Time Face Recognition in Video Surveillance System,” SN Computer Science, vol. 5, no. 5, 2024, doi: 10.1007/s42979-024-02922-1.

[19] A. Sikarwar, H. Chandra, and I. Ram, “Real-Time Biometric Verification and Management System Using Face Embeddings,” in 2020 IEEE 17th India Council International Conference (INDICON), 2020, pp. 1–4. doi: 10.1109/indicon49873.2020.9342551.

[20] F. A. Fayaz, S. Mohi-Ud-Din, I. Batool, S. Kaur, and M. Rashid, “Novel Face Recognition Based Examinee Authentication System Using Python D-Lib,” in 2019 Fifth International Conference on Image Information Processing (ICIIP), 2019, pp. 480–485. doi: 10.1109/iciip47207.2019.8985983.

[21] C.-C. Tsai, C.-C. Kuo, and Y.-L. Chen, “3D Hand Gesture Recognition for Drone Control in Unity,” in 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), 2020, pp. 985–988. doi: 10.1109/case48305.2020.9216807.

[22] A. H. Hoppe, D. Klooz, F. van de Camp, and R. Stiefelhagen, “Mouse-Based Hand Gesture Interaction in Virtual Reality,” in Communications in Computer and Information Science, Springer, 2023, pp. 192–198. doi: 10.1007/978-3-031-36004-6_26.

[23] K. R. Dixit, T. Verma, U. S. Subramanya, and V. Umadevi, “Hand Gesture Based Quadcopter Control Using Image Processing and Adaptive Machine Learning,” in 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), 2018, pp. 1214–1218. doi: 10.1109/rteict42901.2018.9012139.

[24] J. Levy and D. Liu, “Extended Reality (XR) Environments for Flood Risk Management with 3D GIS and Open Source 3D Graphics Cross-Platform Game Engines,” in Lecture Notes in Networks and Systems, Springer, 2023, pp. 271–285. doi: 10.1007/978-981-99-1912-3_25.

[25] L. A. Fagundes-Junior, K. B. de Carvalho, R. S. Ferreira, and A. S. Brandão, “Machine Learning for Unmanned Aerial Vehicles Navigation: An Overview,” SN Computer Science, vol. 5, no. 2, 2024, doi: 10.1007/s42979-023-02592-5.

[26] M. Channayanamath, A. Math, S. Kamath, K. Chachadi, F. Sabeeh, A. Attar, and V. Peddigari, “Dynamic Hand Gesture Recognition Using 3D-Convolutional Neural Network,” in Lecture Notes in Networks and Systems, Springer, 2020, pp. 145–153. doi: 10.1007/978-981-15-5397-4_16.

[27] H. Jin, Z. Jin, Y.-G. Kim, and C. Fan, “Integration of a Lightweight Customized 2D CNN Model to an Edge Computing System for Real-Time Multiple Gesture Recognition,” Journal of Grid Computing, vol. 21, no. 4, 2023, doi: 10.1007/s10723-023-09715-5.

[28] A. A. Q. Mohammed, J. Lv, and M. S. Islam, “A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition,” Sensors, vol. 19, no. 23, p. 5282, 2019, doi: 10.3390/s19235282.

Downloads

Published

2026-06-29

How to Cite

Sholahuddin, M. R., Setiarini, S. D., Ekawijana, A., Samudera, M., & Atqiya, F. (2026). Real-Time Webcam-Based Hand Gesture Recognition with Face Authentication for 3D Drone Simulation in Godot Engine. Media Jurnal Informatika, 18(1), 139–151. https://doi.org/10.35194/mji.v18i1.6483