🏠 Home
~2 min read
Hand Tracking

Hand Tracking and Augmented Reality Interaction

Real-time hand tracking and gesture-based interaction using OpenCV, MediaPipe, and ModernGL

Introduction

This project focuses on building an Augmented Reality (AR) system for real-time hand tracking and gesture-based interaction. Leveraging OpenCV, MediaPipe, and ModernGL, the application captures live video, detects hands in 3D space, and enables intuitive manipulation of virtual 3D objects. The system supports grabbing and moving a virtual cube using natural gestures like pinching, creating an immersive AR experience using only a standard webcam.

System Overview

The AR pipeline is built in Python, integrating several key technologies:

  • MediaPipe for real-time 2D and 3D hand landmark detection.
  • OpenCV for video capture and visualization.
  • Moderngl for rendering textured 3D content, such as cubes and markers.
The application aligns the 3D landmarks with live video feed using solvePnP and renders interaction-aware 3D graphics using OpenGL.

Implementation Details

The system starts by capturing webcam frames and passing them through MediaPipe's HandLandmarker to extract both 2D image-space and relative 3D model-space landmarks. OpenCV visualizes initial detections, and then the program calculates world-space coordinates by solving the Perspective-n-Point (PnP) problem using OpenCV's solvePnP. This transformation enables proper alignment between physical hands and virtual overlays.

3D Rendering and Interactions

Moderngl is used to render a virtual cube, textured using shaders, overlaid on the camera feed. Users can interact with the cube using pinch gestures. The index finger’s proximity to the cube triggers a “grab” state, letting the user reposition the cube within the 3D space. Fingertips and important joints are highlighted using marker meshes for visual feedback. The system supports gesture recognition and real-time updates with smooth animation.

Gesture Detection

The application defines a pinch gesture by detecting minimal distance between the thumb and index fingertips. Combined with proximity checks against the virtual cube, this triggers interactive actions such as dragging. These mechanics allow intuitive and physically consistent manipulation in the AR environment.

Performance and Results

The AR system runs in real time, consistently achieving over 10 FPS on a typical laptop. Visual re-projection of 3D landmarks ensures accurate alignment with the user's real hand. Color and saturation adjustments in shaders enhance visual clarity, and fallback logic ensures usability under varying lighting and movement speeds.

Project Demo

Watch a demonstration of the hand tracking and AR interaction capabilities below:

Developed by Aryan Singh. Explore the full implementation on GitHub.