KinoVision: Gesture & Vision Based Smart Home Control

Published January 17, 2026 0

u uploader
Author

KinoVision - Where Visual Sensing Meets Motion-Driven Control

By Dev Bhavsar

KinoVision is a wearable, gesture-controlled, vision-based smart home automation system that reimagines how users interact with their living spaces. Unlike traditional smart home systems that rely on mobile apps, voice commands, or physical remotes, KinoVision enables appliance control using natural hand gestures combined with visual context awareness. By simply looking at a device, performing a wrist shake to identify it, and tilting the hand left or right, users can turn appliances ON or OFF in an intuitive and touchless manner.

The motivation behind this project was to address the friction and limitations commonly found in existing smart home interfaces - such as app dependency, voice recognition errors, false triggers, and a lack of natural interaction. KinoVision explores a more human-centric approach, where what the user sees and how the user moves become the primary modes of interaction.

The system is built around three cooperating modules: a gesture wristband that detects hand movements, a vision pendant that identifies appliances using a camera and AI-based image understanding, and a home hub that executes control actions through relays. Together, these components form a seamless interaction loop that allows users to select and control appliances using vision-based selection followed by gesture-based commands.

Components Required for KinoVision Wristband

KinoVision is implemented as a fully working prototype and is designed as a stepping stone toward future AR-based smart home interaction systems, where vision and motion can merge into an even more immersive and hands-free control experience. This project was developed as part of the CircuitDigest Smart Home & Wearables Contest 2025, with hardware support from DigiKey.

Component Name	Quantity	Datasheet/Link
Adafruit Memento Programmable Camera	1	View Datasheet
XIAO ESP32-S3	1	View Datasheet
MPU6050 IMU Sensor	1	View Datasheet
ESP32 Development Board	1	View Datasheet
Relay Module	1	-
Light Bulb	1	-
3.7V LiPo Battery	2	-
USB Cables	2	-
Breadboard	1	-
Jumper Wires	10	-
Wrist Band	1	-

Circuit Diagram

KinoVision Wristband - XIAO ESP32-S3 to MPU6050 Connection

The wristband uses I2C communication between the XIAO ESP32-S3 and the MPU6050 accelerometer/gyroscope module.

The MPU6050 continuously measures acceleration across three axes (X,Y,Z). The XIAO ESP32-S3 reads these values via I2C at the default address 0x68 and classifies them into gestures. The entire wristband assembly is powered by a small LiPo battery connected to the ESP's battery input.

KinoVision Wristband - XIAO ESP32-S3 to MPU6050 Connection

HUB Circuit Diagram:

The hub uses an ESP-32 connected to a 5V single-channel relay module. GPIO 25 acts as the control signal; when it goes HIGH, the relay coil activates and closes the circuit between the LIVE wire and the light bulb. The NEUTRAL wire connects directly to the bulb without interruption. This configuration means the appliance is OFF by default (Normally Open), ensuring safety on power-up.

Vision Pendant Integration & System Communication:

The Adafruit Memento - Vision Pendant is integrated into the system through MQTT communication rather than direct wiring with the relay circuit. While the wristband communicates with the hub using ESP-NOW, the hub and Memento exchange messages through Adafruit IO - MQTT broker.

Hardware Assembly

The KinoVision system consists of three hardware modules assembled separately and later integrated: the Gesture Wristband, the ESP32 Home Hub, and the Vision Pendant (Memento). The following steps describe the complete assembly process.

1. Gesture Wristband Assembly

The gesture wristband is built around a custom PCB based on the XIAO ESP32-S3 and the MPU6050 IMU sensor.

Step 1: Solder the XIAO ESP32-S3

Mount the XIAO ESP32-S3 onto the custom PCB.
Ensure proper alignment of all pins.
Solder each pad carefully to avoid cold joints.

Step 2: Solder the MPU6050 IMU Sensor

Connect the MPU6050 using I2C (SDA, SCL).
Ensure VCC and GND are properly connected.
Double-check continuity before powering.

Step 3: Connect Power Supply

Connect a 3.7V LiPo battery to the board.
Verify voltage output before powering the system.
Ensure polarity is correct.

Step 4: Initial Testing

Upload firmware to the XIAO ESP32-S3.
Open serial monitor to verify gesture classification output.
Confirm that SHAKE, LEFT, and RIGHT are detected reliably.

Step 5: Enclosure Assembly

Place the PCB inside the 3D-printed wristband enclosure.
Secure using screws or adhesive.
Attach the wearable strap.

2. ESP32 Home Hub Assembly

The demonstration in this project uses the ESP32-based Home Automation V3 module developed at Techiesms Studio. This board integrates the ESP32, relay driver circuitry, power regulation, and enclosure in a compact form factor.

For clarity and reproducibility, the essential wiring logic of the hub is shown below using a simplified ESP32 + relay configuration.

Step 1: Mount ESP32 on Breadboard / PCB

Insert ESP32 development board.
Connect 5V and GND rails.

Step 2: Connect Relay Module

GPIO25 → Relay IN
5V → Relay VCC
GND → Relay GND

Step 3: Connect Appliance

Connect LIVE wire through relay COM and NO terminals.
Connect NEUTRAL wire directly to bulb.

Step 4: Upload Hub Firmware

Flash ESP32 hub firmware.
Verify MQTT connection.
Confirm relay toggles when gesture command is received.

3. Vision Pendant Assembly (Adafruit Memento)

The Memento is used as a wearable vision module housed in a 3D-printed pendant enclosure.

Step 1: Flash Firmware

Upload vision detection firmware to the Memento.
Configure Wi-Fi and OpenAI API key.
Verify image capture functionality.

Step 2: Test MQTT Subscription

Ensure Memento subscribes to the gesture feed.
Verify image capture is triggered when SHAKE is published.

Step 3: Mount in 3D Enclosure

Insert the Memento into the 3D-printed pendant case.
Ensure camera lens remains unobstructed.
Attach wearable chain or strap.

Code Explanation

KinoVision consists of three distinct programs working together to create the complete gesture control system. Each program runs on different hardware and handles specific responsibilities in the workflow.

System Overview

The system architecture consists of:

Wristband Code (Arduino C++) - Gesture detection and ESP-NOW transmission
Hub Code (Arduino C++) - Central coordinator with ESP-NOW receiver and MQTT bridge
Memento Code (CircuitPython) - AI vision and device identification

1. WRISTBAND CODE (Arduino C++)
Hardware: XIAO ESP32-S3 + MPU6050 Accelerometer

File: wristband.ino

Purpose:

Detects hand gestures (SHAKE, LEFT, RIGHT) and wirelessly transmits them to the hub using ESP-NOW.

Code Sections:

A. Gesture Detection Configuration
// ---------- EASY GESTURE CONFIG ----------

#define SHAKE_THRESHOLD 18.0 // Rapid movement threshold

#define TILT_THRESHOLD 6.0 // Left/right tilt sensitivity

#define MIN_MOVEMENT 10.0 // Minimum movement to register

#define GESTURE_COOLDOWN 1200 // Milliseconds between gestures

Why these values?

SHAKE_THRESHOLD: To make shake gestures easier to trigger
TILT_THRESHOLD: Set to 6.0 for comfortable left/right wrist tilts
MIN_MOVEMENT: Prevents false triggers from minor hand tremors
GESTURE_COOLDOWN: 1.2 seconds prevents accidental rapid-fire gestures

B. Sensor Calibration

void calibrate() {
baseX = 0;
baseY = 0;
baseZ = 0;
 
for (int i = 0; i < 100; i++) {
 sensors_event_t a, g, t;
 mpu.getEvent(&a, &g, &t);
 baseX += a.acceleration.x;
 baseY += a.acceleration.y;
 baseZ += a.acceleration.z;
 delay(20);
}
 
baseX /= 100;
baseY /= 100;
baseZ /= 100;
}

What this does:

Takes 100 samples over 2 seconds while wrist is at rest
Calculates average baseline acceleration in X, Y, Z axes
This baseline is subtracted from all future readings to remove gravity's influence
Critical: Allows the system to detect relative movement, not absolute position

C. Gesture Classification Algorithm

String classifyGesture(float x, float y, float z, float total) {
// SHAKE: Any rapid movement
if (total > SHAKE_THRESHOLD) {
 return "SHAKE";
}
 
// LEFT: Tilt wrist to the left (negative Y)
if (y < -TILT_THRESHOLD && total > MIN_MOVEMENT) {
 return "LEFT";
}
 
// RIGHT: Tilt wrist to the right (positive Y)
if (y > TILT_THRESHOLD && total > MIN_MOVEMENT) {
 return "RIGHT";
}
 
return "IDLE";
}

How it works:

Total magnitude = √(x² + y² + z²) - measures overall movement intensity
Priority order: SHAKE is checked first (highest threshold), then LEFT/RIGHT
Y-axis dominance: Left/right tilts primarily change the Y-axis value
Dual conditions: Tilt gestures require both Y threshold AND minimum movement to prevent false triggers

Example values:

Shake: total = 22.5 → "SHAKE"
Tilt right: y = 8.2, total = 12.1 → "RIGHT"
Tilt left: y = -7.1, total = 11.5 → "LEFT"
Slight movement: total = 8.0 → "IDLE" (below threshold)

D. ESP-NOW Wireless Transmission

// Set WiFi channel to match hub
esp_wifi_set_channel(HUB_CHANNEL, WIFI_SECOND_CHAN_NONE);
// Send gesture packet
esp_now_send(HUB_MAC, (uint8_t*)&packet, sizeof(packet));

Why ESP-NOW?

Ultra-low latency: ~5-10ms transmission time (vs. 100ms+ for WiFi)
No router needed: Direct device-to-device communication
Low power: Minimal battery drain compared to WiFi/Bluetooth
Channel locking: Both devices must be on the same WiFi channel for communication

2. HUB CODE (Arduino C++)

Hardware: ESP32 Dev Kit V1 + Relay Module

File: esp32_hub.ino

Purpose:

Acts as the central coordinator - receives gestures via ESP-NOW, communicates with Memento via MQTT, and controls the relay to switch appliances.

Key Code Sections:

A. Dual Communication Setup

// ESP-NOW for wristband (fast, local)
esp_now_register_recv_cb(onEspNowRecv);
// MQTT for Memento (WiFi, cloud)
Adafruit_MQTT_Client mqtt(&client, AIO_SERVER, AIO_SERVERPORT,
            AIO_USERNAME, AIO_KEY);

Why two protocols?

ESP-NOW (wristband → hub): Millisecond-level latency for instant gesture response
MQTT (hub ↔ Memento): Pub/sub architecture perfect for AI processing pipeline
Trade-off: ESP-NOW doesn't work over internet; MQTT does but has higher latency

B. SHAKE Gesture Handler

if (gesture == "shake") {
// Publish SHAKE to MQTT gesture feed
if (gesturePublish.publish("SHAKE")) {
 Serial.println("Published!");
 waitingForDevice = true;
 deviceRequestTime = millis();
 Serial.println("Waiting for device identification...");
}
}
```

Workflow:

1. Receives "SHAKE" from wristband via ESP-NOW

2. Publishes "SHAKE" to Adafruit IO MQTT feed `dev4522/feeds/gesture`

3. Sets `waitingForDevice = true` to track state

4. Records timestamp for 45-second timeout

State machine:

```

IDLE → SHAKE received → WAITING_FOR_DEVICE → Device identified → READY_TO_CONTROL

↓ (45s timeout)

IDLE (reset)

C. Device Identification Response Handler

while ((subscription = mqtt.readSubscription(100))) {
if (subscription == &deviceSub) {
 String device = String((char*)deviceSub.lastread);
 device.trim();
 device.toLowerCase();
 
 if (device == "light" || device == "fan" || device == "ac") {
  currentDevice = device;
  Serial.println("Device identified: " + currentDevice);
  waitingForDevice = false;
 }
}
}

What happens:

Continuously polls MQTT feed dev4522/feeds/device every 100ms
When Memento publishes device type (e.g., "LIGHT"), hub receives it
Stores device type in currentDevice variable
Exits waiting state - now ready to accept LEFT/RIGHT control gestures

D. LEFT/RIGHT Control

else if (gesture == "left" || gesture == "right") {
if (currentDevice != "") {
 if (gesture == "left") {
  setRelay(false); // Turn OFF
 } else {
  setRelay(true);  // Turn ON
 }
} else {
 Serial.println("No device detected yet");
}
}

Logic:

Guards against: Controlling relay before device identification
LEFT gesture: Sets GPIO2 LOW → Relay opens → Appliance OFF
RIGHT gesture: Sets GPIO2 HIGH → Relay closes → Appliance ON

E. Relay Control

void setRelay(bool state) {
digitalWrite(RELAY_PIN, state ? HIGH : LOW);
relayState = state;
Serial.print("Relay: ");
Serial.println(state ? "ON" : "OFF");
}

3. MEMENTO CODE (CircuitPython)
Hardware: Adafruit Memento Camera

File: code.py

Purpose:

Receives SHAKE commands via MQTT, captures images, uses OpenAI GPT-4 Vision to identify devices, and publishes results back to the hub.

Key Code Sections:

A. Environment Configuration

WIFI_SSID = os.getenv("CIRCUITPY_WIFI_SSID")

WIFI_PASSWORD = os.getenv("CIRCUITPY_WIFI_PASSWORD")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Why environment variables?

Security: API keys and WiFi passwords not hardcoded in source
Portability: Same code works across devices with different credentials
Best practice: Follows CircuitPython convention using settings.toml file

B. AI Prompt Engineering

PROMPT = """Look at this image and identify if there is any LIGHT SOURCE visible.

This includes: LED bulbs, LED strips, ceiling lights, table lamps, floor lamps, tube lights, or any illuminated light fixture.

Reply with ONLY ONE WORD:

- "light" if you see any light source

- "none" if no light source is visible

Reply with just one word, nothing else."""

Why this specific prompt?

Explicit constraints: "ONLY ONE WORD" forces structured output
Comprehensive examples: Lists many light types to improve accuracy
Binary response: "light" or "none" makes parsing simple and reliable
Tested extensively: Achieved ~95% accuracy in detection

Prompt engineering lessons:

Initial prompt: "Is there a light?" → GPT responded with full sentences
Second attempt: "Reply yes or no" → Sometimes added explanations
Final version: Forcing one-word response eliminated parsing errors

C. Image Capture and Encoding

def capture_and_identify():
 # Capture image
 pycam.capture_jpeg()
 time.sleep(0.5)
 
 # Find most recent image
 files = [f for f in os.listdir("/sd") if f.endswith(".jpg")]
 files.sort()
 image_path = "/sd/" + files[-1]
 
 # Encode to base64
 base64_image = encode_image(image_path)

Why base64 encoding?

OpenAI API requires images as base64-encoded strings in JSON payload
Alternative (upload to cloud storage) adds latency and complexity
Trade-off: Base64 increases payload size by ~33%, but simplifies architecture

D. OpenAI API Integration

payload = {
 "model": "gpt-4o-mini",
 "messages": [{
   "role": "user",
   "content": [
     {"type": "text", "text": PROMPT},
     {
       "type": "image_url",
       "image_url": {
         "url": f"data:image/jpeg;base64,{base64_image}",
         "detail": "low"
       }
     }
   ]
 }],
 "max_tokens": 20,
 "temperature": 0.2
}
r = requests.post(
 "https://api.openai.com/v1/chat/completions",
 headers=headers,
 json=payload,
 timeout=30
)

Key parameters:

gpt-4o-mini: Fastest, cheapest vision model (~$0.00015 per image)
detail: "low": 512px resolution sufficient for light detection, reduces cost
max_tokens: 20: Limits response length (we only need 1 word)
temperature: 0.2: Low randomness for consistent, deterministic answers
timeout: 30: Prevents indefinite hanging on network issues

E. MQTT Communication

mqtt_client = MQTT.MQTT(
 broker="io.adafruit.com",
 port=1883,
 username=AIO_USERNAME,
 password=AIO_KEY,
 socket_pool=pool,
 socket_timeout=1,
 keep_alive=60,
)
mqtt_client.on_connect = connected
mqtt_client.on_disconnect = disconnected
mqtt_client.on_message = message_received

Callback pattern:

on_connect: Automatically subscribes to gesture feed when connected
on_message: Triggers capture_and_identify() when "SHAKE" received
Async design: Main loop continues running camera preview while waiting for messages

Total latency breakdown:

Gesture detection: <50ms
ESP-NOW transmission: ~10ms
MQTT publish: ~100ms
Image capture: ~500ms
OpenAI API: ~3000ms
MQTT receive: ~100ms
Relay activation: <10ms

Total: ~4-5 seconds (AI processing dominates)

Complete Workflow:

The flowchart illustrates the complete execution sequence of KinoVision, from gesture detection on the wristband to AI-based device identification and final appliance control via the ESP32 hub. It highlights how ESP-NOW and MQTT are used together to coordinate intent detection, vision confirmation, and relay actuation in a structured pipeline.

Video

Start a Discussion on:

Discord

Forum

Add New Comment

Comment *

DigiKey featured products logo

	PolarFire® Core FPGAs and SoC FPGAs Experience low power, high security, and reliable performance with PolarFire® Core & SoC FPGAs.
	AIROC™ CYW55913/2/1 Connected MCU Redefining Connectivity: AIROC™ CYW55913 modules for IoT, Smart Home, Industrial
	DCM3717 High-Density 48 V DC/DC Converter Modules Innovate with 48V architectures and eliminate the risk of reengineering 12V systems
	0900AT47A0063001E Ceramic SMD Chip Antenna Compact 868/902–928 MHz SMD antenna for IoT, LoRaWAN, Zigbee®, sensors, and asset tracking
	DG11/DG12 Series IEC Inlet with Circuit Breaker SCHURTER's next-generation power entry module with IEC inlet and IP67-rated circuit breaker
	RNWF02 Plug-and-Play Wi-Fi® Modules Experience reliable, high-performance wireless connectivity with RNWF02 today.
	TMF882x Series Optical Distance Sensors TMF882x sensors are designed for high-performance proximity and distance measurement
	SMD Trimmer Potentiometers Nidec Components’ SMD J-hook or gull wing trimmers are 1 to 14 turn with side or top adjustment