AI on the Edge LESSON 16: Control Pan/Tilt Camera Position Using Voice Commands

In AI on the Edge Lesson 16, we take a big step forward by combining voice recognition with physical motion. In this project, you will build a voice-controlled pan/tilt camera system. Using simple spoken commands such as “right,” “left,” “up,” “down,” and “quit,” you can move the Raspberry Pi camera in real time. This lesson brings together the Fusion HAT+ servo control, the Speech-to-Text (STT) capabilities we explored earlier, live video streaming with picamera2 and OpenCV, and multithreading to keep everything running smoothly.

The hardware setup is straightforward. We connect two servos to the Fusion HAT+ — one for pan (horizontal movement) on pin 2 and one for tilt (vertical movement) on pin 3. The Raspberry Pi Camera is mounted on a pan/tilt mechanism so it can physically follow your voice commands. We start the camera at a neutral position (pan = 0°, tilt = -20°) and define step sizes so the movement feels responsive but controlled.

The Python code uses two main threads: one for continuous voice listening and another for displaying the live video feed. In the listening thread, we create an STT object and continuously wait for voice input. When a command is recognized, we adjust the pan or tilt angle accordingly and immediately send the new position to the appropriate servo. The main loop captures frames from the Pi Camera, flips them for correct orientation, displays them in an OpenCV window, and checks for the ‘q’ key to exit gracefully.

This project demonstrates several important concepts working together: real-time voice command processing, servo motor control, camera streaming with picamera2 at 1280×720 resolution and 60 fps, and proper use of threading so that listening and video display do not block each other. You will also notice how we use global variables carefully to share the current pan and tilt positions between the threads.

By the end of this lesson, you will have a working voice-controlled camera that you can point anywhere you want just by talking to it. This is an excellent foundation for more advanced projects such as voice-controlled object tracking, security cameras, or interactive AI assistants that can both see and move.The complete code is provided below, along with explanations of the key sections. Feel free to experiment with different step sizes (xDelta and yDelta), starting angles, or even add new voice commands once you are comfortable with the basic version.

This is the code developed in the video lesson:

from fusion_hat.servo import Servo
from fusion_hat.stt import STT
import threading
import cv2
from picamera2 import Picamera2
piCam = Picamera2()

panPin = 2
tiltPin = 3

panServo = Servo(panPin)
tiltServo = Servo(tiltPin)

x = 0
y =-20

xDelta = 10
yDelta = 5

panServo.angle(x)
tiltServo.angle(y)

W=1280
H=720
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

running = True
    
def listenThread():
    global running, x, y
    stt = STT(language = "en-us")
    while running:
        print("Listening . . .")
        result = stt.listen(stream = False)
        command = result.strip()
        print("Command was: ", command)
        if command == "right":
            x = x + xDelta
            panServo.angle(x)
        if command == "left":
            x= x - xDelta
            panServo.angle(x)
        if command == "up":
            y = y - yDelta
            tiltServo.angle(y)
        if command == "down":
            y = y + yDelta
            tiltServo.angle(y)
        if command == "quit":
            running = False
            break
    print("Thread Terminated")
myThread = threading.Thread(target = listenThread, daemon = True)
myThread.start()
while running:
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    cv2.imshow("Camera", frame)
    cv2.moveWindow("Camera",0,60)
    if cv2.waitKey(1)==ord('q'):
        running = False
        break
cv2.destroyAllWindows()
piCam.stop()
print('Program Terminated')

from fusion_hat.servo import Servo

from fusion_hat.stt import STT

import threading

import cv2

from picamera2 import Picamera2

piCam = Picamera2()

panPin = 2

tiltPin = 3

panServo = Servo(panPin)

tiltServo = Servo(tiltPin)

x = 0

y =-20

xDelta = 10

yDelta = 5

panServo.angle(x)

tiltServo.angle(y)

W=1280

H=720

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

running = True

def listenThread():

global running, x, y

stt = STT(language = "en-us")

while running:

print("Listening . . .")

result = stt.listen(stream = False)

command = result.strip()

print("Command was: ", command)

if command == "right":

x = x + xDelta

panServo.angle(x)

if command == "left":

x= x - xDelta

panServo.angle(x)

if command == "up":

y = y - yDelta

tiltServo.angle(y)

if command == "down":

y = y + yDelta

tiltServo.angle(y)

if command == "quit":

running = False

break

print("Thread Terminated")

myThread = threading.Thread(target = listenThread, daemon = True)

myThread.start()

while running:

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

cv2.imshow("Camera", frame)

cv2.moveWindow("Camera",0,60)

if cv2.waitKey(1)==ord('q'):

running = False

break

cv2.destroyAllWindows()

piCam.stop()

print('Program Terminated')

Technology Tutorials

AI on the Edge LESSON 16: Control Pan/Tilt Camera Position Using Voice Commands

Making The World a Better Place One High Tech Project at a Time. Enjoy!