Exploring Physical AI: On-Device Vision for ROS-Based Embodied Robots

Over the past few years, we have witnessed some remarkable advances in AI. What was once the stuff of science fiction — machines that can compose symphonies, debug complex code, or chat like a real person — has become just another part of our daily routine. But as impressive as these tools are, it would seem that the best is yet to come. Many people in the field believe that progress in embodied AI will spark a total transformation in how we live and work, moving intelligence beyond our screens and into our physical spaces.

Operating in the real world, however, brings significantly more complexity to the table. Unlike a digital chatbot that only has to process text or images, a physical robot must grapple with the complexities of the laws of physics in real time. It has to figure out how to navigate a crowded room without bumping into people, how much pressure to apply when picking up a fragile egg versus a heavy box, and how to stay balanced on uneven ground.

All of these considerations are enough to make a trillion-parameter large language model seem simple by comparison. But there is no reason to be discouraged if you are ready to dip your toes into the world of embodied AI. Naveen Kumar has recently completed a project that demonstrates how accessible this technology can be with the right choice of tools. By following along at home, you can learn how to work smarter, not harder.

A Smarter Solution

Kumar’s project uses a computer vision algorithm to control a humanoid robot. He trained it to understand what a ball looks like, to chase after it, and to kick it like it’s playing soccer. This may sound very complicated, but by combining powerful and energy-efficient edge AI hardware with the Edge Impulse platform, it becomes a weekend project that any technically inclined individual could build.

The Particle Tachyon

At the heart of the system is a Particle Tachyon, a compact single-board computer that shares the familiar Raspberry Pi form factor while housing far more sophisticated internals. The Tachyon is powered by the Qualcomm Dragonwing QCM6490 System-on-Chip, which includes an octa-core Kryo CPU, an Adreno 643 GPU, and — most importantly for this project — a Hexagon 770 DSP with a dedicated AI accelerator capable of up to 12 TOPS. That level of on-device AI performance allows complex vision models to run locally, without sending images to the cloud and waiting for a response.

The robot itself is a HiWonder AiNex biped robot equipped with 24 serial bus servos and a USB camera mounted in its head. Out of the box, the robot ships with a Raspberry Pi 4, but one of the advantages of the Particle Tachyon is how straightforward it is to swap in as a replacement. After removing the robot’s back plate and controller stack, the Tachyon can be mounted using standard hex standoffs and connected directly to the existing robot controller hat via the 40-pin header. Power for both the board and the servos comes from an 11.1V, 3,500 mAh LiPo battery.

A HiWonder AiNex biped robot

Just Add Intelligence

Before the computer vision model could be trained, data needed to be collected. For that, Kumar turned to Edge Impulse. Using the camera mounted on the robot’s head, he collected 232 images of a ball under a variety of conditions. These samples included different distances, angles, lighting, and even motion blur caused by the robot moving its head. Capturing these less-than-ideal scenarios was a deliberate choice, designed to make the model more robust when deployed in the real world. Each image was then carefully labeled using Edge Impulse Studio’s built-in labeling tools.

A sample of the training data

With data in hand, the next step was to design the processing pipeline. In Edge Impulse, this is done by creating an “impulse,” which defines how raw sensor data is processed and fed into a machine learning block. Kumar selected an image preprocessing stage to normalize and resize the camera input to 320 × 320 pixels, followed by an object detection block. For the model architecture, he chose YOLO-Pro, a lightweight object detection model well suited to identifying objects at varying scales—an important consideration when a ball can appear large up close or tiny at a distance.

Training the model involved applying both spatial and color augmentations to help it generalize beyond the limited dataset. Once training began, Edge Impulse handled the heavy lifting, optimizing the model for accuracy while keeping computational requirements low enough for edge deployment. The trained model achieved a precision score of 98.7 percent on the training data, indicating that it was highly effective at identifying the ball when it appeared in an image captured by the robot.

This impulse detects sports balls

To validate those results, the model was then tested against a separate set of unseen images. In this phase, it performed even better, achieving 100 percent accuracy on the test dataset. While such results should always be interpreted with caution—especially with relatively small datasets—they provided strong evidence that the model was ready to move out of the lab and onto the robot.

Getting Physical

Because the model would be running on the Particle Tachyon’s Qualcomm AI accelerator, Kumar selected Edge Impulse’s Linux AARCH64 deployment option with Qualcomm QNN support. The model was quantized to INT8, a requirement for compatibility with the accelerator and a common technique for reducing model size and improving inference speed. Edge Impulse then compiled the model into a single binary that could be dropped directly onto the device.

The chosen deployment method

Once deployed, the object detection model runs inside a dedicated ROS 2 node. This node subscribes to the robot’s camera feed, performs inference locally on each frame, and publishes standardized detection messages containing bounding boxes and confidence scores. Other nodes in the system consume this information to decide how the robot should move—tracking the ball with its head, walking toward it using a PID-controlled gait, and finally triggering a pre-programmed kicking motion when the ball is in position.

The end result is a humanoid robot that can perceive its environment, make decisions, and act on them autonomously, all without relying on cloud services or external compute. It is a compact but powerful demonstration of what embodied AI looks like in practice. More importantly, it shows that this kind of system is no longer confined to research labs or billion-dollar companies. With the right combination of edge hardware, open-source software, and accessible machine learning tools, projects that once seemed impossibly complex are now well within reach of motivated makers and engineers.

For further details about this robot, be sure to take a look at Kumar’s project write-up.

Comments

Subscribe

Are you interested in bringing machine learning intelligence to your devices? We're happy to help.

Subscribe to our newsletter