Edge AI on microcontrollers: running TinyML in production
Run AI on the device — not the cloud — when you need low latency, offline operation, privacy, or long battery life. Modern microcontrollers like the ESP32, STM32, and nRF52 can run small, quantized neural networks for keyword spotting, anomaly detection, and simple vision within a few hundred kilobytes of RAM. The hard part is rarely the model; it is fitting it to the constraints and proving it holds up in the field.
"Edge AI" sounds exotic, but in embedded products it usually means something modest and useful: a device that makes a small decision locally instead of streaming raw data somewhere to be decided for it. A sensor that flags an anomaly. A wearable that recognises a gesture. A microphone that wakes on a keyword. These do not need a data centre — they need a few kilobytes of well-chosen model running where the data is born.
Decide where the intelligence lives
The first question is not "which model" but "where should inference happen". On-device wins when:
- Latency matters — a control loop cannot wait for a round trip to the cloud.
- Connectivity is unreliable — the product must keep working in a basement, a field, or a moving vehicle.
- Privacy matters — raw audio, video, or health data never has to leave the device.
- Power and bandwidth are scarce — sending only events, not raw streams, saves both.
The cloud still wins when the model is large, must be retrained often, or needs to reason across many devices at once. Most real products are a split: a small model on the edge that filters and flags, and heavier analysis in the cloud on the few events that matter.
What actually fits on a microcontroller
The constraint is memory and power, not cleverness. A microcontroller has kilobytes to a few megabytes of RAM, not gigabytes. That rules out large models — but a surprising amount of useful work fits:
- Audio keyword spotting — "wake word" and simple command recognition.
- Vibration and anomaly detection — predictive maintenance on motors and pumps.
- Activity and gesture classification — from accelerometer and IMU data.
- Simple image classification — small, low-resolution vision on the more capable parts.
The techniques that make it fit
- Quantization — converting weights from 32-bit floats to 8-bit integers shrinks the model roughly four-fold and runs faster on integer hardware, usually with little accuracy loss.
- Pruning — removing weights that contribute little, trading a small accuracy hit for size.
- Choosing small architectures — designing for the device from the start beats squeezing a large model down later.
- Good features — strong signal processing before the model often does more than a bigger network.
Where TinyML projects go wrong
- Testing on clean lab data. Field data is noisier; collect and train on the real environment.
- Ignoring memory until the end. Decide the RAM and flash budget first, then design the model to it.
- Forgetting updates. Plan how you will ship a better model later, over the air, once you learn from the field.
- Measuring accuracy, not consequences. A false negative on a safety alert costs far more than a false positive; tune the threshold to the use case.
The pragmatic path
Start by collecting real data from the actual device and environment, build the smallest model that clears your accuracy bar, quantize it, and prove the memory and power budget on hardware before committing the rest of the design. Edge AI rewards restraint: the goal is the smallest reliable decision made in the right place, not the largest model you can cram in.
Frequently asked questions
What is TinyML?
TinyML is machine learning that runs directly on microcontrollers and low-power embedded devices, typically within a few hundred kilobytes of RAM. It enables on-device inference — keyword spotting, anomaly detection, gesture classification — without sending raw data to the cloud.
When should AI run on the device instead of the cloud?
On-device when you need low latency, offline operation, privacy, or to cut power and bandwidth. The cloud when the model is large, changes often, or must reason across many devices. Most products combine both.
Can a microcontroller like the ESP32 run machine learning?
Yes — the ESP32, STM32, and nRF52 can run small quantized neural networks for audio keyword spotting, vibration and anomaly detection, and simple image classification, using int8 quantization to fit limited memory.