← Aditya Chilka
Field Notes · Edge AI

Edge AI on microcontrollers: running TinyML in production

Run AI on the device — not the cloud — when you need low latency, offline operation, privacy, or long battery life. Modern microcontrollers like the ESP32, STM32, and nRF52 can run small, quantized neural networks for keyword spotting, anomaly detection, and simple vision within a few hundred kilobytes of RAM. The hard part is rarely the model; it is fitting it to the constraints and proving it holds up in the field.

"Edge AI" sounds exotic, but in embedded products it usually means something modest and useful: a device that makes a small decision locally instead of streaming raw data somewhere to be decided for it. A sensor that flags an anomaly. A wearable that recognises a gesture. A microphone that wakes on a keyword. These do not need a data centre — they need a few kilobytes of well-chosen model running where the data is born.

Decide where the intelligence lives

The first question is not "which model" but "where should inference happen". On-device wins when:

The cloud still wins when the model is large, must be retrained often, or needs to reason across many devices at once. Most real products are a split: a small model on the edge that filters and flags, and heavier analysis in the cloud on the few events that matter.

What actually fits on a microcontroller

The constraint is memory and power, not cleverness. A microcontroller has kilobytes to a few megabytes of RAM, not gigabytes. That rules out large models — but a surprising amount of useful work fits:

The techniques that make it fit

Where TinyML projects go wrong

The pragmatic path

Start by collecting real data from the actual device and environment, build the smallest model that clears your accuracy bar, quantize it, and prove the memory and power budget on hardware before committing the rest of the design. Edge AI rewards restraint: the goal is the smallest reliable decision made in the right place, not the largest model you can cram in.

Frequently asked questions

What is TinyML?

TinyML is machine learning that runs directly on microcontrollers and low-power embedded devices, typically within a few hundred kilobytes of RAM. It enables on-device inference — keyword spotting, anomaly detection, gesture classification — without sending raw data to the cloud.

When should AI run on the device instead of the cloud?

On-device when you need low latency, offline operation, privacy, or to cut power and bandwidth. The cloud when the model is large, changes often, or must reason across many devices. Most products combine both.

Can a microcontroller like the ESP32 run machine learning?

Yes — the ESP32, STM32, and nRF52 can run small quantized neural networks for audio keyword spotting, vibration and anomaly detection, and simple image classification, using int8 quantization to fit limited memory.