The Embedded Machine Learning Revolution: The Basics You Need To Know

This article was originally published on Wevolver.

In recent years, data has frequently been characterized as the oil and gold of the 21^st century. As consumers, enterprises, and other organizations generate data in an unprecedented pace, businesses are seeking effective ways to analyze data and generate insights that could help them improve their business processes and optimize their decision making. In this context, they are increasingly developing and deploying Artificial Intelligence (AI) systems and technologies, which help them automate business processes and take more educated business decisions. Machine learning is probably the most prominent and widely used AI technology: It leverages large amounts of historic data to build machines that learn and improve themselves by experience, much in the same way humans improve themselves through learning and observation. During the last decade, most of the interest in Machine Learning (ML) has revolved around Deep Learning (DL) ^[1], a specific segment of ML that focuses on Deep Neural Networks (DNNs). DNNs are neural networks with multiple layers between their input and output, which operate like the human mind. The momentum of DNNs is largely due their ability to improve their performance in-line with the amount of data used for their training i.e., they can perform very well when large amounts of training data are available.

AI and DL applications are extremely data intensive. This is the reason why they are associated in the minds of domain experts and industry practitioners with the use of significant amounts of computing resources such as Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU) processors. This is largely because non-trivial deep learning applications comprise complex linear algebraic computations, such as matrix and vector operations. GPUs and TPUs are very efficient and fast in performing such operations, and hence very suitable for executing deep learning algorithms.

Modern cloud computing infrastructures provide on-demand access to large amounts of computing resources and are therefore the primary choice for running sophisticated machine learning and deep learning applications, like autonomous driving and large-scale fraud detection. In several cases AI/ML applications are also hosted on private clouds such as on-premise data centres. Nevertheless, executing AI in the cloud is not always the best option, especially for applications that require low-latency and must be run close to the end-users. Specifically, Cloud AI requires the transfer of large amounts of data from the place where they are produced to cloud data centres, which consumes significant network bandwidth and incurs high latency. Furthermore, such data transfers can be susceptible to privacy leaks, especially in applications that manage sensitive data. Moreover, the use of GPUs and TPUs in the cloud is associated with a significant carbon footprint, which raises environmental performance concerns.

The benefits of running AI closer to the users are everyday perceived by millions of consumers that execute deep learning applications in their smart phones, including popular applications like Siri, OK Google, and Apple FaceID. These applications take advantage of ML/DL models that are (pre)trained in the cloud, yet they also manifest the merit of running applications closer to the user. Furthermore, they illustrate the possibility of training models in the cloud and executing them in devices with less powerful computing capabilities.

Rise and Benefits of Embedded Machine Learning

Like in the case of smart phones, it is also possible to execute machine learning models on all sorts of embedded devices, ranging from networked and mobile embedded systems to small scale microcontrollers. Executing machine learning models on embedded devices is commonly known as Embedded Machine Learning ^[1]. The latter operates based on the following general principle: ML models like neural networks are trained on computing clusters or on the cloud, while inference operations and the execution of the models take place on the embedded devices. Contrary to popular belief, it turns out that once a model is trained, the matrix operations of deep learning models can be effectively executed on CPU (Central Processing Unit) constrained devices or even tiny (e.g., 16 or 32 bit) microcontrollers. The flavour of embedded machine learning that executes ML models on very small pieces of hardware like ultra-low power microcontrollers is called TinyML ^[3].

Embedded machine learning unlocks the potential of processing data within the hundreds of billions of ubiquitous microprocessors and embedded controllers, which are available in a variety of settings such as industrial plants, manufacturing shopfloors, smart buildings, and residential environments. In this way, it also facilitates the processing of the data that are produced by embedded devices (e.g., Internet of Things devices), most of which are currently unexploited.

The execution of machine learning models on embedded devices comes with several benefits over conventional cloud-based AI:

Low Latency: Embedded machine learning is much more efficient than cloud AI in cases where execution of low-latency operations close to the field is required. This is because there is no need to transfer large amounts data to the cloud, which can incur considerable network latency. As such embedded machine learning is an excellent choice when it comes to supporting real-time use cases like field actuation and control in industrial environments.
Reduced Power Consumption: Many embedded systems such as microcontrollers are power-efficient and can operate for a long time without being charged. TinyML can therefore boost the power autonomy of machine learning applications, when compared to machine learning that is executed on smart phones and other mobile computing systems.
Improved Environmental Performance: Cloud AI results in too many CO2 emissions and has very poor environmental performance. On the contrary, machine learning on embedded devices has a much lower carbon footprint, which yields much better sustainability.
Network Bandwidth Efficiency: The execution of machine learning models on embedded devices, enables the extraction features and insights at the source of the data. This obviates the need for transferring raw data to edge or cloud servers, which economizes on bandwidth and network resources.
Strong Privacy: Embedded machine learning obviates the need for transferring and storing data on cloud servers. This reduces the risks of data breaches and privacy leaks, which is particularly important for applications that process sensitive data like citizens’ personal data, Intellectual Property (IP) data, and business secrets.

Based on these benefits embedded machine learning can sometimes be a much better choice than conventional cloud-based AI, especially in use cases that require real-time, low-latency and low overhead interactions ^[4]. For instance, enterprise maintenance and intelligent asset management in various industrial sectors can greatly benefit from embedded machine learning. The latter can enable the extraction of real-time insights on potential machine failures and production defects, through processing data inside the asset (e.g., within the machine). As another example, precision agriculture applications can benefit from instant discovery of crop problems, based on the direct processing of sensor data from field sensors (e.g., images and temperature data). The latter processing is usually much faster than cloud data processing.

Nevertheless, embedded machine learning must not be seen as a replacement to cloud-based AI. In most cases, embedded machine learning comes to complement cloud-based applications with value-added functionalities such as real-time control operations and privacy management for sensitive datasets.

The Embedded Machine Learning Ecosystem

Embedded machine learning applications run on different types of embedded devices, based on tools and techniques that enable the development and deployment of ML models on resource constrained nodes. Hence, the embedded machine learning ecosystem comprises device vendors, notably the OEMs (Original Equipment Manufacturers) where ML models are deployed and executed. Moreover, it also extends the global machine learning ecosystem with tools and techniques for developing, deploying, and operating ML applications on embedded devices, including Internet-of-Things (IoT) devices ^[5]. In the latter case, the ML applications are conveniently called AIoT (Artificial Intelligence on IoT) devices.

There is already a rich set of embedded devices that can run machine learning and deep learning applications. Many devices are low-cost and can be flexibly deployed in different IoT applications, while some of them are also excellent for learning and educational purposes as well. As a prominent example, IoT developers familiar with the Arduino ecosystem can nowadays take advantage of the Arduino Nano 33 BLE Sense board based on a Nordic Semiconductor SoC for the development of TinyML applications. The board comes with several embedded sensors, including a humidity sensor, a temperature sensor, a barometric sensor, a microphone, as well as gesture, proximity, light colour and light intensity sensors. As such it is versatile and suitable for a wide range of applications. As another example, SparkFun’s Edge Development Board supports deep learning applications like speech transcription and gesture recognition. The board is based on Ambiq Micro’s Apollo3 Blue microcontroller, which runs TensorFlow Lite, one of the most popular environments for deep learning applications on embedded devices. Recently, the Thunderboard Sense 2 platform, a compact development platform for IoT product development has been also enhanced with embedded machine learning capabilities. Thanks to a partnership between SiLabs and Edge Impulse, it is possible to support the development of machine learning applications over various Microcontrollers (MCUs), notably MCUs supporting EFR32/EFM32 connectivity. In this way, it is possible to develop a wide range of enterprise functionalities for various application areas, such as monitoring machines and analyzing audible events.

The development of machine learning models is based on popular data science libraries and tools, such as Python Machine Learning libraries like Scikit-Learn and Keras over TensorFlow, as well as relevant tools like Jupyter Notebooks designed for data scientists and ML researchers. Nevertheless, the embedded machine learning ecosystem comprises also libraries that are specially designed to support inference on devices with limited computing capacity. This is for example the case with TensorFlow Lite, which supports on-device inference. Furthermore, there are also TinyML oriented libraries, which enable the deployment of models on devices with very few kilobytes of memory like microcontrollers. For instance, the core of TensorFlow Lite Micro can fit in 16 KB on an Arm Cortex-M3 and can run various machine learning and deep learning models.

Overall, there is already an ecosystem of hardware and software assets, which enable the development, deployment and operation of embedded machine learning applications. This ecosystem is growing, as more developers and integrators are riding the wave of embedded machine learning applications.

Seizing the Embedded Machine Learning Opportunity

Embedded machine learning provides a wealth of innovation opportunities for enterprises that wish to make the best possible use of their data. It empowers enterprises to exploit unused datasets, while helping them in optimizing the bandwidth, storage and latency of their machine learning applications. In this context, it also boosts the development of value-added applications like for example real-time condition-based monitoring in industrial maintenance.

However, there are barriers and challenges to taking advantage of these exciting opportunities. Enterprises need to cope with the proclaimed talent gap in machine learning and embedded systems. Embedded machine learning requires developers and engineers with knowledge, skills, and talent from both worlds. Nevertheless, developers with such skills are scarce. Moreover, embedded machine learning has some peculiarities that differentiate it from traditional machine learning. These peculiarities are not only due to the use of specialized tools, but also because the process of collecting and pre-processing data from embedded devices is typically more cumbersome than collecting and using data from other IT systems and databases ^[6]. Furthermore, to cope with machine learning in a variety of embedded platforms, developers and deployers must be familiar with a variety of different systems and tools, which is yet another challenge. Other important challenges include the need to cope with energy efficiency and resource usage issues, which stem from the use of resource constrained devices ^[7].

To alleviate these challenges, Edge Impulse is leading the effort of creating a vibrant embedded machine learning ecosystem for developers and deployers around the globe. This ecosystem is built around the company’s leading development platform, which is already used by many thousands of skilled developers all around the world. It is also increasingly trusted by enterprises in different sectors, which use the platform for developing embedded machine learning use cases with a tangible Return-on-Investment (ROI). The next article of our series presents some of the most prominent ROI generating use cases of embedded machine learning, including best practice use cases that are already deployed by enterprises of the Edge Impulse ecosystem.

References

1. A. Shrestha and A. Mahmood, "Review of Deep Learning Algorithms and Architectures," in IEEE Access, vol. 7, pp. 53040-53065, 2019, doi: 10.1109/ACCESS.2019.2912200.

2. L. Cerina, M. D. Santambrogio, G. Franco, C. Gallicchio and A. Micheli, "Efficient Embedded Machine Learning applications using Echo State Networks," 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 1299-1302, doi: 10.23919/DATE48585.2020.9116334.

3. Roberto Saracco, “TinyML: a glimpse into a future of Massive Distributed AI”, IEEE Future Directions, January 25^th, 2021, available at: https://cmte.ieee.org/futuredirections/2021/01/25/tinyml-a-glimpse-into-a-future-of-massive-distributed-ai/

4. A. Ibrahim and M. Valle, "Real-Time Embedded Machine Learning for Tensorial Tactile Data Processing," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 11, pp. 3897-3906, Nov. 2018, doi: 10.1109/TCSI.2018.2852260.

5. V. M. Suresh, R. Sidhu, P. Karkare, A. Patil, Z. Lei and A. Basu, "Powering the IoT through embedded machine learning and LoRa," 2018 IEEE 4th World Forum on Internet of Things (WF-IoT), Singapore, 2018, pp. 349-354, doi: 10.1109/WF-IoT.2018.8355177.

6. J. Lee, M. Stanley, A. Spanias and C. Tepedelenlioglu, "Integrating machine learning in embedded sensor systems for Internet-of-Things applications," 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Limassol, 2016, pp. 290-294, doi: 10.1109/ISSPIT.2016.7886051.

7. L. Andrade, A. Prost-Boucle and F. Pétrot, "Overview of the state of the art in embedded machine learning," 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, 2018, pp. 1033-1038, doi: 10.23919/DATE.2018.8342164.

Capabilities

Built for

Industries

Applications

Technical resources