Blog post

Introducing Synthetic Data Generation in Edge Impulse

edge ai
By Edge Impulse Team
Introducing Synthetic Data Generation in Edge Impulse

We are proud to announce that synthetic data generation is now available inside the Edge Impulse platform, enabling a new and efficient way to work with LLM-generated images, audio, and voice to enhance your edge AI models.

Quickly access and generate synthetic data in Edge Impulse

Training effective machine learning models requires a lot of accurate data. However, collecting and curating this data to reflect real-world scenarios is often challenging and expensive. The time, effort, and costs involved can be substantial, posing significant barriers. One option is using synthetic data, which has advanced to the point that all types of it are now viable as a part of datasets used for AI training. 

As of today, we have integrated with three different GenAI solutions, Dall-E to generate images, Whisper for creating human speech elements, and ElevenLabs to generate audio sound effects. 

Integrated data generation from multiple applications

And new foundation models and services are getting released every week! In order to make these integrations scalable, every user belonging to an Edge Impulse organization can also create their own synthetic data generation integrations. See our documentation for more details.

The direct integration of LLM-based data generation in Edge Impulse is available now for Enterprise Plan users and Professional Plan users; you can access this new feature directly in your Edge Impulse projects under the “data acquisition” section, alongside Dataset, Data explorer, and Data sources options.

The Synthetic Data menu lists the GenAI transformation blocks, both public and private, that have synthetic data capabilities.  

Within the new Synthetic Data section, the user can add and refine their prompts quickly and efficiently. The output, including images, are then displayed, allowing users to quickly evaluate and refine their prompts until they get the desired data set, and easily delete unwanted or incorrect data samples.

This iterative workflow will make it much clearer to determine the right prompts for generating data. Additionally, any data that is not deleted will automatically be added to the project, ensuring seamless data management as you continue to build and refine your edge AI model. 

Sample Workflows

To generate synthetic data, navigate to your project in Edge Impulse. Click on “Data acquisition” in the left hand menu and Select the "Synthetic Data" tab in the top nav.

Here are the steps for the currently supported Gen-AI functions:

Synthetic Images

New samples are automatically saved on your dataset

The Dall-E image generator functionality

Generate Human Speech

Generate Audio

See the new Synthetic Data integration in use with the new ElevenLabs integration video

Read the docs page for more information on all three options.

This new capability empowers users to streamline the process of generating and refining LLM-based prompts to create the desired data set. It will provide an efficient workflow for building models using synthetic data and make it easier for developers to push custom Gen-AI transformation blocks for their optimal AI model generation. 

Stay tuned for more updates on our work with generative AI for edge AI applications.



Are you interested in bringing machine learning intelligence to your devices? We're happy to help.

Subscribe to our newsletter