Voice-Controlled Multimodal AI Solution With Advanced Vision Technology

New offering enables low-power voice-controlled operation of embedded vision AI systems in IoT and edge applications

Renesas Electronics Corporation in collaboration with Syntiant Corp. has developed a voice-controlled multimodal AI solution that enables low-power contactless operation for image processing in vision AI-based IoT and edge systems, such as self-checkout machines, security cameras, video conference systems and smart appliances such as robotic cleaning devices.

The new solution combines Renesas RZ/V Series vision AI microprocessor unit (MPU) and Syntiant NDP120 Neural Decision Processor for delivering advanced voice and image processing capabilities. The RZ/V Series MPU incorporates a proprietary DRP-AI (Dynamically Reconfigurable Processor-AI) accelerator and combines high-precision AI inference having an excellent power efficiency, eliminating the need for heat dispersion measures such as heat sinks or cooling fans, which further reduces the bill of materials (BOM) cost and makes it possible to integrate vision AI into a wide range of embedded applications.

The Syntiant NDP120 chip incorporates sophisticated AI capabilities that can implement many high-precision, hands-free voice functions, including speaker recognition, keyword detection, multiple wake words and local command recognition. Packaged with the Syntiant Core 2 neural network inference engine, the NDP120 can also run multiple applications simultaneously while minimising power consumption to 1mW battery power.

The joint solution features always-on functionality with quick voice-triggered activation from standby mode to perform vision-based tasks including object recognition and facial recognition that are required in security cameras and other systems. So, while user-defined voice cues drive activation and system operation, vision AI recognition tracks operator behaviour and controls operation or issues a warning when suspicious actions are detected.

The multimodal architecture eases the creation of contactless user experiences for vision AI-based systems. A dedicated, power-efficient chip for voice recognition reduces standby power consumption while speeding up system development because it is possible to develop software independently of the vision AI functionality.

“We anticipate that demand for multimodal systems that use multiple streams of input information – both image and voice – will increase moving forward as a way to improve both ease of use and safety,” said Hiroto Nitta, Senior Vice President and Head of SoC Business in the IoT and Infrastructure Business Unit at Renesas. “Through the collaboration between Renesas and Syntiant, we will accelerate the adoption of low-power, ultra-small smart voice AI technology in embedded systems and deliver new combined solutions to customers globally.”

“Voice-based user interfaces will make it possible for customers to deliver new user experiences that bring the next generation of innovative ideas from concept to reality, said Syntiant CEO Kurt Busch.

The reference design for the new multimodal AI solution is available now, including circuit diagrams and BOM lists.

Voice-Controlled Multimodal AI Solution With Advanced Vision Technology

SHARE YOUR THOUGHTS & COMMENTS Cancel reply

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics