On-Device AI Just Got Faster: How LiteRT & NPU Deliver

On-Device AI Just Got Faster: How LiteRT & NPU Deliver

In today’s fast-paced digital world, users expect instant, intelligent features right from their mobile apps. Think real-time video effects, seamless speech recognition, or captivating motion capture. While these AI capabilities offer incredible user experiences, they present unique challenges for developers trying to run sophisticated models directly on a device.

Developers constantly grapple with managing device thermals, preserving precious battery life, and preventing frustrating frame drops. To overcome these hurdles and deliver truly fast, responsive AI without compromising performance, a specialized solution is needed. Enter LiteRT, a powerful framework designed to unlock the full potential of Neural Processing Units (NPUs) – the hardware specifically built for these demanding AI workloads.

LiteRT: Unlocking On-Device AI Performance

LiteRT is a robust, cross-platform framework that brings production-ready AI acceleration to the forefront. It intelligently leverages CPU, GPU, and crucial NPU power across a wide array of devices, from mobile phones and desktops to industrial IoT platforms. Designed with performance and scalability in mind, LiteRT simplifies the deployment of high-speed AI features through a unified API.

This innovative approach abstracts away the complexities of integrating with multiple NPU SDKs. Developers can now target diverse silicon architectures without the need to write vendor-specific code, dramatically streamlining their workflow and accelerating development. LiteRT is already proven in the toughest environments, having been hardened across various Google products, popular apps, and even other SDKs.

Real-World Impact: LiteRT in Action

LiteRT is trusted by industry leaders like Google Meet, Epic Games, and Argmax Inc., showcasing the transformative power of NPU acceleration in real-world production applications. Their successes demonstrate how LiteRT is pushing the boundaries of what’s possible with on-device AI.

  • Google Meet: By harnessing mobile NPUs, Google Meet successfully deployed an Ultra-HD segmentation model that was 25x larger than previous versions. Crucially, this was achieved without sacrificing inference speed or increasing power consumption, ensuring consistent, high-quality background replacement even during typical 20-30 minute sessions.
  • Epic Games, Inc. – Live Link Face: High-fidelity, real-time animation experiences demand exceptional efficiency. Epic’s Live Link Face (Beta) app for Android allows creators to capture performances from a single camera, then generate and stream real-time MetaHuman facial animation directly from their devices into Unreal Engine. Real-time facial solving is incredibly computationally intensive, but by using LiteRT on the NPU, Epic achieved up to 30 FPS performance for this demanding task on supported Android devices.
  • Argmax Inc. – Argmax Pro SDK: Argmax recently launched its Pro SDK for Android, offering on-device speech recognition in collaboration with LiteRT. Leveraging LiteRT and AI Pack feature delivery via Google Play, Argmax delivers top-tier accuracy and real-time speed while respecting app size constraints. They utilized LiteRT’s Ahead-Of-Time (AOT) compilation to eliminate costly on-device compilation steps, enabling frontier speech models like NVIDIA Parakeet TDT 0.6B v2 to run with industry-leading latency.

Performance testing on Google Tensor, MediaTek, and Qualcomm Technologies SoCs revealed that upgrading from GPU to NPU delivered over 2x speedup for the Argmax Pro SDK. Beyond speed, the power efficiency of NPUs enabled Argmax SDK Enterprise customers, such as Heidi Health, to conduct reliable on-device live transcription for extended sessions without draining battery life. Furthermore, by offloading runtime libraries and models to on-demand downloads via Play’s AI Packs, devices dynamically obtain the most optimized model for their specific NPU.

Empowering Developers and Expanding Horizons

To help developers test and validate the true potential of NPU acceleration, the Google AI Edge Gallery App now features NPU support for select Gemma models and includes built-in benchmarking tools. Available on Android, the AI Edge Gallery allows you to quickly see on-device AI performance in action, and developers can access its GitHub repository to build their own unique experiences.

While the performance gains in speech, animation, and video are undeniable, the path to NPU utilization has historically been complex due to various vendor-specific SDKs. LiteRT addresses this by providing a streamlined workflow and comprehensive cross-platform support. This empowers developers to deploy advanced AI models across mobile phones, industrial IoT, and even the new wave of AI PCs, all without sacrificing performance or portability.

As highlighted in the recent Google AI Edge Gemma 4 blog post, LiteRT extends NPU acceleration beyond mobile, allowing deployment across a vast range of hardware using a single framework. For the industrial edge, LiteRT supports robust platforms like the Qualcomm Dragonwing™ IQ8 Series, which powers the Arduino VENTUNO Q, enabling high-reliability use cases such as robotics and smart manufacturing with models like Gemma 4. For desktop environments, LiteRT is preparing for the era of AI PCs through OpenVINO™ integration with Intel® Core™ Ultra series 2 and 3 processors, promising significant power savings and responsiveness for local GenAI workloads.

The Google AI Edge Portal provides an invaluable benchmark service, offering insights on ML workloads across more than 100 popular mobile phones, accelerators, and configurations. This allows developers to make data-driven deployment decisions, such as whether to use AOT or Just-In-Time (JIT) compilation, best suited for their specific use cases and target devices. To access the latest Portal NPU features, sign up for the private preview.

With its production-ready NPU integrations, LiteRT delivers a unified workflow that abstracts away low-level complexities across both JIT and AOT deployment. We encourage you to dive into our comprehensive documentation and begin your journey with NPU acceleration today. Let us know your feedback and feature requests by opening an issue on our GitHub channel; we can’t wait to see what incredible innovations you’ll build!

Source: Google Developers Blog

Kristine Vior

Kristine Vior

With a deep passion for the intersection of technology and digital media, Kristine leads the editorial vision of HubNextera News. Her expertise lies in deciphering technical roadmaps and translating them into comprehensive news reports for a global audience. Every article is reviewed by Kristine to ensure it meets our standards for original perspective and technical depth.

More Posts - Website

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top