Embedded Hardware for Processing AI at the Edge: GPU, VPU, FPGA, and ASIC Explained

IT systems are rapidly evolving in businesses and enterprises across the board, and a growing trend is moving computing power to the edge. Gartner predicts by 2025, edge computing will process 75% of data generated by all use cases, including those in factories, healthcare, and transportation. You can correlate edge computing adoption with the rise in artificial intelligence (AI), which is making factories smarter, improving patient outcomes better, increasing autonomous vehicle safety – as well as growing data volumes exponentially larger than ever before. Data from manufacturing equipment, sensors, machine vision systems, and warehouse management systems in a single smart factory could easily total 1 petabyte per day. 

When enterprises first deployed embedded systems, their system architects could not imagine the data volumes that AI, Internet of Things (IoT) and other advanced technologies would generate. Now that the landscape has changed, the embedded systems of ten years ago have to adapt to support today’s edge computing. 

Many AI workloads today are processed in the cloud. However, as more high-bandwidth data puts growing demands on these systems, processing the data at the edge makes sense in terms of latency, reliability, mobility, security, energy efficiency and data transmission costs.  

To meet the new demands of today, hardware must evolve from control and rules-based systems to data-centric environments to accommodate edge AI. 

A Guide to Processor Types for AI Workloads at the Edge 

Hardware requirements for processing AI workloads vary depending on the use case. AI can leverage a wide range of inputs, including videos, images, audio, sensors, and PLC data. The challenge that system architects face is choosing the best computing cores for their AI applications.  

This guide will help you understand the different types of processing cores that can be used in edge systems, and their strengths.  

  1. CPU 

The central processing unit (CPU) is a general-purpose processing unit with usually 4-16 cores. CPUs run complex tasks and facilitate system management. 

They work well with mixed data inputs, such as systems that use both audio and text, and extract, transform, and load (ETL) processes. 

  1. GPU 

Graphics processing units (GPUs) are highly parallel cores (100s or 1,000s) for high-speed graphics rendering. They deliver high-performance processing, and typically have a larger footprint and higher power consumption than CPUs. 

Because of the high number of small cores, GPUs are well suited for AI workloads, facilitating both neural network training training and AI inferencing. 

  1. FPGA 

Field-programmable gate array (FPGA), which are configurable logic gates, consume less power than CPUs and GPUs. They enable in-field reprogramming for engineers with expertise in programming. 

They can be the best choice when a high degree of flexibility is required. 

  1. ASIC 

Application-specific integrated circuits (ASICs) are custom logic designed using a manufacturer’s circuit libraries and offer the advantages of low power consumption, speed, and a small footprint. They are, however, time-consuming to design and more expensive than other options so ASICs are recommended for products that will run in very high volumes.   

Types of ASICs include: 

  • Vision processing units (VPUs), image and vision processors, and co-processors 
  • Tensor processing units (TPUs), such as the first TPU developed by Google for its machine learning framework, TensorFlow 
  • Neural compute units (NCUs), including those from ARM 

Each core type is suited for different types of calculations – and using them together in heterogeneous computing applications provides all of the functionality that complex use cases require. Used together, they can also balance workloads, boost different AI inferencing performance, and build the most cost-effective and efficient configuration. 

Figure 1. heterogeneous edge computing architecture options for AI applications

Steps to Selecting Embedded Hardware for Edge AI 

Selecting an embedded edge hardware system for processing AI at the edge typically requires evaluating three primary factors: 

Performance  

The core hardware system must be able to deliver the speed that complex, data-intensive edge AI applications demand while functioning consistently and reliably, even in harsh environments. 

SWaP  

SWaP is an acronym for size, weight and power. In addition to delivering the functionality the application requires, the edge hardware must also meet specs for size and weight to comply with the application’s physical constraints and make the most sense from a power-consumption standpoint. 

Cost 

The cost of edge hardware can vary – across types of cores and manufacturers. It’s necessary to determine which delivers the functionality and specs your project needs at the best price point. 

Figure 2. Comparing core types for edge AI hardware

Edge Hardware at Work  

The following examples are just a few of the many use cases in which edge hardware innovation enables AI to deliver value today. 

Deep Learning Acceleration Platforms 

Deep learning acceleration platforms (DLAPs) enable functionality, including data acquisition, image pre-processing, image analysis, and AI acceleration. They also give machines the ability to improve their own performance and make decisions. By replacing legacy edge devices that send data to the cloud for processing with DLAPs, operations can see faster responses as well as greater security and control. 

DLAPs can leverage heterogeneous design, for example, using CPUs to manage data acquisition and image pre-processing and GPUs to speed up parallel task processing while keep the edge system small and power-efficient. 

ADLINK’s DLAP Series for example, is designed for performance in harsh industrial or embedded environments, operating in extreme temperatures, high humidity, and in use cases where shocks and vibrations are common; we call this SWaP-optimized for edge AI. The compact units from ADLINK contain an NVIDIA® Quadro® embedded GPU or NVIDIA® Jetson™ supercomputer on a module and can be used for AI-based inferencing, machine vision, and autonomous machine control applications. 

Examples  

AI Self-Checkout: With leisurely lunches a thing of the past, people are looking for quick ways to purchase meals. Self-service adoption is growing among consumers, and AI can modernize customer experiences. At the checkout of a restaurant or store, machine vision can identify the items in a single scan and display the total due – in less than 1.5 seconds with ADLINK DLAP-211-JT2. Customers can complete their purchases quickly and conveniently.

Mobile X-Ray C-Arm: Medical equipment must be reliable, durable, and designed for high performance. Mobile equipment must also pack all necessary computing power into a compact design with low power consumption. The combination of CPU and embedded MXM GPU module, the world’s smallest industrial GPU-enabled system, meets those specifications. 

AI on Modules 

Mobile PCI Express Module (MXM) GPUs allow you to run AI in palm-sized graphics cards. They offer major SWaP advantages, including sizes only a fraction of full-length PEG cards. They provide a high performance per watt, and they are designed for extreme conditions, such as limited or no ventilation, small spaces, high or low temperatures, and dusty or even corrosive environments. 

Examples 

AI Facial Recognition: Access control is vital to prevent unauthorized entry to manufacturing plants, research facilities, data centers, and other highly controlled areas. An AI facial recognition gate eliminates the need for ID cards (that can be counterfeited) and replaces those legacy systems with technology that identifies authorized users, even if they’re wearing glasses or a cap, within a fraction of a second. It can even distinguish between a real human face and a photo. 

Airport Safety: Airport panoramic surveillance systems provide a 360-degree view of runways, air and ground traffic, and a higher degree of situational awareness than legacy monitoring systems. These AI technologies are supported by a rugged, fanless edge device designed for outdoor use, integrated with a high-performance GPU which supports improved surveillance both day and night.   

Edge Robotics 

Hardware innovation is also making robotics at the edge possible. Did you hear Eclipse Cyclone DDS has been selected as the default ROS 2 middleware with an upcoming May 2021 release? This is exciting because it’s the same Data Distribution Service (DDS) technology within our ROScube embedded robotics controllers, which help robots communicate with themselves and the world around them.  

Example: 

Autonomous Mobile Robots in Manufacturing: AMRs powered by edge AI can negotiate their environments and understand context using machine vision, 5G and DDS technologies. Fair Friend Group, one of the largest manufacturers in Taiwan is using 5G over DDS infrastructure to enable remote maintenance with augmented reality, machine vision for automated inspection, and AMR swarms for material handling in their factory today.

AI Machine Vision  

AI-based Machine Vision systems leverage integrated industrial cameras and edge computing systems with embedded high performing GPUs, VPUs and CPUs to perform tasks. Because all computing power is at the edge, latency and bandwidth concerns are eliminated. Also, with the right edge hardware, smart cameras can maintain a small footprint, weight, power, performance, and cost requirements for AI machine vision.  

Example: 

Warehouse Fulfillment: Evans Distribution Systems, a third-party logistics and supply chain company is using an AI machine vision system for quickly and accurately fulfilling orders of Girl Scout cookies for Girl Scouts of America, one of the most popular fundraisers for the well-known non-profit. According to a recent interview, the system is over 99.8% accurate, improving productivity and accurate customer shipments.

Explore Innovative Embedded Hardware for Edge AI Yourself 

ADLINK will showcase a range of hardware for edge AI at embedded world 2021 DIGITAL. This conference is your chance to connect with ADLINK engineers, partners and edge system experts who are ready to help you choose the optimal embedded hardware for your use case. Register for #ew21 for free here using promo code: ew21456845.

Author: Zane Tsai
Author: Zane Tsai

Director of Platform Product Center, Embedded Platforms & Modules, ADLINK Technology