Artificial intelligence (AI) and machine learning (ML) require real-time parallel computations on huge amounts of data. These workloads exacerbate the memory bottleneck of classic all-purpose CPUs from both a latency and power perspective.
To overcome these challenges, many new players in the industry are turning to new technologies for the future of AI/ML computing. Lightelligence recently made waves in the industry when it announced a new AI/ML accelerator that leverages an optical network-on-chip (NoC).
Lightelligence says its new Hummingbird oNOC processor is the first of its kind designed for domain-specific AI workloads. Image courtesy of Lightelligence
In this piece, we’ll look at the challenges with traditional AI/ML multicore processors, the new processing architecture developed by Lightelligence, and the company’s newest ASIC: Hummingbird.
NoC and multicore challenges
AI/ML computation involves specific math functions, such as Multiplication and Accumulation (MAC) and convolutions, to process large amounts of data simultaneously. For this reason, standard AI/ML processing hardware tends to consist of multicore and heterogeneous systems.
An example of a heterogeneous computing architecture. Image courtesy of Routledge Handbooks Online
In a multicore system, a single piece of hardware will consist of many cores to process data in parallel (like a GPU). In a heterogeneous system, such as a SoC, a single chip will contain a large number of different computational blocks, including accelerators for generic CPU, GPU, and MAC functions. Here, different blocks on the SoC will handle different tasks to reduce power consumption and speed up the overall computation for an ML model.
Regardless of the architecture used, the one constant between multicore and heterogeneous systems is the need for data movement. Whether data moves between multiple processing cores or in and out of memory, high-speed computing applications tend to implement on-chip networking to accelerate data transfer between endpoints.
Different NoC architectures and configurations. Image courtesy of ResearchGate
However, due to the physical limitations of digital systems, these architectures have limited bandwidth. As a result, NoCs are also limited in the topologies they can achieve, preventing ASICs from achieving peak performance.
Lightelligence oNoC architecture
For Lightelligence, the key to enabling better performing AI/ML accelerators is to enable new NoC topologies that maximize speed and reduce power consumption. Since conventional electrical NoCs are not enough, the company instead turned to optical NoCs (oNoCs) as a solution.
Lightelligence’s computing architecture consists of three main components: an electronic chip (EIC), an interposer, and a photonic chip (PIC).
A cross-sectional view of the Lightelligence stacked architecture. Image courtesy of Lightelligence
The EIC is part of the system that implements the digital domain of the system, including the ALU, memory and analog interface. The interposer connects the EIC and the PIC to provide power to the domains. The PIC hosts the oNOC, which uses the optical network to interconnect the processing cores in an all-to-all transmission technique. This technique is said to allow all cores to access data simultaneously.
Lightelligence’s oNoC connects EICs with the optical network. Image courtesy of Lightelligence
At a lower level, the interposer contains photon routing waveguides that act as highways for data communication between the EICs. Each EIC is stacked on top of a micro-bump connected PIC to form a 2D array. Light from a laser source passes through waveguides and is translated into electrical data by modulating the intensity of the light. To do this, the analog interface on each EIC couples with the photonic interposer and alters the refractive index of the silicon waveguide to physically modulate the light intensity. To convert it back into bitstream, the EIC houses photodiodes that convert the light pulses into electric current for use in the digital domain.
The main advantage of optical interconnects is that they operate at significantly higher speeds and at lower power consumption than is possible with electrical NoCs. With near-zero latency, oNoC enables new NoC topologies, such as toroidal ones, that would otherwise not be possible.
Hummingbird oNoC processor
Recently, Lightelligence announced its new Hummingbird processor, the first product to feature its oNoc architecture.
Hummingbird is an AI/ML accelerator made up of 64 cores, each connected to each other via oNoC. With 64 transmitters and 512 receivers, Hummingbird is a Single Instruction, Multiple Data (SIMD) solution with its own proprietary ISA.
The Hummingbird processor stacks up. Image courtesy of Lightelligence
While performance numbers aren’t available, the company says the solution offers lower latency and power consumption than anything else out there. Specifically, the oNoC of the solution is said to achieve an energy efficiency ratio of less than < 1 pJ/bit.
As it stands, Hummingbird will be implemented in a standard server PCIe form factor. The tool will be programmable via the Lightelligence SDK, which offers support for TensorFlow. The first demonstrations of the chip will take place at this year’s Hot Chips conference in late August.
#Lightelligence #Introduces #Worlds #NetworkonChip #Optical #Processor #News
Image Source : www.allaboutcircuits.com