Physical symmetry and lattice symmetry in the lattice Boltzmann method

N Cao, S Chen, S Jin, D Martinez - Physical review E, 1997 - APS
The lattice Boltzmann method (LBM) is regarded as a specific finite difference discretization
for the kinetic equation of the discrete velocity distribution function. We argue that for finite …

A scalable multi-TeraOPS deep learning processor core for AI trainina and inference

…, A Agrawal, T Babinsky, N Cao… - … IEEE symposium on …, 2018 - ieeexplore.ieee.org
A multi-TOPS AI core is presented for acceleration of deep learning training and inference in
systems from edge devices to data centers. With a programmable architecture and custom …

RaPiD: AI accelerator for ultra-low precision training and inference

…, CY Chen, A Allain, J Bonano, N Cao… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
The growing prevalence and computational demands of Artificial Intelligence (AI) workloads
has led to widespread use of hardware accelerators in their execution. Scaling the …

Refined similarity hypothesis for transverse structure functions in fluid turbulence

S Chen, KR Sreenivasan, M Nelkin, N Cao - Physical review letters, 1997 - APS
We argue on the basis of empirical data that Kolmogorov's refined similarity hypothesis (RSH)
needs to be modified for transverse velocity increments, and propose an alternative. In this …

9.1 A 7nm 4-core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-aware throttling

…, M Kang, S Venkataramani, N Cao… - … Solid-State Circuits …, 2021 - ieeexplore.ieee.org
Low-precision computation is the key enabling factor to achieve high compute densities (TOPS/W
and TOPS/mm 2 ) in AI hardware accelerators across cloud and edge platforms. …

Statistics and structures of pressure in isotropic turbulence

N Cao, S Chen, GD Doolen - Physics of Fluids, 1999 - pubs.aip.org
Statistics and structures of pressure in three-dimensional incompressible isotropic turbulence
are studied using high-resolution direct numerical simulation for Taylor microscale …

Efficient AI system design with cross-layer approximate computing

…, J Oh, S Jain, T Babinsky, N Cao… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Advances in deep neural networks (DNNs) and the availability of massive real-world data
have enabled superhuman levels of accuracy on many AI tasks and ushered the explosive …

A 45 nm SOI embedded DRAM macro for the POWER™ processor 32 MByte on-chip L3 cache

…, T Kirihata, WR Reohr, K Nair, N Cao - IEEE Journal of Solid …, 2010 - ieeexplore.ieee.org
A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been
developed for the POWER7™ high-performance microprocessor. The macro employs a 6 …

A 3.0 TFLOPS 0.62 V scalable processor core for high compute utilization AI training and inference

…, S Ben-Yehuda, J Bonanno, N Cao… - … IEEE Symposium on …, 2020 - ieeexplore.ieee.org
A processor core is presented for AI training and inference products. Leading-edge compute
efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-…

A scalable multi-TeraOPS core for AI training and inference

…, A Agrawal, T Babinsky, N Cao… - IEEE Solid-State …, 2019 - ieeexplore.ieee.org
This letter presents a multi-TOPS AI accelerator core for deep learning training and inference.
With a programmable architecture and custom ISA, this engine achieves >90% sustained …