LATTE ’22

Workshop on Languages, Tools, and Techniques for Accelerator Design

LATTE '22 is a workshop co-located with ASPLOS '22. For examples of previously presented papers, take a look at LATTE '21. LATTE will take place on March 1st, 2022.

LATTE is going to offer a hybrid attendance model. Participants will be able to attend either remotely or in-person. Table and plenary discussions will both use hybrid modality. Like the previous year, we invite participants to make use of GitHub discussion threads to ask questions from authors. Speakers of accepted papers will be provided with details on talks after acceptance.

Registration

To attend LATTE please use this Whova link to register:

For in-person attendance: Register for the "Tuesday Workshops". This will only give you access to the Tuesday workshops. You can additionally register for ASPLOS from the same page.
For remote attendace: Register for the "ASPLOS remote" option which will give you access to the entire ASPLOS program.

Program

Time (CET)	Event
9–9:15am	Opening & Introductions
9:15–10:15am	Keynote: Chris Leary (Google)
10:15–11am	Session 1
11–11:30am	Coffee Break
11:30am–12:00pm	Session 2
12:00–1:30pm	Lunch (on your own)
1:30–2:30pm	Discussion
2:30–3pm	Session 3
3–4pm	Keynote: Andrew Ling (Groq)
4–4:30pm	Coffee Break
4:30–5pm	Session 4

Keynotes

Andrew Ling (Groq): A Kernel-less Approach to Deep Learning Compilers

Recording

With the recent decline of Moore’s Law, modern architecture derivatives have exploded in order to keep up with the compute demands of today. With that, compiler flows and techniques have been at the forefront to support these new architectures. The vast majority of compilers lean on customized flows and kernels to enable high-performance. This involves investing billions of dollars to create a robust kernel library where kernel developers create hand scheduled and vectorized “inner loops” that leverage the hardware primitives effectively. Nvidia and Intel have the lead in this space and have invested thousands of engineering years to create a kernel library that will be difficult to match. By forcing the industry to stick with a kernel first approach, we are inadvertently limiting the amount of customization and breadth of functionality that could otherwise be achieved, and preventing innovation.

In contrast, the Groq architecture was designed with a software first mindset to avoid the need for manual kernel development and provide a fully automated compilation flow to map HPC and Neural Network models down to the architectural level. This is a strong divergence from the conventional approach and will open up opportunities for further innovation in the machine learning and scientific computing space.

We are able to achieve this level of automation by simplifying the primitive set on the Groq architecture and avoiding traditional approaches to vectorization, which often require “rediscovering” parallelization opportunities in the program and aligning those opportunities with the hardware structure. Advantageously, we are able to decompose directly to specific hardware blocks on the architecture which yields extremely high efficiency because of the regularity of the architecture. Efficiency is a direct result of the design approach that Groq took by using the compiler to guide the architecture design. This software first approach to architecture design will be critical for the industry to move at a faster pace as compute demands continue to grow exponentially in the foreseeable future.

Chris Leary (Google): X-istentialism: Supercomputers, Silicon Atoms, and the Science Between!

Recording

In this talk we discuss the increasing development and adoption of accelerators from the vantage point of a wide-eyed hyperscaler accelerator developer. This talk humbly proposes that an era of "Radical Co-design" or "Co-design 2.0" is upon us, as indicated by current observations and trends. To this end, we recap a decade of accelerator development work across XLA (cluster-capable ML compiler), JAX (staging/metaprogramming), and XLS (high level synthesis toolchain), with acceleration targets now identified at datacenter scale (i.e. Google Wide Profiling / "datacenter tax" workloads), and programs collaborating on open RTL-level tooling and open ASIC process nodes (Category X).

Many confluent factors make this a special time in computing: we've exited the "treadmill" where innovations failed to overtake CPU improvements -- accelerators with definitive specializations are now deployed at scale. Philosophy of science offers guidance on how we might expect "crises" (term of art) around expected improvement rate, scale, and growth to lead to advances in our thinking on computation and how this might dovetail with dominant programming models. As we "rebase" our thinking, we'll observe numerous challenges and open questions on how best to proceed and what avenues of investigation might be most fruitful. However, the undercurrent of progress will be secured through development of open tools and active investigation/discussion in the open and in the scientific community, underscoring the importance of LATTE and its focus.

Papers

Session 1

A Position on Program Synthesis for Processor Development

Zachary D. Sisco, Jonathan Balkind, Timothy Sherwood, Ben Hardekopf (UC Santa Barbara)

Thread Talk
Refinement Types for Hardware

Robin Webbers, Klaus v. Gleissenthall (Vrije Universiteit Amsterdam)

Thread Talk

Session 2

HLS from PyTorch to System Verilog with MLIR and CIRCT

Mike Urbach (Alloy Computing, LLC); Morten B. Petersen (EPFL)

Thread Talk
'Optimising' High-level Synthesis in CIRCT

Julian Oppermann, Melanie Reuter-Oppermann, Andreas Koch (Technical University of Darmstadt); Oliver Sinnen (University of Auckland)

Thread Talk
Using CIRCT for FPGA Physical Design (invited paper)

John Demme, Aaron Landy (Microsoft)

Thread Talk

Session 3

Design Space Description Language for Automated and Comprehensive Exploration of Next-Gen Hardware Accelerators

Shail Dave, Aviral Shrivastava (Arizona State University)

Thread Talk
Learnings from a HLS-based High-Productivity Digital VLSI Flow (invited paper)

Thierry Tambe (Harvard)

Thread Talk

Session 4

High-level Synthesis using the Julia Language

Benjamin Biggs (Imperial College London); Ian McInerney (The University of Manchester); Eric C. Kerrigan, George A. Constantinides (Imperial College London)

Thread Talk
Case Study: Software and Tool Challenges Encountered in Parameterizing a Domain-Specific Accelerator (invited paper)

Radhika Ghosal, Sabrina Neuman (Harvard)

Thread Talk
HPVM2FPGA: Enabling True Hardware-Agnostic FPGA Programming (invited paper)

Adel Ejjeh, Leon Medvinsky, Aaron Councilman, Hemang Nehra, Suraj Sharma, Vikram Adve (University of Illinois at Urbana-Champaign), Luigi Nardi (Lund University and Stanford University), Eriko Nurvitadhi (Intel), Rob A Rutenbar (University of Pittsburgh)

Thread Talk

Program Committee

Rachit Nigam, Cornell University
Adrian Sampson, Cornell University
Ang Li, Princeton
Fabian Schuiki, SiFive
Gus Smith, University of Washington
Hanchen Ye, UIUC
Jianyi Cheng, Imperial College London
Jie Wang, Amazon
Jin Yang, Intel
John Demme, Microsoft
Jonathan Balkind, UC Santa Barbara
Jose Renau, UC Santa Cruz
Katherine Lim, University of Washington
Kevin Laeufer, UC Berkeley
Kiran Kintali, MathWorks
Leonard Truong, Stanford
Licheng Guo, UCLA
Ross Daly, Stanford
Shail Dave, Arizona State University
Stephen Neuendorffer, Xilinx
Tobias Grosser, University of Edinburgh
Zhiru Zhang, Cornell