Demonstration of transformer-based ALBERT model on a 14nm analog AI inference chip

Chen, An; Ambrogio, Stefano; Narayanan, Pritish; Okazaki, Atsuya; Mackin, Charles; Fasoli, Andrea; Rasch, Malte J.; Friz, Alexander; Luquin, Jose; Yasuda, Takeo; Ishii, Masatoshi; Kanamori, Takuto; Hosokawa, Kohji; Philicelli, Timothy; Munetoh, Seiji; Narayanan, Vijay; Tsai, Hsinyu; Burr, Geoffrey W.

Demonstration of transformer-based ALBERT model on a 14nm analog AI inference chip

An Chen (), Stefano Ambrogio, Pritish Narayanan, Atsuya Okazaki, Charles Mackin, Andrea Fasoli, Malte J. Rasch, Alexander Friz, Jose Luquin, Takeo Yasuda, Masatoshi Ishii, Takuto Kanamori, Kohji Hosokawa, Timothy Philicelli, Seiji Munetoh, Vijay Narayanan, Hsinyu Tsai and Geoffrey W. Burr
Additional contact information
An Chen: IBM Research – Almaden
Stefano Ambrogio: IBM Research – Almaden
Pritish Narayanan: IBM Research – Almaden
Atsuya Okazaki: IBM Research – Tokyo
Charles Mackin: IBM Research – Almaden
Andrea Fasoli: IBM Research – Almaden
Malte J. Rasch: IBM T. J. Watson Research Center – Yorktown Heights
Alexander Friz: IBM Research – Almaden
Jose Luquin: IBM Research – Almaden
Takeo Yasuda: IBM Research – Tokyo
Masatoshi Ishii: IBM Research – Tokyo
Takuto Kanamori: IBM Research – Tokyo
Kohji Hosokawa: IBM Research – Tokyo
Timothy Philicelli: IBM Albany NanoTech – Albany
Seiji Munetoh: IBM Research – Tokyo
Vijay Narayanan: IBM T. J. Watson Research Center – Yorktown Heights
Hsinyu Tsai: IBM Research – Almaden
Geoffrey W. Burr: IBM Research – Almaden

Nature Communications, 2025, vol. 16, issue 1, 1-11

Abstract: Abstract A Lite Bidirectional Encoder Representations from Transformers model is demonstrated on an analog inference chip fabricated at 14nm node with phase change memory. The 7.1 million unique analog weights shared across 12 layers are mapped to a single chip, accurately programmed into the conductance of 28.3 million devices, for this first analog hardware demonstration of a meaningfully large Transformer model. The implemented model achieved near iso-accuracy on the General Language Understanding Evaluation benchmark of seven tasks, despite the presence of weight-programming errors, hardware imperfections, readout noise, and error propagation. The average hardware accuracy was only 1.8% below that of the floating-point reference, with several tasks at full iso-accuracy. Careful fine-tuning of model weights using hardware-aware techniques contributes an average hardware accuracy improvement of 4.4%. Accuracy loss due to conductance drift – measured to be roughly 5% over 30 days – was reduced to less than 1% with a recalibration-based “drift compensation” technique.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-63794-4 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-63794-4

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-63794-4

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().