Scalable watermarking for identifying large language model outputs

Dathathri, Sumanth; See, Abigail; Ghaisas, Sumedh; Huang, Po-Sen; McAdam, Rob; Welbl, Johannes; Bachani, Vandana; Kaskasoli, Alex; Stanforth, Robert; Matejovicova, Tatiana; Hayes, Jamie; Vyas, Nidhi; Merey, Majd Al; Brown-Cohen, Jonah; Bunel, Rudy; Balle, Borja; Cemgil, Taylan; Ahmed, Zahra; Stacpoole, Kitty; Shumailov, Ilia; Baetu, Ciprian; Gowal, Sven; Hassabis, Demis; Kohli, Pushmeet

Scalable watermarking for identifying large language model outputs

Sumanth Dathathri (), Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Al Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Taylan Cemgil, Zahra Ahmed, Kitty Stacpoole, Ilia Shumailov, Ciprian Baetu, Sven Gowal, Demis Hassabis and Pushmeet Kohli ()
Additional contact information
Sumanth Dathathri: Google DeepMind
Abigail See: Google DeepMind
Sumedh Ghaisas: Google DeepMind
Po-Sen Huang: Google DeepMind
Rob McAdam: Google
Johannes Welbl: Google DeepMind
Vandana Bachani: Google DeepMind
Alex Kaskasoli: Google DeepMind
Robert Stanforth: Google DeepMind
Tatiana Matejovicova: Google DeepMind
Jamie Hayes: Google DeepMind
Nidhi Vyas: Google
Majd Al Merey: Google
Jonah Brown-Cohen: Google DeepMind
Rudy Bunel: Google DeepMind
Borja Balle: Google DeepMind
Taylan Cemgil: Google DeepMind
Zahra Ahmed: Google DeepMind
Kitty Stacpoole: Google DeepMind
Ilia Shumailov: Google DeepMind
Ciprian Baetu: Google
Sven Gowal: Google DeepMind
Demis Hassabis: Google DeepMind
Pushmeet Kohli: Google DeepMind

Nature, 2024, vol. 634, issue 8035, 818-823

Abstract: Abstract Large language models (LLMs) have enabled the generation of high-quality synthetic text, often indistinguishable from human-written content, at a scale that can markedly affect the nature of the information ecosystem1–3. Watermarking can help identify synthetic text and limit accidental or deliberate misuse4, but has not been adopted in production systems owing to stringent quality, detectability and computational efficiency requirements. Here we describe SynthID-Text, a production-ready text watermarking scheme that preserves text quality and enables high detection accuracy, with minimal latency overhead. SynthID-Text does not affect LLM training and modifies only the sampling procedure; watermark detection is computationally efficient, without using the underlying LLM. To enable watermarking at scale, we develop an algorithm integrating watermarking with speculative sampling, an efficiency technique frequently used in production systems5. Evaluations across multiple LLMs empirically show that SynthID-Text provides improved detectability over comparable methods, and standard benchmarks and human side-by-side ratings indicate no change in LLM capabilities. To demonstrate the feasibility of watermarking in large-scale-production systems, we conducted a live experiment that assessed feedback from nearly 20 million Gemini6 responses, again confirming the preservation of text quality. We hope that the availability of SynthID-Text7 will facilitate further development of watermarking and responsible use of LLM systems.

Date: 2024
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.nature.com/articles/s41586-024-08025-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:634:y:2024:i:8035:d:10.1038_s41586-024-08025-4

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-024-08025-4

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().