Jaehoon Lee

Member of Technical Staff @ Anthropic

San Francisco Bay Area

E-Mail: eejaehoon at gmail dot com

Curriculum Vitae: CV

I am a Member of Technical Staff at Anthropic conducting research in the fields of machine learning and artificial intelligence. I also have a background in theoretical high-energy physics.

I spent an incredible 7 years at Google, focusing on machine learning research. Most recently, as a Staff Research Scientist at Google DeepMind, I worked on the Path to AGI(PAGI) and Gemini teams, concentrating on the science of pretraining. The majority of my time at Google was spent with the Google Brain Team, exploring the scientific understanding of deep neural networks. My journey at Google began when I joined the second cohort of theBrain / AI Residency program in 2017.

Before joining Google, my main research focus was on theoretical high-energy physics. I was a postdoctoral researcher in the Department of Physics & Astronomy at University of British Columbia (UBC) in the String Theory Group. Before that, I completed my PhD in Center for Theoretical Physics (CTP) at MIT working on theoretical physics. I stuided Physics and Mathematical Sciences at Seoul National University.

News

[NEW!] June 2024: I joined Anthropic!
[NEW!] June 2024: Our paper Explaining Neural Scaling Laws is (finally) published in PNAS!
[NEW!] May 2024: Our paper Scaling Exponents Across Parameterizations and Optimizers is accepted at ICML 2024!
[NEW!] Jan 2024: Our paper Small-scale proxies for large-scale Transformer training instabilities is accepted at ICLR 2024 as an oral (1.2% of submitted papers)! See you at Vienna, Austria!
Dec 2023: Our new paper Beyond human data: Scaling self-training for problem-solving with language models is on ArXiv!
Nov 2023: Our new paper Frontier Language Models are not Robust to Adversarial Arithmetic, or” What do I need to say so you agree 2 + 2= 5? is on ArXiv!
Sep 2023: Our new paper Small-scale proxies for large-scale Transformer training instabilities is on ArXiv!
Sep 2023: Our new paper Replacing softmax with relu in vision transformers is on ArXiv!

Selected Publication

For full publication list see: [Google Scholar] [Semantic Scholar] [arXiv]

Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman et al., Jaehoon Lee*, Justin Gilmer*, Simon Kornblith* International Conference on Learning Representations (ICLR), 2024. [arXiv: 2309.14322]
Training LLMs over Neurally Compressed Text
Brian Lester, Jaehoon Lee Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant [arXiv: 2404.03626]
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini Team, Google [arXiv: 2403.05530]
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
BIG-bench collaboration, member of Core Contributors
[Transactions on Machine Learning Research (TMLR), 2023] [https://github.com/google/BIG-bench] [arXiv: 2206.04615]
Dataset Distillation with Infinitely Wide Convolutional Networks
Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee
Neural Information Processing Systems (NeurIPS), 2021
[arXiv: 2107.13034] [code / dataset] [Google AI Blog]
Explaining Neural Scaling Laws
Yasaman Bahri*, Ethan Dyer*, Jared Kaplan*, Jaehoon Lee*, Utkarsh Sharma* [Proceedings of the National Academy of Sciences (PNAS), 2024] [arXiv: 2102.06701]
Finite Versus Infinite Neural Networks: an Empirical Study
Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein
Neural Information Processing Systems (NeurIPS), 2020. [spotlight]
[arXiv: 2007.15801].
Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, Samuel S. Schoenholz
International Conference on Learning Representation(ICLR), 2020 [spotlight]
[arXiv: 1912.02803] [code]
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee*, Lechao Xiao*, Samuel S. Schoenholz, Yasaman Bahri, Jascha Sohl-Dickstein, Jeffrey Pennington
Neural Information Processing Systems (NeurIPS), 2019.
Special Isssue, Journal of Statistical Mechanics: Theory and Experiment, 2020.
[arXiv: 1902.06720] [code1] [code2] [Wikipedia(Neural tangent kernel)]
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue*, Jaehoon Lee*, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl
Journal of Machine Learning Research, 2019.
[arXiv: 1811.03600]
Deep Neural Networks as Gaussian Processes
Jaehoon Lee*, Yasaman Bahri*, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein
International Conference on Learning Representations (ICLR), 2018.
[arXiv: 1711.00165] [code] [Wikipedia(Neural network Gaussian process)]

Research

Recent research interests include:
1. Theoretical aspects of deep neural networks
2. Scientific and principled study of deep neural networks and their learning algorithms
3. Principled study of large scale neural networks (e.g. Neural scaling laws, inifinite-width limit)
4. Theoretical physics with focus on high energy physics
5. Interplay between physics and machine learning
Services:
1. Action Editor for TMLR
2. Area Chair for NeurIPS, ICLR, ICML
3. Reviewer for ICLR / ICML / NeurIPS / JMLR / Neural Computation / Pattern Recognition Letters / Nature Communications / TPAMI / AISTATS
4. Organizer for Aspen Winter Conference on Physics for Machine Learning
5. Organizer for ICML Workshop on Theoretical Physics for Deep Learning
6. Organizer for Vancouver deep learning study group