Seminal Papers of Deep Learning

In a recent Q&A with John Carmack about his unique path to Artificial General Intelligence, he mentioned a list of 40 research papers suggested by Ilya Sutskever to read to better grasp the current state of deep learning.

“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”

This topic was trending for a while on hacker news and Twitter, and given that I was researching previous work in deep learning to catch up with the field myself, it felt like a great idea to pursue.

Therefore, approaching this problem with what Richard Feynman calls the “Martian”¹ view, I will try to compile a list of seminal papers that if someone knew nothing about neural networks and deep learning today and wanted to build AGI, these papers would sort of stand out as seminal papers² that have influenced the direction of research towards building superintelligence systems.

The List

Computing Machinery and Intelligence, by Alan Turing (1950)
The Perceptron, A Perceiving and Recognizing Automaton by Frank Rosenblatt (1957)
LSTM: Long Short-Term Memory (1997)
LeNet: Gradient-Based Learning Applied to Document Recognition (1998)
AlexNet: ImageNet Classification with Deep Convolutional Neural Networks (2012)
Playing Atari with Deep Reinforcement Learning (2013)
Dropout: A Simple Way to Prevent Neural Networks from Overfitting (2014)
VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition (2014)
Sequence to Sequence Learning with Neural Networks (2014)
Generative Adversarial Networks (2014)
Adam: A Method for Stochastic Optimization (2014)
ResNet: Deep Residual Learning for Image Recognition (2015)
U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)
GoogleNet: Going Deeper with Convolutions (2015)
Distilling the Knowledge in a Neural Network (2015)
YOLO: You Only Look Once: Unified, Real-Time Object Detection (2015)
SSD: Single Shot MultiBox Detector (2015)
WaveNet: A Generative Model for Raw Audio (2016)
Mastering the game of Go with deep neural networks and tree search (2016)
Mask R-CNN (2017)
PPO: Proximal Policy Optimization Algorithms (2017)
Learning to Generate Reviews and Discovering Sentiment (2017)
Attention Is All You Need (2017)
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (2017)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
GPT2: Language Models are Unsupervised Multitask Learners (2018)
Dota 2 with Large-Scale Deep Reinforcement Learning (2019)
ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020)
GPT3: Language Models are Few-Shot Learners (2020)
DALLE: Zero-Shot Text-to-Image Generation (2021)
CLIP: Learning Transferable Visual Models From Natural Language Supervision (2021)

There are other papers in mind, but the line is a bit blurry as to what would be considered as seminal and what can be discovered in hindsight.

Notes

“Let’s look at it like a Martian would look at it?”, Richard Feynman ↩
I want to acknowledge the work of all the others, it’s foolish to say that only certain papers are important as lots of work has been built as they say “on the shoulder of giants”. And I think even in the Karmack situation it is still valid to say that this was the goal of the list, more of how to catch up with the field and what are the seminal ideas to achieve AGI. ↩