Top papers in Deep Learning in 5 min
And my thoroughly reviews on these papers
1. ImageNet Classification with Deep Convolutional
Neural Networks
Th exact paper can be found on arxiv at this link. Among the three authors of this celebre document there are Ilya Sutskever and Geoffrey Hinton, from the University of Toronto, where they both met and worked along for a long time.
The innovative neural network came across as an outstanding opportunity for outsmarting and outdoing all the previous state-of-the-art neural networks enrolled in ImageNet Large-Scale Visual Recognition Challenge in 2012.
The ILSVRC dataset consists of 1000 final label (excluding multi-class pictures scenario) and having various initial input size; whatsoever, the pictures are pre-processed to having 256 x 256 x 3 and after that 224 x 224 x 3 random patches are extracted and finally a convolutional network is applied.
2. A logical calculus of the ideas immanent in nervous activity
This masterpiece is a milestone in the artificial intelligence realm and computer science in general. Given the fact that this scientific paper came out in 1943.
It lies down all the basic mathematical and biological principles of what we use to call nowadays deep learning. It translates the neuron connections into mathematical functions back then, when computers haven’t been invented yet! It has 19 pages and its focus is on some root subjects which lays at the ground of deep learning like:
- Mathematics
- Biology
- Data Structures
- Logical Circuits
For its time it’s a masterpiece which I wholeheartedly recommend. You can read it freely here on web.
3. Attention is all you need
This game changer has become more popular than any other architecture because its capability to outdo RNNs and LSTMs and all other deep architectures in a relatively short time. It was released by 8 researchers, most of them from Google, led by A. Vaswani.
Transformers use this unique type of encoders where you can store the tokens based of its position within the context and further more the multi-headed attention won’t get your large window meaning vanished. Thus, you can train them of huge documents and extract meaning, resumes, ideas or features from very large context.
Also, you can find the article freely on arxiv here.
4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Unlike traditional models that read text in a left-to-right or right-to-left manner, BERT reads text bidirectionally, capturing context from both sides of each word simultaneously. This approach enables BERT to understand nuanced language, making it highly effective for tasks like question answering, sentiment analysis, and language translation.
This was the first considerable boost in the field of NLP for difficult tasks of reasoning, text classification or knowledge-based reasoning.
Their root idea was about applying segment and position embeddings over token embedding for construction of a more consistent embedding.
The NLP experiments done on this architecture are:
- GLUE — General Language Understanding Evaluation, this is a open-source benchmark available on github.
- SQaAD — Stanford Question Answering Dataset, another open-source dataset, which is also available on github. Official BERT paper includes SQuAD v1.1 and SQuAD v2.0
- SWAG — The Situations With Adversarial Generations — a dataset which entails 113K sentence-pair completions. You can found it over here.
5. Generative Adversarial Networks
This paper has 8 authors, out of whom, the most outstanding ones, which truly revolutionized the deep learning field are Ian Goodfellow, Aaron Courville and Yoshua Bengio from the Operational Research Informatics Department of Montreal University.
The datasets used in this paper are:
- MNIST — the ‘hello-world’ dataset of deep learning consisting of 0 to 9 hand-written figures. It is publicly available here. There are even Python modules that offer it like tensorflow or sklearn.
- TFD — Toronto Face Database. It has a lot of faces :)
- CIFAR-10 — this is a various object recognition dataset including the following labels: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. There are 60.000 pictures of 32(h) x 32(w) pixels and 3 colour channels.
The authors compare the performance of GANs with deep directed graphical models, deep undirected graphical models and generative autoencoders.
Its provability was only demonstrated theoretically based on KL Kullback-Leibler divergence and Jensen-Shannon divergence.
Also you can find the paper here.
Other noticeable papers that lie at the foundations of Deep Learning and I omitted here and I truly recommend are :
- Convolutional Neural Networks by LeCun Yann et al.
- On the difficulty of training Recurrent Neural Networks by Razvan Pascanu et al.
- On the difficulty of training Recurrent Neural Networks by Joseph Redmon et al.
If you found it interesting enough, or if just wanted to read more do reply and comment. Also leave a clap if you liked what I do, thanks!