2019-11 | Data Science at Home

Episodes

Wednesday Nov 27, 2019

More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Wednesday Nov 27, 2019

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server

References
Attention is all you need https://arxiv.org/abs/1706.03762
The illustrated transformer https://jalammar.github.io/illustrated-transformer
Self-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf

Monday Nov 18, 2019

How to improve the stability of training a GAN (Ep. 88)

Monday Nov 18, 2019

Generative Adversarial Networks or GANs are very powerful tools to generate data. However, training a GAN is not easy. More specifically, GANs suffer of three major issues such as instability of the training procedure, mode collapse and vanishing gradients.

In this episode I not only explain the most challenging issues one would encounter while designing and training Generative Adversarial Networks. But also some methods and architectures to mitigate them. In addition I elucidate the three specific strategies that researchers are considering to improve the accuracy and the reliability of GANs.

The most tedious issues of GANs

Convergence to equilibrium

A typical GAN is formed by at least two networks: a generator G and a discriminator D. The generator's task is to generate samples from random noise. In turn, the discriminator has to learn to distinguish fake samples from real ones. While it is theoretically possible that generators and discriminators converge to a Nash Equilibrium (at which both networks are in their optimal state), reaching such equilibrium is not easy.

Vanishing gradients

Moreover, a very accurate discriminator would push the loss function towards lower and lower values. This in turn, might cause the gradient to vanish and the entire network to stop learning completely.

Mode collapse

Another phenomenon that is easy to observe when dealing with GANs is mode collapse. That is the incapability of the model to generate diverse samples. This in turn, leads to generated data that are more and more similar to the previous ones. Hence, the entire generated dataset would be just concentrated around a particular statistical value.

The solution

Researchers have taken into consideration several approaches to overcome such issues. They have been playing with architectural changes, different loss functions and game theory.

Listen to the full episode to know more about the most effective strategies to build GANs that are reliable and robust. Don't forget to join the conversation on our new Discord channel. See you there!

Tuesday Nov 12, 2019

What if I train a neural network with random data? (with Stanisław Jastrzębski) (Ep. 87)

Tuesday Nov 12, 2019

What happens to a neural network trained with random data?
Are massive neural networks just lookup tables or do they truly learn something?
Today’s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrzębski from New York University.
Stan spent two summers as a visiting student with Prof. Yoshua Bengio and has been working on
Understanding and improving how deep network generalise
Representation Learning
Natural Language Processing
Computer Aided Drug Design

What makes deep learning unique?
I have asked him a few questions for which I was looking for an answer for a long time. For instance, what is deep learning bringing to the table that other methods don’t or are not capable of? Stan believe that the one thing that makes deep learning special is representation learning. All the other competing methods, be it kernel machines, or random forests, do not have this capability. Moreover, optimisation (SGD) lies at the heart of representation learning in the sense that it allows finding good representations.

What really improves the training quality of a neural network?
We discussed about the accuracy of neural networks depending pretty much on how good the Stochastic Gradient Descent method is at finding minima of the loss function. What would influence such minima?Stan's answer has revealed that training set accuracy or loss value is not that interesting actually. It is relatively easy to overfit data (i.e. achieve the lowest loss possible), provided a large enough network, and a large enough computational budget. However, shape of the minima, or performance on validation sets are in a quite fascinating way influenced by optimisation. Optimisation in the beginning of the trajectory, steers such trajectory towards minima of certain properties that go much further than just training accuracy.
As always we spoke about the future of AI and the role deep learning will play.
I hope you enjoy the show!
Don't forget to join the conversation on our new Discord channel. See you there!

References

Homepage of Stanisław Jastrzębski https://kudkudak.github.io/
A Closer Look at Memorization in Deep Networks https://arxiv.org/abs/1706.05394
Three Factors Influencing Minima in SGD https://arxiv.org/abs/1711.04623
Don't Decay the Learning Rate, Increase the Batch Size https://arxiv.org/abs/1711.00489
Stiffness: A New Perspective on Generalization in Neural Networks https://arxiv.org/abs/1901.09491

Tuesday Nov 05, 2019

Deeplearning is easier when it is illustrated (with Jon Krohn) (Ep. 86)

Tuesday Nov 05, 2019

In this episode I speak with Jon Krohn, author of Deeplearning Illustrated a book that makes deep learning easier to grasp.
We also talk about some important guidelines to take into account whenever you implement a deep learning model, how to deal with bias in machine learning used to match jobs to candidates and the future of AI.

You can purchase the book from informit.com/dsathome with code DSATHOME and get 40% off books/eBooks and 60% off video training

Monday Nov 04, 2019

[RB] How to generate very large images with GANs (Ep. 85)

Monday Nov 04, 2019

Join the discussion on our Discord server
In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs. The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images.Enjoy the show!

References
Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376