Archive for the 'Deep Learning' Category

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.
Such architecture is built on top of another important concept already known to the community: self-attention.
In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don't forget to subscribe to our Newsletter or join the discussion on our Discord server

 

References

Read Full Post »

Generative Adversarial Networks or GANs are very powerful tools to generate data. However, training a GAN is not easy. More specifically, GANs suffer of three major issues such as instability of the training procedure, mode collapse and vanishing gradients.

 

In this episode I not only explain the most challenging issues one would encounter while designing and training Generative Adversarial Networks. But also some methods and architectures to mitigate them. In addition I elucidate the three specific strategies that researchers are considering to improve the accuracy and the reliability of GANs.

 

The most tedious issues of GANs

 

Convergence to equilibrium

 

A typical GAN is formed by at least two networks: a generator G and a discriminator D. The generator's task is to generate samples from random noise. In turn, the discriminator has to learn to distinguish fake samples from real ones. While it is theoretically possible that generators and discriminators converge to a Nash Equilibrium (at which both networks are in their optimal state), reaching such equilibrium is not easy. 

 

Vanishing gradients

 

Moreover, a very accurate discriminator would push the loss function towards lower and lower values. This in turn, might cause the gradient to vanish and the entire network to stop learning completely. 

 

Mode collapse

 

Another phenomenon that is easy to observe when dealing with GANs is mode collapse. That is the incapability of the model to generate diverse samples. This in turn, leads to generated data that are more and more similar to the previous ones. Hence, the entire generated dataset would be just concentrated around a particular statistical value. 

 

The solution

 

Researchers have taken into consideration several approaches to overcome such issues. They have been playing with architectural changes, different loss functions and game theory.

 

Listen to the full episode to know more about the most effective strategies to build GANs that are reliable and robust.
Don't forget to join the conversation on our new Discord channel. See you there!

 

Read Full Post »

What happens to a neural network trained with random data?

Are massive neural networks just lookup tables or do they truly learn something? 

Today’s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrzębski from New York University.

Stan spent two summers as a visiting student with Prof. Yoshua Bengio and has been working on 

  • Understanding and improving how deep network generalise
  • Representation Learning
  • Natural Language Processing
  • Computer Aided Drug Design

 

What makes deep learning unique?

I have asked him a few questions for which I was looking for an answer for a long time. For instance, what is deep learning bringing to the table that other methods don’t or are not capable of? 
Stan believe that the one thing that makes deep learning special is representation learning. All the other competing methods, be it kernel machines, or random forests, do not have this capability. Moreover, optimisation (SGD) lies at the heart of representation learning in the sense that it allows finding good representations. 

 

What really improves the training quality of a neural network?

We discussed about the accuracy of neural networks depending pretty much on how good the Stochastic Gradient Descent method is at finding minima of the loss function. What would influence such minima?
Stan's answer has revealed that training set accuracy or loss value is not that interesting actually. It is relatively easy to overfit data (i.e. achieve the lowest loss possible), provided a large enough network, and a large enough computational budget. However, shape of the minima, or performance on validation sets are in a quite fascinating way influenced by optimisation.
Optimisation in the beginning of the trajectory, steers such trajectory towards minima of certain properties that go much further than just training accuracy.

As always we spoke about the future of AI and the role deep learning will play.

I hope you enjoy the show!

Don't forget to join the conversation on our new Discord channel. See you there!

 

References

 

Homepage of Stanisław Jastrzębski https://kudkudak.github.io/

A Closer Look at Memorization in Deep Networks https://arxiv.org/abs/1706.05394

Three Factors Influencing Minima in SGD https://arxiv.org/abs/1711.04623

Don't Decay the Learning Rate, Increase the Batch Size https://arxiv.org/abs/1711.00489

Stiffness: A New Perspective on Generalization in Neural Networks https://arxiv.org/abs/1901.09491

Read Full Post »

In this episode I speak with Jon Krohn, author of Deeplearning Illustrated a book that makes deep learning easier to grasp. 
We also talk about some important guidelines to take into account whenever you implement a deep learning model, how to deal with bias in machine learning used to match jobs to candidates and the future of AI. 
 
 
You can purchase the book from informit.com/dsathome with code DSATHOME and get 40% off books/eBooks and 60% off video training

Read Full Post »

Join the discussion on our Discord server

In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs.
The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images.
Enjoy the show!

 

References

Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376

Read Full Post »

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.
Such architecture is built on top of another important concept already known to the community: self-attention.
In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don't forget to subscribe to our Newsletter or join the discussion on our Discord server

 

References

Read Full Post »

Join the discussion on our Discord server

 

In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published.

We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI.
Moreover, 
Aaron provides some very interesting links and demos that will blow your mind!

Enjoy the show! 

References

Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron)

Pix2Pix: 
 
CycleGAN:
 

GANimorph

 

Read Full Post »

Join the discussion on our Discord server

 

After reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.
In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions. RL seems to work so well! What is wrong with it?

 

Are you a listener of Data Science at Home podcast?
A reader of the Amethix Blog? 
Or did you subscribe to the Artificial Intelligence at your fingertips newsletter?
In any case let’s stay in touch! 
https://amethix.com/survey/

 

 

References

Read Full Post »

Join the discussion on our Discord server

In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published.

We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI.
Moreover, 
Aaron provides some very interesting links and demos that will blow your mind!

Enjoy the show! 

References

Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron)

Pix2Pix: 
 
CycleGAN:
 

GANimorph

 

Read Full Post »

Join the discussion on our Discord server

Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

 

References

Faster Neural Network Training with Data Echoing
https://arxiv.org/abs/1907.05550

Read Full Post »

In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment. 

I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github 

 

References

Videflow Github official repository
https://github.com/videoflow/videoflow

 

Read Full Post »

Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

 

References

Faster Neural Network Training with Data Echoing
https://arxiv.org/abs/1907.05550

Read Full Post »

In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work. 

The questions that Charles answers in the show are essentially two:

  1. Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?

  2. How can we dominate DNN in a theoretically principled way?

 

References 

 

 

Read Full Post »

In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment. 

I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github 

 

References

Videflow Github official repository
https://github.com/videoflow/videoflow

 

Read Full Post »

Today I am with David Kopec, author of Classic Computer Science Problems in Python, published by Manning Publications.

His book deepens your knowledge of problem solving techniques from the realm of computer science by challenging you with interesting and realistic scenarios, exercises, and of course algorithms.
There are examples in the major topics any data scientist should be familiar with, for example search, clustering, graphs, and much more.

Get the book from https://www.manning.com/books/classic-computer-science-problems-in-python and use coupon code poddatascienceathome19 to get 40% discount.

 

References

Twitter https://twitter.com/davekopec

GitHub https://github.com/davecom

classicproblems.com

Read Full Post »