Archive for the 'optimisation' Category

Generative Adversarial Networks or GANs are very powerful tools to generate data. However, training a GAN is not easy. More specifically, GANs suffer of three major issues such as instability of the training procedure, mode collapse and vanishing gradients.

 

In this episode I not only explain the most challenging issues one would encounter while designing and training Generative Adversarial Networks. But also some methods and architectures to mitigate them. In addition I elucidate the three specific strategies that researchers are considering to improve the accuracy and the reliability of GANs.

 

The most tedious issues of GANs

 

Convergence to equilibrium

 

A typical GAN is formed by at least two networks: a generator G and a discriminator D. The generator's task is to generate samples from random noise. In turn, the discriminator has to learn to distinguish fake samples from real ones. While it is theoretically possible that generators and discriminators converge to a Nash Equilibrium (at which both networks are in their optimal state), reaching such equilibrium is not easy. 

 

Vanishing gradients

 

Moreover, a very accurate discriminator would push the loss function towards lower and lower values. This in turn, might cause the gradient to vanish and the entire network to stop learning completely. 

 

Mode collapse

 

Another phenomenon that is easy to observe when dealing with GANs is mode collapse. That is the incapability of the model to generate diverse samples. This in turn, leads to generated data that are more and more similar to the previous ones. Hence, the entire generated dataset would be just concentrated around a particular statistical value. 

 

The solution

 

Researchers have taken into consideration several approaches to overcome such issues. They have been playing with architectural changes, different loss functions and game theory.

 

Listen to the full episode to know more about the most effective strategies to build GANs that are reliable and robust.
Don't forget to join the conversation on our new Discord channel. See you there!

 

Read Full Post »

What happens to a neural network trained with random data?

Are massive neural networks just lookup tables or do they truly learn something? 

Today’s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrzębski from New York University.

Stan spent two summers as a visiting student with Prof. Yoshua Bengio and has been working on 

  • Understanding and improving how deep network generalise
  • Representation Learning
  • Natural Language Processing
  • Computer Aided Drug Design

 

What makes deep learning unique?

I have asked him a few questions for which I was looking for an answer for a long time. For instance, what is deep learning bringing to the table that other methods don’t or are not capable of? 
Stan believe that the one thing that makes deep learning special is representation learning. All the other competing methods, be it kernel machines, or random forests, do not have this capability. Moreover, optimisation (SGD) lies at the heart of representation learning in the sense that it allows finding good representations. 

 

What really improves the training quality of a neural network?

We discussed about the accuracy of neural networks depending pretty much on how good the Stochastic Gradient Descent method is at finding minima of the loss function. What would influence such minima?
Stan's answer has revealed that training set accuracy or loss value is not that interesting actually. It is relatively easy to overfit data (i.e. achieve the lowest loss possible), provided a large enough network, and a large enough computational budget. However, shape of the minima, or performance on validation sets are in a quite fascinating way influenced by optimisation.
Optimisation in the beginning of the trajectory, steers such trajectory towards minima of certain properties that go much further than just training accuracy.

As always we spoke about the future of AI and the role deep learning will play.

I hope you enjoy the show!

Don't forget to join the conversation on our new Discord channel. See you there!

 

References

 

Homepage of Stanisław Jastrzębski https://kudkudak.github.io/

A Closer Look at Memorization in Deep Networks https://arxiv.org/abs/1706.05394

Three Factors Influencing Minima in SGD https://arxiv.org/abs/1711.04623

Don't Decay the Learning Rate, Increase the Batch Size https://arxiv.org/abs/1711.00489

Stiffness: A New Perspective on Generalization in Neural Networks https://arxiv.org/abs/1901.09491

Read Full Post »

In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment. 

I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github 

 

References

Videflow Github official repository
https://github.com/videoflow/videoflow

 

Read Full Post »

In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work. 

The questions that Charles answers in the show are essentially two:

  1. Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?

  2. How can we dominate DNN in a theoretically principled way?

 

References 

Read Full Post »

Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

 

References

Faster Neural Network Training with Data Echoing
https://arxiv.org/abs/1907.05550

Read Full Post »

In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work. 

The questions that Charles answers in the show are essentially two:

  1. Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?

  2. How can we dominate DNN in a theoretically principled way?

 

References 

 

 

Read Full Post »

Today I am with David Kopec, author of Classic Computer Science Problems in Python, published by Manning Publications.

His book deepens your knowledge of problem solving techniques from the realm of computer science by challenging you with interesting and realistic scenarios, exercises, and of course algorithms.
There are examples in the major topics any data scientist should be familiar with, for example search, clustering, graphs, and much more.

Get the book from https://www.manning.com/books/classic-computer-science-problems-in-python and use coupon code poddatascienceathome19 to get 40% discount.

 

References

Twitter https://twitter.com/davekopec

GitHub https://github.com/davecom

classicproblems.com

Read Full Post »

It all starts from physics. The entropy of an isolated system never decreases… Everyone at school, at some point of his life, learned this in his physics class. What does this have to do with machine learning?
To find out, listen to the show.

 

References

Entropy in machine learning 
https://amethix.com/entropy-in-machine-learning/

Read Full Post »

In this episode I met three crazy researchers from KULeuven (Belgium) who found a method to fool surveillance cameras and stay hidden just by holding a special t-shirt. 
We discussed about the technique they used and some consequences of their findings.

They published their paper on Arxiv and made their source code available at https://gitlab.com/EAVISE/adversarial-yolo

Enjoy the show!

 

References

Fooling automated surveillance cameras: adversarial patches to attack person detection 
Simen ThysWiebe Van RanstToon Goedemé

 

Eavise Research Group KULeuven (Belgium)
https://iiw.kuleuven.be/onderzoek/eavise

Read Full Post »

There is a connection between gradient descent based optimizers and the dynamics of damped harmonic oscillators. What does that mean? We now have a better theory for optimization algorithms.
In this episode I explain how all this works.

All the formulas I mention in the episode can be found in the post The physics of optimization algorithms

Enjoy the show.

 

Read Full Post »

How are differential equations related to neural networks? What are the benefits of re-thinking neural network as a differential equation engine? In this episode we explain all this and we provide some material that is worth learning. Enjoy the show!

 

Residual Block

Residual block

 

 

References

[1] K. He, et al., “Deep Residual Learning for Image Recognition”, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, 2016

[2] S. Hochreiter, et al., “Long short-term memory”, Neural Computation 9(8), pages 1735-1780, 1997.

[3] Q. Liao, et al.,”Bridging the gaps between residual learning, recurrent neural networks and visual cortex”, arXiv preprint, arXiv:1604.03640, 2016.

[4] Y. Lu, et al., “Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equation”, Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018.

[5] T. Q. Chen, et al., ” Neural Ordinary Differential Equations”, Advances in Neural Information Processing Systems 31, pages 6571-6583}, 2018

Read Full Post »

In this episode I continue the conversation from the previous one, about failing machine learning models.

When data scientists have access to the distributions of training and testing datasets it becomes relatively easy to assess if a model will perform equally on both datasets. What happens with private datasets, where no access to the data can be granted?

At fitchain we might have an answer to this fundamental problem.

 

Read Full Post »

In this episode I explain the differences between L1 and L2 regularization that you can find in function minimization in basically any machine learning model.

 

Read Full Post »

Despite what researchers claim about genetic evolution, in this episode we give a realistic view of the field.

Read Full Post »

Continuing the discussion of the last two episodes, there is one more aspect of deep learning that I would love to consider and therefore left as a full episode, that is parallelising and distributing deep learning on relatively large clusters.

As a matter of fact, computing architectures are changing in a way that is encouraging parallelism more than ever before. And deep learning is no exception and despite the greatest improvements with commodity GPUs - graphical processing units, when it comes to speed, there is still room for improvement.

Together with the last two episodes, this one completes the picture of deep learning at scale. Indeed, as I mentioned in the previous episode, How to master optimisation in deep learning, the function optimizer is the horsepower of deep learning and neural networks in general. A slow and inaccurate optimisation method leads to networks that slowly converge to unreliable results.

In another episode titled “Additional strategies for optimizing deeplearning” I explained some ways to improve function minimisation and model tuning in order to get better parameters in less time. So feel free to listen to these episodes again, share them with your friends, even re-broadcast or download for your commute.

While the methods that I have explained so far represent a good starting point for prototyping a network, when you need to switch to production environments or take advantage of the most recent and advanced hardware capabilities of your GPU, well... in all those cases, you would like to do something more.  

Read Full Post »