Data Science at Home
GAN
Episodes
Saturday Jan 14, 2023
Accelerating Perception Development with Synthetic Data (Ep. 214)
Saturday Jan 14, 2023
Saturday Jan 14, 2023
In this episode I am with Kevin McNamara, founder and CEO of Parallel Domain. We speak about a very effective method to generate synthetic data that is currently in production at Parallel Domain.
Enjoy the show!
References
Parallel Domain Synthetic Data Improves Cyclist Detection (blog post):
https://paralleldomain.com/parallel-domain-synthetic-data-improves-cyclist-detection/
Beating the State of the Art in Object Tracking with Synthetic Data:
https://paralleldomain.com/beating-the-state-of-the-art-in-object-tracking-with-synthetic-data/
Parallel Domain Open Synthetic Dataset:
https://paralleldomain.com/open-datasets/bicycle-detection
How Toyota Research Institute Trains Better Computer Vision Models with PD Synthetic Data (interview):
https://www.youtube.com/watch?v=QIYttoVxf2w
Career Opportunities:
https://paralleldomain.com/careers
Saturday Aug 29, 2020
Testing in machine learning: generating tests and data (Ep. 117)
Saturday Aug 29, 2020
Saturday Aug 29, 2020
In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning.
On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more.If you want to meet the tribe, tune in september 15th to the live@manning rust conference.
Sunday Jul 26, 2020
GPT-3 cannot code (and never will) (Ep. 114)
Sunday Jul 26, 2020
Sunday Jul 26, 2020
The hype around GPT-3 is alarming and gives and provides us with the awful picture of people misunderstanding artificial intelligence. In response to some comments that claim GPT-3 will take developers' jobs, in this episode I express some personal opinions about the state of AI in generating source code (and in particular GPT-3).
If you have comments about this episode or just want to chat, come join us on the official Discord channel.
This episode is supported by Amethix Technologies.
Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.
Monday Nov 18, 2019
How to improve the stability of training a GAN (Ep. 88)
Monday Nov 18, 2019
Monday Nov 18, 2019
Generative Adversarial Networks or GANs are very powerful tools to generate data. However, training a GAN is not easy. More specifically, GANs suffer of three major issues such as instability of the training procedure, mode collapse and vanishing gradients.
In this episode I not only explain the most challenging issues one would encounter while designing and training Generative Adversarial Networks. But also some methods and architectures to mitigate them. In addition I elucidate the three specific strategies that researchers are considering to improve the accuracy and the reliability of GANs.
The most tedious issues of GANs
Convergence to equilibrium
A typical GAN is formed by at least two networks: a generator G and a discriminator D. The generator's task is to generate samples from random noise. In turn, the discriminator has to learn to distinguish fake samples from real ones. While it is theoretically possible that generators and discriminators converge to a Nash Equilibrium (at which both networks are in their optimal state), reaching such equilibrium is not easy.
Vanishing gradients
Moreover, a very accurate discriminator would push the loss function towards lower and lower values. This in turn, might cause the gradient to vanish and the entire network to stop learning completely.
Mode collapse
Another phenomenon that is easy to observe when dealing with GANs is mode collapse. That is the incapability of the model to generate diverse samples. This in turn, leads to generated data that are more and more similar to the previous ones. Hence, the entire generated dataset would be just concentrated around a particular statistical value.
The solution
Researchers have taken into consideration several approaches to overcome such issues. They have been playing with architectural changes, different loss functions and game theory.
Listen to the full episode to know more about the most effective strategies to build GANs that are reliable and robust. Don't forget to join the conversation on our new Discord channel. See you there!
Monday Nov 04, 2019
[RB] How to generate very large images with GANs (Ep. 85)
Monday Nov 04, 2019
Monday Nov 04, 2019
Join the discussion on our Discord server
In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs. The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images.Enjoy the show!
References
Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376