Artificial Intelligence | Page 12 | Data Science at Home

Episodes

Wednesday May 20, 2020

Compressing deep learning models: distillation (Ep.104)

Wednesday May 20, 2020

Using large deep learning models on limited hardware or edge devices is definitely prohibitive. There are methods to compress large models by orders of magnitude and maintain similar accuracy during inference.
In this episode I explain one of the first methods: knowledge distillation
Come join us on Slack
Reference
Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531
Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks https://arxiv.org/abs/2004.05937

Wednesday Apr 01, 2020

Activate deep learning neurons faster with Dynamic RELU (ep. 101)

Wednesday Apr 01, 2020

In this episode I briefly explain the concept behind activation functions in deep learning. One of the most widely used activation function is the rectified linear unit (ReLU). While there are several flavors of ReLU in the literature, in this episode I speak about a very interesting approach that keeps computational complexity low while improving performance quite consistently.
This episode is supported by pryml.io. At pryml we let companies share confidential data. Visit our website.
Don't forget to join us on discord channel to propose new episode or discuss the previous ones.
References
Dynamic ReLU https://arxiv.org/abs/2003.10027

Monday Mar 23, 2020

WARNING!! Neural networks can memorize secrets (ep. 100)

Monday Mar 23, 2020

One of the best features of neural networks and machine learning models is to memorize patterns from training data and apply those to unseen observations. That's where the magic is. However, there are scenarios in which the same machine learning models learn patterns so well such that they can disclose some of the data they have been trained on. This phenomenon goes under the name of unintended memorization and it is extremely dangerous.
Think about a language generator that discloses the passwords, the credit card numbers and the social security numbers of the records it has been trained on. Or more generally, think about a synthetic data generator that can disclose the training data it is trying to protect.
In this episode I explain why unintended memorization is a real problem in machine learning. Except for differentially private training there is no other way to mitigate such a problem in realistic conditions.At Pryml we are very aware of this. Which is why we have been developing a synthetic data generation technology that is not affected by such an issue.

This episode is supported by Harmonizely. Harmonizely lets you build your own unique scheduling page based on your availability so you can start scheduling meetings in just a couple minutes.Get started by connecting your online calendar and configuring your meeting preferences.Then, start sharing your scheduling page with your invitees!

References
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networkshttps://www.usenix.org/conference/usenixsecurity19/presentation/carlini

Friday Feb 07, 2020

A big welcome to Pryml: faster machine learning applications to production (Ep. 94)

Friday Feb 07, 2020

Why so much silence? Building a company! That's why :) I am building pryml, a platform that allows data scientists build their applications on data they cannot get access to. This is the first of a series of episodes in which I will speak about the technology and the challenges we are facing while we build it.
Happy listening and stay tuned!

Tuesday Dec 31, 2019

It's cold outside. Let's speak about AI winter (Ep. 93)

Tuesday Dec 31, 2019

In the last episode of 2019 I speak with Filip Piekniewski about some of the most worth noting findings in AI and machine learning in 2019. As a matter of fact, the entire field of AI has been inflated by hype and claims that are hard to believe. A lot of the promises made a few years ago have revealed quite hard to achieve, if not impossible. Let's stay grounded and realistic on the potential of this amazing field of research, not to bring disillusion in the near future.
Join us to our Discord channel to discuss your favorite episode and propose new ones. I would like to thank all of you for supporting and inspiring us. I wish you a wonderful 2020!Francesco and the team of Data Science at Home

Saturday Dec 28, 2019

The dark side of AI: bias in the machine (Ep. 92)

Saturday Dec 28, 2019

This is the fourth and last episode of mini series "The dark side of AI". I am your host Francesco and I’m with Chiara Tonini from London. The title of today’s episode is Bias in the machine

C: Francesco, today we are starting with an infuriating discussion. Are you ready to be angry?

F: yeah sure is this about brexit? No, I don’t talk about that. In 1986 the New York City’s Rockefeller University conducted a study on breast and uterine cancers and their link to obesity. Like in all clinical trials up to that point, the subjects of the study were all men. So Francesco, do you see a problem with this approach?

F: No problem at all, as long as those men had a perfectly healthy uterus.In medicine, up to the end of the 20th century, medical studies and clinical trials were conducted on men, medicine dosage and therapy calculated on men (white men). The female body has historically been considered an exception, or variation, from a male body.

F: Like Eve coming from Adam’s rib. I thought we were past that...When the female body has been under analysis, the focus was on the difference between it and the male body, the so-called “bikini approach”: the reproductive organs are different, therefore we study those, and those only. For a long time medicine assumed this was the only difference.

Oh good ...This has led to a hugely harmful fallout across society. Because women had reproductive organs, they should reproduce, and all else about them was deemed uninteresting. Still today, they consider a woman without children somehow to have betrayed her biological destiny. This somehow does not apply to a man without children, who also has reproductive organs.

F: so this is an example of a very specific type of bias in medicine, regarding clinical trials and medical studies, that is not only harmful for the purposes of these studies, but has ripple effects in all of societyOnly in the 2010 a serious conversation has started about the damage caused by not including women in clinical trials. There are many many examples (which we list in the references for this episode).

Give me oneResearchers consider cardiovascular disease a male disease - they even call it “the widower”. They conduct studies on male samples. But it turns out, the symptoms of a heart attack, especially the ones leading up to one, are different in women. This led to doctors not recognising or dismissing the early symptoms in women.

F: I was reading that women are also subject to chronic pain much more than men: for example migraines, and pain related to endometriosis. But there is extensive evidence now of doctors dismissing women’s pain, as either imaginary, or “inevitable”, like it is a normal state of being and does not need a cure at all.

The failure of the medical community as a whole to recognise this obvious bias up to the 21st century is an example of how insidious the problem of bias is.

There are 3 fundamental types of bias:

One: Stochastic drift: you train your model on a dataset, and you validate the model on a split of the training set. When you apply your model out in the world, you systematically add bias in the predictions due to the training data being too specific
Two: The bias in the model, introduced by your choice of the parameters of your model.
Three: The bias in your training sample: people put training samples together, and people have culture, experience, and prejudice. As we will see today, this is the most dangerous and subtle bias. Today we’ll talk about this bias.

Bias is a warping of our understanding of reality. We see reality through the lens of our experience and our culture. The origin of bias can date back to traditions going back centuries, and is so ingrained in our way of thinking, that we don’t even see it anymore.

F: And let me add, when it comes to machine learning, we see reality through the lens of data. Bias is everywhere, and we could spend hours and hours talking about it. It’s complicated.

It’s about to become more complicated.

F: of course, if I know you…Let’s throw artificial intelligence in the mix.

F: You know, there was a happier time when this sentence didn’t fill me with a sense of dread... ImageNet is an online database of over 14 million photos, compiled more than a decade ago at Stanford University. They used it to train machine learning algorithms for image recognition and computer vision, and played an important role in the rise of deep learning. We’ve all played with it, right? The cats and dogs classifier when learning Tensorflow? (I am a dog by the way. )

F: ImageNet has been a critical asset for computer-vision research. There was an annual international competition to create algorithms that could most accurately label subsets of images. In 2012, a team from the University of Toronto used a Convolutional Neural Network to handily win the top prize. That moment is widely considered a turning point in the development of contemporary AI. The final year of the ImageNet competition was 2017, and accuracy in classifying objects in the limited subset had risen from 71% to 97%. But that subset did not include the “Person” category, where the accuracy was much lower... ImageNet contained photos of thousands of people, with labels. This included straightforward tags like “teacher,” “dancer” and “plumber”, as well as highly charged labels like “failure, loser” and “slut, slovenly woman, trollop.”

F: Uh Oh. Then “ImageNet Roulette” was created, by an artist called Trevor Paglen and a Microsoft researcher named Kate Crawford. It was a digital art project, where you could upload your photo and let the classifier identify you, based on the labels of the database. Imagine how well that went.

F: I bet it did’t workOf course it didn’t work. Random people were classified as “orphans” or “non-smoker” or “alcoholic”. Somebody with glasses was a “nerd”. Tabong Kima, a 24-year old African American, was classified as “offender” and “wrongdoer”.

F: and there it is. Quote from Trevor Paglen: “We want to show how layers of bias and racism and misogyny move from one system to the next. The point is to let people see the work that is being done behind the scenes, to see how we are being processed and categorized all the time.”

F: The ImageNet labels were applied by thousands of unknown people, most likely in the United States, hired by the team from Stanford, and working through the crowdsourcing service Amazon Mechanical Turk. They earned pennies for each photo they labeled, churning through hundreds of labels an hour. The labels were not verified in any way : if a labeler thought someone looks “shady”, this label is just a result of their prejudice, but has no basis in reality.As they did, biases were baked into the database. Paglen quote again: “The way we classify images is a product of our worldview,” he said. “Any kind of classification system is always going to reflect the values of the person doing the classifying.” They defined what a “loser” looked like. And a “slut.” And a “wrongdoer.”

F: The labels originally came from another sprawling collection of data called WordNet, a kind of conceptual dictionary for machines built by researchers at Princeton University in the 1980s. But with these inflammatory labels included, the Stanford researchers may not have realized what they were doing.What is happening here is the transferring of bias from one system to the next.

Tech jobs, in past decades but still today, predominantly go to white males from a narrow social class. Inevitably, they imprint the technology with their worldview. So their algorithms learn that a person of color is a criminal, and a woman with a certain look is a slut.

I’m not saying they do it on purpose, but the lack of diversity in the tech industry translates into a narrower world view, which has real consequences in the quality of AI systems.

F: Diversity in tech teams is often framed as an equality issue (which of course it is), but there are enormous advantages in it: it allows to create that cognitive diversity that will reflect into superior products or services. I believe this is an ongoing problem. In recent months, researchers have shown that face-recognition services from companies like Amazon, Microsoft and IBM can be biased against women and people of color.

Crawford and Paglen argue this: “In many narratives around AI it is assumed that ongoing technical improvements will resolve all problems and limitations. But what if the opposite is true? What if the challenge of getting computers to “describe what they see” will always be a problem? The automated interpretation of images is an inherently social and political project, rather than a purely technical one. Understanding the politics within AI systems matters more than ever, as they are quickly moving into the architecture of social institutions: deciding whom to interview for a job, which students are paying attention in class, which suspects to arrest, and much else.”

F: You are using the words “interpretation of images” here, as opposed to “description” or “classification”. Certain images depict something concrete, with an objective reality. Like an apple. But other images… not so much?

ImageNet contain images only corresponding to nouns (not verbs for example). Noun categories such as “apple” are well defined. But not all nouns are created equal. Linguist George Lakoff points out that the concept of an “apple” is more nouny than the concept of “light”, which in turn is more nouny than a concept such as “health.” Nouns occupy various places on an axis from concrete to abstract, and from descriptive to judgmental. The images corresponding to these nouns become more and more ambiguous.These gradients have been erased in the logic of ImageNet. Everything is flattened out and pinned to a label. The results can be problematic, illogical, and cruel, especially when it comes to labels applied to people.

F: so when an image is interpreted as Drug Addict, Crazy, Hypocrite, Spinster, Schizophrenic, Mulatto, Red Neck… this is not an objective description of reality, it’s somebody’s worldview coming to the surface. The selection of images for these categories skews the meaning in ways that are gendered, racialized, ableist, and ageist. ImageNet is an object lesson in what happens when people are categorized like objects. And this practice has only become more common in recent years, often inside the big AI companies, where there is no way for outsiders to see how images are being ordered and classified.

The bizarre thing about these systems is that they remind of early 20th century criminologists like Lombroso, or phrenologists (including Nazi scientists), and physiognomy in general. This was a discipline founded on the assumption that there is a relationship between an image of a person and the character of that person. If you are a murderer, or a Jew, the shape of your head for instance will tell.

F: In reaction to these ideas, Rene’ Magritte produced that famous painting of the pipe with the tag “This is not a pipe”.

You know that famous photograph of the soldier kissing the nurse at the end of the second world war? The nurse came public about it when she was like 90 years old, and told how this total stranger in the street had grabbed her and kissed her. This is a picture of sexual harassment. And knowing that, it does not seem romantic anymore.

F: not romantic at all indeedImages do not describe themselves. This is a feature that artists have explored for centuries. We see those images differently when we see how they’re labeled. The correspondence between image, label, and referent is fluid. What’s more, those relations can change over time as the cultural context of an image shifts, and can mean different things depending on who looks, and where they are located. Images are open to interpretation and reinterpretation. Entire subfields of philosophy, art history, and media theory are dedicated to teasing out all the nuances of the unstable relationship between images and meanings.The common mythos of AI and the data it draws on, is that they are objectively and scientifically classifying the world. But it’s not true, everywhere there is politics, ideology, prejudices, and all of the subjective stuff of history.

F: When we survey the most widely used training sets, we find that this is the rule rather than the exception.Training sets are the foundation on which contemporary machine-learning systems are built. They are central to how AI systems recognize and interpret the world.By looking at the construction of these training sets and their underlying structures, we discover many unquestioned assumptions that are shaky and skewed. These assumptions inform the way AI systems work—and fail—to this day.And the impenetrability of the algorithms, the impossibility of reconstructing the decision-making of a NN, hides the bias further away from scrutiny. When an algorithm is a black box and you can’t look inside, you have no way of analysing its bias.

And the skewness and bias of these algorithms have real effects in society, the more you use AI in the judicial system, in medicine, the job market, in security systems based on facial recognition, the list goes on and on.

Last year Google unveiled BERT (Bidirectional Encoder Representations from Transformers). It’s an AI system that learns to talk: it’s a Natural Language Processing engine to generate written (or spoken) language.

F: we have an episode in which we explain all that

They trained it from lots and lots of digitized information, as varied as old books, Wikipedia entries and news articles. They baked decades and even centuries of biases — along with a few new ones — into all that material. So for instance BERT is extremely sexist: it associates with male almost all professions and positive attributes (except for “mom”).

BERT is widely used in industry and academia. For example it can interpret news headlines automatically. Even Google’s search engine use it.

Try googling “CEO”, and you get out a gallery of images of old white men.

F: such a pervasive and flawed AI system can propagate inequality at scale. And it’s super dangerous because it’s subtle. Especially in industry, query results will not be tested and examined for bias. AI is a black box and researchers take results at face value.

There are many cases of algorithm-based discrimination in the job market. Targeting candidates for tech jobs for instance, may be done by algorithms that will not recognise women as potential candidates. Therefore, they will not be exposed to as many job ads as men. Or, automated HR systems will rank them lower (for the same CV) and screen them out.

In the US, algorithms are used to calculate bail. The majority of the prison population in the US is composed of people of colour, as a result of a systemic bias that goes back centuries. An algorithm learns that a person of colour is more likely to commit a crime, is more likely to not be able to afford bail, is more likely to violate parole. Therefore, people of colour will receive harsher punishments for the same crime. This amplifies this inequality at scale.

Conclusion

Question everything, never take predictions of your models at face value. Always question how your training samples have been put together, who put them together, when and in what context. Always remember that your model produces an interpretation of reality, not a faithful depiction. Treat reality responsibly.

Tuesday Dec 03, 2019

The dark side of AI: social media and the optimization of addiction (Ep. 89)

Tuesday Dec 03, 2019

Chamath Palihapitiya, former Vice President of User Growth at Facebook, was giving a talk at Stanford University, when he said this: “I feel tremendous guilt. The short-term, dopamine-driven feedback loops that we have created are destroying how society works ”.
He was referring to how social media platforms leverage our neurological build-up in the same way slot machines and cocaine do, to keep us using their products as much as possible. They turn us into addicts.

F: how many times do you check your Facebook in a day?
I am not a fan of Facebook. I do not have it on my phone. Still, I check it in the morning on my laptop, and maybe twice more per day. I have a trick though: I do not scroll down. I only check the top bar to see if someone has invited me to an event, or contacted me directly. But from time to time, this resolution of mine slips, and I catch myself scrolling down, without even realising it!

F: is it the first thing you check when you wake up?
No because usually I have a message from you!! :) But yes, while I have my coffee I do a sweep on Facebook and twitter and maybe Instagram, plus the news.

F: Check how much time you spend on Facebook
And then sum it up to your email, twitter, reddit, youtube, instagram, etc. (all viable channels for ads to reach you)
We have an answer. More on that later. Clearly in this episode there is some form of addiction we would like to talk about. So let’s start from the beginning: how does addiction work?
Dopamine is a hormone produced by our body, and in the brain it works as a neurotransmitter, a chemical that neurons use to transmit signals to each other. One of the main functions of dopamine is to shape the “reward-motivated behaviour”: this is the way our brain learns through association, positive reinforcement, incentives, and positively-valenced emotions, in particular, pleasure. In other words, it makes our brain desire more of the things that make us feel good. These things can be for example good food, sex, and crucially, good social interactions, like hugging your friends or your baby, or having a laugh together. Because we are evolved to be social animals with complex social structures, successful social interactions are an evolutionary advantage, and therefore they trigger dopamine release in our brain, which makes us feel good, and reinforces the association between the action and the reward. This feeling motivates us to repeat the behaviour.

F: now that you mention reinforcement, I recall that this mechanism is so powerful and effective that in fact we have been inspired by nature and replicated it in-silico with reinforcement learning. The idea is to motivate (and eventually create an addictive pattern) an agent to follow what is called the optimal policy by giving it positive rewards or punishing it when things don’t go the way we planned.
In our brain, every time an action produces a reward, the connection between action and reward becomes stronger. Through reinforcement, a baby learns to distinguish a cat from a dog, or that fire hurts (that was me).

F: and so this means that all the social interactions people get from social media platforms are in fact doing the same, right?
Yes, but with a difference: smartphones in our pockets keep us connected to an unlimited reserve of constant social interactions. This constant flux of notifications - the rewards - flood our brain with dopamine. The mechanism of reinforcement can spin out of control. The reward pathways in our brain can malfunction, and this leads to addiction.

F: you are saying that social media has LITERALLY the effect of a drug?
Yes. In fact, social media platforms are DESIGNED to exploit the rewards systems in our brain. They are designed to work like a drug. Have you been to a casino and played roulette or the slot machines?

F: ...maybe?
Why is it fun to play roulette? The fun comes from the WAIT before the reward. You put a chip on a number, you don’t know how it’s going to go. You wait for the ball to spin, you get excited. And from time to time, BAM! Your number comes out. Now, compare this with posting something on facebook. You write a message into the void, wait…. And then the LIKES start coming in.

F: yeah i find that familiar...
Contrary to the casino, social media platforms do not want our money, in fact they are free. What they want is, and what we are buying into with, is our time. Because the longer we stay on, the longer they can show us ads, and the more money advertisers can pay them. This is no accident, this is the business model. But asking for our time out loud would not work, we would probably not consciously give it to them. So, like a casino, they make it hard for us to get off, once we are on: they make us crave the likes, the right-swipes, the retweets, the subscriptions. So we check in, we stay on, we keep scrolling, because we hope to get those rewards. The short-term satisfaction of getting a “like” is a little boost of dopamine in our brain. We get used to it, and we want more.

F: a lot of machine learning is also being deployed to amplify this form of addiction and make it.... Well more addictive :) But the question is: how such powerful ads and scenarios are so effective because of the algorithms and how much just because humans are just wired to obey such dynamics? My question is: are we essentially flawed or are these algorithms truly powerful?
It is not a flaw, it’s a feature. The way our brain has evolved has been in response to very specific needs. In particular for this conversation, our brain is wired to favour social interactions, because it is an evolutionary advantage. These algorithms exploit these features of the brain on purpose, they are designed to exploit them.

F: I believe so, but I also believe that the human brain is a powerful machine, so it should be able to predict what satisfaction it can get from social media. So how does it happen that we become addicted?
An example of optimisation strategy that social media platforms use is based on the principle of “reward prediction error coding”. Our brain learns to find patterns in data - this is a basic survival skill - and therefore learns when to expect a reward for a given set of actions. I eat cake, therefore I am happy. Every time. Imagine a scenario, where we have learnt through experience that when we play slot machines in a casino, we learn that we win some money once every 100 times we pull the lever. The difference between predicted and received rewards is a known, fixed quantity. If so, just after winning once, we have almost zero incentive to play again. So the casino fixes the slot machines, to introduce a random element to the timing of the reward. Suddenly our prediction error increases substantially. In this margin of error, in the time between the action (pull the lever) and the reward (maybe) our brain has time to make us anticipate the result and make us excited at the possibility, and this releases dopamine. Playing in itself becomes a reward. F: There is an equivalent in reinforcement learning called the grid world which consists in a mouse getting to the cheese in a maze. In reinforcement learning, everything works smooth as long as the cheese stays in the same place.
Exactly! Now social media apps implement an equivalent trick, called “variable reward schedules”.
In our brain, after an action we get a reward or punishment, and we generate positive or negative feedback to that action. Social media apps optimise their algorithms for the ideal balance of negative and positive feedback in our brains caused by the difference between these predicted and received rewards.
If we perceive a reward to be delivered at random, and - crucially - if checking for the reward comes at little cost, like opening the Facebook app, we end up checking for rewards all the time. Every time we are just a little bit bored, without even thinking, we check the app. The Facebook reward system (the schedule and triggers of notification and likes) has been optimised to maximise this behaviour.

F: are you saying that buffering some likes and then finding the right moment to show them to the user can make the user crave for reward?
Oh yes. Instagram will withhold likes for a period of time, causing a dip in reward compared to the expected level. It will then deliver them later in larger bundles, thus boosting the reward above the expected value, which trigger extra dopamine release, which sends us on a high akin to a cocaine hit.

F: Dear audience, do you remember my question? How much time do each of you spend on social media (or similar) in a day? And why do we still do it?
The fundamental feature here is how little is the perceived cost to check for the reward: I just need to open the app. We perceive this cost to be minimal, so we don’t even think about it. YouTube for instance had the autoplay feature, so you need to do absolutely nothing to remain on the app. But the cost is cumulative over time, it becomes hours in our day, days in a month, years in our lives!! 2 hours of social media per day amounts to 1 month per year.

F: But it’s so EASY, it has become so natural to use social media for everything. To use Google for everything.
The convenience that the platforms give us is one of the most dangerous things about them, and not only for our individual life. The convenience of reaching so many users, together with the business model of monetising attention is one of the causes of the centralisation of the internet, i.e. the fact a few giant platforms control most of the internet traffic. Revenue from ads is concentrated on big platforms, and content creators have no other choice but to use them, if they want to be competitive. The internet went from looking like a distributed network to a centralised network. And this in turn causes data to be centralised, in a self-reinforcing loop. Most of human conversations and interactions pass through the servers of a handful of private corporations.
Conclusion
As Data scientists we should be aware of this (and we think mostly we are). We should also be ethically responsible. I think that being a data scientist no longer has a neutral connotation. Algorithms have this huge power of manipulating human behaviour, and let’s be honest, we are the only ones who really understand how they work. So we have a responsibility here.
There are some organisations, like Data For Democracy for example, who are advocating for something equivalent to the Hippocratic Oath for data scientists. Do no harm.

References
Dopamine reward prediction error coding https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4826767/
Skinner - Operant Conditioning https://www.simplypsychology.org/operant-conditioning.html
Dopamine, Smartphones & You: A battle for your time http://sitn.hms.harvard.edu/flash/2018/dopamine-smartphones-battle-time/
Reward system https://en.wikipedia.org/wiki/Reward_system
Data for democracy datafordemocracy.org

Wednesday Nov 27, 2019

More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Wednesday Nov 27, 2019

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server

References
Attention is all you need https://arxiv.org/abs/1706.03762
The illustrated transformer https://jalammar.github.io/illustrated-transformer
Self-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf

Monday Nov 18, 2019

How to improve the stability of training a GAN (Ep. 88)

Monday Nov 18, 2019

Generative Adversarial Networks or GANs are very powerful tools to generate data. However, training a GAN is not easy. More specifically, GANs suffer of three major issues such as instability of the training procedure, mode collapse and vanishing gradients.

In this episode I not only explain the most challenging issues one would encounter while designing and training Generative Adversarial Networks. But also some methods and architectures to mitigate them. In addition I elucidate the three specific strategies that researchers are considering to improve the accuracy and the reliability of GANs.

The most tedious issues of GANs

Convergence to equilibrium

A typical GAN is formed by at least two networks: a generator G and a discriminator D. The generator's task is to generate samples from random noise. In turn, the discriminator has to learn to distinguish fake samples from real ones. While it is theoretically possible that generators and discriminators converge to a Nash Equilibrium (at which both networks are in their optimal state), reaching such equilibrium is not easy.

Vanishing gradients

Moreover, a very accurate discriminator would push the loss function towards lower and lower values. This in turn, might cause the gradient to vanish and the entire network to stop learning completely.

Mode collapse

Another phenomenon that is easy to observe when dealing with GANs is mode collapse. That is the incapability of the model to generate diverse samples. This in turn, leads to generated data that are more and more similar to the previous ones. Hence, the entire generated dataset would be just concentrated around a particular statistical value.

The solution

Researchers have taken into consideration several approaches to overcome such issues. They have been playing with architectural changes, different loss functions and game theory.

Listen to the full episode to know more about the most effective strategies to build GANs that are reliable and robust. Don't forget to join the conversation on our new Discord channel. See you there!

Tuesday Nov 12, 2019

What if I train a neural network with random data? (with Stanisław Jastrzębski) (Ep. 87)

Tuesday Nov 12, 2019

What happens to a neural network trained with random data?
Are massive neural networks just lookup tables or do they truly learn something?
Today’s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrzębski from New York University.
Stan spent two summers as a visiting student with Prof. Yoshua Bengio and has been working on
Understanding and improving how deep network generalise
Representation Learning
Natural Language Processing
Computer Aided Drug Design

What makes deep learning unique?
I have asked him a few questions for which I was looking for an answer for a long time. For instance, what is deep learning bringing to the table that other methods don’t or are not capable of? Stan believe that the one thing that makes deep learning special is representation learning. All the other competing methods, be it kernel machines, or random forests, do not have this capability. Moreover, optimisation (SGD) lies at the heart of representation learning in the sense that it allows finding good representations.

What really improves the training quality of a neural network?
We discussed about the accuracy of neural networks depending pretty much on how good the Stochastic Gradient Descent method is at finding minima of the loss function. What would influence such minima?Stan's answer has revealed that training set accuracy or loss value is not that interesting actually. It is relatively easy to overfit data (i.e. achieve the lowest loss possible), provided a large enough network, and a large enough computational budget. However, shape of the minima, or performance on validation sets are in a quite fascinating way influenced by optimisation. Optimisation in the beginning of the trajectory, steers such trajectory towards minima of certain properties that go much further than just training accuracy.
As always we spoke about the future of AI and the role deep learning will play.
I hope you enjoy the show!
Don't forget to join the conversation on our new Discord channel. See you there!

References

Homepage of Stanisław Jastrzębski https://kudkudak.github.io/
A Closer Look at Memorization in Deep Networks https://arxiv.org/abs/1706.05394
Three Factors Influencing Minima in SGD https://arxiv.org/abs/1711.04623
Don't Decay the Learning Rate, Increase the Batch Size https://arxiv.org/abs/1711.00489
Stiffness: A New Perspective on Generalization in Neural Networks https://arxiv.org/abs/1901.09491