data science | Page 10 | Data Science at Home

Episodes

Friday Jun 19, 2020

Rust and machine learning #2 with Luca Palmieri (Ep. 108)

Friday Jun 19, 2020

In the second episode of Rust and Machine learning I am speaking with Luca Palmieri, who has been spending a large part of his career at the interception of machine learning and data engineering. In addition, Luca contributed to several projects closer to the machine learning community using the Rust programming language. Linfa is an ambitious project that definitely deserves the attention of the data science community (and it's written in Rust, with Python bindings! How cool??!).

References
Series Announcement - Zero to Production in Rust https://www.lpalmieri.com/posts/2020-05-10-announcement-zero-to-production-in-rust/
Zero To Production #0: Foreword https://www.lpalmieri.com/posts/2020-05-24-zero-to-production-0-foreword/
Taking ML to production with Rust: a 25x speedup https://www.lpalmieri.com/posts/2019-12-01-taking-ml-to-production-with-rust-a-25x-speedup/

Wednesday Jun 17, 2020

Rust and machine learning #1 (Ep. 107)

Wednesday Jun 17, 2020

This is the first episode of a series about the Rust programming language and the role it can play in the machine learning field.
Rust is one of the most beautiful languages I have ever studied so far. I personally come from the C programming language, though for professional activities in machine learning I had to switch to the loved and hated Python language.
This episode is clearly not providing you with an exhaustive list of the benefits of Rust, nor its capabilities. For this you can check the references and start getting familiar with what I think it's going to be the language of the next 20 years.

Sponsored
This episode is supported by Pryml Technologies. Pryml offers secure and cost effective data privacy solutions for your organisation. It generates a synthetic alternative without disclosing you confidential data.

References
The Rust Programming Language
Cookin' with Rust

Monday Jun 15, 2020

Protecting workers with artificial intelligence (with Sandeep Pandya CEO Everguard.ai)(Ep. 106)

Monday Jun 15, 2020

In this episode I have a chat with Sandeep Pandya, CEO at Everguard.ai a company that uses sensor fusion, computer vision and more to provide safer working environments to workers in heavy industry.Sandeep is a senior executive who can hide the complexity of the topic with great talent.

This episode is supported by Pryml.io Pryml is an enterprise-scale platform to synthesise data and deploy applications built on that data back to a production environment.Test ideas. Launch new products. Fast. Secure.

Friday May 08, 2020

Pandemics and the risks of collecting data (Ep. 103)

Friday May 08, 2020

Codiv-19 is an emergency. True. Let's just not prepare for another emergency about privacy violation when this one is over.

Join our new Slack channel

This episode is supported by Proton. You can check them out at protonmail.com or protonvpn.com

Sunday Apr 19, 2020

Why average can get your predictions very wrong (ep. 102)

Sunday Apr 19, 2020

Whenever people reason about probability of events, they have the tendency to consider average values between two extremes. In this episode I explain why such a way of approximating is wrong and dangerous, with a numerical example.
We are moving our community to Slack. See you there!

Wednesday Apr 01, 2020

Activate deep learning neurons faster with Dynamic RELU (ep. 101)

Wednesday Apr 01, 2020

In this episode I briefly explain the concept behind activation functions in deep learning. One of the most widely used activation function is the rectified linear unit (ReLU). While there are several flavors of ReLU in the literature, in this episode I speak about a very interesting approach that keeps computational complexity low while improving performance quite consistently.
This episode is supported by pryml.io. At pryml we let companies share confidential data. Visit our website.
Don't forget to join us on discord channel to propose new episode or discuss the previous ones.
References
Dynamic ReLU https://arxiv.org/abs/2003.10027

Monday Mar 23, 2020

WARNING!! Neural networks can memorize secrets (ep. 100)

Monday Mar 23, 2020

One of the best features of neural networks and machine learning models is to memorize patterns from training data and apply those to unseen observations. That's where the magic is. However, there are scenarios in which the same machine learning models learn patterns so well such that they can disclose some of the data they have been trained on. This phenomenon goes under the name of unintended memorization and it is extremely dangerous.
Think about a language generator that discloses the passwords, the credit card numbers and the social security numbers of the records it has been trained on. Or more generally, think about a synthetic data generator that can disclose the training data it is trying to protect.
In this episode I explain why unintended memorization is a real problem in machine learning. Except for differentially private training there is no other way to mitigate such a problem in realistic conditions.At Pryml we are very aware of this. Which is why we have been developing a synthetic data generation technology that is not affected by such an issue.

This episode is supported by Harmonizely. Harmonizely lets you build your own unique scheduling page based on your availability so you can start scheduling meetings in just a couple minutes.Get started by connecting your online calendar and configuring your meeting preferences.Then, start sharing your scheduling page with your invitees!

References
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networkshttps://www.usenix.org/conference/usenixsecurity19/presentation/carlini

Saturday Mar 14, 2020

Attacks to machine learning model: inferring ownership of training data (Ep. 99)

Saturday Mar 14, 2020

In this episode I explain a very effective technique that allows one to infer the membership of any record at hand to the (private) training dataset used to train the target model. The effectiveness of such technique is due to the fact that it works on black-box models of which there is no access to the data used for training, nor model parameters and hyperparameters. Such a scenario is very realistic and typical of machine learning as a service APIs.
This episode is supported by pryml.io, a platform I am personally working on that enables data sharing without giving up confidentiality.

As promised below is the schema of the attack explained in the episode.

References
Membership Inference Attacks Against Machine Learning Models

Sunday Mar 08, 2020

Don't be naive with data anonymization (Ep. 98)

Sunday Mar 08, 2020

Masking, obfuscating, stripping, shuffling. All the above techniques try to do one simple thing: keeping the data private while sharing it with third parties. Unfortunately, they are not the silver bullet to confidentiality. All the players in the synthetic data space rely on simplistic techniques that are not secure, might not be compliant and risky for production. At pryml we do things differently.

Sunday Mar 01, 2020

Why sharing real data is dangerous (Ep. 97)

Sunday Mar 01, 2020

There are very good reasons why a financial institution should never share their data. Actually, they should never even move their data. Ever.In this episode I explain you why.