data science | Page 15 | Data Science at Home

Episodes

Tuesday Aug 28, 2018

Episode 45: why do machine learning models fail?

Tuesday Aug 28, 2018

The success of a machine learning model depends on several factors and events. True generalization to data that the model has never seen before is more a chimera than a reality. But under specific conditions a well trained machine learning model can generalize well and perform with testing accuracy that is similar to the one performed during training.
In this episode I explain when and why machine learning models fail from training to testing datasets.

Tuesday Aug 21, 2018

Episode 44: The predictive power of metadata

Tuesday Aug 21, 2018

In this episode I don't talk about data. In fact, I talk about metadata.
While many machine learning models rely on certain amounts of data eg. text, images, audio and video, it has been proved how powerful is the signal carried by metadata, that is all data that is invisible to the end user.Behind a tweet of 140 characters there are more than 140 fields of data that draw a much more detailed profile of the sender and the content she is producing... without ever considering the tweet itself.

References You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information https://www.ucl.ac.uk/~ucfamus/papers/icwsm18.pdf

Tuesday Aug 14, 2018

Episode 43: Applied Text Analysis with Python (interview with Rebecca Bilbro)

Tuesday Aug 14, 2018

Today’s episode is about text analysis with python. Python is the de facto standard in machine learning. A large community, a generous choice in the set of libraries, at the price of less performant tasks, sometimes. But overall a decent language for typical data science tasks.
I am with Rebecca Bilbro, co-author of Applied Text Analysis with Python, with Benjamin Bengfort and Tony Ojeda.
We speak about the evolution of applied text analysis, tools and pipelines, chatbots.

Tuesday Jul 31, 2018

Episode 41: How can deep neural networks reason

Tuesday Jul 31, 2018

Today’s episode will be about deep learning and reasoning. There has been a lot of discussion about the effectiveness of deep learning models and their capability to generalize, not only across domains but also on data that such models have never seen.
But there is a research group from the Department of Computer Science, Duke University that seems to be on something with deep learning and interpretability in computer vision.

References
Prediction Analysis Lab Duke University https://users.cs.duke.edu/~cynthia/lab.html
This looks like that: deep learning for interpretable image recognition https://arxiv.org/abs/1806.10574

Thursday Jul 19, 2018

Episode 39: What is L1-norm and L2-norm?

Thursday Jul 19, 2018

In this episode I explain the differences between L1 and L2 regularization that you can find in function minimization in basically any machine learning model.

Tuesday Jul 17, 2018

Episode 38: Collective intelligence (Part 2)

Tuesday Jul 17, 2018

In the second part of this episode I am interviewing Johannes Castner from CollectiWise, a platform for collective intelligence. I am moving the conversation towards the more practical aspects of the project, asking about the centralised AGI and blockchain components that are essential part of the platform.

References
Opencog.orgThaler, Richard H., Sunstein, Cass R. and Balz, John P. (April 2, 2010). "Choice Architecture". doi:10.2139/ssrn.1583509. SSRN 1583509 
Teschner, F., Rothschild, D. & Gimpel, H. Group Decis Negot (2017) 26: 953. https://doi.org/10.1007/s10726-017-9531-0
Firas Khatib, Frank DiMaio, Foldit Contenders Group, Foldit Void Crushers Group, Seth Cooper, Maciej Kazmierczyk, Miroslaw Gilski, Szymon Krzywda, Helena Zabranska, Iva Pichova, James Thompson, Zoran Popović, Mariusz Jaskolski & David Baker, Crystal structure of a monomeric retroviral protease solved by protein folding game players, Nature Structural & Molecular Biology volume18, pages1175–1177 (2011)
Rosenthal, Franz; Dawood, Nessim Yosef David (1969). The Muqaddimah : an introduction to history ; in three volumes. 1. Princeton University Press. ISBN 0-691-01754-9.
Kevin J. Boudreau and Karim R. Lakhani, Using the Crowd as an Innovation Partner, April 2013.
Sam Bowles, The Moral Economy: Why Good Incentives are No Substitute for Good Citizens.Amartya K. Sen, Rational Fools: A Critique of the Behavioral Foundations of Economic Theory, Philosophy & Public Affairs, Vol. 6, No. 4 (Summer, 1977), pp. 317-344, Published by: Wiley, Stable URL: http://www.jstor.org/stable/2264946

Tuesday Jul 03, 2018

Episode 36: The dangers of machine learning and medicine

Tuesday Jul 03, 2018

Humans seem to have reached a cross-point, where they are asked to choose between functionality and privacy. But not both. Not both at all. No data, no service. That’s what companies building personal finance services say. The same applies to marketing companies, social media companies, search engine companies, and healthcare institutions.
In this episode I speak about the reasons to aggregate data for precision medicine, the consequences of such strategies and how can researchers and organizations provide services to individuals while respecting their privacy.

Friday Jun 29, 2018

Episode 35: Attacking deep learning models

Friday Jun 29, 2018

Attacking deep learning models
Compromising AI for fun and profit

Deep learning models have shown very promising results in computer vision and sound recognition. As more and more deep learning based systems get integrated in disparate domains, they will keep affecting the life of people. Autonomous vehicles, medical imaging and banking applications, surveillance cameras and drones, digital assistants, are only a few real applications where deep learning plays a fundamental role. A malfunction in any of these applications will affect the quality of such integrated systems and compromise the security of the individuals who directly or indirectly use them.
In this episode, we explain how machine learning models can be attacked and what we can do to protect intelligent systems from being compromised.

Friday Jun 22, 2018

Episode 34: Get ready for AI winter

Friday Jun 22, 2018

Today I am having a conversation with Filip Piękniewski, researcher working on computer vision and AI at Koh Young Research America. His adventure with AI started in the 90s and since then a long list of experiences at the intersection of computer science and physics, led him to the conclusion that deep learning might not be sufficient nor appropriate to solve the problem of intelligence, specifically artificial intelligence. I read some of his publications and got familiar with some of his ideas. Honestly, I have been attracted by the fact that Filip does not buy the hype around AI and deep learning in particular. He doesn’t seem to share the vision of folks like Elon Musk who claimed that we are going to see an exponential improvement in self driving cars among other things (he actually said that before a Tesla drove over a pedestrian).

Monday Jun 11, 2018

Episode 33: Decentralized Machine Learning and the proof-of-train

Monday Jun 11, 2018

In the attempt of democratizing machine learning, data scientists should have the possibility to train their models on data they do not necessarily own, nor see. A model that is privately trained should be verified and uniquely identified across its entire life cycle, from its random initialization to setting the optimal values of its parameters.How does blockchain allow all this? Fitchain is the decentralized machine learning platform that provides models an identity and a certification of their training procedure, the proof-of-train