Data Science at Home
Episodes

Tuesday Oct 01, 2019
Attacking machine learning for fun and profit (with the authors of SecML Ep. 80)
Tuesday Oct 01, 2019
Tuesday Oct 01, 2019
Join the discussion on our Discord server
As ML plays a more and more relevant role in many domains of everyday life, it’s quite obvious to see more and more attacks to ML systems. In this episode we talk about the most popular attacks against machine learning systems and some mitigations designed by researchers Ambra Demontis and Marco Melis, from the University of Cagliari (Italy). The guests are also the authors of SecML, an open-source Python library for the security evaluation of Machine Learning (ML) algorithms. Both Ambra and Marco are members of research group PRAlab, under the supervision of Prof. Fabio Roli.
SecML Contributors
Marco Melis (Ph.D Student, Project Maintainer, https://www.linkedin.com/in/melismarco/)Ambra Demontis (Postdoc, https://pralab.diee.unica.it/it/AmbraDemontis) Maura Pintor (Ph.D Student, https://it.linkedin.com/in/maura-pintor)Battista Biggio (Assistant Professor, https://pralab.diee.unica.it/it/BattistaBiggio)
References
SecML: an open-source Python library for the security evaluation of Machine Learning (ML) algorithms https://secml.gitlab.io/.
Demontis et al., “Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks,” presented at the 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 321–338. https://www.usenix.org/conference/usenixsecurity19/presentation/demontis
W. Koh and P. Liang, “Understanding Black-box Predictions via Influence Functions,” in International Conference on Machine Learning (ICML), 2017. https://arxiv.org/abs/1703.04730
Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli, “Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 751–759. https://arxiv.org/abs/1708.06939
Biggio and F. Roli, “Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning,” Pattern Recognition, vol. 84, pp. 317–331, 2018. https://arxiv.org/abs/1712.03141
Biggio et al., “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Part III, 2013, vol. 8190, pp. 387–402. https://arxiv.org/abs/1708.06131
Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in 29th Int’l Conf. on Machine Learning, 2012, pp. 1807–1814. https://arxiv.org/abs/1206.6389
Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Seattle, 2004, pp. 99–108. https://dl.acm.org/citation.cfm?id=1014066
Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. "Axiomatic attribution for deep networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. https://arxiv.org/abs/1703.01365
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Model-agnostic interpretability of machine learning." arXiv preprint arXiv:1606.05386 (2016). https://arxiv.org/abs/1606.05386
Guo, Wenbo, et al. "Lemna: Explaining deep learning based security applications." Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2018. https://dl.acm.org/citation.cfm?id=3243792
Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): E0130140. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140
![[RB] How to scale AI in your organisation (Ep. 79)](https://pbcdn1.podbean.com/imglogo/ep-logo/pbblog1799802/data_science_at_home_podcast_cover_300x300.png)
Thursday Sep 26, 2019
[RB] How to scale AI in your organisation (Ep. 79)
Thursday Sep 26, 2019
Thursday Sep 26, 2019
Join the discussion on our Discord server
Scaling technology and business processes are not equal. Since the beginning of the enterprise technology, scaling software has been a difficult task to get right inside large organisations. When it comes to Artificial Intelligence and Machine Learning, it becomes vastly more complicated.
In this episode I propose a framework - in five pillars - for the business side of artificial intelligence.

Monday Sep 23, 2019
Monday Sep 23, 2019
Join the discussion on our Discord server
In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published.
We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI. Moreover, Aaron provides some very interesting links and demos that will blow your mind!
Enjoy the show!
References
Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron)
Pix2Pix:
https://phillipi.github.io/pix2pix/
CycleGAN:
https://junyanz.github.io/CycleGAN/
GANimorph
Paper: https://arxiv.org/abs/1808.04325
Code: https://github.com/brownvc/ganimorph
UNIT:https://arxiv.org/abs/1703.00848
MUNIT:https://github.com/NVlabs/MUNIT
DRIT: https://github.com/HsinYingLee/DRIT
GPT-2 and related
Try OpenAI's GPT-2: https://talktotransformer.com/
Blogpost: https://blog.usejournal.com/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc
The Original Transformer Paper: https://arxiv.org/abs/1706.03762
Grover: The FakeNews generator and detector: https://rowanzellers.com/grover/
![Training neural networks faster without GPU [RB] (Ep. 77)](https://pbcdn1.podbean.com/imglogo/ep-logo/pbblog1799802/data_science_at_home_podcast_cover_300x300.png)
Tuesday Sep 17, 2019
Training neural networks faster without GPU [RB] (Ep. 77)
Tuesday Sep 17, 2019
Tuesday Sep 17, 2019
Join the discussion on our Discord server
Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.
Enjoy the show!
References
Faster Neural Network Training with Data Echoinghttps://arxiv.org/abs/1907.05550

Friday Sep 06, 2019
How to generate very large images with GANs (Ep. 76)
Friday Sep 06, 2019
Friday Sep 06, 2019
Join the discussion on our Discord server
In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs. The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images.Enjoy the show!
References
Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376
![[RB] Complex video analysis made easy with Videoflow (Ep. 75)](https://pbcdn1.podbean.com/imglogo/ep-logo/pbblog1799802/data_science_at_home_podcast_cover_300x300.png)
Thursday Aug 29, 2019
[RB] Complex video analysis made easy with Videoflow (Ep. 75)
Thursday Aug 29, 2019
Thursday Aug 29, 2019
In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment.
I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github
References
Videflow Github official repository https://github.com/videoflow/videoflow
![[RB] Validate neural networks without data with Dr. Charles Martin (Ep. 74)](https://pbcdn1.podbean.com/imglogo/ep-logo/pbblog1799802/data_science_at_home_podcast_cover_300x300.png)
Tuesday Aug 27, 2019
[RB] Validate neural networks without data with Dr. Charles Martin (Ep. 74)
Tuesday Aug 27, 2019
Tuesday Aug 27, 2019
In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work.
The questions that Charles answers in the show are essentially two:
Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?
How can we dominate DNN in a theoretically principled way?
References
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks https://github.com/CalculatedContent/WeightWatcher
Slack channel https://weightwatcherai.slack.com/
Dr. Charles Martin Blog http://calculatedcontent.com and channel https://www.youtube.com/c/calculationconsulting
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning - Charles H. Martin, Michael W. Mahoney

Wednesday Aug 21, 2019
How to cluster tabular data with Markov Clustering (Ep. 73)
Wednesday Aug 21, 2019
Wednesday Aug 21, 2019
In this episode I explain how a community detection algorithm known as Markov clustering can be constructed by combining simple concepts like random walks, graphs, similarity matrix. Moreover, I highlight how one can build a similarity graph and then run a community detection algorithm on such graph to find clusters in tabular data.
You can find a simple hands-on code snippet to play with on the Amethix Blog
Enjoy the show!
References
[1] S. Fortunato, “Community detection in graphs”, Physics Reports, volume 486, issues 3-5, pages 75-174, February 2010.
[2] Z. Yang, et al., “A Comparative Analysis of Community Detection Algorithms on Artificial Networks”, Scientific Reports volume 6, Article number: 30750 (2016)
[3] S. Dongen, “A cluster algorithm for graphs”, Technical Report, CWI (Centre for Mathematics and Computer Science) Amsterdam, The Netherlands, 2000.
[4] A. J. Enright, et al., “An efficient algorithm for large-scale detection of protein families”, Nucleic Acids Research, volume 30, issue 7, pages 1575-1584, 2002.

Tuesday Aug 06, 2019
Training neural networks faster without GPU (Ep. 71)
Tuesday Aug 06, 2019
Tuesday Aug 06, 2019
Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.
Enjoy the show!
References
Faster Neural Network Training with Data Echoinghttps://arxiv.org/abs/1907.05550

Tuesday Jul 23, 2019
Validate neural networks without data with Dr. Charles Martin (Ep. 70)
Tuesday Jul 23, 2019
Tuesday Jul 23, 2019
In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work.
The questions that Charles answers in the show are essentially two:
Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?
How can we dominate DNN in a theoretically principled way?
References
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks https://github.com/CalculatedContent/WeightWatcher
Slack channel https://weightwatcherai.slack.com/
Dr. Charles Martin Blog http://calculatedcontent.com and channel https://www.youtube.com/c/calculationconsulting
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning - Charles H. Martin, Michael W. Mahoney