Data Science at Home
Episodes

Friday Sep 24, 2021
Speaking about data with Mikkel Settnes from Dreamdata.io (Ep. 170)
Friday Sep 24, 2021
Friday Sep 24, 2021
In this episode Mikkel and Francesco have a really interesting conversation about some key differences between large and small organization in approaching machine learning. Listen to the episode to know more.
References
https://dreamdata.io/b2b-attribution
https://dreamdata.io/services
https://www.nature.com/articles/s43586-020-00001-2

Tuesday Sep 14, 2021
Send compute to data with POSH data-aware shell (Ep. 169)
Tuesday Sep 14, 2021
Tuesday Sep 14, 2021
Our Sponsors
Quantum Metric
Stay off the naughty list this holiday season by reducing customer friction, increasing conversions, and personalizing the shopping experience. Want a sneak peak? Visit us at quantummetric.com/podoffer and see if you qualify to receive our “12 Days of Insights” offer with code DATASCIENCE. This offer gives you 12-day access to our platform coupled with a bespoke insight report that will help you identify where customers are struggling or engaging in your digital product.
Amethix Technologies
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
References
Paper https://deeptir.me/papers/posh-atc20.pdf
Code https://github.com/deeptir18/posh

Tuesday Aug 24, 2021
CSV sucks. Here is why. (Ep. 166)
Tuesday Aug 24, 2021
Tuesday Aug 24, 2021
It's time we get serious about replacing the CSV format with something that, guess what? it has been around for so long.
In this episode I explain the good parts of CSV files and the not so good ones. It's time we evolve to something better.
Our Sponsors
Quantum Metric
Stay off the naughty list this holiday season by reducing customer friction, increasing conversions, and personalizing the shopping experience. Want a sneak peak? Visit us at quantummetric.com/podoffer and see if you qualify to receive our “12 Days of Insights” offer with code DATASCIENCE. This offer gives you 12-day access to our platform coupled with a bespoke insight report that will help you identify where customers are struggling or engaging in your digital product.
Amethix Technologies
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Thursday Jul 08, 2021
Apache Arrow, Ballista and Big Data in Rust with Andy Grove RB (Ep. 160)
Thursday Jul 08, 2021
Thursday Jul 08, 2021
Do you want to know the latest in big data analytics frameworks? Have you ever heard of Apache Arrow? Rust? Ballista? In this episode I speak with Andy Grove one of the main authors of Apache Arrow and Ballista compute engine.Andy explains some challenges while he was designing the Arrow and Ballista memory models and he describes some amazing solutions.
Our Sponsors
If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
References
https://arrow.apache.org/
https://ballistacompute.org/
https://github.com/ballista-compute/ballista

Friday Jun 04, 2021
True Machine Intelligence just like the human brain (Ep. 155)
Friday Jun 04, 2021
Friday Jun 04, 2021
In this episode I have a really interesting conversation with Karan Grewal, member of the research staff at Numenta where he investigates how biological principles of intelligence can be translated into silicon.We speak about the thousand brains theory and why neural networks forget.
References
Main paper on the Thousand Brains Theory: https://www.frontiersin.org/articles/10.3389/fncir.2018.00121/full
Blog post on Thousand Brains Theory: https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/
GLOM paper by Geoff Hinton: https://arxiv.org/pdf/2102.12627.pdf
Why neural networks forget? https://numenta.com/blog/2021/02/04/why-neural-networks-forget-and-lessons-from-the-brain

Thursday Apr 08, 2021
Polars: the fastest dataframe crate in Rust - with Ritchie Vink (Ep. 146)
Thursday Apr 08, 2021
Thursday Apr 08, 2021
In this episode I speak with Ritchie Vink, the author of Polars, a crate that is the fastest dataframe library at date of speaking :) If you want to participate to an amazing Rust open source project, this is your change to collaborate to the official repository in the references.
References
https://github.com/ritchie46/polars

Friday Mar 26, 2021
Apache Arrow, Ballista and Big Data in Rust with Andy Grove (Ep. 145)
Friday Mar 26, 2021
Friday Mar 26, 2021
Do you want to know the latest in big data analytics frameworks? Have you ever heard of Apache Arrow? Rust? Ballista? In this episode I speak with Andy Grove one of the main authors of Apache Arrow and Ballista compute engine.Andy explains some challenges while he was designing the Arrow and Ballista memory models and he describes some amazing solutions.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.
References
https://arrow.apache.org/
https://ballistacompute.org/
https://github.com/ballista-compute/ballista

Friday Mar 19, 2021
Pandas vs Rust (Ep. 144)
Friday Mar 19, 2021
Friday Mar 19, 2021
Pandas is the de-facto standard for data loading and manipulation. Python is the de-facto programming language for such operations. Rust is the underdog. Or is it?In this episode I am showing you why that is no longer the case.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
Useful Links
https://github.com/haixuanTao/Data-Manipulation-Rust-Pandas
https://github.com/ritchie46/polars
https://github.com/rust-ndarray/ndarray

Saturday Mar 13, 2021
Concurrent is not parallel - Part 2 (Ep. 143)
Saturday Mar 13, 2021
Saturday Mar 13, 2021
In plain English, concurrent and parallel are synonyms. Not for a CPU. And definitely not for programmers. In this episode I summarize the ways to parallelize on different architectures and operating systems.
Rock-star data scientists must know how concurrency works and when to use it IMHO.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
Useful Links
http://web.mit.edu/6.005/www/fa14/classes/17-concurrency/
https://doc.rust-lang.org/book/ch16-00-concurrency.html
https://urban-institute.medium.com/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba

Wednesday Mar 10, 2021
Concurrent is not parallel - Part 1 (Ep. 142)
Wednesday Mar 10, 2021
Wednesday Mar 10, 2021
In plain English, concurrent and parallel are synonyms. Not for a CPU. And definitely not for programmers. In this episode I summarize the ways to parallelize on different architectures and operating systems. Rock-star data scientists must know how concurrency works and when to use it IMHO.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.