computer science | Page 8 | Data Science at Home

Episodes

Sunday Jul 26, 2020

GPT-3 cannot code (and never will) (Ep. 114)

Sunday Jul 26, 2020

The hype around GPT-3 is alarming and gives and provides us with the awful picture of people misunderstanding artificial intelligence. In response to some comments that claim GPT-3 will take developers' jobs, in this episode I express some personal opinions about the state of AI in generating source code (and in particular GPT-3).

If you have comments about this episode or just want to chat, come join us on the official Discord channel.

This episode is supported by Amethix Technologies.
Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.

Sunday Jul 19, 2020

What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)

Sunday Jul 19, 2020

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code. The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context.
In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts.
Don't forget to join our Discord channel and comment previous episodes or propose new ones.

This episode is supported by Amethix Technologies
Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence.

References
Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool - https://pandas.pydata.org/
Modin - Scale your pandas workflows by changing one line of code - https://github.com/modin-project/modin
Dask advanced parallelism for analytics https://dask.org/
Ray is a fast and simple framework for building and running distributed applications https://github.com/ray-project/ray
RAPIDS - GPU data science https://rapids.ai/

Monday Jun 29, 2020

Rust and machine learning #4: practical tools (Ep. 110)

Monday Jun 29, 2020

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly.
To make a comparison with the Python ecosystem I will cover frameworks for linear algebra (numpy), dataframes (pandas), off-the-shelf machine learning (scikit-learn), deep learning (tensorflow) and reinforcement learning (openAI).
Rust is the language of the future.Happy coding!
Reference
BLAS linear algebra https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
Rust dataframe https://github.com/nevi-me/rust-dataframe
Rustlearn https://github.com/maciejkula/rustlearn
Rusty machine https://github.com/AtheMathmo/rusty-machine
Tensorflow bindings https://lib.rs/crates/tensorflow
Juice (machine learning for hackers) https://lib.rs/crates/juice
Rust reinforcement learning https://lib.rs/crates/rsrl

Monday Jun 22, 2020

Rust and machine learning #3 with Alec Mocatta (Ep. 109)

Monday Jun 22, 2020

In the 3rd episode of Rust and machine learning I speak with Alec Mocatta. Alec is a +20 year experience professional programmer who has been spending time at the interception of distributed systems and data analytics. He's the founder of two startups in the distributed system space and author of Amadeus, an open-source framework that encourages you to write clean and reusable code that works, regardless of data scale, locally or distributed across a cluster.
Only for June 24th, LDN *Virtual* Talks June 2020 with Bippit (Alec speaking about Amadeus)

Wednesday Jun 17, 2020

Rust and machine learning #1 (Ep. 107)

Wednesday Jun 17, 2020

This is the first episode of a series about the Rust programming language and the role it can play in the machine learning field.
Rust is one of the most beautiful languages I have ever studied so far. I personally come from the C programming language, though for professional activities in machine learning I had to switch to the loved and hated Python language.
This episode is clearly not providing you with an exhaustive list of the benefits of Rust, nor its capabilities. For this you can check the references and start getting familiar with what I think it's going to be the language of the next 20 years.

Sponsored
This episode is supported by Pryml Technologies. Pryml offers secure and cost effective data privacy solutions for your organisation. It generates a synthetic alternative without disclosing you confidential data.

References
The Rust Programming Language
Cookin' with Rust

Friday Feb 07, 2020

A big welcome to Pryml: faster machine learning applications to production (Ep. 94)

Friday Feb 07, 2020

Why so much silence? Building a company! That's why :) I am building pryml, a platform that allows data scientists build their applications on data they cannot get access to. This is the first of a series of episodes in which I will speak about the technology and the challenges we are facing while we build it.
Happy listening and stay tuned!

Tuesday Oct 15, 2019

What is wrong with reinforcement learning? (Ep. 82)

Tuesday Oct 15, 2019

Join the discussion on our Discord server

After reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions. RL seems to work so well! What is wrong with it?

Are you a listener of Data Science at Home podcast? A reader of the Amethix Blog? Or did you subscribe to the Artificial Intelligence at your fingertips newsletter? In any case let’s stay in touch! https://amethix.com/survey/

References
Emergence of Locomotion Behaviours in Rich Environments https://arxiv.org/abs/1707.02286
Rainbow: Combining Improvements in Deep Reinforcement Learning https://arxiv.org/abs/1710.02298
AlphaGo Zero: Starting from scratch https://deepmind.com/blog/article/alphago-zero-starting-scratch

Thursday Oct 10, 2019

Have you met Shannon? Conversation with Jimmy Soni and Rob Goodman about one of the greatest minds in history (Ep. 81)

Thursday Oct 10, 2019

Join the discussion on our Discord server

In this episode I have an amazing conversation with Jimmy Soni and Rob Goodman, authors of “A mind at play”, a book entirely dedicated to the life and achievements of Claude Shannon. Claude Shannon does not need any introduction. But for those who need a refresh, Shannon is the inventor of the information age.
Have you heard of binary code, entropy in information theory, data compression theory (the stuff behind mp3, mpg, zip, etc.), error correcting codes (the stuff that makes your RAM work well), n-grams, block ciphers, the beta distribution, the uncertainty coefficient?
All that stuff has been invented by Claude Shannon :)

Articles:
https://medium.com/the-mission/10-000-hours-with-claude-shannon-12-lessons-on-life-and-learning-from-a-genius-e8b9297bee8f
https://medium.com/the-mission/on-claude-shannons-103rd-birthday-here-are-103-memorable-claude-shannon-quotes-maxims-and-843de4c716cf?source=your_stories_page---------------------------
http://nautil.us/issue/51/limits/how-information-got-re_invented
http://nautil.us/issue/50/emergence/claude-shannon-the-las-vegas-cheat

Claude's papers:
https://medium.com/the-mission/a-genius-explains-how-to-be-creative-claude-shannons-long-lost-1952-speech-fbbcb2ebe07f
http://www.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

A mind at play (book links):
http://amzn.to/2pasLMz -- Hardcover
https://amzn.to/2oCfVL0 -- Audio

Thursday Sep 26, 2019

[RB] How to scale AI in your organisation (Ep. 79)

Thursday Sep 26, 2019

Join the discussion on our Discord server
Scaling technology and business processes are not equal. Since the beginning of the enterprise technology, scaling software has been a difficult task to get right inside large organisations. When it comes to Artificial Intelligence and Machine Learning, it becomes vastly more complicated.
In this episode I propose a framework - in five pillars - for the business side of artificial intelligence.

Tuesday Sep 17, 2019

Training neural networks faster without GPU [RB] (Ep. 77)

Tuesday Sep 17, 2019

Join the discussion on our Discord server
Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.
Enjoy the show!

References
Faster Neural Network Training with Data Echoinghttps://arxiv.org/abs/1907.05550