Archive for the 'data science' Category

Hey there! Having the best time of my life ;)

This is the first episode I record while I am live on my new Twitch channel :) So much fun!

Feel free to follow me for the next live streaming. You can also see me coding machine learning stuff in Rust :))

Don't forget to jump on the usual Discord and have a chat

I'll see you there!

 

 

 

 

Read Full Post »

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning.
We cover testing with deep learning (neuron coverage, threshold coverage, sign change coverage, layer coverage, etc.), combinatorial testing and their practical aspects.

On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more.
If you want to meet the tribe, tune in september 15th to the live@manning rust conference.

 

 

Read Full Post »

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning.

 

On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more.
If you want to meet the tribe, tune in september 15th to the live@manning rust conference.

 

 

Read Full Post »

In this episode I speak about a testing methodology for machine learning models that are supposed to be integrated in production environments.

Don't forget to come chat with us in our Discord channel

 

Enjoy the show!

 

--

This episode is supported by Amethix Technologies.

 

Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.

Read Full Post »

There is definitely room for improvement in the family of algorithms of stochastic gradient descent. In this episode I explain a relatively simple method that has shown to improve on the Adam optimizer. But, watch out! This approach does not generalize well.

Join our Discord channel and chat with us.

 

References

 

Read Full Post »

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code.
The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context. 

In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts.

Don't forget to join our Discord channel and comment previous episodes or propose new ones.

 

This episode is supported by Amethix Technologies

Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence.

 

References

Read Full Post »

In this episode I speak with Filip Piekniewski about some of the most worth noting findings in AI and machine learning in 2019. As a matter of fact, the entire field of AI has been inflated by hype and claims that are hard to believe. A lot of the promises made a few years ago have revealed quite hard to achieve, if not impossible. Let's stay grounded and realistic on the potential of this amazing field of research, not to bring disillusion in the near future.

Join us to our Discord channel to discuss your favorite episode and propose new ones.

 

This episode is brought to you by Protonmail

Click on the link in the description or go to protonmail.com/datascience and get 20% off their annual subscription.

Read Full Post »

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly.

To make a comparison with the Python ecosystem I will cover frameworks for linear algebra (numpy), dataframes (pandas), off-the-shelf machine learning (scikit-learn), deep learning (tensorflow) and reinforcement learning (openAI).

Rust is the language of the future.
Happy coding!
 

Reference

  1. BLAS linear algebra https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
  2. Rust dataframe https://github.com/nevi-me/rust-dataframe
  3. Rustlearn https://github.com/maciejkula/rustlearn
  4. Rusty machine https://github.com/AtheMathmo/rusty-machine
  5. Tensorflow bindings https://lib.rs/crates/tensorflow
  6. Juice (machine learning for hackers) https://lib.rs/crates/juice
  7. Rust reinforcement learning https://lib.rs/crates/rsrl

Read Full Post »

In the 3rd episode of Rust and machine learning I speak with Alec Mocatta.
Alec is a +20 year experience professional programmer who has been spending time at the interception of distributed systems and data analytics. He's the founder of two startups in the distributed system space and author of Amadeus, an open-source framework that encourages you to write clean and reusable code that works, regardless of data scale, locally or distributed across a cluster.

Only for June 24th, LDN *Virtual* Talks June 2020 with Bippit (Alec speaking about Amadeus)

 

Read Full Post »

In the second episode of Rust and Machine learning I am speaking with Luca Palmieri, who has been spending a large part of his career at the interception of machine learning and data engineering.
In addition, Luca contributed to several projects closer to the machine learning community using the Rust programming language. Linfa is an ambitious project that definitely deserves the attention of the data science community (and it's written in Rust, with Python bindings! How cool??!).

 

References

Read Full Post »

This is the first episode of a series about the Rust programming language and the role it can play in the machine learning field.

Rust is one of the most beautiful languages I have ever studied so far. I personally come from the C programming language, though for professional activities in machine learning I had to switch to the loved and hated Python language.

This episode is clearly not providing you with an exhaustive list of the benefits of Rust, nor its capabilities. For this you can check the references and start getting familiar with what I think it's going to be the language of the next 20 years.

 

Sponsored

This episode is supported by Pryml Technologies. Pryml offers secure and cost effective data privacy solutions for your organisation. It generates a synthetic alternative without disclosing you confidential data.

 

References

 

Read Full Post »

In this episode I have a chat with Sandeep Pandya, CEO at Everguard.ai a company that uses sensor fusion, computer vision and more to provide safer working environments to workers in heavy industry.
Sandeep is a senior executive who can hide the complexity of the topic with great talent.

 

This episode is supported by Pryml.io
Pryml is an enterprise-scale platform to synthesise data and deploy applications built on that data back to a production environment.
Test ideas. Launch new products. Fast. Secure.

Read Full Post »

Codiv-19 is an emergency. True. Let's just not prepare for another emergency about privacy violation when this one is over.

 

Join our new Slack channel

 

This episode is supported by Proton. You can check them out at protonmail.com or protonvpn.com

Read Full Post »

Whenever people reason about probability of events, they have the tendency to consider average values between two extremes. 
In this episode I explain why such a way of approximating is wrong and dangerous, with a numerical example.

We are moving our community to Slack. See you there!

 

 

Read Full Post »

In this episode I briefly explain the concept behind activation functions in deep learning. One of the most widely used activation function is the rectified linear unit (ReLU). 
While there are several flavors of ReLU in the literature, in this episode I speak about a very interesting approach that keeps computational complexity low while improving performance quite consistently.

This episode is supported by pryml.io. At pryml we let companies share confidential data. Visit our website.

Don't forget to join us on discord channel to propose new episode or discuss the previous ones. 

References

Dynamic ReLU https://arxiv.org/abs/2003.10027

Read Full Post »

Play this podcast on Podbean App