NLP | Data Science at Home

Episodes

Monday Nov 25, 2024

Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)

Monday Nov 25, 2024

In this episode of Data Science at Home, host Francesco Gadaleta dives deep into the evolving world of AI-generated content detection with experts Souradip Chakraborty, Ph.D. grad student at the University of Maryland, and Amrit Singh Bedi, CS faculty at the University of Central Florida.
Together, they explore the growing importance of distinguishing human-written from AI-generated text, discussing real-world examples from social media to news. How reliable are current detection tools like DetectGPT? What are the ethical and technical challenges ahead as AI continues to advance? And is the balance between innovation and regulation tipping in the right direction?

Tune in for insights on the future of AI text detection and the broader implications for media, academia, and policy.

Chapters

00:00 - Intro
00:23 - Guests: Souradip Chakraborty and Amrit Singh Bedi
01:25 - Distinguish Text Generation By AI
04:33 - Research on Safety and Alignment of Generative Model
06:01 - Tools to Detect Generated AI Text
11:28 - Water Marking
18:27 - Challenges in Detecting Large Documents Generated by AI
23:34 - Number of Tokens
26:22 - Adversarial Attack
29:01 - True Positive and False Positive of Detectors
31:01 - Limit of Technologies
41:01 - Future of AI Detection Techniques
46:04 - Closing Thought

Subscribe to our new YouTube channel https://www.youtube.com/@DataScienceatHome

Wednesday May 03, 2023

Revolutionize Your AI Game: How Running Large Language Models Locally Gives You an Unfair Advantage Over Big Tech Giants (Ep. 226)

Wednesday May 03, 2023

This is the first episode about the latest trend in artificial intelligence that's shaking up the industry - running large language models locally on your machine. This new approach allows you to bypass the limitations and constraints of cloud-based models controlled by big tech companies, and take control of your own AI journey.
We'll delve into the benefits of running models locally, such as increased speed, improved privacy and security, and greater customization and flexibility. We'll also discuss the technical requirements and considerations for running these models on your own hardware, and provide practical tips and advice to get you started.
Join us as we uncover the secrets to unleashing the full potential of large language models and taking your AI game to the next level!
Sponsors
AI-powered Email Security Best-in-class protection against the most sophisticated attacks,from phishing and impersonation to BEC and zero-day threats https://www.mimecast.com/

References
https://agi-sphere.com/llama-models/
https://crfm.stanford.edu/2023/03/13/alpaca.html
https://beebom.com/how-run-chatgpt-like-language-model-pc-offline/
https://sharegpt.com/
https://stability.ai/

Friday Oct 23, 2020

Neural search (Ep. 123)

Friday Oct 23, 2020

Come join me in our Discord channel speaking about all things data science.
Follow me on Twitch during my live coding sessions usually in Rust and Python
This episode is supported by Monday.com
The Monday Apps Challenge is bringing developers around the world together to compete in order to build apps that can improve the way teams work together on monday.com.

Sunday Jul 26, 2020

GPT-3 cannot code (and never will) (Ep. 114)

Sunday Jul 26, 2020

The hype around GPT-3 is alarming and gives and provides us with the awful picture of people misunderstanding artificial intelligence. In response to some comments that claim GPT-3 will take developers' jobs, in this episode I express some personal opinions about the state of AI in generating source code (and in particular GPT-3).

If you have comments about this episode or just want to chat, come join us on the official Discord channel.

This episode is supported by Amethix Technologies.
Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.

Friday Oct 18, 2019

[RB] Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan) (Ep. 83)

Friday Oct 18, 2019

Join the discussion on our Discord server

In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published.
We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI. Moreover, Aaron provides some very interesting links and demos that will blow your mind!
Enjoy the show!
References
Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron)
Pix2Pix:
https://phillipi.github.io/pix2pix/

CycleGAN:
https://junyanz.github.io/CycleGAN/

GANimorph
Paper: https://arxiv.org/abs/1808.04325
Code: https://github.com/brownvc/ganimorph

UNIT:https://arxiv.org/abs/1703.00848
MUNIT:https://github.com/NVlabs/MUNIT
DRIT: https://github.com/HsinYingLee/DRIT

GPT-2 and related
Try OpenAI's GPT-2: https://talktotransformer.com/
Blogpost: https://blog.usejournal.com/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc
The Original Transformer Paper: https://arxiv.org/abs/1706.03762
Grover: The FakeNews generator and detector: https://rowanzellers.com/grover/

Monday Sep 23, 2019

Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan) (Ep. 78)

Monday Sep 23, 2019

Join the discussion on our Discord server
In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published.
We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI. Moreover, Aaron provides some very interesting links and demos that will blow your mind!
Enjoy the show!
References
Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron)
Pix2Pix:
https://phillipi.github.io/pix2pix/

CycleGAN:
https://junyanz.github.io/CycleGAN/

GANimorph
Paper: https://arxiv.org/abs/1808.04325
Code: https://github.com/brownvc/ganimorph

UNIT:https://arxiv.org/abs/1703.00848
MUNIT:https://github.com/NVlabs/MUNIT
DRIT: https://github.com/HsinYingLee/DRIT

GPT-2 and related
Try OpenAI's GPT-2: https://talktotransformer.com/
Blogpost: https://blog.usejournal.com/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc
The Original Transformer Paper: https://arxiv.org/abs/1706.03762
Grover: The FakeNews generator and detector: https://rowanzellers.com/grover/