Wednesday Nov 27, 2019
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.
Such architecture is built on top of another important concept already known to the community: self-attention.
In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server
References
- Attention is all you need
https://arxiv.org/abs/1706.03762 - The illustrated transformer
https://jalammar.github.io/illustrated-transformer - Self-attention for generative models
http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
Version: 20240731
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.