Umar Jamil
Umar Jamil
  • 22
  • 927 156
ML Interpretability: feature visualization, adversarial example, interp. for language models
In this video, I will be introducing Machine Learning Interpretability, a vast topic that aims at understanding the inner mechanisms of how machine learning models make their predictions, with the aim of debugging them, making them more transparent and trustworthy.
I will start by reviewing deep learning and the back-propagation algorithm, which are necessary for understanding adversarial example generation and feature visualization for computer vision classification models. In the second part, I will show how we can leverage the knowledge built in the first part of the video and apply it to language models. In particular, we will see how we can get insights on the bias of a language model by generating a prompt that maximizes the likelihood of the next token being a certain concept of our choice. This allows us to answer questions like:
"What does my language model think of women?"
"What does my language model think of minorities?"
This video has been built in collaboration with Leap Labs - an AI research lab that deals with machine learning interpretability and built the Leap Labs Interpretability Engine, which allows to get insights on how computer vision models work and how to improve them by generating prototypes, isolating features and understanding entanglement between classes.
Leap Labs: www.leap-labs.com/
Leap Labs Tutorials: docs.leap-labs.com/tutorial
As usual, the code and PDF slides are available at the following links:
- PDF slides: github.com/hkproj/ml-interpretability-notes
- Adversarial Example Generation (tricking a classifier): github.com/hkproj/adversarial_example_generator
- Generate inputs for language models: github.com/jessicarumbelow/Backwards
Переглядів: 4 669

Відео

Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
Переглядів 23 тис.Місяць тому
In this video, I will be explaining Kolmogorov-Arnold Networks, a new type of network that was presented in the paper "KAN: Kolmogorov-Arnold Networks" by Liu et al. I will start the video by reviewing Multilayer Perceptrons, to show how the typical Linear layer works in a neural network. I will then introduce the concept of data fitting, which is necessary to understand Bézier Curves and then ...
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
Переглядів 7 тис.2 місяці тому
In this video I will explain Direct Preference Optimization (DPO), an alignment technique for language models introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model". I start by introducing language models and how they are used for text generation. After briefly introducing the topic of AI alignment, I start by reviewing Reinforcement Learning (R...
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
Переглядів 13 тис.3 місяці тому
In this video, I will explain Reinforcement Learning from Human Feedback (RLHF) which is used to align, among others, models like ChatGPT. I will start by introducing how Language Models work and what we mean by AI alignment. In the second part of the video, I will derive from first principles the Policy Gradient Optimization algorithm, by explaining also the problems with the gradient calculat...
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
Переглядів 34 тис.5 місяців тому
Explanation of the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces In this video I will be explaining Mamba, a new sequence modeling architecture that can compete with the Transformer. I will first start by introducing the various sequence modeling architectures (RNN, CNN and Transformer) and then deep dive into State Space Models. To fully understand State Space Models, ...
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
Переглядів 23 тис.5 місяців тому
In this video I will be introducing all the innovations in the Mistral 7B and Mixtral 8x7B model: Sliding Window Attention, KV-Cache with Rolling Buffer, Pre-Fill and Chunking, Sparse Mixture of Experts (SMoE); I will also guide you in understanding the most difficult part of the code: Model Sharding and the use of xformers library to compute the attention for multiple prompts packed into a sin...
Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code
Переглядів 10 тис.6 місяців тому
A complete tutorial on how to train a model on multiple GPUs or multiple servers. I first describe the difference between Data Parallelism and Model Parallelism. Later, I explain the concept of gradient accumulation (including all the maths behind it). Then, we get to the practical tutorial: first we create a cluster on Paperspace with two servers (each having two GPUs) and then training a mode...
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
Переглядів 15 тис.6 місяців тому
In this video I will introduce and explain quantization: we will first start with a little introduction on numerical representation of integers and floating-point numbers in computers, then see what is quantization and how it works. I will explore topics like Asymmetric and Symmetric Quantization, Quantization Range, Quantization Granularity, Dynamic and Static Quantization, Post-Training Quant...
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
Переглядів 47 тис.6 місяців тому
Get your 5$ coupon for Gradient: gradient.1stcollab.com/umarjamilai In this video we explore the entire Retrieval Augmented Generation pipeline. I will start by reviewing language models, their training and inference, and then explore the main ingredient of a RAG pipeline: embedding vectors. We will see what are embedding vectors, how they are computed, and how we can compute embedding vectors ...
BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
Переглядів 31 тис.7 місяців тому
Full explanation of the BERT model, including a comparison with other language models like LLaMA and GPT. I cover topics like: training, inference, fine tuning, Masked Language Models (MLM), Next Sentence Prediction (NSP), [CLS] token, sentence embedding, text classification, question answering, self-attention mechanism. Everything is visually explained step by step. I also review the backgroun...
Coding Stable Diffusion from scratch in PyTorch
Переглядів 91 тис.8 місяців тому
Full coding of Stable Diffusion from scratch, with full explanation, including explanation of the mathematics. Visual explanation of text-to-image, image-to-image, inpainting Repository with PDF slides: github.com/hkproj/pytorch-stable-diffusion Prerequisites: 1) Transformer explained: ua-cam.com/video/bCz4OMemCcA/v-deo.html Chapters 00:00:00 - Introduction 00:04:30 - What is Stable Diffusion? ...
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm
Переглядів 29 тис.9 місяців тому
Full coding of LLaMA 2 from scratch, with full explanation, including Rotary Positional Embedding, RMS Normalization, Multi-Query Attention, KV Cache, Grouped Query Attention (GQA), the SwiGLU Activation function and more! I explain the most used inference methods: Greedy, Beam Search, Temperature Scaling, Random Sampling, Top K, Top P I also explain the math behind the Rotary Positional Embedd...
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
Переглядів 50 тис.10 місяців тому
Full explanation of the LLaMA 1 and LLaMA 2 model from Meta, including Rotary Positional Embeddings, RMS Normalization, Multi-Query Attention, KV-Cache, Grouped Multi-Query Attention (GQA), the SwiGLU Activation function and more! I also review the Transformer concepts that are needed to understand LLaMA and everything is visually explained! As always, the PDF slides are freely available on Git...
Segment Anything - Model explanation with code
Переглядів 15 тис.10 місяців тому
Full explanation of the Segment Anything Model from Meta, along with its code. As always the slides are freely available: github.com/hkproj/segment-anything-slides Chapters 00:00 - Introduction 01:20 - Image Segmentation 03:28 - Segment Anything 06:58 - Task 08:20 - Model (Overview) 09:51 - Image Encoder 10:07 - Vision Transformer 12:30 - Masked Autoencoder Vision Transformer 15:32 - Prompt Enc...
LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
Переглядів 20 тис.10 місяців тому
A full visual explanation of LoRA, with PyTorch code form scratch! Full code and slides are available on my GitHub: github.com/hkproj/pytorch-lora Chapters 00:00 - Introduction 00:47 - How neural networks work 01:48 - How fine tuning works 03:50 - LoRA 08:58 - Math intuition 10:25 - Math explanation 14:05 - PyTorch implementation from scratch
LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation
Переглядів 3,9 тис.11 місяців тому
LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code Explanation
How diffusion models work - explanation and code!
Переглядів 8 тис.11 місяців тому
How diffusion models work - explanation and code!
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
Переглядів 20 тис.Рік тому
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
Переглядів 328 тис.Рік тому
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
Переглядів 147 тис.Рік тому
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
CLIP - Paper explanation (training and inference)
Переглядів 4,1 тис.Рік тому
CLIP - Paper explanation (training and inference)
Wav2Lip (generate talking avatar videos) - Paper reading and explanation
Переглядів 2,5 тис.Рік тому
Wav2Lip (generate talking avatar videos) - Paper reading and explanation

КОМЕНТАРІ

  • @TheArmaan81
    @TheArmaan81 День тому

    This is so bonkers. Cheers Mate, you've saved me sometime. Thanks.

  • @Elrevisor2k
    @Elrevisor2k День тому

    Great great video. In the case of colorizing a BW image how is the process? This can also be trained and programmed in python? 👍👍

  • @mohammadyahya78
    @mohammadyahya78 День тому

    amazing

  • @ronraisch2073
    @ronraisch2073 День тому

    Great video, it’s explained really well 😊

  • @mohammadyahya78
    @mohammadyahya78 2 дні тому

    amazing

  • @ChadieRahimian
    @ChadieRahimian 2 дні тому

    Thanks for the amazing explanation!

  • @mlloving
    @mlloving 2 дні тому

    Amazing video. You explained it so clear. Thank you for putting effort into this lecture. If possible, would you please create a lecture about YOLO codes.

  • @Akuma7499
    @Akuma7499 2 дні тому

    How to load and save the lora weights can anyone explain?

  • @ZaindTV
    @ZaindTV 2 дні тому

    Will u do a Video about the training-code?

  • @supratimsaha8541
    @supratimsaha8541 2 дні тому

    Brilliant explanation

  • @usr-34-gambaman
    @usr-34-gambaman 2 дні тому

    Does leap labs provide open-source libraries?

    • @umarjamilai
      @umarjamilai 2 дні тому

      You can play with the LLM interpretability notebook, which is open source. Link in the description

  • @adityakulkarni5577
    @adityakulkarni5577 3 дні тому

    phenomenal video

  • @capyk5455
    @capyk5455 3 дні тому

    Superb explanation, love your channel :)

  • @dzenathan6003
    @dzenathan6003 3 дні тому

    that was really lovely and great from you thanks alot i would be more happy if you showed us how to fine tune your model that will make the whole video simply perfect

  • @n.8642
    @n.8642 3 дні тому

    Thanks! I learned a lot from your excellent video.

  • @ebadsayed487
    @ebadsayed487 4 дні тому

    Your video is truly amazing, thanks a lot for this. I want to train this model on Summarization Task so what changes I need to do?

  • @maxvell77
    @maxvell77 4 дні тому

    Thanks!

  • @gkmocastro
    @gkmocastro 5 днів тому

    Thank you a lot for this amazing video. It helped me understand better diffusion models for my masters.

  • @NamitJain-te3sc
    @NamitJain-te3sc 6 днів тому

    Amazing work!! Would be really thankful if you can share the code or the resources on how to make that vocab.json, merges.txt and model ckpt file.

  • @bibhutibaibhavbora8770
    @bibhutibaibhavbora8770 6 днів тому

    When the new video is coming?

  • @serinevcim5390
    @serinevcim5390 6 днів тому

    I could not understand something: is token2 that we append to the Q after first inference equal to attention1 ???????

  • @selayan4985
    @selayan4985 6 днів тому

    Such a briliant work you have done. Really learned a lot, thanks!!!

  • @evgeniic
    @evgeniic 7 днів тому

    Can someone explain, why we want to use VAE instead of AE? Isnt diffusion models are good in reconstruction of distribustions?

  • @yuningliu6300
    @yuningliu6300 7 днів тому

    at 2:21 you mentioned the documentation. where can I find it ?

  • @ivancruz2783
    @ivancruz2783 7 днів тому

    Great work! Thanks for putting this all together. Very easy to follow and simple explanations of complex ideas! It helps a lot to code along the explanation

  • @laodrofotic7713
    @laodrofotic7713 8 днів тому

    THis is amazing! I just finished the whole video and I trully understand it now, thank you Umar Jamil, you are the greatest!!!!!!!!!!!!

  • @laodrofotic7713
    @laodrofotic7713 8 днів тому

    I must say it started off a bit bad when you started writing with the red stick, I almost tuned out. Turns out I have to agree this is the best explanation of self attention I have seen on youtube, congratulations, this is really good and properly explained, specially the QKV

  • @flakky626
    @flakky626 8 днів тому

    I followed the code and could understand some of it but the thing is I feel overwhelmed seing such large code bases.. When will I be able to code stuff like that on such scale!!

  • @FranciscoSantiburcioCortes
    @FranciscoSantiburcioCortes 8 днів тому

    Awesome Explanation, thanks for such tutorial

  • @codevacaphe3763
    @codevacaphe3763 9 днів тому

    Hi Umar, can I ask you how do you get the LlaMa architecture, which paper is it from ?

    • @umarjamilai
      @umarjamilai 8 днів тому

      I've built it myself by studying the code.

    • @codevacaphe3763
      @codevacaphe3763 5 днів тому

      @@umarjamilai Wow that's amazing thank you for sharing.

  • @SarisKiattithapanayong-hx3dw
    @SarisKiattithapanayong-hx3dw 9 днів тому

    can i request controll net asswell ?

  • @rachadlakis1
    @rachadlakis1 9 днів тому

    Wow, this is an incredibly detailed explanation of the Transformer Model! Thank you for sharing all the insights and resources. Understanding the layers and processes involved is crucial for anyone working with this model. Keep up the great work!

  • @reopjk6226
    @reopjk6226 10 днів тому

    You really love Chinese

  • @user-nu2fm4cb6v
    @user-nu2fm4cb6v 10 днів тому

    this is the Best explanation that i saw from all the resourses including even paid coursera courses.❤❤

  • @harshjha6774
    @harshjha6774 10 днів тому

    51:20 funny dude xd:)))))

    • @harshjha6774
      @harshjha6774 10 днів тому

      hahaha still laughing but video was amazing

  • @ThuyTamPhuocBinh
    @ThuyTamPhuocBinh 10 днів тому

    感谢你干货满满的教程!

  • @aanchalmahajan3821
    @aanchalmahajan3821 10 днів тому

    Best explanations in such a beautiful manner. Please share video on GPT also. Would be highly thankful for such. It's great to learn from you. Thanks a lot Sir ☺☺. I highly request to share video on GPT.

  • @xrg-hm9ye
    @xrg-hm9ye 11 днів тому

    Such a wonderful video!

  • @sudarshannambiar9055
    @sudarshannambiar9055 11 днів тому

    legend

  • @matiasbunsterraby2176
    @matiasbunsterraby2176 11 днів тому

    Umar, thank you very much. It is a very clarifying presentation. One question: in slide 7 of the pdf, how do ypu build the embedding vector of each word?

    • @umarjamilai
      @umarjamilai 11 днів тому

      Randomly. Embeddings are initialized randomly and then trained with backpropagation.

  • @federicoottomano8619
    @federicoottomano8619 11 днів тому

    This is great! Going through the CLIP part right now ^^

  • @georgejunior8975
    @georgejunior8975 11 днів тому

    Great explanation!!!

  • @user-uu3ud5sp5h
    @user-uu3ud5sp5h 11 днів тому

    1:32:45 self attetion 2:57:13 cross attention

  • @jiegong529
    @jiegong529 11 днів тому

    You are just too amazing! You can understand these stuff in great detail. Then you take the time and explain to us in educative videos. A true gem channel!

  • @user-fs8dw2lc6w
    @user-fs8dw2lc6w 12 днів тому

    very helpful。

  • @benji6296
    @benji6296 12 днів тому

    Umar thank you for the content, really helps to grasp what the concepts are .

  • @kqb540
    @kqb540 12 днів тому

    Umar, Andrew Ng, 3Blue1Brown and Andrej are all you need. You are one of the best educators of deep learning. Thank you.

  • @Philip8888888
    @Philip8888888 12 днів тому

    Wow. This video is pure gold. Very nicely explained and I'm still only 30 mintues into it!

  • @Philip8888888
    @Philip8888888 12 днів тому

    Wow. 5 hours. I need to grab some drinks and snacks first!

  • @QunFengDai
    @QunFengDai 13 днів тому

    你是我的导师