Podchaser Logo
Home
407: What the heck are tokens, embeddings, and GPTs anyways?

407: What the heck are tokens, embeddings, and GPTs anyways?

Released Monday, 22nd April 2024
Good episode? Give it some love!
407: What the heck are tokens, embeddings, and GPTs anyways?

407: What the heck are tokens, embeddings, and GPTs anyways?

407: What the heck are tokens, embeddings, and GPTs anyways?

407: What the heck are tokens, embeddings, and GPTs anyways?

Monday, 22nd April 2024
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

mergeconflict407 === [00:00:00] James: Frank, Frank, Frank, Frank, Frank. We're talking about AI and welcome back everyone to Merge Conflict, your weekly developer podcast. If you're brand new here, jam that subscribe button, hit like, or hit follow, or give us a review, whatever you want to do, just follow this podcast because this is now the official unofficial, sometimes on occasion, we talk about AI and today we're talking about everyone's favorite topic. Frank tokens. Ding, ding, ding, tokens. Yeah, it's like, okay, here's how I describe tokens. It's like that. It's like when you go. No, I'm done doing it. It's like, okay. You go in to the boardwalk, right? And there's all of these games. Okay. All these games. Okay. That are out. Okay. And oh, and you take your money. Hey, you take money and you put it into the machine. It gives you a bunch of tokens. So there's a currency, like a dollar, one hundred tokens and then each each machine model has an input of how many tokens. are needed to put in the limits, the limit of tokens that you can put in to play. Now, often on this scenario, oh, which is a bad, it's all, it's all wrong by the worst analogy I've [00:01:12] Frank: heard. [00:01:14] James: I was. And I think [00:01:15] Frank: your analogy is wrong too. It is wrong. I just have to interrupt. It's wrong. The machines give you tickets. The machine gives you tickets. You put, you put tokens in. You put in tokens. Yeah. Tickets come out. So what this has absolutely nothing to do with. No, it doesn't. TikTok. [00:01:33] James: Is that what we're talking about? No. Okay. We're talking about tokenizers, tokenization. We're talking about clods and Geminis and GPTs. I just watch a Fireship video and like every five seconds it's some new fancy crazy. [00:01:50] Frank: Okay, I think we almost have to apologize for this episode. We just got into a discussion about tokens and tokenization and what does it really mean to be a token in an AI networking world? Um, and somehow it's bled over to a podcast. So hi everyone. Let's just talk about some fundamentals of AI here. Uh, because why not? It's, it's like, it's like the most overused word at this point. I, I, I hope, I don't think it's entered like the common vernacular. I haven't heard like any of my cousins saying tokens, but like, you just spontaneously started talking about tokens today. So like, this isn't even my fault. This is on you. [00:02:34] James: Now we've talked about tokens on past previous episodes of the podcast. And when Frank was talking about it. I didn't really know what he was talking about, but now I almost know what he's talking about. Do [00:02:48] Frank: you? Okay. Cool. Cool. Yeah. Um, so tokens have been the, become the currency of AI. So, with the dawn of the large language models, the LLMs. The oracles from on high are great overlords. Are we talking into the future? [00:03:08] James: Sure. We got to be nice [00:03:09] Frank: to the robots. Um, uh, they all work with text. Isn't that weird? Like who would have thought, who would have thought we would do a whole episode on consoles and how to make console apps good in this modern world. And then this week we're going to talk about tokens, which by the way, tokens are just text. They're just a funny way to represent text. FYI, I'm just cutting through 30 minutes of this episode and just getting to the end. Tokens are text. There's a simple transformation. It's not that simple, but tokens are just text. Aren't they really, aren't they really numbers though? Ah, okay. We, we, we assign them numbers. Okay? [00:03:54] James: Yes! I win! I win! One to nothing, baby! That's right. [00:03:59] Frank: Is the letter A a number? Well, in ASCII, it is. In Unicode, it is. You know, you can assign numbers to anything. Anything that you can enumerate, anything that you can list. We're programmers here. We know this. Anything that you can make into a collection, you can assign a number to. So yes, they're numbers. But they're numbers in two forms. So, um, when you see a token, they're usually an integer because there will be a vocabulary, and we can get into what that actually means. But a neural network, I'm going to throw out numbers that are probably wrong, you can look up the real correct numbers, but like a network like GPT 3, an older network, I think it had something like 50, 000. Different tokens. [00:04:41] James: That's correct. Yes. 52, 000. I'm pretty sure. Is that 52, [00:04:44] Frank: 000? Thank you for looking that up. [00:04:47] James: I just know, I just know, I just know off the top of my head. I was side tangent. I was in front of Myers the other day and we were buying some firewood and the firewood has like The, I got the, the axe thingy. It's amazing. It has the tiniest part. It's amazing. So it has like the tiniest barcode in the entire world on it and it doesn't scan. So this guy comes over at the self checkout and from memory. Types in this 20 digit code. Anyways, that, that's my assimilation with this wood has numbers associated with basically they're UPC codes. Everything's a UPC code. Oh [00:05:26] Frank: my gosh. But it's a serial system because yeah. Um, but you know, like those numbers are arbitrary, right? Like we could, we could put them in any order and all that stuff. So there's a lot, there's a lot of things we could talk about with tokens. We can talk about like, why do we tokenize at all? You know, if we're writing English sentences, you need the letters of the alphabet, the space character, the period question mark, one or two emojis, you know, you really don't need it. Um, you know, you could really just use individual, uh, characters and things like [00:06:03] James: that. So we could talk about why do we tokenize? Why Frank, you prompted it. Let me ask you, why do we use tokens? Why do I, I can't, I just. Send in the text. Compression. [00:06:16] Frank: We use tokens because this is jumping ahead a little bit, but all these neural networks that we're inputting all this text into, all this data into, that has to go into a buffer that a neural network can understand. And the way we design these things, that buffer's got a fixed size, or what I should say is that it has a maximum size. And we often call that the context size of these networks. And that's, that's what it is. But the context also includes your history, uh, the system prompt, and your current prompt. Uh, yeah, so the history becomes the bulk of it, but there's a lot there and that's a fixed size. And, uh, if you take, uh, what, a thousand character sentence, how many tokens would that turn into if you don't have compression? It would turn into a thousand tokens. Thousand, yeah. But, uh, we all know the English language repeats itself a lot. Uh, we use some, uh, character combinations, and we don't use others. So someone one day got a very clever idea in their head, and they're like, what if we write a little compressor that figures out Uh, the most common combination of two characters, the most common combination of three characters, the most common combination of four characters. And we come up with a fancy way. People love writing compression algorithms. Programmers love, love this stuff. Computer scientists live on it. They, they write books on compression algorithms. So there's a million ways that we know how to do this. And the compression algorithm people came up with is called this tokenization idea, where you find, um, substrings. That just occur very often in the source text. So the word the, you know, if you have a thousand words of English, the word the is probably gonna be in there a couple times. So you might as well just assign the whole word the into, uh, one token. And now you have a compression of four to one. [00:08:25] James: Yeah. [00:08:25] Frank: And the compression's usually not that good. But it's good enough. So in the case where I would need a thousand, uh, neural network entries to represent a thousand character sentence, now I only need 250. And the network can run a lot faster. That's one aspect, but there's multiple, but I should take a pause cause I just said a lot of words. [00:08:50] James: No, I think that makes sense to me. A hundred percent. I think one way of looking at this, that's really helpful by the way, is to go to platform. openai. com. Slash tokenizer. I'll put it in the show notes. And what this does is it, it does give you a brief overview of how open AI is LLMs, you know, GPTs process texts using tokens. So it's definition we'll see how close it is, is which are common sequences of characters found in a set of texts. The model. The models learn to understand the statistical relationship between these tokens and Excel producing the next token in a sequence of tokens. It's just tokens followed by tokens and those tokens refer to something. Now, the cool part about this is that you can actually enter into a, um, uh, open text box and it will show you the characters and also the tokens that are there. So for example, if I say, hello world, That is two tokens, 11 characters. If I say, hello, hello, I guess, hello, Frank. [00:09:55] Frank: I said hello, James. [00:09:57] James: Yeah. [00:09:58] Frank: So, right, so this, this is the beauty of the compression. The word hello comes up a lot in English. So they have one token represents the entire word, hello. And then, I put in James. And James comes up a lot, I guess, in English, cause James is an entire single token, also. And so, like all compression, it's just a heuristic. They just collect a bunch of statistics. Um, you actually train the tokenizer. Like, when I'm doing my Cuneiform, uh, work, I'm training tokenizers to look at a bunch of text and figure out what are the common patterns, and take those common patterns and merge them into a single token. And keep doing that, keep doing that, keep doing that, until you get maximum compression. So, when I gave the example of the word the, that was a four to one compression, but James, uh, Hello, James was 11 characters down to two tokens. So that's 11 to two compression or 5. 5 to one compression. [00:11:02] James: Pretty good. And here's another good example is we just did a Patreon exclusive half an hour video and, um, uh, podcast. Basically, if you want more of us, patreon. com. Shout out. We have Patreon bonus episodes every week. We had a, it was called AI Kitty Letterbox. Forwards, AI, that's one kind of, uh, kitty, litter, box, AI, token, kitty, token, litter, two tokens. Litter? Litter is not in there. So actually L is its own token. And then itter, Apparently, but because a lot of iters [00:11:39] Frank: out there, a lot of iters, [00:11:40] James: most likely Litter is downranked compared to other iters. So it's able to, it makes that trade off, right? Instead of putting all the iters in there, boom, and then box is another token. So that's really cool. So like, that's a good example. I was just trying to find something that wasn't just a one to one with words that, because then, you know, it's every word that's in there, but not every word is in it. We have more than 52, 000 words. [00:12:01] Frank: Now, can I give, I want to give. One more good thing about tokenization and one more bad thing about tokenization. You want the good news first or the bad news? [00:12:13] James: Bad news. Heather's I've been, I've been, I've been saying good news. I've been saying good news, bad news with Heather. And she's just like, give me the bad news. And I was like, the good news isn't that good. That's why I said the good news isn't that good. Is this is [00:12:23] Frank: Heather's doing? Okay. Yeah, it is. [00:12:24] James: It is. [00:12:26] Frank: I see. I thought you were going to be more positive. I, I really thought you were going to go for the positive one there. All right. The bad news. Try numbers. Numbers, numbers don't do well with tokenization. So if you think about what I just said, um, what things become tokens? Common strings in, let's say, English. And they do try to train these things multinational, but, uh, multilanguage, I should say. But they're really, the big ones are really English dominant right now. Anyway, numbers are a bad example, because what is the difference between 1. 0, 1. 0, 1. 0? 1. 00, 1. 00000001. You know, [00:13:18] James: these [00:13:18] Frank: all conceptually, to a human being thinking person that knows anything about any kind of numbers, these are all roughly the same thing. That number, but those are all have very different tokenizations, uh, to the neural network. So the concept of the number one, you would think is a unified general concept. The tokenizer can really mess with it. The tokenizer can really influence what the network's able to learn because now the network can't just look at one token and know, Oh, that's the number one. It now has to learn 50. Different tokens. I mean, literally a million different tokens can actually be combinations thereof, uh, can actually be the number one and they're all just as valid. The number one with a million zeros behind it is still the number one. Um, so that's unfortunate. It's, um, wasted space in the neural network. It's so a lot of the reputation that these language models get is that they're bad at math. And. 50 percent to 90 percent of them being bad at math is because of the tokenizer. [00:14:31] James: That makes sense. Yeah. Just [00:14:32] Frank: the way it represents numbers is not ideal. [00:14:35] James: Yeah. I typed in a lot. It seems like it also maxes out at three, at least in this tokenizer example, max out, it's like three. So if I do, if I do zero, zero. So I do 0, 1 token. If I do 0, 0, 2, 1 token. If I do 0, 0, 0, 1 token, if I do 0, 0, 0, 0, 2 tokens, it's like the max is three basically is what it seems like. And then, yeah, if you do, that's the other part is if you do 1. 0, that's going to be three tokens because 1. 0 and then another zero continues in this way with that, that, so it does become a pretty lackluster, um, experience. So that makes sense. Yeah. [00:15:13] Frank: It's a difficult problem because 1.1, that could be the number one and one 10th. That could be section 1.1, you know, so text is not a great input. It's. You know, history, history has a sense of humor and it's funny that text, we, we all wanted natural language input. It's been the goal of machine learning AI forever, but, um, turns out natural language is not that great at some things. Numbers and being specific about, um, objectiveness of things. A lot of it's just gleaned from context. So there's not a great way to represent. 1. 1. It could be a number, especially at the tokenizer level. You just don't know. So they, they play it safe. Usually that one might even be a token. That one comes up, but you can imagine a lot of numbers that come up and a lot of other numbers that never do. [00:16:18] James: That makes sense. I like that. So these tokens all have numbers assigned to them, right? So the, the words have. Our tokens and those tokens are assigned to, to it basically. [00:16:31] Frank: Substrings, really. Substrings. Technically, yeah, it's not words. Um, for the common words, it's a word, but really just think of it as a substring of all the text in the, in the human world here. [00:16:43] James: So, what do we do with these tokens? Like I mean, we have, if I have a bunch of things. Are you ready, James? tokens. I'm ready. Yeah. What do I do with these tokens? Are you ready [00:16:51] Frank: for the good news? Because you didn't want the good news before. Do I get tickets? [00:16:54] James: back? Yeah. What's the good you don't get tickets back. Am I getting, am I cashing this in for a prize at the end of the day? [00:16:58] Frank: No. Okay. Well, there's, there's two parts to the good news, but I'm, I'll, I'll start really high level. The neat thing about tokenization is that like the concept of the word, the, the, the, the, T H E in English. It's nebulous, but it comes up. It means a lot of things in a lot of contexts. And that becomes a single token. And so the network can learn a lot about that token. It's kind of the opposite of the number problem. See, I really thought you were going to go for the positive first, and I had a great segue. Like, here's a good thing, and then here's how that good thing falls apart with the numbers. It's the opposite of the numbers. The problem with numbers is there's a million ways to represent the number one. But there's one way to represent the word the, it's the word the, and so the network can learn a lot and it can get really good at using the word the, um, what's the, there's a word going around that, um, if you see it in generated text, or if you see it in text, it's probably an AI Generated because it, it just favors some words. It, it, it got good at some words and it, like some words, and I, I really can't remember it. I think it was like an adverb or something overly Anyway, um, so it becomes really good at the word the, and it becomes really good. At other words, that become a little annoying, that it keeps repeating just because it has a lot of information about those words. [00:18:21] James: That makes sense. I mean, it makes a lot of sense. Yeah. Because those, yeah, those tokens and those words are going to be used over and over again. And it can infer a lot by the surrounding parts of it as well, I assume in the, in the learning process. Ready for good news [00:18:36] Frank: part [00:18:36] James: two? Yeah. I'm ready for great news part two. Let's hit, hit me. I'm ready. [00:18:42] Frank: Okay. So we took all of English text, broke it up into tokens. Yes. We assigned numbers to them. Completely irrelevant. And we just, You know, number 42 is sometimes, we, we just, cause we need to, we're programmers. We got to put the stuff in an array. They got to have some order when they go in the array. So we assign numbers to them. That's really the whole reason why it does not matter. What does matter is these fancy networks learn another number, a more advanced number. A tensor, some would say. An embedding, others would say. So for every single token, a thing called an embedding is learned. So this actually has information associated with it. Information that only the neural network really knows how to interpret, but it's information that it chose to learn during its training process. Sorry, I'm giving a lot of animus, but in my mind, these networks are, you know, they're doing their own thing. So, um, we, we get this, we get this thing called an embedding and it turns out we have learned how to do very interesting things with these embeddings because they are information packed. Um, basically the networks have learned to just, Oh, in the olden days there were always interesting things about the embeddings. Like if you took king, took the embedding for the word token king or the couple tokens king, whatever, just make your array longer. And you subtracted man from it, you would get queen. So, like, it learned this fun arithmetic that it could do over these, we used to call them word vectors, now they're called embeddings, because they're not quite linear the way they used to be linear. Those were called variational, but whatever. Um, but they still have a lot of meaning like that, and we can do fancy things. And you were telling me about, About a project you did with fancy things with it. But I think that's, what's really neat about these tokens is once you have a network trained on them, now all of a sudden there is actually a chunk of information that's tricky to interpret, but it's associated with each token. [00:20:55] James: Yeah. And how I have, I've watched a few YouTube videos. I spoke with several people and I've been doing a lot of reading. Often you'll see it described in the simplest terms as. XYZ think of an X, we, we as humans can understand XYZ coordinate system, right? There's a, we mostly can understand an X and a Y Z is a little complicated, but Z is the depth. So [00:21:20] Frank: your eyes are working. Yeah. Yeah. [00:21:22] James: So, so my understanding is we. You take the example of, of the king, but you also have something of, of queen and you have XYZs that are kind of pointed in a direction. They're kind of pointed. And maybe for example, king and queen are on different because the, how the machine learned it. It's like, well, these things are very different, but they might be in some direction because they might say, oh, well, king and queen, how they are. They're, of some timeline or something like that of uh, of, of days of yore, right? Of, of, of, uh, castles and whatever, right? So, um, they might be somewhere just like, uh, a woman or man or aunt and uncle, right? And the idea is that most likely, Things that are really relevant. So like you were saying, there might be king and man, and over here, there's queen and princess and you know, all these other things that are in the similar vein, they're probably pointing in very similar directions. Um, which means if you. Subtract that, like you're saying, you will probably find those, those, those things, basically the, the distances between them. [00:22:27] Frank: Yeah. And that's my favorite metaphor actually, because it's honestly really close to what's really going on, even at the math level at the, you know, if you're implementing these things from scratch with code, um, it's really close to what's happening. So in your case, you're saying the X, Y, Z, that's three dimensions. So that would be three numbers. Uh, that you would associate with each token and those three numbers combined, not combined, just listed out the three numbers that would be the embedding for the token. And then if you have a whole bunch of text, the embedding for that text is all the numbers just stringed out one after another. And I love that analogy so much because we do talk about these things as dimensions in code. It's a metaphor itself, but in code, it's called a dimension. I was programming on them today. And, uh, the only difference is in your example, there were three dimensions. And in the case of something like GPTs, I mean, there's 700, a thousand, you know, so it is a thousand dimensional space out there where Nebulous concepts. You can locate them. Or, these aren't fully concepts because there's more concepts in the network, but, um, information. It's almost like a database. [00:23:50] James: Yeah, it is. And it's doing these lookups in general. I'll put a link to the one video, um, from three blue one Brown, which is, uh, what is a GPT and he often uses the very similar metaphor and he does really fun stuff with it. Uh, I don't know if I can actually like look at it is. Uh, any, um, ads and whatnot as it's going through, of course, but there's a whole series basically that he has. So he has one that's like, imagine you put in King and then there was another one that was like, lived in Scotland. Right. And those are two things that are pointed. And then it's like, okay, well that actually would end up basically, um, you know, there's, there's like another one that points to, yeah, murdered predecessor, like in Shakespearean language. So it was like building up this. So the idea is this, right. It was like. Was he gave it a prompt of the King doth wake tonight and takes his ruse or whatever. And it's like kind of building out this and then it ends up pointing to, you know, this, these arrows in this XYZ can do the same thing. But in a different era, basically, is this able to look up a similar King or queen or whatever it is, you know, and you could change it to queen and then it would follow a similar path. basically. I'm doing a bad job of explaining what's going on, but it's a great video. [00:25:10] Frank: You're creeping into how the networks actually work themselves. So, if we're drawing lines at layers, uh, everything I just said is basically all that exists with, uh, tokens at the token layer. Tokens take text, compress it down, Those compressed versions of the text get some kind of data associated with them called the embedding. After that, now more interesting things start to happen with these, uh, language models. They have the transformer attention mechanism for every token in the input. It looks at every other token in, uh, the context. And it starts comparing, comparing and contrasting, calculating, doing crazy matrix math, doing neural network stuff. And, um, it's fun after that. Uh, so that box is really big and hard to investigate, but what it really is doing is just finding associations between words in order to predict the next word. But yeah, finding this elaborate network of associations that you're walking through when you're talking about this arrow points of that, it's all these associations so that it figures out. To output its answer, which is quite vast and takes a lot of CPU time. [00:26:27] James: He had another one, which was showing, and this is a better example, but like imagine Germany and Japan. And if you had like Japan plus sushi, so Germany, Japan, and sushi, and there's basically like those in there. And what you could do is like draw the, the distance from. If you're in a 3D space from Japan to sushi, if you did that same arrow from Germany up, it actually points at bratwurst. Great example. Right? [00:26:52] Frank: Yeah. Uh, that, that's where I, I, sorry. Uh, you just pointed out where I made a mistake with my analogy when I said, uh, King minus man, I knew I was doing that wrong because, um, you want to find a Delta that you already know, you want to find an arrow, a Delta, and then you want to apply it to a different location just to see what kind of answers that would come up with. [00:27:12] James: Exactly. And [00:27:13] Frank: yeah. It's a fun way to navigate the database that's created out of all of this. Because, in the end, you want to query the network itself. That's the interesting thing, where it generates new answers. But, just this whole tokenization and embedding process, the process of it learning, associates all this interesting data with the tokens themselves. [00:27:39] James: Yeah, it's, it's very, very fancy and really interesting to think about. And this video does a great job and hopefully you've done a little bit of a good job. I think of just a little bit of the video, just kind of talking through it. Um, cause I do think that that end part is very fascinating. The, the guessing of the next word based on. The, the embeddings and the numbers, there's all these pluses and minuses that go into it. And then, uh, each of these different models also have like a temperature associated with it as well, because there's all sorts of things is there might be, if you had zero in the temperature, basically there's. The weights is basically the temperature is the waiting and it right. And that's, or no. [00:28:20] Frank: It's the amount of randomness. [00:28:23] James: Sorry. Okay. So there is, sorry, there's waiting already, but yeah, they're the amount of randomness that would go into the way and allows it to kind of have a little bit of fun. And it gives a good example is like, imagine you. Started typing once upon a time, there was a Right, and it would say, you want [00:28:40] Frank: a random answer or do you want the most predictable possible thing? Now, if you are working at work and you're querying the company database, you want the most accurate, predictable answer. But if you are brainstorming, you want some, you want some craziness in your life. You want a high temperature. Let's, let's, let's boil that pot. See what comes out. [00:29:02] James: Exactly. And in fact, if you go in to, let's say co pilot, like on your windows machine or Bing chat, there is the more creative, more balanced, more precise. And now I understand what the heck that thing is doing is just slightly adjusting the temperature. [00:29:19] Frank: For the record. We have literally talked about that on this podcast, but great. I'm glad it's, it's sinking in. We're getting it into you. [00:29:26] James: Frank. Now I understand a little bit. Uh, yeah, just a little bit. Um, [00:29:30] Frank: yeah. So this is a good point. Chance to let's rewind a tiny bit and, um, do the layers one more time, just because you jumped up to layer three layer. Once we take text, turn it into tokens. This is the most basic layer. This is mostly what we talked about. Layer two. A lot of magic happens. This is the generative prediction model. It takes those tokens, decides what token is going to come next. That is the AI in the AI. Okay, so that's level two let's call it. Level three is, it outputs a bunch of probabilities. What you do with those probabilities, if someone said to you, you have a 90 percent chance of winning a million dollars, With a 10 percent chance of dying tomorrow. Do you take it? So like probabilities, you know, they're tricky things. They're not always obvious, but anyway, this thing's going to output a probability. So we have to decide what we're going to do with those probabilities. And that's what you were talking about with the temperature. Um, When you're, when it's a high temperature, you're feeling saucy. If something's a little probability, you're like, let's do it anyway. Uh, when you're a low temperature, you're a little cold. Now, GROI breaks down. You just want the most successful, most optimistic, uh, result coming out of the thing. So. Three layers, tokenization, magic, sampling the probabilities. Those are the three layers. [00:30:50] James: There you go. Um, I'll link to this video that does a better job of me getting in the way of Frank, um, explaining it in more detail. That's good. I, I think what's nice is if you're listening to this, a little, again, hopefully it wasn't too hard to follow, maybe a little bit hard, but I do recommend watching this video as well. Like I said, I'll put it in the show notes because it's all visual. And I think that's, what's been helpful is that. When you and I talk about it, I didn't understand it. So, so actually if you've made it to the end, what I'd recommend is go watch this video and then go listen to this podcast again. Uh, and that might be a good, a good place to start because I think the visualization helped me better understand what's like, Oh, when you mean a token, what does that mean? Right. It's actually see the number. Oh, when you mean an embedding, what are the numbers that are there? Right. To get some visualization. And obviously you can't visualize The entire, like you said, coordinate system. And it even makes a joke. It's like, it'd be, it's like literally impossible for us to even fathom at 10, 000 coordinate system. You know what I mean? It just, you can't do it. Um, but it kind of like, you know, it goes through it. And I think the, the arrow pointing stuff is really. Really spot on. [00:32:04] Frank: Yeah. And it's a tricky one because the tokenization is taken a little bit for granted these days. Uh, people in the, uh, ML world, we don't talk about it much. Like it's, it's something we all suffer through because we acknowledge it needs to happen, but we don't like to talk about it. So you'll get fewer nice introduction videos on it and things like that. So it's kind of fun that we spent some time talking about the low level feature. Yeah. Um, but yeah, these, these things are a real deep dive and. I do have to say, um, if you're interested in how these things work, and you can handle reading a research paper, I know they, small font, they can be quite obnoxious sometimes, but there is a research paper called, Attention Is All You Need, And it's kind of what set this architecture afloat. And in it, you're talking about visual explanations is, um, a graphic showing how the transformer network works. And in my head, I always referenced that graphic. Even during our conversation today, I was always had that graphic up in my head because it's just a nice little. It's a nice little reference for me. So, um, three blue, 20 browns. Uh, I'm sure they're great. But, um, I just want to give props to the original authors of the original paper because they, they wrote such a graphic that has stuck in my head. [00:33:30] James: Yeah, I think that's a good thing. If you have a link, I can, I can shove it into a note somewhere over here as well. Uh, I think sometimes archive paper. Yeah. Yeah. And I think that's the hardest part too, is, um, a lot of times you're sort of thrown into the, when I watch demos of people doing things, like it's almost like, Oh, I don't even know, I don't even know those basic words and I know I don't know them a hundred percent yet. And even I, this video that is me, you know, not out yet, but we'll be out this Thursday that Frank was reviewing. Is in our. And we were talking about it in our Patreon is like, I still, the reason I didn't even go further in that video is because like, I don't feel comfortable a hundred percent doing a tutorial on like what that is actually doing. You know what I mean? I, I enjoy talking about things that I actually understand at a high level, because I think that's one thing is it's really easy to go from. To crawling to sprinting on this stuff. And sometimes you get into a session, you watch a video and they're already sprinting, you know, like, Whoa, whoa, whoa. I don't know, you know, and I get, I get flack sometimes on that too, is like, wow, James, why do you do so much beginner? Video self degree interviews? So, cause I was like, yeah, one, that's what I'm good at. And also. I've never been sprinting when it comes to mobile development. I stopped at a run. I cannot run, you know, that fast. I'm just, I am, I'm a, I'm a runner and that's it. You know, I, I don't go any, any more than that. Um, that's who I am. So it makes me really good at those explaining those other layers. So anyways, these analogies are terrible. That's why I'm not a professional analogy maker, Frank. [00:35:02] Frank: No, I, I, I think you could, uh, probably make some big bucks, but you'll, you'll find another way. [00:35:09] James: I will find another way. Um, all right, before we get out of here, we had a bunch of other exciting stuff. We'll, we'll talk about next week. Uh, all sorts of solution X files are back to XML, [00:35:20] Frank: XML, XML, XML forever. [00:35:22] James: Kind of makes sense. Yes. Projects are already XML. I'm just, you know, it's fine. Frank says it's fine. Uh, let's see if we got any questions on our YouTube. Uh oh. Yeah. Uh, on the Spotify, we're doing a free q and a. Yes. Uh oh. Yes. Good. It's q and a time. Yes. Uh, I think the one, uh, one when we just got one feedback, which is on the, uh, Adobe voice enhancer, as I didn't learn my, I did learn my lesson actually is that. Um, the audio version, the first time I ever did it, I took the finished track with the, uh, music and put it into the enhance and that messed up the audio. [00:36:11] Frank: Funny. Yes. Okay. So, [00:36:13] James: so I didn't do that on the last episode. So dunkle, uh, give that a listen. Uh, let me just double check a few things before we get out of here. Let's check the, uh, podcast episode comments. We got. Nothing. Perfect. Let's look at our email. We got emails. Oh, there's too many for me to parse in here. I don't see any feedback. I like YouTube comments, everyone. So appreciate YouTube comments. YouTube comments. That's the easiest. YouTube. com. Alright. Well, I think, um. That's going to do it, Frank. What do you think? I think we [00:36:49] Frank: had fun. I can't believe we talked about tokens for 30 minutes. We'll see if we can talk about XML for 30 minutes. There [00:36:56] James: we go. All right. Well, we'll catch up with you next week. So until then, I'm James Montemagno. And I'm Frank Krueger. Thanks for watching and listening. Peace.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features