Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:06
Welcome to Practical AI. If
0:09
you work in artificial intelligence, aspire
0:11
to, or are curious how AI-related
0:14
tech is changing the world, this
0:16
is the show for you. We
0:19
just dropped Dance Party, our
0:21
third full-length album on changelog
0:23
beats. Buy it on Bandcamp
0:25
and iTunes, or stream it on Spotify,
0:28
Apple Music, and the rest. Link in
0:30
the show notes. Thank you to our
0:32
partners at fly.io. Launch
0:34
your app close to your users. Find
0:36
out how at fly.io. Welcome
0:44
to another episode of Practical AI.
0:46
This is Daniel Wightnack. I am
0:48
the CEO and founder at Prediction
0:51
Guard. I'm joined as always
0:53
by my co-host, Chris Benson, who is
0:55
a tech strategist at Lockheed Martin. How
0:58
are you doing, Chris? Doing great today. It was nice seeing
1:00
you a few days ago in person. In
1:02
the flesh. In the flesh. Yeah, that
1:04
was great. I think you posted a
1:07
picture on LinkedIn, so if
1:09
anybody doesn't know what we look like and has
1:11
some crazy reason to want to know, there's a
1:13
smiling mug of us on Daniel's profile.
1:16
Yes. Yes. The
1:19
reason we met is I was on a client
1:22
visit on site and we were
1:24
prototyping out some stuff like chat
1:26
over your docs and natural language
1:28
to SQL stuff and all sorts
1:31
of things with Prediction Guard. One
1:34
of the models that we were using was
1:36
from Noose Research. That
1:38
works out great because we
1:40
have Karen Mohocha here who
1:42
is from Noose Research, co-founder
1:45
and researcher there. Welcome.
1:48
Glad to have you, Karen. Hey, all. Thanks
1:50
for having me. I'm extremely excited to chat with you
1:52
guys. Yeah, like I said, I'm
1:55
a huge... Well, this is
1:57
our first time meeting, but I feel like we're
1:59
already friends. because I've had
2:01
so much of my own benefit
2:03
and interaction in working with models
2:05
from Noose Research, a lot of
2:07
amazing models that you posted on
2:09
Hugging Face and research that you're
2:12
doing. I'm wondering if you could
2:14
just give us a little bit
2:16
of a background about Noose
2:18
specifically and kind of
2:21
how you came together as
2:23
researchers and started to me
2:26
from the sidelines that seem like, oh, all of
2:28
a sudden there's these amazing models on Hugging
2:30
Face and I don't know who these people are,
2:32
these Noose Research people, but they're amazing. So
2:35
give us a little bit of the
2:37
backstory there. Absolutely. Yeah. So just as
2:39
a general overview, we are one part
2:42
like open source research organization. We put
2:44
these models out for free. We
2:46
put a lot of research out for free, some data
2:48
sets so people can build on top of these open
2:50
models. On the other
2:52
hand, we're very recently a company
2:55
as well as C Corp. So
2:57
we've been working pretty hard after
2:59
getting some seed funding on building
3:02
together some exciting stuff I won't go
3:04
into on during the overview point, but
3:07
we're continuing to do our open source
3:09
research and development and release of models
3:11
indefinitely. The way we started
3:14
is very interesting. And it would be
3:16
pretty out of nowhere to the outside
3:18
for sure. It was it was extremely
3:20
fast for us. We're a
3:22
collective of people who have been playing
3:24
around in the open source language
3:26
model space for a while, ranging
3:28
from like GPT to release to llama
3:30
release to like the first Transformers paper,
3:33
we've got people from various
3:35
eras of gen AI of when they
3:37
came in. And for myself, it was
3:40
GPT to I stumbled upon a
3:42
colab notebook and started fine
3:44
tuning made some Edgar Allen Poe and
3:46
Lovecraft tunes. I've
3:48
done the same. That's awesome. And we
3:50
just got pulled into this
3:53
world of look at these next
3:55
token predictors that are just managing this
3:57
matter together the most wonderful and amazing
3:59
stories. that slowly turn into
4:01
a deeper and deeper dive of, well,
4:03
how can I use this for learning
4:05
information? How can I learn to use
4:08
this for production and automation? It's evolved
4:10
over time. For us, we started
4:13
off just working with
4:15
different open source collectives actually.
4:17
Once OpenAI released GPT-3 and
4:19
had closed sourced it, we
4:21
were used to open source GPT-2. We're like, oh man, what are
4:24
we going to do? How are we
4:26
going to continue to play with a level
4:28
of customization and interactivity that we had with
4:30
GPT-2? Then Iluther had
4:32
released GPT-J6B, the
4:35
cobalt AI community, this community of people
4:37
who tune models and inference models, started
4:39
to pop up, I think, around 2020,
4:41
2021 in the face of this. A
4:47
lot of us started to have places to
4:49
centralize and play with these models. We
4:51
got to contribute and learn how to
4:53
become better, open source AI developers, etc.
4:57
Eventually, there was a need for
5:00
more concrete organizations to do this
5:02
focused work on the creation of
5:04
these models. We were
5:07
stuck with OK architectures for a
5:09
while, like Pythia. But thanks
5:11
to Meta, we wouldn't be
5:13
here without Meta. I'll say that
5:16
first and foremost. The great Llama.
5:18
Yeah. Prior to Llama, everyone's like,
5:20
oh, Facebook evil, my
5:22
data, etc. Here
5:24
we are. They are like the
5:26
shepherds of this new era of the open
5:28
source AI movement. When Llama
5:30
came out, there was a paper that
5:33
came out called Alpaca by Stanford Lab.
5:36
This was about distilling data
5:38
from bigger models like
5:40
GPT-3, chat GPT, GPT-4, and being
5:42
able to train smaller
5:45
models on that distilled synthetic data,
5:48
something they called the instruction data. The
5:51
Alpaca format really opened up the
5:53
playing field for everybody to start making
5:55
these instruct style models, these actual four
5:58
prod use style models. So
6:01
there was an idea I had in my
6:03
head of well the alpaca guys are using
6:05
only GPT 3.5 outputs. What
6:08
if I only generated GPT 4 outputs? It'll
6:10
be a little expensive, but you'll
6:12
probably get a better model out of it than
6:14
alpaca. At the same time that I
6:16
was looking at this, there was a
6:18
guy on Twitter named Technium who had just
6:21
started putting together his own synthetic data set
6:23
based off alpaca and the GPT 4 only
6:25
as well. So I was
6:27
working with a group at the time called
6:29
Open Assistant under Lion. They're
6:32
a really big nonprofit. And
6:34
while I was working on that, we had
6:36
some GPUs. They were cool with us using
6:38
towards the development of new models. So I
6:40
reached out to Technium and they said, hey,
6:42
I have a little bit of compute. You
6:44
have GPT 4 data in the same format.
6:46
I have GPT 4 data in the same
6:49
format. Let's train a model. So
6:51
we trained a model called GPT 4 X.
6:53
Vicuña. This model was on
6:56
the Vicuña fine tune. We fine tuned to
6:58
fine tune basically. The Vicuña model was
7:00
a alpaca style fine tune and we tried our
7:02
data set on top of it. It
7:05
was good. It was okay. Then we
7:07
thought, you know, we'll probably get a better result
7:09
if we just train on the base llama model.
7:12
And the resulting model was the
7:14
very first Hermes model. Gotcha.
7:17
The OG. The OG. And that's kind
7:19
of how it started to come together
7:22
was we both had
7:24
a data thesis on use GPT 4
7:26
only and follow alpaca. And
7:28
we trained on llama and we got an Hermes. And
7:31
we didn't know what benchmarks were. We didn't
7:33
know anything about any of this
7:35
stuff. We just made a model. And
7:38
it got a ton of attention. We
7:40
put it out under this name noose research. Noose
7:43
comes from the Greek word for intellect.
7:45
We thought it was a good name
7:47
for AI company. But
7:50
it was just a place for, you know, fun
7:52
projects and fine tunes and stuff. It was just
7:54
a name we were using for our collaboration. And
7:57
people started swarming and asking, you know, what's
7:59
your name? news research? Like what's this sudden
8:02
like mystical like open source
8:04
organization that like put out this like best
8:06
model and we're like, yeah, best model like
8:08
we just you know, we just tried something.
8:11
It was it was really organic. And
8:13
it got to the point that people started telling
8:16
us, you know, you must have trained on the
8:18
benchmarks, like these are doing too well. And we
8:20
were like, what's benchmarks? We're not really like
8:24
coming from an academic place as much
8:26
as from like a enthusiast that became
8:28
so committed that it became our life, right?
8:31
It became our day to day. Yeah. So from
8:33
there, people started to ask us,
8:35
can I join news research? Now,
8:37
there wasn't a news research to join.
8:40
Just two guys, right? What
8:42
ended up happening was we formed a
8:44
private discord server. And we thought there's
8:46
a lot of people who range
8:48
from somebody who's like
8:50
1617 years old savant on
8:52
Twitter, hasn't even been a
8:54
college yet insane at transformer stuff to
8:57
mid 30s, you
9:00
know, working a really, really good fanged
9:02
ask job, and just wants to
9:04
really create and let loose. That was another class
9:06
of volunteer. And then you have, you know, older
9:09
gentlemen who has already exited a company or
9:11
something who has just been playing with code
9:13
for a while and wants to jump in
9:15
and hang out. So we ended up being
9:17
this really eclectic group, you know, we don't
9:19
know what your name is, we don't know
9:21
what your race is, we don't know your
9:23
gender or anything. It's just Discord profile picture,
9:25
Twitter profile picture, right? So we
9:28
came together, grew to about like 40
9:30
people all working together on various
9:33
different projects like Hermes tunes, data
9:35
synthesis, the capybara series, context length
9:37
extension, etc. And just from
9:41
this kind of interaction between Twitter and discord and
9:43
bringing people in that we thought were cool, we ended
9:46
up becoming what people will call open
9:48
source research org. Yeah,
9:51
that you sort of stumbled into
9:53
creating this amazing research
9:55
organization, which is ruling the world,
9:58
which is It's
10:01
what OpenAI might have been. Oh, well,
10:03
yeah. It's really sweet.
10:05
Thank you, guys. Yeah. And
10:07
I love it. It's so cool to
10:09
hear that story and that background. And
10:12
I see, like, in my own sort
10:14
of little snapshots here and there, like,
10:16
connecting that in my mind over the
10:18
past couple of years as I've seen
10:20
you all post different models and that
10:23
sort of thing. This is something, you
10:25
know, we've definitely touched on on the
10:27
show before, but some of our listeners
10:29
might not kind of fully
10:31
grasp when you say this sort
10:33
of, like, synthetic data sets that
10:35
you were focused on in this
10:37
alpaca format. Could you kind of
10:39
explain a little bit, like, we've
10:41
talked a lot about fine tuning
10:43
and, you know, preference tuning and
10:45
RLHF and different things, but what
10:47
does it specifically mean that, like,
10:49
you would take synthetic data?
10:51
What does that mean in your case?
10:54
And like, why does that result
10:57
in something good in fine tuning and
10:59
open model? People might think, oh, this
11:01
is synthetic data. Why should I expect
11:03
it to, like, be any good? So
11:05
could you kind of help explain that subject
11:07
a little bit? Yeah, absolutely.
11:10
So, I mean, out of context, synthetic is
11:13
like as meaningless as, like, artificial, right? It
11:15
could be data is data. But
11:17
in this case, it's referring to a particular
11:19
class of data that's been generated by another
11:22
language model or another AI, another
11:24
diffusion model, etc., that can actually
11:26
be used to further train models. Now, you might
11:28
say, why would you want to do something like
11:30
that? How is it helpful? What
11:32
was important to us is we were all GPU poor,
11:35
right? We were all running on laptops or maybe a
11:37
3090, maybe a 4090. As
11:40
individuals, we don't have data centers. So
11:43
training or even tuning, like, a large
11:45
model in the early days, like, 70
11:47
billion parameters, something like that, was just
11:49
unfeasible for us. And knowing
11:51
that GPT-3 is like something like 175 billion parameters
11:53
and 3.5 and 4 can only
11:57
go up from there, the
11:59
question became how can we make these
12:01
small 7 billion parameter
12:03
models even compete with
12:05
these massive ones? These
12:08
ones that I want to run offline, these ones that
12:10
I might want to run on an edge device, on
12:12
a phone, on a drone, etc. How can
12:14
I make them even useful? So
12:16
there's two things to talk about here. One
12:18
is synthetic data and the other is distillation.
12:22
Synthetic data is just referring to any
12:24
kind of data that's created by a
12:27
model in this case. The
12:29
reason that's useful is in
12:31
particular distillation. So if I told
12:33
you to go
12:36
study comp sci for 10
12:38
years, for example, and put in that massive
12:41
time investment and really focus on general
12:43
programming, and then I told you
12:45
now it's time for you to learn about AI and
12:47
transformers and stuff and put you through all the math
12:50
prerequisites, etc. You're going to come
12:52
out with a really strong foundation of
12:54
how to do the work, but the problem
12:56
is you've put in a massive time investment.
12:58
Now, if I take that guy who spent
13:01
10 years doing engineering, another
13:03
five years doing AI, and I ask
13:05
him, hey, can you teach somebody like
13:08
just really important, like compressed tidbits that
13:10
will help them just get up and
13:12
running to do the work? That's data
13:15
distillation, right? That's knowledge distillation. So you
13:17
look at these big models, like
13:19
a cloud or a 70B model or GPT-4,
13:21
and you can see like, they're amazing. They're
13:24
brilliant at everything. They have a bunch of
13:26
high-quality data they're trained on, and they have
13:28
a bunch of low-quality data
13:30
they're trained on that they
13:32
can interact with and express
13:34
in a high-quality form. So
13:36
instead of me having to read a massive
13:39
10 pager for why some
13:42
chemical reaction or some like tax-based process, whatever
13:44
you want it to be, like, instead of
13:46
reading a massive document on that and then
13:49
feeding that to a language model, we
13:51
can just have that really smart model
13:53
that already understands it really well, compress
13:55
that information into an instruction
13:59
or into a context. conversation until like
14:01
two sentences, three sentences, five sentences,
14:03
like half a page. And
14:06
we can just train a much smaller model
14:08
on that compressed information.
14:12
And it will learn the compressed
14:14
information, you know, to the degree
14:16
that a language model learned something, you know,
14:18
not perfectly. But because of that, what
14:21
the alpaca guys did was they generated
14:23
a bunch of seed tasks from GPT
14:25
3.5 various different domains
14:27
and topics and created these kind
14:29
of compressed instructions with the instruction
14:31
and input question from the user
14:33
and then an answer. So
14:36
the instruction could be like, given the following
14:38
math equation, explain step by step why
14:40
this is the answer. And then
14:43
the input is the equation, which is your
14:45
question. And then the output is the compressed
14:48
answer. So all of that we can
14:50
take as one sample in the data
14:52
set, and we can make hundreds of
14:54
thousands or millions of samples like that
14:56
of various different domains and various different
14:58
tasks. So the alpaca guys did this, less
15:01
than 100k examples, I believe, and they
15:03
trained the llama models on
15:06
these, and they found massive
15:08
boosts to performance that this
15:10
distilled information like a human
15:12
successfully compresses and transfers over.
15:15
So when I saw that, and then independently when
15:17
Technium saw that, and then independently with many others
15:19
saw that we were like, this
15:21
is so intuitive. This is exactly
15:24
how I've learned anything by just
15:26
going on discord and Twitter and bothering people to give
15:28
me the compressed bit of how I do something. We
15:31
should try doing this with even higher quality models than
15:33
3.5. So we
15:36
created, I can't remember the exact
15:38
number at the moment, but at least 50,000, maybe
15:42
100,000 examples originally for Hermes one
15:44
like this just using GPT 4.
15:47
And then we trained on that
15:49
and ended up getting performance that
15:52
was extremely, extremely like massive boost
15:54
compared to the other models that
15:56
were not trained using this kind
15:58
of method. So without
16:01
these giants that have
16:03
already established themselves in the space, we wouldn't
16:05
be here. Like without open AI, without meta,
16:07
like we literally wouldn't have the model and
16:10
the data to do the kind of work
16:12
that we did to make Hermes. What
16:14
it allowed for us is like for
16:17
local models to finally be like comprehensible
16:19
and for us to finally have like
16:21
offline capabilities to kind of take the
16:23
good stuff from something like GPT four
16:26
or something else and make
16:28
it uncensored. So it still has all
16:30
this understanding of all these
16:32
topics, but it doesn't have all that
16:34
RLHF inside it necessarily that safety eyes
16:37
is it so that when people
16:39
utilize the model has all this intelligence, but
16:41
it's has more freedom of thought to kind
16:43
of converse with you on topics that open
16:45
AI may reject. Gotcha. One of the things
16:47
I was curious about as you were going
16:49
through that was a few episodes
16:51
back, Daniel and I were kind of talking
16:53
about the effect of model licensing, you know,
16:55
on the community and the different
16:58
kind of licensing concerns that were coming out
17:00
from whether it be, you know, meta open
17:02
AI, you name the organization, is that ever
17:04
a challenge for you since you're kind of
17:06
using those to get started in terms of the
17:09
inputs, is that been a
17:11
concern or do you anticipate it
17:13
being a concern? I think that
17:15
of course, like generally like the
17:17
US international regulation on this stuff
17:19
is evolving. The conversation is evolving
17:21
very much. So naturally there's like,
17:23
you have to keep it top of mind. You
17:26
have to think about these kinds of things. But
17:28
thankfully, because all of our model releases are like
17:30
open source and we don't profit from them. Like
17:32
if somebody goes off and creates a product using
17:34
our model, you know, good for
17:37
them, but we don't necessarily take on
17:39
that liability or that worry of saying,
17:41
Hey, like we're going to sell you
17:43
this model that was created with GPT
17:45
for outputs. We, we actually actively try
17:47
to stay away from doing that. But
17:50
because the data distillation paradigm is so
17:52
effective, you know, if a model comes
17:55
out, that's better than GPT for, and it's open
17:57
source and I can use it locally. And
18:00
in their TOS, it says, you know, you
18:02
can use this to make a commercial model,
18:04
that we can apply the same techniques that
18:07
we've been preparing and researching and understanding from
18:09
these close models and use it there. So
18:11
right now, like, we don't stand to or
18:13
try to or have any plans
18:16
to profit from using any of these outputs.
18:19
We're not about that, because we want to
18:21
be careful and respectful of these model creators.
18:23
But that and these companies, but that being
18:25
said, we're learning all these techniques
18:27
and developing all these techniques that will be
18:30
useful for when that time comes and for
18:32
when that's available, especially with the advent of
18:34
something like Mistral. If we
18:36
do distillation from a Mistral model like Mistral
18:38
medium or something like that, that's
18:41
completely, from my understanding, you know,
18:43
barring their TOS saying otherwise, but
18:45
I believe it doesn't. It's
18:47
completely okay in that situation
18:49
for us to create models like this
18:51
that can be used commercially, etc. Regarding
18:54
the TOS stuff, though, like, as
18:57
much as we err on the side of caution, I
19:00
find it hard to see
19:02
a company enforce their TOS
19:05
when these larger models
19:07
are likely trained on
19:11
not all copyright free stuff.
19:13
Like, I find it
19:15
hard pressed to believe that these
19:17
closed source companies, their models are,
19:19
you know, totally copyright free and
19:21
totally copyright clean. So if
19:24
some other company that was feeling a little
19:26
more rank functions than ourselves was to say,
19:28
you know, we're going to commercially release some
19:30
of this, I imagine it'd
19:33
be difficult for them to
19:35
become after without the other group
19:37
opening their books. And there's actually
19:39
pretty interesting interaction that happened regarding
19:41
this between Google and open
19:44
AI, if you guys are familiar. So
19:47
yeah, I saw this interesting picture the other
19:49
day, it was like the interesting web of
19:51
AI, and it was like how Microsoft,
19:54
Google, open AI, like, it's like on
19:56
one side, there's the ones and it
19:58
shows how they're connected. to
20:00
the other ones is like
20:02
this visualization and like how
20:05
many of them overlap in
20:07
these strange ways between like,
20:09
whether it's together or Mistral
20:11
or meta, Google, Microsoft, OpenAI
20:13
is sort of very interesting
20:16
web of connections that probably
20:18
makes some of these things rather difficult.
20:20
Leave it for the lawyers to sort
20:22
out. Yeah. Yeah, that's the
20:24
thing is like, we can look at an
20:27
example, right? Like you hear that phrase like
20:29
good artists copy great artists right? Like, so
20:31
the data distillers, we're copying, right? Like, we're
20:34
just distilling this information, like we're trying to
20:36
like, make our models more like those. And
20:38
we don't really plan to commercialize, we're just
20:40
doing it for free for everyone. But the
20:43
great artists are, you know, Google, you know,
20:45
like, you look at Bard, and
20:47
it tells you, you know, I was made by OpenAI.
20:49
Now, it's fine for our open source model. So I
20:51
was made by OpenAI, because we're very transparent about this
20:53
is trained on GPT outputs. But when
20:55
Bard violates the TOS with a paid
20:57
product, bold,
21:00
yeah, that sounds like I was trained by OpenAI, right?
21:03
You think that OpenAI would come
21:05
after this multi billion dollar company,
21:07
like immediately, right? Instead, you see
21:09
a tweet from first, you see
21:11
Google deny it, then you see
21:13
a tweet from Sam Altman, which was something
21:15
along the lines of I'm paraphrasing
21:17
or something along the lines of I'm not
21:19
mad that they trained on our outputs. I'm
21:21
mad that they lied about it. And
21:24
I'm sitting there like, okay, you're mad
21:26
about this. But like, don't you aren't
21:29
you going to pursue the legal action in
21:31
your terms of services? No, no, because everyone
21:33
would have to open their books up to that
21:35
being said, I don't condone the
21:38
commercial use of that kind of stuff.
21:40
Like they release like making a paid model
21:43
from GPT for outputs. Like, I wouldn't advise
21:45
anyone sell a model made with them, just
21:47
because like, you know, we want to
21:49
respect people's like TOS and stuff. They worked
21:51
hard and spent billions to make this stuff
21:53
for hundreds of millions, however much they spent.
21:56
But there is certainly room for
21:58
hypothesis. in
22:00
that realm of the large
22:03
courts. So that's my thoughts
22:05
on the licensing stuff. And that's definitely my
22:08
own individual thoughts. Like we're
22:10
a pretty decentralized collective at News.
22:12
So you'll find people with all
22:14
sorts of opinions all over the
22:16
place. And as a company, we
22:18
don't hold any view whatsoever on
22:20
that. Yeah, I'm wondering, maybe this
22:22
gets a little bit to the
22:24
distributed nature of this, but I
22:26
know that there's sort of various
22:28
collections of what the Noose
22:31
Research Group has done over
22:33
time. You mentioned Hermes, but
22:35
then there's these other kind
22:37
of categories of things too,
22:39
like the yarn models, Capybara,
22:41
Puffin, Obsidian, just looking over
22:43
the hugging face now. I'm wondering if you
22:45
could just give us, like from your perspective,
22:47
a little bit of a map of
22:50
these different things and like how
22:52
people might categorize the different collections
22:55
of what Noose has done. I
22:57
definitely want to talk about like the future
22:59
things and ongoing things as well, but
23:02
as it stands now, what are
23:04
the kind of major categories of
23:06
what the collective has invested in
23:09
their time in over time? Certainly,
23:11
certainly. So within the stuff
23:13
that's newable on hugging face at least, we've
23:15
got the Hermes series of which,
23:18
like I told you guys, the initial
23:20
story of how it went down. But
23:22
from there, Technium kept going. I haven't
23:24
personally had any interaction with the Hermes
23:26
model since the initial. From there, Tech
23:28
just continued to create more and more
23:30
synthetic data, collect from more and more
23:32
sources, use more and more open data
23:35
sets. And he's just got the, I
23:37
guess, award-winning data thesis. The guy
23:40
really knows how to go about
23:42
curating and synthesizing good data. So
23:45
Technium, it's his baby, the
23:47
Hermes project. So everything you've seen since is
23:49
really his work and anyone who's kind of
23:51
collaborated with him, but almost like,
23:54
you can't call it anything but a solo project
23:56
because of the open data sets we use too.
23:59
Everything is built on that. the shoulders of giants and
24:01
the shoulders of each other as little people. But, uh,
24:03
tech really has helmed the Hermes initiative
24:06
so far. I think that's our most
24:08
popular model series and he released the
24:10
open Hermes as well, because we
24:12
had some data in the original Hermes that
24:14
we never released publicly and, uh, we wanted
24:17
to make that kind of an option for
24:19
everybody. So that's Hermes still
24:22
follows the same kind of philosophy of
24:24
synthetic data. And it now uses the
24:26
chat ML format instead of the
24:28
alpaca format is what we kind of upgraded to.
24:31
Then you've got a capybara and puffin, which
24:33
are both done by a volunteer and, uh,
24:36
you know, OG member LDJ. We
24:38
may be familiar with Luigi Danielle Jr. So
24:41
the capybara series was, uh,
24:43
using an amplify instruct method,
24:45
this novel method that, uh,
24:48
LDJ had worked on alongside another
24:50
one of our researchers, Jay. So
24:53
LDJ and Jay can get confusing, but,
24:55
uh, uh, the two of them worked
24:57
on the copybara series, created the dataset,
24:59
trained the models, and then puffin was,
25:01
uh, the idea of using handpicked
25:03
smaller samples from some of our
25:05
larger datasets to make sleek datasets for
25:08
an easy tune and see how
25:10
that works kind of, uh, in the
25:12
spirit of the Lima paper, where
25:15
they just used a few examples to get really
25:17
good results. Those are really
25:19
the popular tunes using synthetic data
25:21
for like general use. Yarn
25:24
is this novel context length extension
25:26
method at the time of creation
25:28
by Imozilla, also known as
25:30
Jeffrey Kanell and Bowen Pang,
25:32
also known as block 97 alongside,
25:36
uh, Enrico Chipotle and,
25:38
uh, Eleuther AI. So what
25:41
happened there was these guys were already looking
25:43
into context, like the extension for a while.
25:45
And, uh, when we kind of
25:47
came under the noose banner to do the work,
25:50
uh, it opened up a little bit
25:52
of resources from compute sponsorships. It opened
25:54
up a more centralized place for them
25:56
to be able to do that collaboration.
25:59
I had. no hand in the
26:02
yarn models whatsoever. And that's the exciting
26:04
thing is everyone really gets to
26:06
work in their own spheres, in their own kind
26:09
of autonomous circles. And then we just check
26:11
in and see, you know, how's the research going?
26:13
How's it coming along? Because we really work with
26:15
people that we heavily believe in and we believe
26:17
in their idea. So if
26:20
we don't already have an idea, we're kind of
26:22
just saying, you know, please freely create because
26:25
we brought you in because what you will
26:27
freely create will push forth our agenda anyway.
26:29
So I think those are our big model
26:32
releases and series that we have available. Outside
26:35
of that, we have a bunch of stuff
26:37
on our GitHub as well. Stuff
26:39
that's being worked on, stuff that hasn't necessarily come
26:41
out yet. There's a lot of that. So
26:45
I got a question for you as a follow up. It's
26:47
pretty fascinating, the story that you've been telling
26:50
us here, because of that kind of organic,
26:52
you know, creation of the organization or collective.
26:56
And I'm wondering as you've done that and you
26:58
kind of went through and talked about the different
27:00
model groups, and kind of talked about, you know,
27:02
the owners or spiritual owners, if you will, of
27:04
each of those families, how do
27:06
the different members of the collective interact
27:09
to kind of share? Like how do
27:11
you each push each other along or
27:13
share information or give ideas so
27:16
that cross-family efforts can kind of benefit
27:18
from the overall collective? And as you
27:20
said now, a C corp and you
27:23
guys are more organized at this point.
27:25
So what kind of culture has developed
27:27
around those communications and learnings? Yeah,
27:30
absolutely. I mean, when it started, it was
27:32
just like a small discord, maybe like 10
27:34
people. From there, like we kind of
27:36
created more channels as people wanted to work on
27:38
more things. And we had
27:40
initially split up into like three,
27:43
four different topics or sectors that people
27:45
could assign themselves to. One
27:47
being data synthesis, of course, so we can
27:49
kind of find new novel methods and formats
27:51
for distillation and the creation of synthetic data.
27:54
One Being training, like people who are
27:56
just like really good at training, hyperparam
27:59
stuff. People who will come
28:01
up with new architecture is a new
28:03
techniques and other being agents. A. Group
28:05
of people who want to actually try to build
28:07
tools and do autonomous work with this stuff. And.
28:09
Then we have this one category that at
28:11
it was a prediction for the future of
28:14
the simulation so we're happy with the were
28:16
very interested in kind of bringing this stuff
28:18
into simulation and unity into it and seeing
28:20
how all these things came together and. Is
28:22
interesting because. The. Training built
28:24
on the data synthesis, The agents build on
28:27
the training and then the same would build
28:29
on the agents was kind of. The ideas
28:31
of everybody needed to work together because all
28:33
those things are so intrinsically connected. but people
28:36
would have specializations on and were in that
28:38
were floating Wanted to work. We.
28:40
Didn't end up doing a lot on the same
28:42
side of things. Now recently, there's a lot more
28:44
interest. Or because we have a
28:46
lot more you know, capability generally as
28:49
the Ai community does, he know. But
28:51
as we've grown. To we went
28:53
to forty people. It was fine. I was
28:55
going to like five doesn't live with me
28:57
to scores it's it's a little on. We'll
28:59
be there. So what we do is we
29:02
kind of tier people in. you come into
29:04
the discord, you to see maybe two channels
29:06
and then we'll give people a developer role
29:08
of we don't really let people select their
29:10
own rose because we wanted nature beating kind
29:12
of sort through people. we knows her and
29:15
let them through and even as we do
29:17
open source research a lot of his unreleased
29:19
and we want to make sure that it's
29:21
kind of protected before. Release. So we
29:23
create this developer role so people
29:26
can than see like way more
29:28
channels of just general development, adult
29:30
conversation and from there as we
29:32
see you know contributors who have.
29:35
Started. To do more work or show
29:37
more passion towards contributing to news in
29:39
a particular field or who have some
29:42
reputation or some portfolio on a particular
29:44
field then will assign them one of
29:46
those roles. And that will open
29:48
up the family of channels relating to
29:50
those roles and our current projects surrounding
29:52
that role. So it does and says
29:54
projects, agent, projects, training, projects, etc. So
29:56
we kind of just tear it out
29:59
so people can. The Rack. And
30:01
people have been around for awhile of people
30:03
we consider cellos are part of the towards
30:05
the can usually see pretty much everything. So.
30:08
They're pretty effective in serving as
30:10
coordinators for the cross communication between
30:12
these different channels and grooves. and
30:15
even if something has like a
30:17
particular. Ah, someone has a particular
30:19
role or some ten or as a
30:21
particular role it's supposed to be a
30:23
part of like it's still discord and
30:26
we're still very chill so like people
30:28
will still work on like various different
30:30
overlaps inside of just once in. A
30:44
fearless and then you know
30:46
that artificial intelligence is revolutionizing
30:48
the way we produce information
30:50
changing society, culture, politics, the
30:52
economy. but it's also created
30:54
a world of he I
30:56
generated content including deep fakes.
30:58
saw can we tell what's
31:00
real on mine? Read: write
31:02
own building next era of
31:04
the Internet A new book
31:06
from entrepreneur and investor Chris
31:08
Dixon exposed one possible solution
31:11
to the internet's authenticity problem:
31:13
block scenes. From Ai that
31:15
tracks is source material to dinner
31:17
of programs that compensate rather than
31:19
cannibalize graders, Read Write Own is
31:21
a call to action for a
31:23
more open, transparent and democratic Internet.
31:25
One that opens the black box
31:27
of a I tracked the origins
31:29
we see our minds and much
31:31
more. This is our seems to
31:33
reimagine world seems in technologies to
31:35
build the internet. We once not
31:37
them. while we inherited. Order your
31:39
copy of Read Write Own Today
31:41
or go to Read And Write
31:43
Own. Dot Com and learn more.
31:51
And. I.
31:57
Have a I selfish question.
32:00
Which now that this is one of
32:02
the advantages of doing a podcast, they
32:04
get to talk to all the amazing
32:06
people doing amazing things and learn from
32:09
them. But I'm wondering as a person
32:11
who is also trying to find to
32:13
and some models either just for my
32:15
own enjoyment and and learning but also
32:17
fine tuning models for specific tasks and
32:20
in a specific ah customer use cases
32:22
and that sort of thing is a
32:24
lot of people out there. I think
32:26
many of our listeners who are thinking
32:28
like. Since. You being part of
32:31
this collective have worked for you know,
32:33
since the sort of dawn of of
32:35
these many you know the proliferation of
32:37
fine tuned for from La Minor Excedrin
32:39
And as you've seen all that as
32:41
you're doing, more and more fine tuned
32:43
now as you're looking towards the future,
32:46
Do you have any. Kind
32:48
of good advice or. Things.
32:50
To keep in mind for all those
32:53
like fine tuners out there that are
32:55
thinking about grabbing something off of hugging
32:57
face, creating their own versions of these
32:59
models. Maybe they have their own ideas
33:02
about a specific take on on a
33:04
model. Any general tips that you found
33:06
to be really useful over time or
33:08
like pitfalls that you'd like to highlight.
33:11
yeah, I mean I can. I can
33:13
try to think of a few of
33:15
the top of his. I'll say that.
33:18
Type of or hammers are really important. And
33:20
ah, it's important to try to get that right.
33:23
It's. Going to vary from model model but
33:25
a lot of the time from people think
33:27
hyper brands like don't really matter as much
33:29
to but obsess over and some people think
33:31
it's a secret sauce as well. so I'd
33:34
say like trying to do a lot of
33:36
research into a good for hims a good
33:38
learning me. I'd also say like
33:40
I could be totally wrong about this as
33:43
I'm not the trainer of Hermes today as
33:45
a lot of these models with something I
33:47
personally believe in a lot is like ignore
33:49
like people telling you to only train for
33:51
which x amount of time with if you're
33:53
not overfeeding like just keep like if you
33:56
can if you have the computer keep training
33:58
and keep him. trained for
34:00
more tokens, more epochs. That's something I
34:02
heavily believe in. In terms
34:04
of trainers to use, there's a lot
34:06
of people who make their own scripts
34:08
for specialty stuff. And there's, of
34:11
course, you can just use HingVase. The
34:14
library we use is called
34:16
Axolotl, A-X-O-L-O-T-L, like the animal.
34:20
The Akashious Wing Leon
34:22
of the Open Access Collective. We
34:24
think Axolotl is probably the best
34:26
general purpose trainer for LoRa's, Q-LoRa's,
34:28
Finetunes, et cetera. Any
34:31
open source repository is buggy and stuff
34:33
you're going to have to work out.
34:36
But it's, in my opinion,
34:38
probably the easiest and most effective
34:40
trainer to use for pretty much
34:42
any model architecture available right now.
34:45
So I definitely point everybody towards
34:47
Axolotl. Awesome. Yeah, that's
34:49
super useful. We'll share some links in
34:52
our show notes as well. So people
34:54
make sure and check that stuff out.
34:56
Another interesting question, as
35:00
you see, I
35:02
think we saw these waves
35:04
of models that came out
35:06
maybe around synthetic data, Finetunes,
35:08
or other types of Finetunes.
35:10
I see this interesting thing
35:12
happening over the past, however
35:17
many months, not that long in the
35:19
scheme of things, but in the AI
35:21
world, maybe a while, where we're now.
35:24
There's a lot of interesting approaches,
35:26
more so than just Finetunes, but
35:28
mixture of experts and merging, and
35:31
of course, multimodal stuff coming out. Now
35:33
I see Noose kind of dabbling in that.
35:35
You don't have to answer for the whole
35:37
collective. But as there's so many of these
35:40
things coming out and different approaches, what
35:42
are some of the things within
35:44
that? It doesn't have to be one
35:46
of those. But what are some of the things on
35:48
your mind moving forward or
35:51
on Noose's mind more
35:53
generally? Sure. I'll try to
35:55
go from simple to
35:57
complex on the kind of stuff. I
36:00
think that definitely just like straight
36:02
up instruction tuning is great. There's
36:05
other ways to tune like the Evol
36:07
instruct method. I would advise
36:09
people to try to create new instruction
36:12
methodologies that allow us to make even
36:14
better formatted data. People don't
36:16
spend enough time trying to create new instruct
36:18
formats. And we've definitely been
36:20
swamped with not doing that as well. So
36:22
I think towards the general community, it's a
36:24
really easy place to get started. You don't
36:26
need to really know how to code so
36:29
much as think about how a human might
36:31
more effectively phrase something or format something
36:34
and kind of remix from there. I think that's
36:36
like probably the easiest place to start. Then
36:39
there's a model merging, right? Model merging
36:41
is great. You can just like take
36:43
two models and Frankenstein them together to question
36:45
mark results. You know, you got to
36:47
just try and see what happens and feel it out.
36:50
Then from there, I would say
36:52
there's stuff like DPO. There's
36:55
RLHF, DPO, these kind of rewards
36:57
things that can let you like
37:00
enable rejections or create censorship or
37:02
put some kind of general concept
37:04
or attitude towards the model. We
37:07
found that to be pretty effective with the
37:10
latest noose Hermes mixture DPO. It
37:12
seems like people really like it and prefer it over just
37:14
the SFT. So
37:17
that's another thing that I'd heavily recommend. From
37:19
there, we get a little more
37:22
complex. We have some reward model
37:24
stuff we're working on that I won't speak to just
37:26
yet outside of saying we're working on it that we
37:28
think is going to be like pretty big for reasoning
37:30
boosts. Of course, there's techniques like
37:32
chain of thought and tree of thought for like
37:34
multi-step prompting. Creating
37:37
data sets even out of that for
37:39
any of these purposes that are already mentioned is
37:41
going to be really effective. Now
37:44
to stuff that maybe not everybody can actually
37:46
a lot of people would already be able
37:48
to do this. Here's like something that we
37:50
like to call over at noose activations hacking
37:53
where you're kind of messing with the
37:55
way that a model I'm trying to think
37:57
about how to say this in like the
37:59
most laid out. terms like you're trying to
38:01
mess with how a model like generally vibes
38:04
about something so
38:06
rather than just doing a system prompt or something
38:08
like that you can actually like
38:10
change the the model vectors to kind
38:12
of be like more political about something
38:14
less political about something more terse more
38:16
specific again that's far more effect and
38:19
control over a model than a system
38:21
prompt it's basically like a system prompt
38:23
that like tells it to embody certain
38:25
characteristics but it's not something you can
38:27
really jailbreak or get around as
38:30
far as my testing is shown certainly not
38:32
as easily as a system prompt like
38:35
we have no problem jailbreaking even the
38:37
most censored closed models today like
38:39
it can be done by anybody with
38:41
the right words right but um this activation
38:44
stuff it really creates a bit more of
38:46
a robustness and fidelity to the concepts that
38:48
you're trying to tell it to embody there's
38:51
a few more I'm trying to think of that would
38:53
be useful for people one
38:56
thing is soft prompting it's not really
38:58
around anymore it used to be pretty
39:00
big during the GPTJ like pretty llama
39:02
days when the cobalt AI
39:04
guys really pioneered the use of it in
39:06
the open source community but a
39:08
soft prompt basically takes like massive prompt
39:10
and compresses it down to like way
39:12
less tokens so you can give your
39:14
model like a huge a huge
39:17
system prompt or huge amount of information
39:19
and use like way less tokens so
39:22
soft printing is cool it's not gonna be
39:24
too difficult to like update it for like
39:27
llama, Mistral, like today's architectures it's just like
39:29
nobody has really done it that I've seen
39:32
so you know to the community if
39:34
you guys do that please share that's
39:39
actually much easier than the activation
39:41
stuff I think and then finally
39:43
probably the hardest unsolved is like
39:46
sampling methods like today
39:48
we use like top K top P
39:51
like you know nuclear sampling center or
39:53
whatever like there's better ways to
39:55
pick tokens for sure there's better ways
39:57
to judge the value of tokens for
39:59
sure Everyone has been too
40:01
kind of concerned with higher levels to
40:03
get that low and do whatever
40:06
the magic math is that I can't do that
40:08
would, you know,
40:10
enable some steering and
40:12
some even beyond
40:14
steering like alternative sampling paradigms.
40:17
And I think that would probably
40:19
bring the biggest change in transformation
40:21
to literally all models regardless of
40:24
the tune regardless of the architecture
40:26
etc. Get pulled
40:28
off so really looking forward to something like
40:30
that happening in the space. That
40:32
was a lot of really good advice that you have
40:34
there. I was sitting there trying to take notes while
40:37
you're talking through it and everything going wait, but
40:39
he said that too and he said that too.
40:41
There's a really good answer there. Thank
40:44
you for that. As we're starting to wind
40:46
up here, I wanted to ask
40:48
you, I know about as we're recording this, it
40:50
looks like it was just over three weeks ago,
40:53
about four weeks ago when we released this
40:55
episode. You guys announced your
40:58
$5.2 million seed financing
41:01
round. So congratulations on
41:03
that. That was pretty amazing. Thank you.
41:06
And I'm kind of wondering, so like you've
41:08
kind of started with this kind of fairy
41:10
tale story of kind of organically building from
41:12
the ground up, you know
41:14
yourself, you connect with somebody else,
41:17
a few other people join, you
41:19
get to thousands of people contributing,
41:21
you find and really producing amazing
41:23
work. And then
41:25
you're incorporating and now you got
41:27
the seed round coming. Where
41:29
does that lead you? It's kind of a sky's
41:32
the limit kind of scenario it seems, you know,
41:34
that now that you're kind of launching
41:36
and, you know, on that, you know, as
41:38
a corporation, as you said, where can
41:40
you go from here? What do you anticipate
41:42
over the next couple of years or
41:45
even several years out? You know, what's the vision?
41:47
What do you want to achieve? You've come a
41:49
long way so far. What's next? AGI.
41:51
No, I'm just kidding. Believe
41:55
you if you said it, actually. I
41:58
mean, like, you know, someone will do it. But
42:00
then you'll distill the knowledge.
42:03
Then we'll distill and then you'll run
42:05
the API on your neural link, on
42:07
your contact lens or something. But
42:12
for us, there's a huge focus
42:14
on locality. There's a huge focus on offline.
42:16
There's a huge focus on take the power
42:18
back, run the model yourself, do everything at
42:20
home. That's big for us.
42:22
And at the same time, of course, we believe in scale.
42:24
But there's this idea that there's so much unsolved at
42:26
the small model size. Why don't we
42:29
do that before we go to a trillion per
42:31
amps? Because we can scale those realizations. But
42:34
for us, there's certainly a transformation
42:36
and change in attitude and pressures
42:38
from going from pure open source
42:41
volunteer to as well having this
42:43
more corporate branch created as well.
42:46
But that being said, it's been pretty
42:49
consistent, our ethos and our motivation for
42:51
why we do this. And
42:54
like you said, it really was organic in
42:56
the sense that we're a product of the
42:58
times, we're a product of the atmosphere of
43:00
the community. People have said
43:02
nice things like you guys are setting the trend. And
43:04
it's not really true so much as the truth is
43:06
like, we are one of many embodiments
43:09
of the sentiment that the community has and
43:11
that the world has, we think. There's
43:14
more than one news research in this world. There's
43:16
alignment labs, there's Pygmalion, there's Cobalt, there's people who
43:18
have been around before us, people who will come
43:20
along the way, people who have already formed
43:22
since we have. And
43:24
there's lots of people who have kind
43:27
of embodied the news research ethos. And
43:29
it's not really just our ethos as
43:31
much as the overall community's ethos. They're
43:33
people who have come before us, people who
43:36
will come along the way, who do
43:38
very, very similar style of work
43:41
as us, this kind of open work. And
43:43
I think that's got everything to do with the fact
43:45
that like, this is what the
43:47
people want. We're just the everyman, just
43:50
like everybody else. We're not like billionaires
43:52
or super like all ex-Facebook
43:55
or anything like that. We're
43:57
just a bunch of people who really.
44:00
really care about this who want to
44:02
see everyone have access to
44:04
language models, everyone be able to automate
44:06
their lives, everyone be able to push
44:09
their understanding of any topic to the
44:11
next level. And our
44:13
work as we become an organization that's
44:16
looking to be a company
44:18
and create revenue, etc. We
44:21
won't let it tamper or hinder
44:23
any of the open source work
44:25
we do. In fact, we want
44:27
it to empower all of that
44:29
work because we believe that the
44:31
tools and the developments and services
44:33
that we will be providing as
44:35
a corporation will only serve to
44:37
better feed the entire open source
44:39
community. We're not really looking to
44:42
suddenly make like a closed Hermes
44:44
or something like that. We're more
44:46
looking to create tools and do
44:48
research that makes your open Hermes
44:50
far more effective, far better and, you know,
44:53
good enough that you may want to pay
44:55
for that tool. It
44:58
sounds like something I would pay for. That's for sure.
45:02
Yeah, it's super inspiring. I
45:04
really appreciate you taking
45:06
time current to talk with us. I've
45:08
thoroughly enjoyed this because I am such
45:10
a fan of everything you all are
45:12
doing and the community that you've built.
45:14
So thank you for saying true to
45:16
that culture and what you're doing. And
45:18
I'm really looking forward to seeing what
45:21
happens in the future and where things
45:23
head. And I hope that we can
45:25
talk again and have noose back on
45:27
the show. And in a year when,
45:29
of course, everything will be different in
45:31
the world, and I'm sure you'll still
45:33
be doing interesting things. So yeah, you're
45:35
always welcome back on the show. Thank
45:37
you so much. It's been a pleasure
45:39
to chat with you guys. Thanks for
45:41
being so candid. I'm glad
45:43
we were able to kind of push our message forth
45:45
more and thanks for the validation you and the community
45:47
have given us to keep doing this great work. All
45:50
right. Thanks. We'll talk soon. See ya. That
46:00
is Practical AI for this week, thanks for listening.
46:03
Subscribe now, if you haven't yet,
46:05
head to practicalai.fm for all the
46:07
ways. And don't forget to check
46:09
out our fresh changelog beats. The
46:12
dance party album is on Spotify, Apple Music,
46:14
and the rest. There's a link in the
46:16
show notes for ya. Thanks
46:18
once again to our partners at fly.io,
46:21
to our beat freakin' residents, Breakmaster Cylinder,
46:23
and to you for listening. That's all
46:25
for now, we'll talk to you again
46:27
next time.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More