Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:06
Welcome to Practical AI. If
0:09
you work in artificial intelligence, aspire
0:12
to, or are curious how
0:15
AI-related tech is changing the world,
0:17
this is the show for you.
0:20
Thank you to our partners
0:22
at fly.io, the home of
0:24
changelog.com. Fly transforms
0:26
containers into microvms that run on
0:28
their hardware in 30 plus regions
0:30
on six continents. So you can
0:33
launch your app near your users.
0:36
Learn more at fly.io. Welcome
0:44
to another fully connected episode
0:46
of the Practical AI podcast.
0:49
In these fully connected episodes, we try
0:51
to keep you up to date with
0:53
everything that's happening in the AI and
0:55
machine learning world and try to give
0:57
you a few learning resources to level
1:00
up your AI game. This
1:02
is Daniel Weitnack. I'm the founder and
1:04
CEO of Prediction Guard, and I'm joined
1:06
as always by my co-host Chris Benson,
1:09
who's a tech strategist at Lockheed Martin.
1:11
How are you doing, Chris? Doing
1:13
very well, Daniel. It's enjoying
1:15
the day. And by
1:17
the way, since you've traveled to the
1:20
Atlanta area tonight, we haven't gotten together,
1:22
but you're just a few minutes away.
1:24
Actually, so welcome to Atlanta. Just got
1:26
in. Yeah, we're within not
1:28
maybe a short drive, depending
1:31
on your view of what
1:33
a short drive is. Anything under
1:35
three hours is short in Atlanta. And I think you're
1:37
like 45 minutes away from me right now. Yeah,
1:40
so hopefully we'll get a chance to
1:42
catch up tomorrow, which will be awesome
1:44
because we rarely get to
1:47
see each other in person. It's
1:49
been an interesting couple of weeks for
1:52
me. I, so for
1:54
those that are listening from
1:56
abroad, maybe we had some
1:58
major ice. in snow type
2:01
storms recently. And
2:03
my great and embarrassing
2:05
moment was, I was walking back
2:07
from the office and the like
2:10
freezing rain and I flipped and
2:12
fell and my laptop
2:14
bag with laptop in it broke
2:16
my fall, which is maybe good,
2:19
but that also broke the laptop.
2:22
So actually the laptop works,
2:24
it's just the screen doesn't work. So maybe
2:26
I'll be able to resolve that, but it's
2:28
like a mini portable server there, isn't
2:30
it? Yeah, exactly. You
2:32
have enough monitors around, it's not that much of
2:34
an issue, but yeah, I
2:37
had to put Ubuntu on a
2:39
burner laptop for the trip, so
2:41
yeah, it's always a fun time.
2:45
Speaking of personal devices,
2:47
there's been a lot of interesting
2:49
news and releases, not
2:53
of, well, I guess of
2:55
models, but also of
2:58
interesting actual hardware
3:00
devices related to
3:02
AI. Recently, one
3:04
of those is the Rabbit R1,
3:07
which was announced and
3:10
sort of launched to pre-orders with a
3:12
lot of acclaim. Another
3:14
one that I saw was the AI Pen,
3:17
P-I-N, which is like a little, I don't
3:19
know, my grandma would call it a brooch
3:21
maybe, like a
3:24
large pen you put on your jacket
3:26
or something like that. I am
3:28
wondering, Chris, as you see
3:31
these devices, and I wanna dig a lot
3:33
more into some of the interesting research
3:35
and models and data behind some
3:37
of these things like Rabbit, but
3:39
just generally, what are your thoughts
3:41
on this sort of trend of
3:46
AI-driven personal devices to
3:48
help you with all of your personal
3:50
things and plugged into all of your
3:52
personal data and sort of
3:54
AI attached to everything in your life? Well,
3:57
I think it's coming. Maybe it's here.
4:00
But I know that I am definitely torn. I
4:02
mean, I love the idea of all this help
4:04
along the way There's so
4:06
many like I forget everything I'm terrible if
4:09
I don't write something down and then follow
4:11
up on the list I
4:13
am NOT a naturally organized person So
4:16
my wife is and my wife is always
4:18
reminding me that I really struggle in this
4:20
area and she usually she's She's
4:22
not being very nice in the way that she
4:24
says it. So, um, it's all love I'm sure
4:26
but yes So part of me
4:28
is like wow, this is the way I
4:31
could actually you know be all there
4:33
get all the things done But
4:35
the idea of just giving up all my data and
4:37
just being It's like
4:39
so many others. It's not that
4:42
aspect is not appealing. So yeah, I
4:44
guess I'm I'm not Leaping
4:46
how much different do you think
4:48
this sort of thing is then?
4:51
Everything we already give over with our
4:53
smartphones. It's a good point you're making
4:56
I mean we've had computing devices with
4:58
us in our pocket or
5:00
on our person 24-7 for
5:03
what at least the past 10
5:06
years, you know for at least for
5:08
those that adopted the iPhone or whatever
5:10
when it came out But
5:12
um, yeah, so in terms
5:14
of location Certainly
5:16
account access and certain
5:19
automations What do you
5:21
think makes it because obviously this
5:23
is something on the mind of the makers
5:25
of the devices because I
5:27
think both the AI
5:30
pen and the rabbit are one make
5:32
some sort of explicit statements in
5:34
their launch and in their website about Privacy
5:37
is really important to us. This is how
5:39
we're doing things to because we really care
5:42
about this So obviously they
5:44
anticipated some kind of additional reaction,
5:46
but we we all already have
5:48
smartphones I think most of us if
5:50
we are willing to admit
5:52
it We know that we're being tracked
5:55
everywhere and all of our data goes everywhere.
5:57
So I don't know. What is it
5:59
about this AI? element that you
6:01
think either makes a an
6:03
actual difference in terms of the substance
6:05
of what's happening with the data or
6:08
is it just a perception thing? It's
6:10
probably a perception thing with me. I
6:12
mean because everything that you said I
6:14
agree with you're dead on and
6:17
we've been giving this data up for years
6:19
and we've gotten comfortable with it and that's
6:21
just something that we all kind of don't
6:23
like about it but we've been accepting it
6:25
for years and I guess
6:27
it's the expectation that with these AI
6:29
assistants that we've been hearing about for
6:31
so long coming and we're starting to
6:34
see things like the rabbit come into market
6:36
and such that there's probably a whole new
6:38
level of kind of analysis of
6:40
us and all the things and in a
6:43
sense knowing you better than you do that
6:45
is uncomfortable and probably will not
6:47
be as uncomfortable in the years to come
6:49
because we'll grow used to that as well
6:52
but I have to admit right now it's
6:54
an emotional reaction it makes me
6:56
a little bit leery. Yeah maybe
6:58
it's prior to these sorts
7:01
of devices there was
7:03
sort of the perception at
7:05
least that yes my
7:08
data is going somewhere maybe
7:10
there's an nefarious person behind
7:12
this but there's sort of
7:14
a person behind this like the data
7:16
is going all to Facebook or meta
7:19
and they're like maybe they're
7:21
even listening in on me and
7:23
putting ads for mattresses in my
7:26
feed or whatever the thing is
7:28
right so that perception has been
7:30
around for quite some time regardless
7:32
of whether Facebook is
7:35
actually listening in or whatever or it's
7:37
another party like you know
7:39
the NSA and the government's listening in
7:41
but I think all of those perceptions
7:43
really relied on this idea that even
7:46
if there's something bad happening that I
7:48
don't want happening with my data there's
7:51
sort of a group of people back
7:53
there doing something with it and now
7:55
there's this sort of idea of this
7:57
agentic entity behind the
7:59
scenes that's doing something with my
8:02
data without human oversight. I think
8:04
maybe that's if
8:06
there's anything sort of fundamentally different
8:08
here I think it's the level
8:10
of automation and the sort of
8:12
agentic nature of this which does
8:15
provide some sort of difference. Although
8:18
there's always like you know if
8:20
you're processing voice or something there's
8:23
voice analytics and you can put that
8:25
to text and then there
8:27
are always NLP models in the background
8:29
doing very various things or whatever so
8:31
there's some level of automation that's already
8:33
been there. I agree and I think
8:35
but you mentioned perception up front and
8:37
I think that makes a big difference.
8:40
I guess with like you mentioned NSA intelligence
8:43
agencies I think we all
8:45
just assume that they're all listening to all the
8:47
things all the time now and
8:49
that's one of those things that's completely beyond
8:52
your control and so there's
8:54
almost no reason to worry about it
8:56
I suppose unless you happen to be one
8:58
of the people that an intelligence agency would care
9:00
about which I don't particularly think I am. So
9:03
it just goes someplace and you just kind of
9:05
shrug it off. There's a certain amount of what
9:08
we've done these years with mobile where it's you're
9:11
opting in I think it's leveling up to
9:13
we're saying with some of these AI agents
9:15
coming out we know how much data about
9:17
ourselves is going to be there and so
9:19
it's just escalating the opt-in up to a
9:21
whole new level. So hopefully
9:24
we'll see what happens. Yeah hope it
9:26
works out well. We haven't really for
9:28
the listeners maybe that are just listening
9:31
to this and haven't actually maybe you're
9:33
in parallel doing the search and looking at these
9:35
devices but in case you're on your run or
9:37
in your car we can
9:40
describe a little bit so the I described
9:42
the AI pen thing a little bit the
9:44
rabbit I thought was really really
9:46
cool design it I don't know if
9:48
there's any nerds out there that love
9:51
this sort of synthesizer
9:54
analog sequencer teenage engineering
9:57
stuff that's out there
10:00
But actually the hardware design, teenage
10:02
engineering was involved in that in some way.
10:04
So it's like a little square thing, the
10:06
Rabbit R1. It's got one
10:09
button you can push and speak
10:12
a command. It's got a little
10:14
actual hardware wheel that you can
10:16
spend to scroll. The
10:19
screen, they show it as black
10:21
most of the time, but it
10:23
pops up with the song you're
10:25
playing on Spotify, or some of the
10:27
things you would expect to be happening
10:29
on a touch screen or that sort
10:31
of thing. But the primary interface is
10:34
thought to be, in my
10:36
understanding, speech. Not
10:38
that you would be pulling up a keyboard
10:40
on the thing and typing in a lot.
10:42
That's not the point. The point would be
10:45
this speech-driven, conversational,
10:48
and even call it an
10:50
operating system. Conversational operating system
10:52
to do certain actions or
10:54
tasks, which we'll talk a
10:56
lot more about the research
10:58
behind that. But that's what
11:00
the device is and looks
11:02
like. It's interesting that going
11:04
with the device route and
11:06
the fact that they're selling
11:08
the actual unit itself. Over
11:11
the years, we started on desktops and then went to
11:13
laptops, and
11:17
then went to our phones, and the phones
11:19
have evolved over time. We've been
11:21
talking about wearables and things like that
11:23
over the years as they've evolved. But
11:25
I think there's a little bit of
11:27
a gamble in actually having it as
11:30
a physical device because that's something else
11:32
that they're presuming you're going to put
11:34
at the center of your life. That
11:36
versus being the traditional phone app approach
11:38
where you're using the thing that your
11:40
customer already has in their hands. What
11:43
are your thoughts about the physicalness of this
11:45
offering? I think it's interesting. One of
11:47
the points, if you watch the release
11:50
or launch or promotion video for
11:52
the Rabbit R1, he
11:55
talks about the app-driven
11:57
nature of smartphones. there's
12:00
an app for everything. And
12:02
there's so many apps now that
12:04
navigating apps is kind of
12:06
a task in and of itself. And the
12:08
Silicon Valley meme, no one ever deletes an
12:11
app, so you just accumulate more and more
12:13
and more apps, and they
12:15
kind of build up on your phone and now
12:17
you have to organize them into
12:20
little groupings or whatever. So
12:22
I think the point being that it's
12:25
nice that there's an app
12:27
for everything, but the navigation
12:29
and orchestration of those various
12:31
apps is sometimes
12:33
not seamless and burdensome. I'm
12:35
even thinking about myself and
12:37
kind of checking over here,
12:39
you know, I got in
12:41
the Uber, oh, I
12:43
forgot to switch over my payment on
12:45
my Uber app, so now I've got
12:47
to open my bank app, right, and
12:49
then grab my virtual card number and
12:52
copy that over, but then I've
12:54
got to go to my password
12:56
management app to copy my password.
12:58
There's all these sorts of interactions
13:00
between various things that aren't seamless
13:02
as you might think they would
13:04
be, but it's easy for me to say
13:07
in words conversationally, hey,
13:10
I want to update the payment on my
13:12
current Uber ride or whatever, right?
13:14
So the thought that that would
13:17
be an easy thing to express
13:19
conversationally is interesting
13:21
and then have that be accomplished in
13:24
the background if it actually
13:26
works, so it's also quite interesting. I agree
13:28
with that and I can't help
13:30
but wonder if you look back
13:32
at the advent of the phone and the
13:34
smartphone and, you know, the iPhone
13:37
comes out and it really isn't really so much
13:39
a phone anymore but a little computer, and
13:41
so we kind of, the idea of the
13:43
phone being the base device in your life
13:46
has been something that's been with us now
13:48
for, you know, over 15 years, and so
13:50
one of the things I wonder is, could
13:52
there be a trend where maybe the phone
13:55
doesn't become, if you think about it,
13:57
you're texting but a lot of your texting isn't really texting,
13:59
it's messaging. in apps, maybe the
14:01
phone is no longer the central device
14:04
in your life going forward and maybe
14:06
you're actually having your primary thing. And
14:08
so that would obviously play into rabbit's
14:11
approach where they're giving you another device
14:13
and packages everything together in that AI
14:16
OS that they're talking about or conversationally
14:18
it runs your life if
14:20
you expose your life to it the way you are
14:22
across many apps on the phone. It's
14:24
an opportunity potentially to take a
14:27
left turn with the way we
14:29
think about devices and maybe the phone is
14:31
no longer in the future, in the not so
14:33
distant future, maybe the phone is no longer the
14:35
centerpiece. If
14:50
you're listening, you know that
14:52
artificial intelligence is revolutionizing the
14:54
way we produce information, changing
14:56
society, culture, politics, the economy,
14:58
but it's also created a
15:00
world of AI generated content,
15:02
including deep fakes. So how
15:04
can we tell what's real
15:06
online? Read, write, own,
15:08
building the next era of
15:10
the internet. A new book
15:12
from entrepreneur and investor Chris
15:14
Dixon explores one possible solution
15:16
to the internet's authenticity problem,
15:18
blockchains. From AI that
15:21
tracks its source material to generative
15:23
programs that compensate rather than cannibalize
15:25
creators. Read, write, own
15:27
is a call to action for
15:29
a more open, transparent and democratic
15:31
internet, one that opens the black
15:33
box of AI tracks the origins
15:35
we see online and much more.
15:37
This is our chance to reimagine
15:39
world changing technologies to build the
15:42
internet we want, not the
15:44
one we inherited. Enter your copy
15:46
of read, write, own today
15:48
or go to readwriteown.com to
15:50
learn more. All
16:13
right, Chris. Well, there's a
16:15
few things interacting in the
16:17
background here in terms of
16:19
the technology behind the Rabbit
16:22
device, and I'm sure other similar
16:24
types of devices that have come
16:26
out. Actually, there's some
16:28
of this sort of technology
16:30
that we've talked a little bit about
16:32
on the podcast before. I don't know
16:35
if you remember, we had the episode
16:37
with AskUI, which they
16:39
had this sort of multimodal
16:41
model that I think a
16:43
lot of their focus over time was
16:45
on testing. A lot of people might
16:47
test web applications or
16:50
websites using something like Selenium
16:52
or something like that that
16:54
automates desktop activity or interactions
16:56
with web applications and
16:59
actually automates that for testing purposes
17:01
or other purposes. AskUI
17:04
had some of this technology a
17:06
while back to kind of perform
17:08
certain actions using AI on a
17:10
user interface without sort of hard
17:12
coding like click on 100 pixels
17:16
this way and 20 pixels down
17:18
this way, right? That
17:20
I think has been going on for some time.
17:23
This adds a sort of different element
17:25
to it in that there's the voice
17:27
interaction, but then they're
17:29
really emphasizing the flexibility of
17:32
this and the updating
17:34
of it. So actually, they
17:36
emphasize like, I think
17:38
some of the examples they gave is, I have a
17:41
certain configuration on my laptop
17:44
or on my screen that I'm using
17:46
with a browser with certain plugins that
17:48
make it look a certain way and
17:50
everything sort of looks different for everybody
17:52
and it's all configured in their own
17:54
sort of way. Even
17:56
app-wise, apps kind of
17:58
are very personalized. now, right? Which
18:00
makes it a challenge to say, click
18:03
on this button at this place. It
18:05
might not be at the same place
18:07
for everybody all the time. And of
18:10
course, apps update and that sort of
18:12
thing. So the solution that Rabbit has
18:14
come out with to deal
18:16
with this is what they're calling a
18:19
large action model. And
18:22
specifically, they're talking about this
18:24
large action model being a
18:26
neurosymbolic model. And I
18:28
want to talk through a little bit
18:30
of that. But before I do, I
18:33
think we sort of have to back
18:35
up and talk a little bit about
18:37
AI models, large language models, chat GPT
18:39
has been interacting with external things for
18:41
some time now. I think
18:43
there's confusion, at least about
18:46
how that happens and
18:48
what the model is doing. So it
18:50
might be good just to kind of
18:53
set the stage for this in
18:56
terms of how these models are interacting with
18:58
external things. So the way that
19:00
this looks, at least in the Rabbit cases, you
19:02
click the button and you say, oh, I
19:05
want to change the payment card on my
19:07
Uber, unclick,
19:09
and stuff happens in the background.
19:11
And somehow the large action model
19:14
interacts with Uber and maybe my
19:16
bank app or whatever, and actually
19:18
makes the update. So
19:20
the question is how this happens.
19:23
Have you used any of the
19:25
plugins or anything in chat GPT
19:27
or the kind of search the
19:29
web type of plugin to a
19:31
chat interface or anything like that?
19:33
Absolutely. I mean, that's what makes
19:36
the, I mean, I think
19:38
people tend to focus on the model itself, you
19:40
know, I mean, that's where all the glory is.
19:42
And people say, ah, this model versus that. But
19:45
so much the power comes in the
19:47
plugins themselves and or other ways in
19:49
which they interact with the world. And
19:52
so is we're trying to kind of
19:54
pave our way into the future and figure out how
19:56
we're going to use these and how they're
19:58
going to impact our lives. whether it be the
20:01
Rabbit way or whether you're talking chat GPT
20:03
with its plugins, that's the key. It's all
20:05
those interactions as the touch points with
20:07
the different things that you care about which
20:10
makes it worthwhile. So yes, absolutely. And I'm
20:12
looking forward to doing it some more here.
20:15
Yeah, so there's a couple of things
20:17
maybe that we can talk about and
20:19
actually some of them are
20:21
even highlighted in recent things that happen that
20:24
we may wanna highlight also. One
20:26
of those is if you
20:28
think about a large language model
20:30
like that used in chat GPT
20:32
or neural chat, LAMA2, whatever it
20:35
is, you put text in and you
20:37
get text out. We've talked about that a lot on
20:39
the show. And so you
20:41
put your prompt in and you get a
20:43
completion. It's like fancy auto complete and you
20:45
get this completion out, right? Not that interesting.
20:48
We've talked a little bit about rag
20:50
on the show which means I
20:53
am programming some logic
20:55
around my prompt such
20:58
that when I get my user input,
21:01
I'm searching some of my own data
21:03
or some external data that I've stored
21:05
in a vector database or in a
21:07
set of embeddings to retrieve
21:09
text that's semantically similar
21:12
to my query and
21:14
just pushing that into the prompt as
21:17
a sort of grounding mechanism to sort
21:19
of ground the answer in
21:21
that external data. So you got sort
21:24
of basic auto complete, you've got retrieval
21:26
to insert external
21:28
data via a
21:30
vector database. You've
21:32
got some multimodal input.
21:35
So, and by multimodal
21:37
models, I'm meaning things like
21:39
lava. And actually
21:41
this week, there was a great,
21:44
published on January 24th, I
21:47
saw it in the daily papers on
21:49
Hugging Face. MMLLM's recent
21:51
advances in multimodal large language models.
21:53
So if you're wanting to know
21:55
sort of the state of the
21:57
art and what's going on multimodal
22:00
large language models, like I just mentioned,
22:03
that's probably a much deeper dive that you can
22:05
go into. So check out that and we'll link
22:07
it in our show notes. But these are models
22:10
that would not only take a text prompt
22:12
but might take a text prompt paired with
22:14
an image, right? So you could put an
22:16
image in and you say, also
22:19
have a text prompt that says, is
22:21
there a raccoon in this image, right?
22:23
And then, you know, hopefully the reasoning
22:25
happens and says yes or no if
22:27
there's a- Is it there always a
22:29
raccoon in the image? There's always a
22:31
raccoon everywhere. That's one element of
22:33
this as a, that would
22:35
be a specialized model that
22:39
allows you to integrate multiple modes
22:41
of data. And there's similar ones
22:43
out there for audio and text
22:46
and other things. So again,
22:48
summary, you've got text to text,
22:50
auto complete. You've got this retrieval
22:52
mechanism to pull in some
22:54
external text data into your text
22:56
prompt. You've got specialized
22:59
models that allow you to
23:01
bring in an image and
23:03
text. All of that's super
23:05
interesting. And I think it's connected to
23:07
what Rabbit is doing. But
23:10
there's actually more to what's
23:12
going on with, let's
23:14
say when people perform actions
23:16
on external systems or integrate
23:19
external systems with these sorts
23:21
of AI models. And
23:24
this is what in the sort of
23:26
Lang chain world, if you've interacted with
23:29
Lang chain at all, they
23:31
would call this maybe tools. And
23:33
you even saw things in the past like
23:35
tool former and other models
23:37
where the idea was, well,
23:40
okay, I have, maybe it's the
23:42
Google search API, right? Or SERP
23:44
API or one of these search
23:47
APIs, right? I know
23:49
that I can take a JSON object,
23:51
send it off to that API and get a search
23:54
result, right? Okay, so now if
23:56
I wanna call that search
23:58
API with an... AI model,
24:01
what I need to do is get the
24:03
AI model to generate the right JSON
24:06
structured output that I can
24:08
then just programmatically, not with
24:10
any sort of fancy AI logic,
24:13
but programmatically take that JSON
24:15
object and send it off
24:17
to the API, get the response, and
24:20
either plug that in in the sort of
24:22
retrieval way that we talked about before, just
24:24
give it back to the user as the
24:26
response that they wanted, right? So this
24:29
has been happening for quite a
24:31
while. This is kind of
24:33
like we saw one of these
24:35
cool AI demos every week, right?
24:38
Where, oh, the AI is
24:40
integrated with Kayak now to get me
24:42
a rental car and the AI is
24:44
integrated with, you know, this
24:46
external system and all really cool, but
24:48
at the heart of that was the
24:50
idea that I would generate structured output
24:53
that I could use in a
24:55
regular computer programming way to
24:58
call an API and then
25:00
get a result back which I would
25:02
then use in my system. So that's
25:04
kind of this tool idea,
25:07
which is still not quite what Rabbit is
25:09
doing, but I think that's something that people
25:11
don't realize is happening behind the scenes in
25:14
these tools. I think that's really popular in
25:16
the enterprise, you know, and I'm, you know,
25:18
in the enterprise with air quotes there, because
25:21
that approach is, you know,
25:23
in large organizations, they're
25:26
going to other, you know, the cloud
25:28
providers with their APIs, you know, Microsoft
25:30
is, you know, has their relationship with
25:32
OpenAI and they're wrapping that, you
25:35
know, Google has their APIs and they're using
25:37
RAG, you know, in
25:39
that same way to try to integrate with
25:41
systems instead of actually creating the models on
25:44
their own. I would say that's a
25:46
very, very popular approach right now in
25:48
an enterprise-y environments that are still more software-driven
25:50
and still trying to figure out how to
25:53
use APIs for AI models.
25:55
Yeah, I can give you a
25:57
concrete example of something we did
25:59
with a... Customer Prediction Guard,
26:01
which is the Shopify API,
26:03
right? So ecommerce customer, the
26:06
Shopify API has this
26:08
sort of Shopify, I think
26:11
it's called ShopifyQL query language, it's
26:13
structured, right? And you can call
26:15
the regular API via GraphQL, right?
26:18
And so it's very structured sort
26:20
of way you can call this
26:22
API to get sales
26:25
information or order information or
26:27
do certain tasks, right? And
26:29
so you can create a
26:31
natural language query and say,
26:34
okay, well, don't try to give me
26:36
natural language out, but give me ShopifyQL
26:38
or give me something that I can
26:40
plug into a GraphQL query, and then
26:42
I'm going to go off and query
26:44
the Shopify API and either perform some
26:46
interaction or get some data, right? So
26:49
this is very popular. This is
26:51
how you sort of get AI on top
26:53
of tools. What's
26:55
interesting, I think that Rabbit
26:57
observes in what they're saying
27:00
and others have observed as
27:02
well. I think, you know, you take
27:04
the case like ask UI,
27:06
like we talked about before. And
27:09
the observation is that
27:11
not everything has this
27:13
sort of nice structured way you can
27:15
interact with it with an API. So
27:18
think about pull out your phone. You've
27:20
got all of these apps on your phone. Some
27:23
of them will have a nice API
27:26
that's well defined. Some of
27:28
them will have an API that me
27:30
as a user, I know nothing about, right? There's
27:32
maybe an API that exists there, but it's hard
27:35
to use or not that well documented,
27:38
or maybe I don't have
27:40
the right account to use it or
27:42
something. There's all of these
27:44
interactions that I want to do
27:47
on my accounts with
27:49
my web apps, with my apps that
27:52
have no defined structured API
27:54
to execute all of those
27:56
things. So then the question
27:58
comes in. And that's why I wanted
28:00
to lead up to this is because even
28:03
if you can retrieve data to get grounded
28:05
answers, even if you can integrate images, even
28:07
if you can interact with
28:09
APIs, all of that gets
28:11
you pretty far as we've seen. But
28:13
ultimately not everything is going to
28:15
have a nice structured API, or
28:18
it's not going to have an API that's updated
28:21
or has all the features that you
28:23
want or does all the things you
28:25
want, right? So the question, I think
28:27
the fundamental question that the Rabbit research
28:29
team is thinking about is how
28:31
do we then reformulate the
28:34
problem in a flexible way
28:37
to allow a user to trigger an
28:40
AI system to perform arbitrary
28:43
actions across an arbitrary
28:45
number of applications or
28:47
an application without
28:50
knowing beforehand the
28:52
structure of that application or its
28:54
API? So I think that's the really
28:56
interesting question. I agree with
28:58
you completely. And there's so
29:01
much complexity. They refer to it
29:03
as human intentions expressed
29:05
through actions on a computer. And that
29:07
sounds really, really simple. When
29:09
you say it like that, but they're so, that's
29:12
quite a challenge to make that
29:14
work in an unstructured world. So I'm
29:17
really curious. They
29:20
have their research page, but I don't guess
29:22
they've put out any papers that
29:24
describe some of the research they've done yet, have they?
29:27
Just in general terms. And that's
29:29
where we get to
29:31
the exciting world of large
29:34
action models. Somehow
29:39
that makes me think of Arnold Schwarzenegger.
29:41
Large action heroes.
30:07
You know, when we started podcasting back
30:09
in 2009, and online stores
30:11
is the furthest thing from our
30:13
minds, now we have merch.changelog.com. And
30:15
you can go there right now
30:17
and order some t-shirts. And that's
30:19
all powered by Shopify. It's so
30:22
easy, all because Shopify is amazing.
30:25
Shopify is the global commerce platform that
30:27
helps you sell at every stage of
30:29
your business. From the launch
30:31
your online shop stage to the first
30:34
real life store stage, all the way to
30:36
the, did we just hit a million dollar stage?
30:39
Shopify is there to help you grow. Whether
30:41
you're selling security systems or marketing
30:43
memory modules, Shopify helps you sell
30:45
everywhere from their all-in-one e-commerce platform
30:48
to their in-person POS system, wherever
30:50
and whatever you're selling, Shopify has
30:52
got you covered. Shopify
30:54
helps you turn browsers into buyers with
30:56
the internet's best converting checkout, up to
30:59
36% better compared to other leading commerce
31:01
platforms. And sell more with less effort
31:03
thanks to Shopify magic, your AI powered
31:05
all-star. You know, nothing gets me and
31:08
Jared more excited than when our guests
31:10
get that coupon code in their email,
31:12
in their show ships or to everyone
31:14
out there who loves Change Well Podcast
31:16
and can go to merch.changewell.com and get
31:18
your favorite threads to support our podcast.
31:21
It is just the best thing ever.
31:24
From stickers to threads, all
31:26
that is at merch.changewell.com. And
31:29
did you know that Shopify powers 10% of
31:31
all e-commerce in the US and
31:34
Shopify is the global force
31:36
behind Allbirds, Raffi's and Brooklyn and
31:38
millions of other entrepreneurs of every
31:41
size across 175 countries. Plus,
31:45
Shopify's extensive help resources are there
31:47
to support you and your success
31:49
every step of the way. Those
31:52
businesses that grow, grow with Shopify. Sign
31:54
up for a $1 per
31:56
month trial period at shopify.com. A.I.
32:00
all lowercase go to
32:03
shopify.com slash practical
32:05
AI now to grow your business no
32:07
matter what stage you're in. Again
32:10
shopify.com/practical AI.
32:30
Yeah Chris,
32:35
so coming from Arnold
32:37
Schwarzenegger and large action
32:39
heroes to large action
32:41
models, I was wondering if
32:44
this was a term that Rabbit came up
32:46
with. I think it has existed for some
32:48
amount of time. I at least saw it
32:50
at least as far as back
32:52
as June of last year, 2023 I saw Silvio Savaris article
33:00
on Salesforce AI research
33:03
blog about LAMs from large
33:05
language models to large action
33:08
models. I think the
33:10
focus of that article was very much
33:12
on the sort of agentic stuff that
33:14
we talked about before in terms of
33:16
interacting with different systems but in a
33:18
very automated way. The
33:21
term large action model as far
33:23
as Rabbit refers to it, it's
33:25
this new architecture that they are saying
33:28
that they've come up with and I'm sure
33:30
they have because seems like the
33:32
device works. We don't know I think all of
33:35
the details about it at least I haven't
33:37
seen all of the
33:39
details or it is sort of not
33:42
transparent in the way that maybe
33:44
a model release would be on hugging
33:46
face with code associated with it in
33:48
a long research paper. Maybe
33:50
I'm missing that somewhere or listeners can
33:52
tell me if they found it but
33:54
I couldn't find that. They do have
33:56
a research page though which gives us
33:59
a few clues as
34:01
to what's going on and some
34:03
explanation in kind of general
34:06
terms. And what
34:08
they've described is that
34:10
their goal is to
34:13
observe human interactions with
34:15
a UI. And
34:17
there seems to be some sort
34:20
of multimodal model that is detecting
34:22
what things are where in the
34:24
UI. And they're
34:26
mapping that onto some
34:28
kind of flexible, symbolic,
34:32
synthesized representation of a
34:35
program. So the
34:37
user is doing this thing, right? So
34:40
I'm changing the payment on my Uber app.
34:43
And that's represented or synthesized
34:46
behind the scenes
34:49
in some sort of structured way and
34:52
kind of updated over time
34:54
as it sees demonstrations, human
34:56
demonstrations of this going on.
34:59
And so the words that they... I'll just
35:01
kind of read this so people, if
35:03
they're not looking at the article, they say
35:05
we designed the technical stack from the ground
35:07
up from the data collection
35:09
platform to the new network architecture.
35:13
And here's the sort of very dense
35:15
loaded wording that probably
35:17
has a lot packed into
35:19
it. They say that utilizes
35:21
both transformer style attention and
35:24
graph-based message passing combined
35:27
with program synthesizers
35:31
that are demonstration and
35:33
example guided. So
35:36
that's a lot in that statement. And of course,
35:38
they've mentioned a few in more
35:40
description in other places. But
35:43
it seems like my sort
35:45
of interpretation of this is
35:48
that the requested
35:51
action comes in to
35:53
the system, to the network
35:56
architecture, right? And there's a
35:58
neural layer. So this is a neural layer. symbolic
36:00
model. So there's a neural
36:02
layer that somehow interprets that
36:05
user action into a
36:07
set of symbols or
36:09
representations that it's learned about the
36:11
AI, like the UI, I mean
36:13
the Shopify UI or the Uber
36:16
UI or whatever. And
36:18
then they use some sort
36:20
of symbolic logic processing of
36:23
this sort of synthesized program to
36:26
actually execute a series of actions
36:28
within the app and
36:30
perform an action that it's learned
36:33
through demonstration. So this
36:35
is sort of what
36:37
they mean, I think, when they're
36:40
talking about neurosymbolic. So there's a
36:42
neural network portion of this, kind
36:45
of like when you put something
36:47
into chat GPT or
36:49
a transformer-based large language model and
36:51
you get something out. In
36:54
the case of we were talking about getting JSON
36:57
structured out when we're interacting with
36:59
an external tool, but here it
37:01
seems like you're getting some
37:03
sort of thing out, whatever
37:06
that is, a set of symbols or
37:08
some sort of structured thing that's then
37:10
passed through symbolic
37:12
processing layers that
37:15
are essentially symbolic
37:17
and rule-based ways to
37:19
execute a learned program
37:22
over this application. And by program
37:24
here, I think they mean, they
37:26
reference a couple papers, and my
37:29
best interpretation is that they mean
37:31
not a computer program in the
37:34
sense of Python code, but
37:36
a logical program that
37:38
represents an action like here is
37:41
the logical program to update the
37:43
payment on the Uber app. You
37:46
go here and then you click this and
37:48
then you enter that and then you blah
37:50
blah blah, you do those things, right? Except
37:53
here, those programs, those synthesized programs
37:55
are learned by looking at the
37:57
data. So, I think that's a good thing. I think that's a
37:59
good thing. at human intentions
38:01
and what they do in
38:04
an application. And that's
38:06
how those programs are synthesized. So
38:08
that was a long, I
38:10
don't know how well that held together, but that was my
38:13
best at this point without seeing anything
38:16
else from a single sort of
38:18
blog post. When you can keep me quiet for
38:20
a couple of minutes there, it means you're doing
38:22
a pretty good job. I
38:24
have a question I wanna throw out and I
38:27
don't know that you'd be able to answer it obviously, but
38:29
it just to speculate. While
38:31
we were talking about that and
38:33
thinking about multimodal, I'm
38:35
wondering the device itself comes
38:38
with many of the same sensors that
38:40
you're gonna find in
38:42
a cell phone these days, but
38:44
I'm wondering if that feeds in more than
38:47
just the speech and that
38:49
obviously has the camera on it. It
38:51
comes with a magnum meter, I can't
38:53
say the word, GPS, accelerometer, and gyroscope.
38:56
And obviously, so it's detecting motion,
38:59
it knows location, all the things. Has
39:01
the camera, has the mic. How
39:03
much of that do you think is
39:05
relevant to the lambs, to
39:07
the large action model in terms of
39:10
inputs? Do you think that there is
39:12
potentially relevance in the non-speech and
39:15
non-camera concerns on it? Do you
39:17
think the way people move could
39:19
have some play in there? I
39:22
know we're being purely speculative. I'm just,
39:24
it caught my imagination. Yeah,
39:26
I'm not sure. I mean, it
39:28
could be that that's used in
39:31
ways similar to how those
39:33
sensors are used on smartphones
39:35
these days. Like if I'm
39:38
asking Rabbit to book
39:40
me an Uber to here or
39:42
something like that. Now, it
39:45
could infer the location maybe of where I am based
39:49
on where I'm wanting to go or ask me where I
39:51
am. But likely
39:53
the easiest thing would be to use
39:55
a GPS sensor, to know
39:58
my location and just like. put that
40:00
as the pen in the Uber app and now it
40:02
knows. So I think there's
40:04
some level of interaction between these things.
40:06
I'm not sure how much, but it
40:08
seems like, at least in
40:10
terms of location, I could definitely see that
40:13
coming into play. I'm not sure on the
40:15
other ones. Well, it looks a lot like,
40:18
physically, it looks like a lot like
40:20
a smartphone without the phone. Yeah, yeah,
40:22
a smartphone, different sort of aspect
40:25
ratio, but still kind of touchscreen.
40:27
I think you can still pull
40:29
up a keyboard and that sort
40:32
of thing. And you see
40:34
things when you prompt it. So
40:36
yeah, I imagine that that's
40:38
maybe an evolution of this over
40:41
time is sensory
40:43
input of various things.
40:45
Like I could imagine that being very
40:47
interesting in running or fitness
40:49
type of scenarios, right? If I've
40:51
got my rabbit with me and
40:54
I instruct rabbit to post
40:57
a celebratory social media post every
40:59
time I keep my mileage or
41:01
my time per mile at
41:07
a certain level or something, and it's using
41:09
some sort of sensors on
41:11
the device to do that. I think
41:14
there's probably ways that that will work
41:16
out. I'm not sure about now. It'll be
41:18
interesting that if this approach
41:21
sticks, and I might
41:23
make an analogy to things like the
41:26
Aura Ring for health wearing that, and
41:28
then competitors started coming out, and then
41:31
Amazon has their own version of
41:33
a health ring that's coming out.
41:36
Along those lines, you have
41:38
all these incumbent players in the AI space that
41:40
are, for the most
41:42
part, very large, well-funded cloud
41:45
companies, and in at least
41:47
one case, a retail company blended
41:50
in there. If
41:52
this might be an alternative in
41:54
some ways to the smartphone being
41:56
the dominant device, and it has
41:59
all the best, the same capabilities plus
42:01
more and they have the lamb
42:03
behind it to drive that functionality.
42:05
How long does it take for
42:08
an Amazon or a Google or
42:10
a Microsoft to come along after
42:12
this and start producing their own
42:14
variant because they already have the
42:16
infrastructure that they need to produce
42:18
the back end and they're going to
42:20
be able to produce you know Google and Amazon certainly
42:23
produce front-end stuff on quite a lot as well. So
42:25
it'll be interesting to see if this is the
42:27
beginning of a new marketplace opening up
42:30
in the AI space as an
42:32
entrant. So there's already really
42:34
great hardware out there for
42:36
smartphones and I
42:38
wonder if something like this is kind
42:41
of a shock to the market but in some
42:46
ways you know
42:48
just as phones with
42:50
external key buttons sort of
42:52
morphed into smartphones
42:55
with touch screens. Otherwise
42:57
I could see smartphones that
43:00
are primarily app-driven in
43:02
the way that we interact with them now being
43:05
pushed in a certain direction
43:07
because of these interfaces and
43:10
so smartphones won't look the
43:12
same in two years as they
43:14
do now and they won't follow that same
43:16
sort of app-driven trajectory like
43:18
they are now probably because of
43:20
things that are rethought and
43:23
it might not be that we all
43:25
have rabbits in our
43:27
pocket but maybe smartphones become
43:29
more like rabbits over time.
43:32
I'm not sure I think that that's
43:34
very likely a thing that happened.
43:36
It's also interesting to me it's a
43:38
little bit hard to parse out
43:41
for me what's happening, what's
43:43
the workload like between what's
43:46
happening on the device and
43:48
what's happening in the cloud and what
43:51
sort of connectivity is
43:53
actually needed for full functionality
43:55
with the device. Maybe that's something
43:57
if you want to share your own findings
44:00
on that in our Slack community
44:03
at changelog.com/community. We'd love to hear
44:05
about it. My understanding is there
44:07
is at least a good
44:09
portion of the lamb and the
44:12
lamb powered routines that are operating
44:14
in a centralized sort of platform
44:17
and hardware. So there's not this
44:19
kind of huge large model running
44:22
on a very low power device
44:25
that might suck away all the, all
44:27
the energy. But I think that's also an
44:30
interesting direction is how far could
44:32
we get, especially with local models
44:34
getting so good recently
44:37
with fine tune, local optimized
44:40
quantized models, doing action
44:42
related things on edge
44:45
devices in our
44:47
pockets that aren't relying on
44:50
stable and high speed internet
44:53
connections, which also of course
44:55
helps with the privacy related
44:57
issues as well. I
44:59
agree. I, by the way, I'm going to
45:01
make a prediction. I'm predicting that a large
45:04
cloud computing service provider
45:06
will purchase rabbit. All
45:09
right. You heard it here first. Uh, uh, I
45:12
don't know what sort of odds Chris is,
45:15
is giving, um, or I'm not going to
45:17
bet against him, that's for sure. But,
45:20
uh, yeah, I think, I think that's interesting. That
45:22
is definitely a lot of, I think there will
45:24
be a lot of action
45:26
models of some type, whether those
45:29
be tool using
45:31
LLMs or lambs or
45:34
SLMs or whatever, whatever we've got
45:37
coming up. Um, and
45:39
they should have named it. They could have named
45:42
it a lamb instead of a rabbit is, you
45:44
know, I just want to point out they're, they're
45:46
getting their animals mixed up, man. I actually, yeah,
45:48
that's a really good point. I don't know if
45:50
they came up with rabbit before lamb, but maybe
45:52
they just had the lack of the B there,
45:55
but I think they probably could have figured
45:57
out something. Yeah. And the only thing that could.
45:59
a bit, just in the lamb is raccoon, of
46:02
course, but that's beside the point. You have to
46:04
come around full circle there. Of
46:06
course, of course. We'll leave that
46:08
device up to you as well.
46:10
Yeah. All
46:13
right. Well, this has been fun, Chris. I
46:15
do recommend in terms of if people wanna
46:17
learn more, there's a really good research
46:20
page on rabbit.tech, rabbit.tech
46:22
slash research. And down at the
46:24
bottom of the page, there's
46:27
a list of references
46:29
that they share throughout that
46:32
people might find interesting as
46:35
they explore the technology. I
46:37
would also recommend that people look at Lang
46:40
chain's documentation on tools.
46:43
And also maybe just check out a couple
46:45
of these tools. They're not that complicated. Like
46:47
I say, there's sort of, they
46:49
expect JSON input and then they run a
46:52
software function and do a thing. That's sort
46:54
of what's happening there. So maybe
46:56
check out some of those in the array of
46:58
tools that people have built for Lang chain and
47:01
try using them. So yeah, this has
47:03
been fun, Chris. Thanks, it was great.
47:05
Thanks for bringing the rabbit to our
47:08
attention. Yeah, hopefully see you in person
47:10
soon. That's right. And
47:12
yeah, we'll include some links in our show
47:14
notes. Everyone take a look at them. Talk
47:16
to you soon, Chris. Have a good one.
47:27
Thank you for listening to practical AI.
47:30
Your next step is to subscribe now,
47:32
if you haven't already. And
47:34
if you're a longtime listener of the show, help
47:36
us reach more people by sharing practical AI with
47:39
your friends and colleagues. Thanks
47:41
once again to Fastly and Fly for partnering
47:43
with us to bring you all change talk
47:45
podcasts. Check out what they're
47:47
up to at fastly.com and fly.io. And
47:50
to our beat freakin residents, Breakmaster Cylinder for
47:53
continuously cranking out the best beats in the
47:55
viz. That's all for now. We'll
47:57
talk to you again next time. All.
48:10
Be.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More