Podchaser Logo
Home
Large Action Models (LAMs) & Rabbits 🐇

Large Action Models (LAMs) & Rabbits 🐇

Released Tuesday, 30th January 2024
Good episode? Give it some love!
Large Action Models (LAMs) & Rabbits 🐇

Large Action Models (LAMs) & Rabbits 🐇

Large Action Models (LAMs) & Rabbits 🐇

Large Action Models (LAMs) & Rabbits 🐇

Tuesday, 30th January 2024
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:06

Welcome to Practical AI. If

0:09

you work in artificial intelligence, aspire

0:12

to, or are curious how

0:15

AI-related tech is changing the world,

0:17

this is the show for you.

0:20

Thank you to our partners

0:22

at fly.io, the home of

0:24

changelog.com. Fly transforms

0:26

containers into microvms that run on

0:28

their hardware in 30 plus regions

0:30

on six continents. So you can

0:33

launch your app near your users.

0:36

Learn more at fly.io. Welcome

0:44

to another fully connected episode

0:46

of the Practical AI podcast.

0:49

In these fully connected episodes, we try

0:51

to keep you up to date with

0:53

everything that's happening in the AI and

0:55

machine learning world and try to give

0:57

you a few learning resources to level

1:00

up your AI game. This

1:02

is Daniel Weitnack. I'm the founder and

1:04

CEO of Prediction Guard, and I'm joined

1:06

as always by my co-host Chris Benson,

1:09

who's a tech strategist at Lockheed Martin.

1:11

How are you doing, Chris? Doing

1:13

very well, Daniel. It's enjoying

1:15

the day. And by

1:17

the way, since you've traveled to the

1:20

Atlanta area tonight, we haven't gotten together,

1:22

but you're just a few minutes away.

1:24

Actually, so welcome to Atlanta. Just got

1:26

in. Yeah, we're within not

1:28

maybe a short drive, depending

1:31

on your view of what

1:33

a short drive is. Anything under

1:35

three hours is short in Atlanta. And I think you're

1:37

like 45 minutes away from me right now. Yeah,

1:40

so hopefully we'll get a chance to

1:42

catch up tomorrow, which will be awesome

1:44

because we rarely get to

1:47

see each other in person. It's

1:49

been an interesting couple of weeks for

1:52

me. I, so for

1:54

those that are listening from

1:56

abroad, maybe we had some

1:58

major ice. in snow type

2:01

storms recently. And

2:03

my great and embarrassing

2:05

moment was, I was walking back

2:07

from the office and the like

2:10

freezing rain and I flipped and

2:12

fell and my laptop

2:14

bag with laptop in it broke

2:16

my fall, which is maybe good,

2:19

but that also broke the laptop.

2:22

So actually the laptop works,

2:24

it's just the screen doesn't work. So maybe

2:26

I'll be able to resolve that, but it's

2:28

like a mini portable server there, isn't

2:30

it? Yeah, exactly. You

2:32

have enough monitors around, it's not that much of

2:34

an issue, but yeah, I

2:37

had to put Ubuntu on a

2:39

burner laptop for the trip, so

2:41

yeah, it's always a fun time.

2:45

Speaking of personal devices,

2:47

there's been a lot of interesting

2:49

news and releases, not

2:53

of, well, I guess of

2:55

models, but also of

2:58

interesting actual hardware

3:00

devices related to

3:02

AI. Recently, one

3:04

of those is the Rabbit R1,

3:07

which was announced and

3:10

sort of launched to pre-orders with a

3:12

lot of acclaim. Another

3:14

one that I saw was the AI Pen,

3:17

P-I-N, which is like a little, I don't

3:19

know, my grandma would call it a brooch

3:21

maybe, like a

3:24

large pen you put on your jacket

3:26

or something like that. I am

3:28

wondering, Chris, as you see

3:31

these devices, and I wanna dig a lot

3:33

more into some of the interesting research

3:35

and models and data behind some

3:37

of these things like Rabbit, but

3:39

just generally, what are your thoughts

3:41

on this sort of trend of

3:46

AI-driven personal devices to

3:48

help you with all of your personal

3:50

things and plugged into all of your

3:52

personal data and sort of

3:54

AI attached to everything in your life? Well,

3:57

I think it's coming. Maybe it's here.

4:00

But I know that I am definitely torn. I

4:02

mean, I love the idea of all this help

4:04

along the way There's so

4:06

many like I forget everything I'm terrible if

4:09

I don't write something down and then follow

4:11

up on the list I

4:13

am NOT a naturally organized person So

4:16

my wife is and my wife is always

4:18

reminding me that I really struggle in this

4:20

area and she usually she's She's

4:22

not being very nice in the way that she

4:24

says it. So, um, it's all love I'm sure

4:26

but yes So part of me

4:28

is like wow, this is the way I

4:31

could actually you know be all there

4:33

get all the things done But

4:35

the idea of just giving up all my data and

4:37

just being It's like

4:39

so many others. It's not that

4:42

aspect is not appealing. So yeah, I

4:44

guess I'm I'm not Leaping

4:46

how much different do you think

4:48

this sort of thing is then?

4:51

Everything we already give over with our

4:53

smartphones. It's a good point you're making

4:56

I mean we've had computing devices with

4:58

us in our pocket or

5:00

on our person 24-7 for

5:03

what at least the past 10

5:06

years, you know for at least for

5:08

those that adopted the iPhone or whatever

5:10

when it came out But

5:12

um, yeah, so in terms

5:14

of location Certainly

5:16

account access and certain

5:19

automations What do you

5:21

think makes it because obviously this

5:23

is something on the mind of the makers

5:25

of the devices because I

5:27

think both the AI

5:30

pen and the rabbit are one make

5:32

some sort of explicit statements in

5:34

their launch and in their website about Privacy

5:37

is really important to us. This is how

5:39

we're doing things to because we really care

5:42

about this So obviously they

5:44

anticipated some kind of additional reaction,

5:46

but we we all already have

5:48

smartphones I think most of us if

5:50

we are willing to admit

5:52

it We know that we're being tracked

5:55

everywhere and all of our data goes everywhere.

5:57

So I don't know. What is it

5:59

about this AI? element that you

6:01

think either makes a an

6:03

actual difference in terms of the substance

6:05

of what's happening with the data or

6:08

is it just a perception thing? It's

6:10

probably a perception thing with me. I

6:12

mean because everything that you said I

6:14

agree with you're dead on and

6:17

we've been giving this data up for years

6:19

and we've gotten comfortable with it and that's

6:21

just something that we all kind of don't

6:23

like about it but we've been accepting it

6:25

for years and I guess

6:27

it's the expectation that with these AI

6:29

assistants that we've been hearing about for

6:31

so long coming and we're starting to

6:34

see things like the rabbit come into market

6:36

and such that there's probably a whole new

6:38

level of kind of analysis of

6:40

us and all the things and in a

6:43

sense knowing you better than you do that

6:45

is uncomfortable and probably will not

6:47

be as uncomfortable in the years to come

6:49

because we'll grow used to that as well

6:52

but I have to admit right now it's

6:54

an emotional reaction it makes me

6:56

a little bit leery. Yeah maybe

6:58

it's prior to these sorts

7:01

of devices there was

7:03

sort of the perception at

7:05

least that yes my

7:08

data is going somewhere maybe

7:10

there's an nefarious person behind

7:12

this but there's sort of

7:14

a person behind this like the data

7:16

is going all to Facebook or meta

7:19

and they're like maybe they're

7:21

even listening in on me and

7:23

putting ads for mattresses in my

7:26

feed or whatever the thing is

7:28

right so that perception has been

7:30

around for quite some time regardless

7:32

of whether Facebook is

7:35

actually listening in or whatever or it's

7:37

another party like you know

7:39

the NSA and the government's listening in

7:41

but I think all of those perceptions

7:43

really relied on this idea that even

7:46

if there's something bad happening that I

7:48

don't want happening with my data there's

7:51

sort of a group of people back

7:53

there doing something with it and now

7:55

there's this sort of idea of this

7:57

agentic entity behind the

7:59

scenes that's doing something with my

8:02

data without human oversight. I think

8:04

maybe that's if

8:06

there's anything sort of fundamentally different

8:08

here I think it's the level

8:10

of automation and the sort of

8:12

agentic nature of this which does

8:15

provide some sort of difference. Although

8:18

there's always like you know if

8:20

you're processing voice or something there's

8:23

voice analytics and you can put that

8:25

to text and then there

8:27

are always NLP models in the background

8:29

doing very various things or whatever so

8:31

there's some level of automation that's already

8:33

been there. I agree and I think

8:35

but you mentioned perception up front and

8:37

I think that makes a big difference.

8:40

I guess with like you mentioned NSA intelligence

8:43

agencies I think we all

8:45

just assume that they're all listening to all the

8:47

things all the time now and

8:49

that's one of those things that's completely beyond

8:52

your control and so there's

8:54

almost no reason to worry about it

8:56

I suppose unless you happen to be one

8:58

of the people that an intelligence agency would care

9:00

about which I don't particularly think I am. So

9:03

it just goes someplace and you just kind of

9:05

shrug it off. There's a certain amount of what

9:08

we've done these years with mobile where it's you're

9:11

opting in I think it's leveling up to

9:13

we're saying with some of these AI agents

9:15

coming out we know how much data about

9:17

ourselves is going to be there and so

9:19

it's just escalating the opt-in up to a

9:21

whole new level. So hopefully

9:24

we'll see what happens. Yeah hope it

9:26

works out well. We haven't really for

9:28

the listeners maybe that are just listening

9:31

to this and haven't actually maybe you're

9:33

in parallel doing the search and looking at these

9:35

devices but in case you're on your run or

9:37

in your car we can

9:40

describe a little bit so the I described

9:42

the AI pen thing a little bit the

9:44

rabbit I thought was really really

9:46

cool design it I don't know if

9:48

there's any nerds out there that love

9:51

this sort of synthesizer

9:54

analog sequencer teenage engineering

9:57

stuff that's out there

10:00

But actually the hardware design, teenage

10:02

engineering was involved in that in some way.

10:04

So it's like a little square thing, the

10:06

Rabbit R1. It's got one

10:09

button you can push and speak

10:12

a command. It's got a little

10:14

actual hardware wheel that you can

10:16

spend to scroll. The

10:19

screen, they show it as black

10:21

most of the time, but it

10:23

pops up with the song you're

10:25

playing on Spotify, or some of the

10:27

things you would expect to be happening

10:29

on a touch screen or that sort

10:31

of thing. But the primary interface is

10:34

thought to be, in my

10:36

understanding, speech. Not

10:38

that you would be pulling up a keyboard

10:40

on the thing and typing in a lot.

10:42

That's not the point. The point would be

10:45

this speech-driven, conversational,

10:48

and even call it an

10:50

operating system. Conversational operating system

10:52

to do certain actions or

10:54

tasks, which we'll talk a

10:56

lot more about the research

10:58

behind that. But that's what

11:00

the device is and looks

11:02

like. It's interesting that going

11:04

with the device route and

11:06

the fact that they're selling

11:08

the actual unit itself. Over

11:11

the years, we started on desktops and then went to

11:13

laptops, and

11:17

then went to our phones, and the phones

11:19

have evolved over time. We've been

11:21

talking about wearables and things like that

11:23

over the years as they've evolved. But

11:25

I think there's a little bit of

11:27

a gamble in actually having it as

11:30

a physical device because that's something else

11:32

that they're presuming you're going to put

11:34

at the center of your life. That

11:36

versus being the traditional phone app approach

11:38

where you're using the thing that your

11:40

customer already has in their hands. What

11:43

are your thoughts about the physicalness of this

11:45

offering? I think it's interesting. One of

11:47

the points, if you watch the release

11:50

or launch or promotion video for

11:52

the Rabbit R1, he

11:55

talks about the app-driven

11:57

nature of smartphones. there's

12:00

an app for everything. And

12:02

there's so many apps now that

12:04

navigating apps is kind of

12:06

a task in and of itself. And the

12:08

Silicon Valley meme, no one ever deletes an

12:11

app, so you just accumulate more and more

12:13

and more apps, and they

12:15

kind of build up on your phone and now

12:17

you have to organize them into

12:20

little groupings or whatever. So

12:22

I think the point being that it's

12:25

nice that there's an app

12:27

for everything, but the navigation

12:29

and orchestration of those various

12:31

apps is sometimes

12:33

not seamless and burdensome. I'm

12:35

even thinking about myself and

12:37

kind of checking over here,

12:39

you know, I got in

12:41

the Uber, oh, I

12:43

forgot to switch over my payment on

12:45

my Uber app, so now I've got

12:47

to open my bank app, right, and

12:49

then grab my virtual card number and

12:52

copy that over, but then I've

12:54

got to go to my password

12:56

management app to copy my password.

12:58

There's all these sorts of interactions

13:00

between various things that aren't seamless

13:02

as you might think they would

13:04

be, but it's easy for me to say

13:07

in words conversationally, hey,

13:10

I want to update the payment on my

13:12

current Uber ride or whatever, right?

13:14

So the thought that that would

13:17

be an easy thing to express

13:19

conversationally is interesting

13:21

and then have that be accomplished in

13:24

the background if it actually

13:26

works, so it's also quite interesting. I agree

13:28

with that and I can't help

13:30

but wonder if you look back

13:32

at the advent of the phone and the

13:34

smartphone and, you know, the iPhone

13:37

comes out and it really isn't really so much

13:39

a phone anymore but a little computer, and

13:41

so we kind of, the idea of the

13:43

phone being the base device in your life

13:46

has been something that's been with us now

13:48

for, you know, over 15 years, and so

13:50

one of the things I wonder is, could

13:52

there be a trend where maybe the phone

13:55

doesn't become, if you think about it,

13:57

you're texting but a lot of your texting isn't really texting,

13:59

it's messaging. in apps, maybe the

14:01

phone is no longer the central device

14:04

in your life going forward and maybe

14:06

you're actually having your primary thing. And

14:08

so that would obviously play into rabbit's

14:11

approach where they're giving you another device

14:13

and packages everything together in that AI

14:16

OS that they're talking about or conversationally

14:18

it runs your life if

14:20

you expose your life to it the way you are

14:22

across many apps on the phone. It's

14:24

an opportunity potentially to take a

14:27

left turn with the way we

14:29

think about devices and maybe the phone is

14:31

no longer in the future, in the not so

14:33

distant future, maybe the phone is no longer the

14:35

centerpiece. If

14:50

you're listening, you know that

14:52

artificial intelligence is revolutionizing the

14:54

way we produce information, changing

14:56

society, culture, politics, the economy,

14:58

but it's also created a

15:00

world of AI generated content,

15:02

including deep fakes. So how

15:04

can we tell what's real

15:06

online? Read, write, own,

15:08

building the next era of

15:10

the internet. A new book

15:12

from entrepreneur and investor Chris

15:14

Dixon explores one possible solution

15:16

to the internet's authenticity problem,

15:18

blockchains. From AI that

15:21

tracks its source material to generative

15:23

programs that compensate rather than cannibalize

15:25

creators. Read, write, own

15:27

is a call to action for

15:29

a more open, transparent and democratic

15:31

internet, one that opens the black

15:33

box of AI tracks the origins

15:35

we see online and much more.

15:37

This is our chance to reimagine

15:39

world changing technologies to build the

15:42

internet we want, not the

15:44

one we inherited. Enter your copy

15:46

of read, write, own today

15:48

or go to readwriteown.com to

15:50

learn more. All

16:13

right, Chris. Well, there's a

16:15

few things interacting in the

16:17

background here in terms of

16:19

the technology behind the Rabbit

16:22

device, and I'm sure other similar

16:24

types of devices that have come

16:26

out. Actually, there's some

16:28

of this sort of technology

16:30

that we've talked a little bit about

16:32

on the podcast before. I don't know

16:35

if you remember, we had the episode

16:37

with AskUI, which they

16:39

had this sort of multimodal

16:41

model that I think a

16:43

lot of their focus over time was

16:45

on testing. A lot of people might

16:47

test web applications or

16:50

websites using something like Selenium

16:52

or something like that that

16:54

automates desktop activity or interactions

16:56

with web applications and

16:59

actually automates that for testing purposes

17:01

or other purposes. AskUI

17:04

had some of this technology a

17:06

while back to kind of perform

17:08

certain actions using AI on a

17:10

user interface without sort of hard

17:12

coding like click on 100 pixels

17:16

this way and 20 pixels down

17:18

this way, right? That

17:20

I think has been going on for some time.

17:23

This adds a sort of different element

17:25

to it in that there's the voice

17:27

interaction, but then they're

17:29

really emphasizing the flexibility of

17:32

this and the updating

17:34

of it. So actually, they

17:36

emphasize like, I think

17:38

some of the examples they gave is, I have a

17:41

certain configuration on my laptop

17:44

or on my screen that I'm using

17:46

with a browser with certain plugins that

17:48

make it look a certain way and

17:50

everything sort of looks different for everybody

17:52

and it's all configured in their own

17:54

sort of way. Even

17:56

app-wise, apps kind of

17:58

are very personalized. now, right? Which

18:00

makes it a challenge to say, click

18:03

on this button at this place. It

18:05

might not be at the same place

18:07

for everybody all the time. And of

18:10

course, apps update and that sort of

18:12

thing. So the solution that Rabbit has

18:14

come out with to deal

18:16

with this is what they're calling a

18:19

large action model. And

18:22

specifically, they're talking about this

18:24

large action model being a

18:26

neurosymbolic model. And I

18:28

want to talk through a little bit

18:30

of that. But before I do, I

18:33

think we sort of have to back

18:35

up and talk a little bit about

18:37

AI models, large language models, chat GPT

18:39

has been interacting with external things for

18:41

some time now. I think

18:43

there's confusion, at least about

18:46

how that happens and

18:48

what the model is doing. So it

18:50

might be good just to kind of

18:53

set the stage for this in

18:56

terms of how these models are interacting with

18:58

external things. So the way that

19:00

this looks, at least in the Rabbit cases, you

19:02

click the button and you say, oh, I

19:05

want to change the payment card on my

19:07

Uber, unclick,

19:09

and stuff happens in the background.

19:11

And somehow the large action model

19:14

interacts with Uber and maybe my

19:16

bank app or whatever, and actually

19:18

makes the update. So

19:20

the question is how this happens.

19:23

Have you used any of the

19:25

plugins or anything in chat GPT

19:27

or the kind of search the

19:29

web type of plugin to a

19:31

chat interface or anything like that?

19:33

Absolutely. I mean, that's what makes

19:36

the, I mean, I think

19:38

people tend to focus on the model itself, you

19:40

know, I mean, that's where all the glory is.

19:42

And people say, ah, this model versus that. But

19:45

so much the power comes in the

19:47

plugins themselves and or other ways in

19:49

which they interact with the world. And

19:52

so is we're trying to kind of

19:54

pave our way into the future and figure out how

19:56

we're going to use these and how they're

19:58

going to impact our lives. whether it be the

20:01

Rabbit way or whether you're talking chat GPT

20:03

with its plugins, that's the key. It's all

20:05

those interactions as the touch points with

20:07

the different things that you care about which

20:10

makes it worthwhile. So yes, absolutely. And I'm

20:12

looking forward to doing it some more here.

20:15

Yeah, so there's a couple of things

20:17

maybe that we can talk about and

20:19

actually some of them are

20:21

even highlighted in recent things that happen that

20:24

we may wanna highlight also. One

20:26

of those is if you

20:28

think about a large language model

20:30

like that used in chat GPT

20:32

or neural chat, LAMA2, whatever it

20:35

is, you put text in and you

20:37

get text out. We've talked about that a lot on

20:39

the show. And so you

20:41

put your prompt in and you get a

20:43

completion. It's like fancy auto complete and you

20:45

get this completion out, right? Not that interesting.

20:48

We've talked a little bit about rag

20:50

on the show which means I

20:53

am programming some logic

20:55

around my prompt such

20:58

that when I get my user input,

21:01

I'm searching some of my own data

21:03

or some external data that I've stored

21:05

in a vector database or in a

21:07

set of embeddings to retrieve

21:09

text that's semantically similar

21:12

to my query and

21:14

just pushing that into the prompt as

21:17

a sort of grounding mechanism to sort

21:19

of ground the answer in

21:21

that external data. So you got sort

21:24

of basic auto complete, you've got retrieval

21:26

to insert external

21:28

data via a

21:30

vector database. You've

21:32

got some multimodal input.

21:35

So, and by multimodal

21:37

models, I'm meaning things like

21:39

lava. And actually

21:41

this week, there was a great,

21:44

published on January 24th, I

21:47

saw it in the daily papers on

21:49

Hugging Face. MMLLM's recent

21:51

advances in multimodal large language models.

21:53

So if you're wanting to know

21:55

sort of the state of the

21:57

art and what's going on multimodal

22:00

large language models, like I just mentioned,

22:03

that's probably a much deeper dive that you can

22:05

go into. So check out that and we'll link

22:07

it in our show notes. But these are models

22:10

that would not only take a text prompt

22:12

but might take a text prompt paired with

22:14

an image, right? So you could put an

22:16

image in and you say, also

22:19

have a text prompt that says, is

22:21

there a raccoon in this image, right?

22:23

And then, you know, hopefully the reasoning

22:25

happens and says yes or no if

22:27

there's a- Is it there always a

22:29

raccoon in the image? There's always a

22:31

raccoon everywhere. That's one element of

22:33

this as a, that would

22:35

be a specialized model that

22:39

allows you to integrate multiple modes

22:41

of data. And there's similar ones

22:43

out there for audio and text

22:46

and other things. So again,

22:48

summary, you've got text to text,

22:50

auto complete. You've got this retrieval

22:52

mechanism to pull in some

22:54

external text data into your text

22:56

prompt. You've got specialized

22:59

models that allow you to

23:01

bring in an image and

23:03

text. All of that's super

23:05

interesting. And I think it's connected to

23:07

what Rabbit is doing. But

23:10

there's actually more to what's

23:12

going on with, let's

23:14

say when people perform actions

23:16

on external systems or integrate

23:19

external systems with these sorts

23:21

of AI models. And

23:24

this is what in the sort of

23:26

Lang chain world, if you've interacted with

23:29

Lang chain at all, they

23:31

would call this maybe tools. And

23:33

you even saw things in the past like

23:35

tool former and other models

23:37

where the idea was, well,

23:40

okay, I have, maybe it's the

23:42

Google search API, right? Or SERP

23:44

API or one of these search

23:47

APIs, right? I know

23:49

that I can take a JSON object,

23:51

send it off to that API and get a search

23:54

result, right? Okay, so now if

23:56

I wanna call that search

23:58

API with an... AI model,

24:01

what I need to do is get the

24:03

AI model to generate the right JSON

24:06

structured output that I can

24:08

then just programmatically, not with

24:10

any sort of fancy AI logic,

24:13

but programmatically take that JSON

24:15

object and send it off

24:17

to the API, get the response, and

24:20

either plug that in in the sort of

24:22

retrieval way that we talked about before, just

24:24

give it back to the user as the

24:26

response that they wanted, right? So this

24:29

has been happening for quite a

24:31

while. This is kind of

24:33

like we saw one of these

24:35

cool AI demos every week, right?

24:38

Where, oh, the AI is

24:40

integrated with Kayak now to get me

24:42

a rental car and the AI is

24:44

integrated with, you know, this

24:46

external system and all really cool, but

24:48

at the heart of that was the

24:50

idea that I would generate structured output

24:53

that I could use in a

24:55

regular computer programming way to

24:58

call an API and then

25:00

get a result back which I would

25:02

then use in my system. So that's

25:04

kind of this tool idea,

25:07

which is still not quite what Rabbit is

25:09

doing, but I think that's something that people

25:11

don't realize is happening behind the scenes in

25:14

these tools. I think that's really popular in

25:16

the enterprise, you know, and I'm, you know,

25:18

in the enterprise with air quotes there, because

25:21

that approach is, you know,

25:23

in large organizations, they're

25:26

going to other, you know, the cloud

25:28

providers with their APIs, you know, Microsoft

25:30

is, you know, has their relationship with

25:32

OpenAI and they're wrapping that, you

25:35

know, Google has their APIs and they're using

25:37

RAG, you know, in

25:39

that same way to try to integrate with

25:41

systems instead of actually creating the models on

25:44

their own. I would say that's a

25:46

very, very popular approach right now in

25:48

an enterprise-y environments that are still more software-driven

25:50

and still trying to figure out how to

25:53

use APIs for AI models.

25:55

Yeah, I can give you a

25:57

concrete example of something we did

25:59

with a... Customer Prediction Guard,

26:01

which is the Shopify API,

26:03

right? So ecommerce customer, the

26:06

Shopify API has this

26:08

sort of Shopify, I think

26:11

it's called ShopifyQL query language, it's

26:13

structured, right? And you can call

26:15

the regular API via GraphQL, right?

26:18

And so it's very structured sort

26:20

of way you can call this

26:22

API to get sales

26:25

information or order information or

26:27

do certain tasks, right? And

26:29

so you can create a

26:31

natural language query and say,

26:34

okay, well, don't try to give me

26:36

natural language out, but give me ShopifyQL

26:38

or give me something that I can

26:40

plug into a GraphQL query, and then

26:42

I'm going to go off and query

26:44

the Shopify API and either perform some

26:46

interaction or get some data, right? So

26:49

this is very popular. This is

26:51

how you sort of get AI on top

26:53

of tools. What's

26:55

interesting, I think that Rabbit

26:57

observes in what they're saying

27:00

and others have observed as

27:02

well. I think, you know, you take

27:04

the case like ask UI,

27:06

like we talked about before. And

27:09

the observation is that

27:11

not everything has this

27:13

sort of nice structured way you can

27:15

interact with it with an API. So

27:18

think about pull out your phone. You've

27:20

got all of these apps on your phone. Some

27:23

of them will have a nice API

27:26

that's well defined. Some of

27:28

them will have an API that me

27:30

as a user, I know nothing about, right? There's

27:32

maybe an API that exists there, but it's hard

27:35

to use or not that well documented,

27:38

or maybe I don't have

27:40

the right account to use it or

27:42

something. There's all of these

27:44

interactions that I want to do

27:47

on my accounts with

27:49

my web apps, with my apps that

27:52

have no defined structured API

27:54

to execute all of those

27:56

things. So then the question

27:58

comes in. And that's why I wanted

28:00

to lead up to this is because even

28:03

if you can retrieve data to get grounded

28:05

answers, even if you can integrate images, even

28:07

if you can interact with

28:09

APIs, all of that gets

28:11

you pretty far as we've seen. But

28:13

ultimately not everything is going to

28:15

have a nice structured API, or

28:18

it's not going to have an API that's updated

28:21

or has all the features that you

28:23

want or does all the things you

28:25

want, right? So the question, I think

28:27

the fundamental question that the Rabbit research

28:29

team is thinking about is how

28:31

do we then reformulate the

28:34

problem in a flexible way

28:37

to allow a user to trigger an

28:40

AI system to perform arbitrary

28:43

actions across an arbitrary

28:45

number of applications or

28:47

an application without

28:50

knowing beforehand the

28:52

structure of that application or its

28:54

API? So I think that's the really

28:56

interesting question. I agree with

28:58

you completely. And there's so

29:01

much complexity. They refer to it

29:03

as human intentions expressed

29:05

through actions on a computer. And that

29:07

sounds really, really simple. When

29:09

you say it like that, but they're so, that's

29:12

quite a challenge to make that

29:14

work in an unstructured world. So I'm

29:17

really curious. They

29:20

have their research page, but I don't guess

29:22

they've put out any papers that

29:24

describe some of the research they've done yet, have they?

29:27

Just in general terms. And that's

29:29

where we get to

29:31

the exciting world of large

29:34

action models. Somehow

29:39

that makes me think of Arnold Schwarzenegger.

29:41

Large action heroes.

30:07

You know, when we started podcasting back

30:09

in 2009, and online stores

30:11

is the furthest thing from our

30:13

minds, now we have merch.changelog.com. And

30:15

you can go there right now

30:17

and order some t-shirts. And that's

30:19

all powered by Shopify. It's so

30:22

easy, all because Shopify is amazing.

30:25

Shopify is the global commerce platform that

30:27

helps you sell at every stage of

30:29

your business. From the launch

30:31

your online shop stage to the first

30:34

real life store stage, all the way to

30:36

the, did we just hit a million dollar stage?

30:39

Shopify is there to help you grow. Whether

30:41

you're selling security systems or marketing

30:43

memory modules, Shopify helps you sell

30:45

everywhere from their all-in-one e-commerce platform

30:48

to their in-person POS system, wherever

30:50

and whatever you're selling, Shopify has

30:52

got you covered. Shopify

30:54

helps you turn browsers into buyers with

30:56

the internet's best converting checkout, up to

30:59

36% better compared to other leading commerce

31:01

platforms. And sell more with less effort

31:03

thanks to Shopify magic, your AI powered

31:05

all-star. You know, nothing gets me and

31:08

Jared more excited than when our guests

31:10

get that coupon code in their email,

31:12

in their show ships or to everyone

31:14

out there who loves Change Well Podcast

31:16

and can go to merch.changewell.com and get

31:18

your favorite threads to support our podcast.

31:21

It is just the best thing ever.

31:24

From stickers to threads, all

31:26

that is at merch.changewell.com. And

31:29

did you know that Shopify powers 10% of

31:31

all e-commerce in the US and

31:34

Shopify is the global force

31:36

behind Allbirds, Raffi's and Brooklyn and

31:38

millions of other entrepreneurs of every

31:41

size across 175 countries. Plus,

31:45

Shopify's extensive help resources are there

31:47

to support you and your success

31:49

every step of the way. Those

31:52

businesses that grow, grow with Shopify. Sign

31:54

up for a $1 per

31:56

month trial period at shopify.com. A.I.

32:00

all lowercase go to

32:03

shopify.com slash practical

32:05

AI now to grow your business no

32:07

matter what stage you're in. Again

32:10

shopify.com/practical AI.

32:30

Yeah Chris,

32:35

so coming from Arnold

32:37

Schwarzenegger and large action

32:39

heroes to large action

32:41

models, I was wondering if

32:44

this was a term that Rabbit came up

32:46

with. I think it has existed for some

32:48

amount of time. I at least saw it

32:50

at least as far as back

32:52

as June of last year, 2023 I saw Silvio Savaris article

33:00

on Salesforce AI research

33:03

blog about LAMs from large

33:05

language models to large action

33:08

models. I think the

33:10

focus of that article was very much

33:12

on the sort of agentic stuff that

33:14

we talked about before in terms of

33:16

interacting with different systems but in a

33:18

very automated way. The

33:21

term large action model as far

33:23

as Rabbit refers to it, it's

33:25

this new architecture that they are saying

33:28

that they've come up with and I'm sure

33:30

they have because seems like the

33:32

device works. We don't know I think all of

33:35

the details about it at least I haven't

33:37

seen all of the

33:39

details or it is sort of not

33:42

transparent in the way that maybe

33:44

a model release would be on hugging

33:46

face with code associated with it in

33:48

a long research paper. Maybe

33:50

I'm missing that somewhere or listeners can

33:52

tell me if they found it but

33:54

I couldn't find that. They do have

33:56

a research page though which gives us

33:59

a few clues as

34:01

to what's going on and some

34:03

explanation in kind of general

34:06

terms. And what

34:08

they've described is that

34:10

their goal is to

34:13

observe human interactions with

34:15

a UI. And

34:17

there seems to be some sort

34:20

of multimodal model that is detecting

34:22

what things are where in the

34:24

UI. And they're

34:26

mapping that onto some

34:28

kind of flexible, symbolic,

34:32

synthesized representation of a

34:35

program. So the

34:37

user is doing this thing, right? So

34:40

I'm changing the payment on my Uber app.

34:43

And that's represented or synthesized

34:46

behind the scenes

34:49

in some sort of structured way and

34:52

kind of updated over time

34:54

as it sees demonstrations, human

34:56

demonstrations of this going on.

34:59

And so the words that they... I'll just

35:01

kind of read this so people, if

35:03

they're not looking at the article, they say

35:05

we designed the technical stack from the ground

35:07

up from the data collection

35:09

platform to the new network architecture.

35:13

And here's the sort of very dense

35:15

loaded wording that probably

35:17

has a lot packed into

35:19

it. They say that utilizes

35:21

both transformer style attention and

35:24

graph-based message passing combined

35:27

with program synthesizers

35:31

that are demonstration and

35:33

example guided. So

35:36

that's a lot in that statement. And of course,

35:38

they've mentioned a few in more

35:40

description in other places. But

35:43

it seems like my sort

35:45

of interpretation of this is

35:48

that the requested

35:51

action comes in to

35:53

the system, to the network

35:56

architecture, right? And there's a

35:58

neural layer. So this is a neural layer. symbolic

36:00

model. So there's a neural

36:02

layer that somehow interprets that

36:05

user action into a

36:07

set of symbols or

36:09

representations that it's learned about the

36:11

AI, like the UI, I mean

36:13

the Shopify UI or the Uber

36:16

UI or whatever. And

36:18

then they use some sort

36:20

of symbolic logic processing of

36:23

this sort of synthesized program to

36:26

actually execute a series of actions

36:28

within the app and

36:30

perform an action that it's learned

36:33

through demonstration. So this

36:35

is sort of what

36:37

they mean, I think, when they're

36:40

talking about neurosymbolic. So there's a

36:42

neural network portion of this, kind

36:45

of like when you put something

36:47

into chat GPT or

36:49

a transformer-based large language model and

36:51

you get something out. In

36:54

the case of we were talking about getting JSON

36:57

structured out when we're interacting with

36:59

an external tool, but here it

37:01

seems like you're getting some

37:03

sort of thing out, whatever

37:06

that is, a set of symbols or

37:08

some sort of structured thing that's then

37:10

passed through symbolic

37:12

processing layers that

37:15

are essentially symbolic

37:17

and rule-based ways to

37:19

execute a learned program

37:22

over this application. And by program

37:24

here, I think they mean, they

37:26

reference a couple papers, and my

37:29

best interpretation is that they mean

37:31

not a computer program in the

37:34

sense of Python code, but

37:36

a logical program that

37:38

represents an action like here is

37:41

the logical program to update the

37:43

payment on the Uber app. You

37:46

go here and then you click this and

37:48

then you enter that and then you blah

37:50

blah blah, you do those things, right? Except

37:53

here, those programs, those synthesized programs

37:55

are learned by looking at the

37:57

data. So, I think that's a good thing. I think that's a

37:59

good thing. at human intentions

38:01

and what they do in

38:04

an application. And that's

38:06

how those programs are synthesized. So

38:08

that was a long, I

38:10

don't know how well that held together, but that was my

38:13

best at this point without seeing anything

38:16

else from a single sort of

38:18

blog post. When you can keep me quiet for

38:20

a couple of minutes there, it means you're doing

38:22

a pretty good job. I

38:24

have a question I wanna throw out and I

38:27

don't know that you'd be able to answer it obviously, but

38:29

it just to speculate. While

38:31

we were talking about that and

38:33

thinking about multimodal, I'm

38:35

wondering the device itself comes

38:38

with many of the same sensors that

38:40

you're gonna find in

38:42

a cell phone these days, but

38:44

I'm wondering if that feeds in more than

38:47

just the speech and that

38:49

obviously has the camera on it. It

38:51

comes with a magnum meter, I can't

38:53

say the word, GPS, accelerometer, and gyroscope.

38:56

And obviously, so it's detecting motion,

38:59

it knows location, all the things. Has

39:01

the camera, has the mic. How

39:03

much of that do you think is

39:05

relevant to the lambs, to

39:07

the large action model in terms of

39:10

inputs? Do you think that there is

39:12

potentially relevance in the non-speech and

39:15

non-camera concerns on it? Do you

39:17

think the way people move could

39:19

have some play in there? I

39:22

know we're being purely speculative. I'm just,

39:24

it caught my imagination. Yeah,

39:26

I'm not sure. I mean, it

39:28

could be that that's used in

39:31

ways similar to how those

39:33

sensors are used on smartphones

39:35

these days. Like if I'm

39:38

asking Rabbit to book

39:40

me an Uber to here or

39:42

something like that. Now, it

39:45

could infer the location maybe of where I am based

39:49

on where I'm wanting to go or ask me where I

39:51

am. But likely

39:53

the easiest thing would be to use

39:55

a GPS sensor, to know

39:58

my location and just like. put that

40:00

as the pen in the Uber app and now it

40:02

knows. So I think there's

40:04

some level of interaction between these things.

40:06

I'm not sure how much, but it

40:08

seems like, at least in

40:10

terms of location, I could definitely see that

40:13

coming into play. I'm not sure on the

40:15

other ones. Well, it looks a lot like,

40:18

physically, it looks like a lot like

40:20

a smartphone without the phone. Yeah, yeah,

40:22

a smartphone, different sort of aspect

40:25

ratio, but still kind of touchscreen.

40:27

I think you can still pull

40:29

up a keyboard and that sort

40:32

of thing. And you see

40:34

things when you prompt it. So

40:36

yeah, I imagine that that's

40:38

maybe an evolution of this over

40:41

time is sensory

40:43

input of various things.

40:45

Like I could imagine that being very

40:47

interesting in running or fitness

40:49

type of scenarios, right? If I've

40:51

got my rabbit with me and

40:54

I instruct rabbit to post

40:57

a celebratory social media post every

40:59

time I keep my mileage or

41:01

my time per mile at

41:07

a certain level or something, and it's using

41:09

some sort of sensors on

41:11

the device to do that. I think

41:14

there's probably ways that that will work

41:16

out. I'm not sure about now. It'll be

41:18

interesting that if this approach

41:21

sticks, and I might

41:23

make an analogy to things like the

41:26

Aura Ring for health wearing that, and

41:28

then competitors started coming out, and then

41:31

Amazon has their own version of

41:33

a health ring that's coming out.

41:36

Along those lines, you have

41:38

all these incumbent players in the AI space that

41:40

are, for the most

41:42

part, very large, well-funded cloud

41:45

companies, and in at least

41:47

one case, a retail company blended

41:50

in there. If

41:52

this might be an alternative in

41:54

some ways to the smartphone being

41:56

the dominant device, and it has

41:59

all the best, the same capabilities plus

42:01

more and they have the lamb

42:03

behind it to drive that functionality.

42:05

How long does it take for

42:08

an Amazon or a Google or

42:10

a Microsoft to come along after

42:12

this and start producing their own

42:14

variant because they already have the

42:16

infrastructure that they need to produce

42:18

the back end and they're going to

42:20

be able to produce you know Google and Amazon certainly

42:23

produce front-end stuff on quite a lot as well. So

42:25

it'll be interesting to see if this is the

42:27

beginning of a new marketplace opening up

42:30

in the AI space as an

42:32

entrant. So there's already really

42:34

great hardware out there for

42:36

smartphones and I

42:38

wonder if something like this is kind

42:41

of a shock to the market but in some

42:46

ways you know

42:48

just as phones with

42:50

external key buttons sort of

42:52

morphed into smartphones

42:55

with touch screens. Otherwise

42:57

I could see smartphones that

43:00

are primarily app-driven in

43:02

the way that we interact with them now being

43:05

pushed in a certain direction

43:07

because of these interfaces and

43:10

so smartphones won't look the

43:12

same in two years as they

43:14

do now and they won't follow that same

43:16

sort of app-driven trajectory like

43:18

they are now probably because of

43:20

things that are rethought and

43:23

it might not be that we all

43:25

have rabbits in our

43:27

pocket but maybe smartphones become

43:29

more like rabbits over time.

43:32

I'm not sure I think that that's

43:34

very likely a thing that happened.

43:36

It's also interesting to me it's a

43:38

little bit hard to parse out

43:41

for me what's happening, what's

43:43

the workload like between what's

43:46

happening on the device and

43:48

what's happening in the cloud and what

43:51

sort of connectivity is

43:53

actually needed for full functionality

43:55

with the device. Maybe that's something

43:57

if you want to share your own findings

44:00

on that in our Slack community

44:03

at changelog.com/community. We'd love to hear

44:05

about it. My understanding is there

44:07

is at least a good

44:09

portion of the lamb and the

44:12

lamb powered routines that are operating

44:14

in a centralized sort of platform

44:17

and hardware. So there's not this

44:19

kind of huge large model running

44:22

on a very low power device

44:25

that might suck away all the, all

44:27

the energy. But I think that's also an

44:30

interesting direction is how far could

44:32

we get, especially with local models

44:34

getting so good recently

44:37

with fine tune, local optimized

44:40

quantized models, doing action

44:42

related things on edge

44:45

devices in our

44:47

pockets that aren't relying on

44:50

stable and high speed internet

44:53

connections, which also of course

44:55

helps with the privacy related

44:57

issues as well. I

44:59

agree. I, by the way, I'm going to

45:01

make a prediction. I'm predicting that a large

45:04

cloud computing service provider

45:06

will purchase rabbit. All

45:09

right. You heard it here first. Uh, uh, I

45:12

don't know what sort of odds Chris is,

45:15

is giving, um, or I'm not going to

45:17

bet against him, that's for sure. But,

45:20

uh, yeah, I think, I think that's interesting. That

45:22

is definitely a lot of, I think there will

45:24

be a lot of action

45:26

models of some type, whether those

45:29

be tool using

45:31

LLMs or lambs or

45:34

SLMs or whatever, whatever we've got

45:37

coming up. Um, and

45:39

they should have named it. They could have named

45:42

it a lamb instead of a rabbit is, you

45:44

know, I just want to point out they're, they're

45:46

getting their animals mixed up, man. I actually, yeah,

45:48

that's a really good point. I don't know if

45:50

they came up with rabbit before lamb, but maybe

45:52

they just had the lack of the B there,

45:55

but I think they probably could have figured

45:57

out something. Yeah. And the only thing that could.

45:59

a bit, just in the lamb is raccoon, of

46:02

course, but that's beside the point. You have to

46:04

come around full circle there. Of

46:06

course, of course. We'll leave that

46:08

device up to you as well.

46:10

Yeah. All

46:13

right. Well, this has been fun, Chris. I

46:15

do recommend in terms of if people wanna

46:17

learn more, there's a really good research

46:20

page on rabbit.tech, rabbit.tech

46:22

slash research. And down at the

46:24

bottom of the page, there's

46:27

a list of references

46:29

that they share throughout that

46:32

people might find interesting as

46:35

they explore the technology. I

46:37

would also recommend that people look at Lang

46:40

chain's documentation on tools.

46:43

And also maybe just check out a couple

46:45

of these tools. They're not that complicated. Like

46:47

I say, there's sort of, they

46:49

expect JSON input and then they run a

46:52

software function and do a thing. That's sort

46:54

of what's happening there. So maybe

46:56

check out some of those in the array of

46:58

tools that people have built for Lang chain and

47:01

try using them. So yeah, this has

47:03

been fun, Chris. Thanks, it was great.

47:05

Thanks for bringing the rabbit to our

47:08

attention. Yeah, hopefully see you in person

47:10

soon. That's right. And

47:12

yeah, we'll include some links in our show

47:14

notes. Everyone take a look at them. Talk

47:16

to you soon, Chris. Have a good one.

47:27

Thank you for listening to practical AI.

47:30

Your next step is to subscribe now,

47:32

if you haven't already. And

47:34

if you're a longtime listener of the show, help

47:36

us reach more people by sharing practical AI with

47:39

your friends and colleagues. Thanks

47:41

once again to Fastly and Fly for partnering

47:43

with us to bring you all change talk

47:45

podcasts. Check out what they're

47:47

up to at fastly.com and fly.io. And

47:50

to our beat freakin residents, Breakmaster Cylinder for

47:53

continuously cranking out the best beats in the

47:55

viz. That's all for now. We'll

47:57

talk to you again next time. All.

48:10

Be.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features