Large Action Models (LAMs) & Rabbits 🐇 by Practical AI: Machine Learning, Data Science | Podchaser

Episode from the podcastPractical AI: Machine Learning, Data Science

Large Action Models (LAMs) & Rabbits 🐇

Released Tuesday, 30th January 2024

Good episode? Give it some love!

Large Action Models (LAMs) & Rabbits 🐇

Large Action Models (LAMs) & Rabbits 🐇

Tuesday, 30th January 2024

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:06

Welcome to Practical AI. If

0:09

you work in artificial intelligence, aspire

0:12

to, or are curious how

0:15

AI-related tech is changing the world,

0:17

this is the show for you.

0:20

Thank you to our partners

0:22

at fly.io, the home of

0:24

changelog.com. Fly transforms

0:26

containers into microvms that run on

0:28

their hardware in 30 plus regions

0:30

on six continents. So you can

0:33

launch your app near your users.

0:36

Learn more at fly.io. Welcome

0:44

to another fully connected episode

0:46

of the Practical AI podcast.

0:49

In these fully connected episodes, we try

0:51

to keep you up to date with

0:53

everything that's happening in the AI and

0:55

machine learning world and try to give

0:57

you a few learning resources to level

1:00

up your AI game. This

1:02

is Daniel Weitnack. I'm the founder and

1:04

CEO of Prediction Guard, and I'm joined

1:06

as always by my co-host Chris Benson,

1:09

who's a tech strategist at Lockheed Martin.

1:11

How are you doing, Chris? Doing

1:13

very well, Daniel. It's enjoying

1:15

the day. And by

1:17

the way, since you've traveled to the

1:20

Atlanta area tonight, we haven't gotten together,

1:22

but you're just a few minutes away.

1:24

Actually, so welcome to Atlanta. Just got

1:26

in. Yeah, we're within not

1:28

maybe a short drive, depending

1:31

on your view of what

1:33

a short drive is. Anything under

1:35

three hours is short in Atlanta. And I think you're

1:37

like 45 minutes away from me right now. Yeah,

1:40

so hopefully we'll get a chance to

1:42

catch up tomorrow, which will be awesome

1:44

because we rarely get to

1:47

see each other in person. It's

1:49

been an interesting couple of weeks for

1:52

me. I, so for

1:54

those that are listening from

1:56

abroad, maybe we had some

1:58

major ice. in snow type

2:01

storms recently. And

2:03

my great and embarrassing

2:05

moment was, I was walking back

2:07

from the office and the like

2:10

freezing rain and I flipped and

2:12

fell and my laptop

2:14

bag with laptop in it broke

2:16

my fall, which is maybe good,

2:19

but that also broke the laptop.

2:22

So actually the laptop works,

2:24

it's just the screen doesn't work. So maybe

2:26

I'll be able to resolve that, but it's

2:28

like a mini portable server there, isn't

2:30

it? Yeah, exactly. You

2:32

have enough monitors around, it's not that much of

2:34

an issue, but yeah, I

2:37

had to put Ubuntu on a

2:39

burner laptop for the trip, so

2:41

yeah, it's always a fun time.

2:45

Speaking of personal devices,

2:47

there's been a lot of interesting

2:49

news and releases, not

2:53

of, well, I guess of

2:55

models, but also of

2:58

interesting actual hardware

3:00

devices related to

3:02

AI. Recently, one

3:04

of those is the Rabbit R1,

3:07

which was announced and

3:10

sort of launched to pre-orders with a

3:12

lot of acclaim. Another

3:14

one that I saw was the AI Pen,

3:17

P-I-N, which is like a little, I don't

3:19

know, my grandma would call it a brooch

3:21

maybe, like a

3:24

large pen you put on your jacket

3:26

or something like that. I am

3:28

wondering, Chris, as you see

3:31

these devices, and I wanna dig a lot

3:33

more into some of the interesting research

3:35

and models and data behind some

3:37

of these things like Rabbit, but

3:39

just generally, what are your thoughts

3:41

on this sort of trend of

3:46

AI-driven personal devices to

3:48

help you with all of your personal

3:50

things and plugged into all of your

3:52

personal data and sort of

3:54

AI attached to everything in your life? Well,

3:57

I think it's coming. Maybe it's here.

4:00

But I know that I am definitely torn. I

4:02

mean, I love the idea of all this help

4:04

along the way There's so

4:06

many like I forget everything I'm terrible if

4:09

I don't write something down and then follow

4:11

up on the list I

4:13

am NOT a naturally organized person So

4:16

my wife is and my wife is always

4:18

reminding me that I really struggle in this

4:20

area and she usually she's She's

4:22

not being very nice in the way that she

4:24

says it. So, um, it's all love I'm sure

4:26

but yes So part of me

4:28

is like wow, this is the way I

4:31

could actually you know be all there

4:33

get all the things done But

4:35

the idea of just giving up all my data and

4:37

just being It's like

4:39

so many others. It's not that

4:42

aspect is not appealing. So yeah, I

4:44

guess I'm I'm not Leaping

4:46

how much different do you think

4:48

this sort of thing is then?

4:51

Everything we already give over with our

4:53

smartphones. It's a good point you're making

4:56

I mean we've had computing devices with

4:58

us in our pocket or

5:00

on our person 24-7 for

5:03

what at least the past 10

5:06

years, you know for at least for

5:08

those that adopted the iPhone or whatever

5:10

when it came out But

5:12

um, yeah, so in terms

5:14

of location Certainly

5:16

account access and certain

5:19

automations What do you

5:21

think makes it because obviously this

5:23

is something on the mind of the makers

5:25

of the devices because I

5:27

think both the AI

5:30

pen and the rabbit are one make

5:32

some sort of explicit statements in

5:34

their launch and in their website about Privacy

5:37

is really important to us. This is how

5:39

we're doing things to because we really care

5:42

about this So obviously they

5:44

anticipated some kind of additional reaction,

5:46

but we we all already have

5:48

smartphones I think most of us if

5:50

we are willing to admit

5:52

it We know that we're being tracked

5:55

everywhere and all of our data goes everywhere.

5:57

So I don't know. What is it

5:59

about this AI? element that you

6:01

think either makes a an

6:03

actual difference in terms of the substance

6:05

of what's happening with the data or

6:08

is it just a perception thing? It's

6:10

probably a perception thing with me. I

6:12

mean because everything that you said I

6:14

agree with you're dead on and

6:17

we've been giving this data up for years

6:19

and we've gotten comfortable with it and that's

6:21

just something that we all kind of don't

6:23

like about it but we've been accepting it

6:25

for years and I guess

6:27

it's the expectation that with these AI

6:29

assistants that we've been hearing about for

6:31

so long coming and we're starting to

6:34

see things like the rabbit come into market

6:36

and such that there's probably a whole new

6:38

level of kind of analysis of

6:40

us and all the things and in a

6:43

sense knowing you better than you do that

6:45

is uncomfortable and probably will not

6:47

be as uncomfortable in the years to come

6:49

because we'll grow used to that as well

6:52

but I have to admit right now it's

6:54

an emotional reaction it makes me

6:56

a little bit leery. Yeah maybe

6:58

it's prior to these sorts

7:01

of devices there was

7:03

sort of the perception at

7:05

least that yes my

7:08

data is going somewhere maybe

7:10

there's an nefarious person behind

7:12

this but there's sort of

7:14

a person behind this like the data

7:16

is going all to Facebook or meta

7:19

and they're like maybe they're

7:21

even listening in on me and

7:23

putting ads for mattresses in my

7:26

feed or whatever the thing is

7:28

right so that perception has been

7:30

around for quite some time regardless

7:32

of whether Facebook is

7:35

actually listening in or whatever or it's

7:37

another party like you know

7:39

the NSA and the government's listening in

7:41

but I think all of those perceptions

7:43

really relied on this idea that even

7:46

if there's something bad happening that I

7:48

don't want happening with my data there's

7:51

sort of a group of people back

7:53

there doing something with it and now

7:55

there's this sort of idea of this

7:57

agentic entity behind the

7:59

scenes that's doing something with my

8:02

data without human oversight. I think

8:04

maybe that's if

8:06

there's anything sort of fundamentally different

8:08

here I think it's the level

8:10

of automation and the sort of

8:12

agentic nature of this which does

8:15

provide some sort of difference. Although

8:18

there's always like you know if

8:20

you're processing voice or something there's

8:23

voice analytics and you can put that

8:25

to text and then there

8:27

are always NLP models in the background

8:29

doing very various things or whatever so

8:31

there's some level of automation that's already

8:33

been there. I agree and I think

8:35

but you mentioned perception up front and

8:37

I think that makes a big difference.

8:40

I guess with like you mentioned NSA intelligence

8:43

agencies I think we all

8:45

just assume that they're all listening to all the

8:47

things all the time now and

8:49

that's one of those things that's completely beyond

8:52

your control and so there's

8:54

almost no reason to worry about it

8:56

I suppose unless you happen to be one

8:58

of the people that an intelligence agency would care

9:00

about which I don't particularly think I am. So

9:03

it just goes someplace and you just kind of

9:05

shrug it off. There's a certain amount of what

9:08

we've done these years with mobile where it's you're

9:11

opting in I think it's leveling up to

9:13

we're saying with some of these AI agents

9:15

coming out we know how much data about

9:17

ourselves is going to be there and so

9:19

it's just escalating the opt-in up to a

9:21

whole new level. So hopefully

9:24

we'll see what happens. Yeah hope it

9:26

works out well. We haven't really for

9:28

the listeners maybe that are just listening

9:31

to this and haven't actually maybe you're

9:33

in parallel doing the search and looking at these

9:35

devices but in case you're on your run or

9:37

in your car we can

9:40

describe a little bit so the I described

9:42

the AI pen thing a little bit the

9:44

rabbit I thought was really really

9:46

cool design it I don't know if

9:48

there's any nerds out there that love

9:51

this sort of synthesizer

9:54

analog sequencer teenage engineering

9:57

stuff that's out there

10:00

But actually the hardware design, teenage

10:02

engineering was involved in that in some way.

10:04

So it's like a little square thing, the

10:06

Rabbit R1. It's got one

10:09

button you can push and speak

10:12

a command. It's got a little

10:14

actual hardware wheel that you can

10:16

spend to scroll. The

10:19

screen, they show it as black

10:21

most of the time, but it

10:23

pops up with the song you're

10:25

playing on Spotify, or some of the

10:27

things you would expect to be happening

10:29

on a touch screen or that sort

10:31

of thing. But the primary interface is

10:34

thought to be, in my

10:36

understanding, speech. Not

10:38

that you would be pulling up a keyboard

10:40

on the thing and typing in a lot.

10:42

That's not the point. The point would be

10:45

this speech-driven, conversational,

10:48

and even call it an

10:50

operating system. Conversational operating system

10:52

to do certain actions or

10:54

tasks, which we'll talk a

10:56

lot more about the research

10:58

behind that. But that's what

11:00

the device is and looks

11:02

like. It's interesting that going

11:04

with the device route and

11:06

the fact that they're selling

11:08

the actual unit itself. Over

11:11

the years, we started on desktops and then went to

11:13

laptops, and

11:17

then went to our phones, and the phones

11:19

have evolved over time. We've been

11:21

talking about wearables and things like that

11:23

over the years as they've evolved. But

11:25

I think there's a little bit of

11:27

a gamble in actually having it as

11:30

a physical device because that's something else

11:32

that they're presuming you're going to put

11:34

at the center of your life. That

11:36

versus being the traditional phone app approach

11:38

where you're using the thing that your

11:40

customer already has in their hands. What

11:43

are your thoughts about the physicalness of this

11:45

offering? I think it's interesting. One of

11:47

the points, if you watch the release

11:50

or launch or promotion video for

11:52

the Rabbit R1, he

11:55

talks about the app-driven

11:57

nature of smartphones. there's

12:00

an app for everything. And

12:02

there's so many apps now that

12:04

navigating apps is kind of

12:06

a task in and of itself. And the

12:08

Silicon Valley meme, no one ever deletes an

12:11

app, so you just accumulate more and more

12:13

and more apps, and they

12:15

kind of build up on your phone and now

12:17

you have to organize them into

12:20

little groupings or whatever. So

12:22

I think the point being that it's

12:25

nice that there's an app

12:27

for everything, but the navigation

12:29

and orchestration of those various

12:31

apps is sometimes

12:33

not seamless and burdensome. I'm

12:35

even thinking about myself and

12:37

kind of checking over here,

12:39

you know, I got in

12:41

the Uber, oh, I

12:43

forgot to switch over my payment on

12:45

my Uber app, so now I've got

12:47

to open my bank app, right, and

12:49

then grab my virtual card number and

12:52

copy that over, but then I've

12:54

got to go to my password

12:56

management app to copy my password.

12:58

There's all these sorts of interactions

13:00

between various things that aren't seamless

13:02

as you might think they would

13:04

be, but it's easy for me to say

13:07

in words conversationally, hey,

13:10

I want to update the payment on my

13:12

current Uber ride or whatever, right?

13:14

So the thought that that would

13:17

be an easy thing to express

13:19

conversationally is interesting

13:21

and then have that be accomplished in

13:24

the background if it actually

13:26

works, so it's also quite interesting. I agree

13:28

with that and I can't help

13:30

but wonder if you look back

13:32

at the advent of the phone and the

13:34

smartphone and, you know, the iPhone

13:37

comes out and it really isn't really so much

13:39

a phone anymore but a little computer, and

13:41

so we kind of, the idea of the

13:43

phone being the base device in your life

13:46

has been something that's been with us now

13:48

for, you know, over 15 years, and so

13:50

one of the things I wonder is, could

13:52

there be a trend where maybe the phone

13:55

doesn't become, if you think about it,

13:57

you're texting but a lot of your texting isn't really texting,

13:59

it's messaging. in apps, maybe the

14:01

phone is no longer the central device

14:04

in your life going forward and maybe

14:06

you're actually having your primary thing. And

14:08

so that would obviously play into rabbit's

14:11

approach where they're giving you another device

14:13

and packages everything together in that AI

14:16

OS that they're talking about or conversationally

14:18

it runs your life if

14:20

you expose your life to it the way you are

14:22

across many apps on the phone. It's

14:24

an opportunity potentially to take a

14:27

left turn with the way we

14:29

think about devices and maybe the phone is

14:31

no longer in the future, in the not so

14:33

distant future, maybe the phone is no longer the

14:35

centerpiece. If

14:50

you're listening, you know that

14:52

artificial intelligence is revolutionizing the

14:54

way we produce information, changing

14:56

society, culture, politics, the economy,

14:58

but it's also created a

15:00

world of AI generated content,

15:02

including deep fakes. So how

15:04

can we tell what's real

15:06

online? Read, write, own,

15:08

building the next era of

15:10

the internet. A new book

15:12

from entrepreneur and investor Chris

15:14

Dixon explores one possible solution

15:16

to the internet's authenticity problem,

15:18

blockchains. From AI that

15:21

tracks its source material to generative

15:23

programs that compensate rather than cannibalize

15:25

creators. Read, write, own

15:27

is a call to action for

15:29

a more open, transparent and democratic

15:31

internet, one that opens the black

15:33

box of AI tracks the origins

15:35

we see online and much more.

15:37

This is our chance to reimagine

15:39

world changing technologies to build the

15:42

internet we want, not the

15:44

one we inherited. Enter your copy

15:46

of read, write, own today

15:48

or go to readwriteown.com to

15:50

learn more. All

16:13

right, Chris. Well, there's a

16:15

few things interacting in the

16:17

background here in terms of

16:19

the technology behind the Rabbit

16:22

device, and I'm sure other similar

16:24

types of devices that have come

16:26

out. Actually, there's some

16:28

of this sort of technology

16:30

that we've talked a little bit about

16:32

on the podcast before. I don't know

16:35

if you remember, we had the episode

16:37

with AskUI, which they

16:39

had this sort of multimodal

16:41

model that I think a

16:43

lot of their focus over time was

16:45

on testing. A lot of people might

16:47

test web applications or

16:50

websites using something like Selenium

16:52

or something like that that

16:54

automates desktop activity or interactions

16:56

with web applications and

16:59

actually automates that for testing purposes

17:01

or other purposes. AskUI

17:04

had some of this technology a

17:06

while back to kind of perform

17:08

certain actions using AI on a

17:10

user interface without sort of hard

17:12

coding like click on 100 pixels

17:16

this way and 20 pixels down

17:18

this way, right? That

17:20

I think has been going on for some time.

17:23

This adds a sort of different element

17:25

to it in that there's the voice

17:27

interaction, but then they're

17:29

really emphasizing the flexibility of

17:32

this and the updating

17:34

of it. So actually, they

17:36

emphasize like, I think

17:38

some of the examples they gave is, I have a

17:41

certain configuration on my laptop

17:44

or on my screen that I'm using

17:46

with a browser with certain plugins that

17:48

make it look a certain way and

17:50

everything sort of looks different for everybody

17:52

and it's all configured in their own

17:54

sort of way. Even

17:56

app-wise, apps kind of

17:58

are very personalized. now, right? Which

18:00

makes it a challenge to say, click

18:03

on this button at this place. It

18:05

might not be at the same place

18:07

for everybody all the time. And of

18:10

course, apps update and that sort of

18:12

thing. So the solution that Rabbit has

18:14

come out with to deal

18:16

with this is what they're calling a

18:19

large action model. And

18:22

specifically, they're talking about this

18:24

large action model being a

18:26

neurosymbolic model. And I

18:28

want to talk through a little bit

18:30

of that. But before I do, I

18:33

think we sort of have to back

18:35

up and talk a little bit about

18:37

AI models, large language models, chat GPT

18:39

has been interacting with external things for

18:41

some time now. I think

18:43

there's confusion, at least about

18:46

how that happens and

18:48

what the model is doing. So it

18:50

might be good just to kind of

18:53

set the stage for this in

18:56

terms of how these models are interacting with

18:58

external things. So the way that

19:00

this looks, at least in the Rabbit cases, you

19:02

click the button and you say, oh, I

19:05

want to change the payment card on my

19:07

Uber, unclick,

19:09

and stuff happens in the background.

19:11

And somehow the large action model

19:14

interacts with Uber and maybe my

19:16

bank app or whatever, and actually

19:18

makes the update. So

19:20

the question is how this happens.

19:23

Have you used any of the

19:25

plugins or anything in chat GPT

19:27

or the kind of search the

19:29

web type of plugin to a

19:31

chat interface or anything like that?

19:33

Absolutely. I mean, that's what makes

19:36

the, I mean, I think

19:38

people tend to focus on the model itself, you

19:40

know, I mean, that's where all the glory is.

19:42

And people say, ah, this model versus that. But

19:45

so much the power comes in the

19:47

plugins themselves and or other ways in

19:49

which they interact with the world. And

19:52

so is we're trying to kind of

19:54

pave our way into the future and figure out how

19:56

we're going to use these and how they're

19:58

going to impact our lives. whether it be the

20:01

Rabbit way or whether you're talking chat GPT

20:03

with its plugins, that's the key. It's all

20:05

those interactions as the touch points with

20:07

the different things that you care about which

20:10

makes it worthwhile. So yes, absolutely. And I'm

20:12

looking forward to doing it some more here.

20:15

Yeah, so there's a couple of things

20:17

maybe that we can talk about and

20:19

actually some of them are

20:21

even highlighted in recent things that happen that

20:24

we may wanna highlight also. One

20:26

of those is if you

20:28

think about a large language model

20:30

like that used in chat GPT

20:32

or neural chat, LAMA2, whatever it

20:35

is, you put text in and you

20:37

get text out. We've talked about that a lot on

20:39

the show. And so you

20:41

put your prompt in and you get a

20:43

completion. It's like fancy auto complete and you

20:45

get this completion out, right? Not that interesting.

20:48

We've talked a little bit about rag

20:50

on the show which means I

20:53

am programming some logic

20:55

around my prompt such

20:58

that when I get my user input,

21:01

I'm searching some of my own data

21:03

or some external data that I've stored

21:05

in a vector database or in a

21:07

set of embeddings to retrieve

21:09

text that's semantically similar

21:12

to my query and

21:14

just pushing that into the prompt as

21:17

a sort of grounding mechanism to sort

21:19

of ground the answer in

21:21

that external data. So you got sort

21:24

of basic auto complete, you've got retrieval

21:26

to insert external

21:28

data via a

21:30

vector database. You've

21:32

got some multimodal input.

21:35

So, and by multimodal

21:37

models, I'm meaning things like

21:39

lava. And actually

21:41

this week, there was a great,

21:44

published on January 24th, I

21:47

saw it in the daily papers on

21:49

Hugging Face. MMLLM's recent

21:51

advances in multimodal large language models.

21:53

So if you're wanting to know

21:55

sort of the state of the

21:57

art and what's going on multimodal

22:00

large language models, like I just mentioned,

22:03

that's probably a much deeper dive that you can

22:05

go into. So check out that and we'll link

22:07

it in our show notes. But these are models

22:10

that would not only take a text prompt

22:12

but might take a text prompt paired with

22:14

an image, right? So you could put an

22:16

image in and you say, also

22:19

have a text prompt that says, is

22:21

there a raccoon in this image, right?

22:23

And then, you know, hopefully the reasoning

22:25

happens and says yes or no if

22:27

there's a- Is it there always a

22:29

raccoon in the image? There's always a

22:31

raccoon everywhere. That's one element of

22:33

this as a, that would

22:35

be a specialized model that

22:39

allows you to integrate multiple modes

22:41

of data. And there's similar ones

22:43

out there for audio and text

22:46

and other things. So again,

22:48

summary, you've got text to text,

22:50

auto complete. You've got this retrieval

22:52

mechanism to pull in some

22:54

external text data into your text

22:56

prompt. You've got specialized

22:59

models that allow you to

23:01

bring in an image and

23:03

text. All of that's super

23:05

interesting. And I think it's connected to

23:07

what Rabbit is doing. But

23:10

there's actually more to what's

23:12

going on with, let's

23:14

say when people perform actions

23:16

on external systems or integrate

23:19

external systems with these sorts

23:21

of AI models. And

23:24

this is what in the sort of

23:26

Lang chain world, if you've interacted with

23:29

Lang chain at all, they

23:31

would call this maybe tools. And

23:33

you even saw things in the past like

23:35

tool former and other models

23:37

where the idea was, well,

23:40

okay, I have, maybe it's the

23:42

Google search API, right? Or SERP

23:44

API or one of these search

23:47

APIs, right? I know

23:49

that I can take a JSON object,

23:51

send it off to that API and get a search

23:54

result, right? Okay, so now if

23:56

I wanna call that search

23:58

API with an... AI model,

24:01

what I need to do is get the

24:03

AI model to generate the right JSON

24:06

structured output that I can

24:08

then just programmatically, not with

24:10

any sort of fancy AI logic,

24:13

but programmatically take that JSON

24:15

object and send it off

24:17

to the API, get the response, and

24:20

either plug that in in the sort of

24:22

retrieval way that we talked about before, just

24:24

give it back to the user as the

24:26

response that they wanted, right? So this

24:29

has been happening for quite a

24:31

while. This is kind of

24:33

like we saw one of these

24:35

cool AI demos every week, right?

24:38

Where, oh, the AI is

24:40

integrated with Kayak now to get me

24:42

a rental car and the AI is

24:44

integrated with, you know, this

24:46

external system and all really cool, but

24:48

at the heart of that was the

24:50

idea that I would generate structured output

24:53

that I could use in a

24:55

regular computer programming way to

24:58

call an API and then

25:00

get a result back which I would

25:02

then use in my system. So that's

25:04

kind of this tool idea,

25:07

which is still not quite what Rabbit is

25:09

doing, but I think that's something that people

25:11

don't realize is happening behind the scenes in

25:14

these tools. I think that's really popular in

25:16

the enterprise, you know, and I'm, you know,

25:18

in the enterprise with air quotes there, because

25:21

that approach is, you know,

25:23

in large organizations, they're

25:26

going to other, you know, the cloud

25:28

providers with their APIs, you know, Microsoft

25:30

is, you know, has their relationship with

25:32

OpenAI and they're wrapping that, you

25:35

know, Google has their APIs and they're using

25:37

RAG, you know, in

25:39

that same way to try to integrate with

25:41

systems instead of actually creating the models on

25:44

their own. I would say that's a

25:46

very, very popular approach right now in

25:48

an enterprise-y environments that are still more software-driven

25:50

and still trying to figure out how to

25:53

use APIs for AI models.

25:55

Yeah, I can give you a

25:57

concrete example of something we did

25:59

with a... Customer Prediction Guard,

26:01

which is the Shopify API,

26:03

right? So ecommerce customer, the

26:06

Shopify API has this

26:08

sort of Shopify, I think

26:11

it's called ShopifyQL query language, it's

26:13

structured, right? And you can call

26:15

the regular API via GraphQL, right?

26:18

And so it's very structured sort

26:20

of way you can call this

26:22

API to get sales

26:25

information or order information or

26:27

do certain tasks, right? And

26:29

so you can create a

26:31

natural language query and say,

26:34

okay, well, don't try to give me

26:36

natural language out, but give me ShopifyQL

26:38

or give me something that I can

26:40

plug into a GraphQL query, and then

26:42

I'm going to go off and query

26:44

the Shopify API and either perform some

26:46

interaction or get some data, right? So

26:49

this is very popular. This is

26:51

how you sort of get AI on top

26:53

of tools. What's

26:55

interesting, I think that Rabbit

26:57

observes in what they're saying

27:00

and others have observed as

27:02

well. I think, you know, you take

27:04

the case like ask UI,

27:06

like we talked about before. And

27:09

the observation is that

27:11

not everything has this

27:13

sort of nice structured way you can

27:15

interact with it with an API. So

27:18

think about pull out your phone. You've

27:20

got all of these apps on your phone. Some

27:23

of them will have a nice API

27:26

that's well defined. Some of

27:28

them will have an API that me

27:30

as a user, I know nothing about, right? There's

27:32

maybe an API that exists there, but it's hard

27:35

to use or not that well documented,

27:38

or maybe I don't have

27:40

the right account to use it or

27:42

something. There's all of these

27:44

interactions that I want to do

27:47

on my accounts with

27:49

my web apps, with my apps that

27:52

have no defined structured API

27:54

to execute all of those

27:56

things. So then the question

27:58

comes in. And that's why I wanted

28:00

to lead up to this is because even

28:03

if you can retrieve data to get grounded

28:05

answers, even if you can integrate images, even

28:07

if you can interact with

28:09

APIs, all of that gets

28:11

you pretty far as we've seen. But

28:13

ultimately not everything is going to

28:15

have a nice structured API, or

28:18

it's not going to have an API that's updated

28:21

or has all the features that you

28:23

want or does all the things you

28:25

want, right? So the question, I think

28:27

the fundamental question that the Rabbit research

28:29

team is thinking about is how

28:31

do we then reformulate the

28:34

problem in a flexible way

28:37

to allow a user to trigger an

28:40

AI system to perform arbitrary

28:43

actions across an arbitrary

28:45

number of applications or

28:47

an application without

28:50

knowing beforehand the

28:52

structure of that application or its

28:54

API? So I think that's the really

28:56

interesting question. I agree with

28:58

you completely. And there's so

29:01

much complexity. They refer to it

29:03

as human intentions expressed

29:05

through actions on a computer. And that

29:07

sounds really, really simple. When

29:09

you say it like that, but they're so, that's

29:12

quite a challenge to make that

29:14

work in an unstructured world. So I'm

29:17

really curious. They

29:20

have their research page, but I don't guess

29:22

they've put out any papers that

29:24

describe some of the research they've done yet, have they?

29:27

Just in general terms. And that's

29:29

where we get to

29:31

the exciting world of large

29:34

action models. Somehow

29:39

that makes me think of Arnold Schwarzenegger.

29:41

Large action heroes.

30:07

You know, when we started podcasting back

30:09

in 2009, and online stores

30:11

is the furthest thing from our

30:13

minds, now we have merch.changelog.com. And

30:15

you can go there right now

30:17

and order some t-shirts. And that's

30:19

all powered by Shopify. It's so

30:22

easy, all because Shopify is amazing.

30:25

Shopify is the global commerce platform that

30:27

helps you sell at every stage of

30:29

your business. From the launch

30:31

your online shop stage to the first

30:34

real life store stage, all the way to

30:36

the, did we just hit a million dollar stage?

30:39

Shopify is there to help you grow. Whether

30:41

you're selling security systems or marketing

30:43

memory modules, Shopify helps you sell

30:45

everywhere from their all-in-one e-commerce platform

30:48

to their in-person POS system, wherever

30:50

and whatever you're selling, Shopify has

30:52

got you covered. Shopify

30:54

helps you turn browsers into buyers with

30:56

the internet's best converting checkout, up to

30:59

36% better compared to other leading commerce

31:01

platforms. And sell more with less effort

31:03

thanks to Shopify magic, your AI powered

31:05

all-star. You know, nothing gets me and

31:08

Jared more excited than when our guests

31:10

get that coupon code in their email,

31:12

in their show ships or to everyone

31:14

out there who loves Change Well Podcast

31:16

and can go to merch.changewell.com and get

31:18

your favorite threads to support our podcast.

31:21

It is just the best thing ever.

31:24

From stickers to threads, all

31:26

that is at merch.changewell.com. And

31:29

did you know that Shopify powers 10% of

31:31

all e-commerce in the US and

31:34

Shopify is the global force

31:36

behind Allbirds, Raffi's and Brooklyn and

31:38

millions of other entrepreneurs of every

31:41

size across 175 countries. Plus,

31:45

Shopify's extensive help resources are there

31:47

to support you and your success

31:49

every step of the way. Those

31:52

businesses that grow, grow with Shopify. Sign

31:54

up for a $1 per

31:56

month trial period at shopify.com. A.I.

32:00

all lowercase go to

32:03

shopify.com slash practical

32:05

AI now to grow your business no

32:07

matter what stage you're in. Again

32:10

shopify.com/practical AI.

32:30

Yeah Chris,

32:35

so coming from Arnold

32:37

Schwarzenegger and large action

32:39

heroes to large action

32:41

models, I was wondering if

32:44

this was a term that Rabbit came up

32:46

with. I think it has existed for some

32:48

amount of time. I at least saw it

32:50

at least as far as back

32:52

as June of last year, 2023 I saw Silvio Savaris article

33:00

on Salesforce AI research

33:03

blog about LAMs from large

33:05

language models to large action

33:08

models. I think the

33:10

focus of that article was very much

33:12

on the sort of agentic stuff that

33:14

we talked about before in terms of

33:16

interacting with different systems but in a

33:18

very automated way. The

33:21

term large action model as far

33:23

as Rabbit refers to it, it's

33:25

this new architecture that they are saying

33:28

that they've come up with and I'm sure

33:30

they have because seems like the

33:32

device works. We don't know I think all of

33:35

the details about it at least I haven't

33:37

seen all of the

33:39

details or it is sort of not

33:42

transparent in the way that maybe

33:44

a model release would be on hugging

33:46

face with code associated with it in

33:48

a long research paper. Maybe

33:50

I'm missing that somewhere or listeners can

33:52

tell me if they found it but

33:54

I couldn't find that. They do have

33:56

a research page though which gives us

33:59

a few clues as

34:01

to what's going on and some

34:03

explanation in kind of general

34:06

terms. And what

34:08

they've described is that

34:10

their goal is to

34:13

observe human interactions with

34:15

a UI. And

34:17

there seems to be some sort

34:20

of multimodal model that is detecting

34:22

what things are where in the

34:24

UI. And they're

34:26

mapping that onto some

34:28

kind of flexible, symbolic,

34:32

synthesized representation of a

34:35

program. So the

34:37

user is doing this thing, right? So

34:40

I'm changing the payment on my Uber app.

34:43

And that's represented or synthesized

34:46

behind the scenes

34:49

in some sort of structured way and

34:52

kind of updated over time

34:54

as it sees demonstrations, human

34:56

demonstrations of this going on.

34:59

And so the words that they... I'll just

35:01

kind of read this so people, if

35:03

they're not looking at the article, they say

35:05

we designed the technical stack from the ground

35:07

up from the data collection

35:09

platform to the new network architecture.

35:13

And here's the sort of very dense

35:15

loaded wording that probably

35:17

has a lot packed into

35:19

it. They say that utilizes

35:21

both transformer style attention and

35:24

graph-based message passing combined

35:27

with program synthesizers

35:31

that are demonstration and

35:33

example guided. So

35:36

that's a lot in that statement. And of course,

35:38

they've mentioned a few in more

35:40

description in other places. But

35:43

it seems like my sort

35:45

of interpretation of this is

35:48

that the requested

35:51

action comes in to

35:53

the system, to the network

35:56

architecture, right? And there's a

35:58

neural layer. So this is a neural layer. symbolic

36:00

model. So there's a neural

36:02

layer that somehow interprets that

36:05

user action into a

36:07

set of symbols or

36:09

representations that it's learned about the

36:11

AI, like the UI, I mean

36:13

the Shopify UI or the Uber

36:16

UI or whatever. And

36:18

then they use some sort

36:20

of symbolic logic processing of

36:23

this sort of synthesized program to

36:26

actually execute a series of actions

36:28

within the app and

36:30

perform an action that it's learned

36:33

through demonstration. So this

36:35

is sort of what

36:37

they mean, I think, when they're

36:40

talking about neurosymbolic. So there's a

36:42

neural network portion of this, kind

36:45

of like when you put something

36:47

into chat GPT or

36:49

a transformer-based large language model and

36:51

you get something out. In

36:54

the case of we were talking about getting JSON

36:57

structured out when we're interacting with

36:59

an external tool, but here it

37:01

seems like you're getting some

37:03

sort of thing out, whatever

37:06

that is, a set of symbols or

37:08

some sort of structured thing that's then

37:10

passed through symbolic

37:12

processing layers that

37:15

are essentially symbolic

37:17

and rule-based ways to

37:19

execute a learned program

37:22

over this application. And by program

37:24

here, I think they mean, they

37:26

reference a couple papers, and my

37:29

best interpretation is that they mean

37:31

not a computer program in the

37:34

sense of Python code, but

37:36

a logical program that

37:38

represents an action like here is

37:41

the logical program to update the

37:43

payment on the Uber app. You

37:46

go here and then you click this and

37:48

then you enter that and then you blah

37:50

blah blah, you do those things, right? Except

37:53

here, those programs, those synthesized programs

37:55

are learned by looking at the

37:57

data. So, I think that's a good thing. I think that's a

37:59

good thing. at human intentions

38:01

and what they do in

38:04

an application. And that's

38:06

how those programs are synthesized. So

38:08

that was a long, I

38:10

don't know how well that held together, but that was my

38:13

best at this point without seeing anything

38:16

else from a single sort of

38:18

blog post. When you can keep me quiet for

38:20

a couple of minutes there, it means you're doing

38:22

a pretty good job. I

38:24

have a question I wanna throw out and I

38:27

don't know that you'd be able to answer it obviously, but

38:29

it just to speculate. While

38:31

we were talking about that and

38:33

thinking about multimodal, I'm

38:35

wondering the device itself comes

38:38

with many of the same sensors that

38:40

you're gonna find in

38:42

a cell phone these days, but

38:44

I'm wondering if that feeds in more than

38:47

just the speech and that

38:49

obviously has the camera on it. It

38:51

comes with a magnum meter, I can't

38:53

say the word, GPS, accelerometer, and gyroscope.

38:56

And obviously, so it's detecting motion,

38:59

it knows location, all the things. Has

39:01

the camera, has the mic. How

39:03

much of that do you think is

39:05

relevant to the lambs, to

39:07

the large action model in terms of

39:10

inputs? Do you think that there is

39:12

potentially relevance in the non-speech and

39:15

non-camera concerns on it? Do you

39:17

think the way people move could

39:19

have some play in there? I

39:22

know we're being purely speculative. I'm just,

39:24

it caught my imagination. Yeah,

39:26

I'm not sure. I mean, it

39:28

could be that that's used in

39:31

ways similar to how those

39:33

sensors are used on smartphones

39:35

these days. Like if I'm

39:38

asking Rabbit to book

39:40

me an Uber to here or

39:42

something like that. Now, it

39:45

could infer the location maybe of where I am based

39:49

on where I'm wanting to go or ask me where I

39:51

am. But likely

39:53

the easiest thing would be to use

39:55

a GPS sensor, to know

39:58

my location and just like. put that

40:00

as the pen in the Uber app and now it

40:02

knows. So I think there's

40:04

some level of interaction between these things.

40:06

I'm not sure how much, but it

40:08

seems like, at least in

40:10

terms of location, I could definitely see that

40:13

coming into play. I'm not sure on the

40:15

other ones. Well, it looks a lot like,

40:18

physically, it looks like a lot like

40:20

a smartphone without the phone. Yeah, yeah,

40:22

a smartphone, different sort of aspect

40:25

ratio, but still kind of touchscreen.

40:27

I think you can still pull

40:29

up a keyboard and that sort

40:32

of thing. And you see

40:34

things when you prompt it. So

40:36

yeah, I imagine that that's

40:38

maybe an evolution of this over

40:41

time is sensory

40:43

input of various things.

40:45

Like I could imagine that being very

40:47

interesting in running or fitness

40:49

type of scenarios, right? If I've

40:51

got my rabbit with me and

40:54

I instruct rabbit to post

40:57

a celebratory social media post every

40:59

time I keep my mileage or

41:01

my time per mile at

41:07

a certain level or something, and it's using

41:09

some sort of sensors on

41:11

the device to do that. I think

41:14

there's probably ways that that will work

41:16

out. I'm not sure about now. It'll be

41:18

interesting that if this approach

41:21

sticks, and I might

41:23

make an analogy to things like the

41:26

Aura Ring for health wearing that, and

41:28

then competitors started coming out, and then

41:31

Amazon has their own version of

41:33

a health ring that's coming out.

41:36

Along those lines, you have

41:38

all these incumbent players in the AI space that

41:40

are, for the most

41:42

part, very large, well-funded cloud

41:45

companies, and in at least

41:47

one case, a retail company blended

41:50

in there. If

41:52

this might be an alternative in

41:54

some ways to the smartphone being

41:56

the dominant device, and it has

41:59

all the best, the same capabilities plus

42:01

more and they have the lamb

42:03

behind it to drive that functionality.

42:05

How long does it take for

42:08

an Amazon or a Google or

42:10

a Microsoft to come along after

42:12

this and start producing their own

42:14

variant because they already have the

42:16

infrastructure that they need to produce

42:18

the back end and they're going to

42:20

be able to produce you know Google and Amazon certainly

42:23

produce front-end stuff on quite a lot as well. So

42:25

it'll be interesting to see if this is the

42:27

beginning of a new marketplace opening up

42:30

in the AI space as an

42:32

entrant. So there's already really

42:34

great hardware out there for

42:36

smartphones and I

42:38

wonder if something like this is kind

42:41

of a shock to the market but in some

42:46

ways you know

42:48

just as phones with

42:50

external key buttons sort of

42:52

morphed into smartphones

42:55

with touch screens. Otherwise

42:57

I could see smartphones that

43:00

are primarily app-driven in

43:02

the way that we interact with them now being

43:05

pushed in a certain direction

43:07

because of these interfaces and

43:10

so smartphones won't look the

43:12

same in two years as they

43:14

do now and they won't follow that same

43:16

sort of app-driven trajectory like

43:18

they are now probably because of

43:20

things that are rethought and

43:23

it might not be that we all

43:25

have rabbits in our

43:27

pocket but maybe smartphones become

43:29

more like rabbits over time.

43:32

I'm not sure I think that that's

43:34

very likely a thing that happened.

43:36

It's also interesting to me it's a

43:38

little bit hard to parse out

43:41

for me what's happening, what's

43:43

the workload like between what's

43:46

happening on the device and

43:48

what's happening in the cloud and what

43:51

sort of connectivity is

43:53

actually needed for full functionality

43:55

with the device. Maybe that's something

43:57

if you want to share your own findings

44:00

on that in our Slack community

44:03

at changelog.com/community. We'd love to hear

44:05

about it. My understanding is there

44:07

is at least a good

44:09

portion of the lamb and the

44:12

lamb powered routines that are operating

44:14

in a centralized sort of platform

44:17

and hardware. So there's not this

44:19

kind of huge large model running

44:22

on a very low power device

44:25

that might suck away all the, all

44:27

the energy. But I think that's also an

44:30

interesting direction is how far could

44:32

we get, especially with local models

44:34

getting so good recently

44:37

with fine tune, local optimized

44:40

quantized models, doing action

44:42

related things on edge

44:45

devices in our

44:47

pockets that aren't relying on

44:50

stable and high speed internet

44:53

connections, which also of course

44:55

helps with the privacy related

44:57

issues as well. I

44:59

agree. I, by the way, I'm going to

45:01

make a prediction. I'm predicting that a large

45:04

cloud computing service provider

45:06

will purchase rabbit. All

45:09

right. You heard it here first. Uh, uh, I

45:12

don't know what sort of odds Chris is,

45:15

is giving, um, or I'm not going to

45:17

bet against him, that's for sure. But,

45:20

uh, yeah, I think, I think that's interesting. That

45:22

is definitely a lot of, I think there will

45:24

be a lot of action

45:26

models of some type, whether those

45:29

be tool using

45:31

LLMs or lambs or

45:34

SLMs or whatever, whatever we've got

45:37

coming up. Um, and

45:39

they should have named it. They could have named

45:42

it a lamb instead of a rabbit is, you

45:44

know, I just want to point out they're, they're

45:46

getting their animals mixed up, man. I actually, yeah,

45:48

that's a really good point. I don't know if

45:50

they came up with rabbit before lamb, but maybe

45:52

they just had the lack of the B there,

45:55

but I think they probably could have figured

45:57

out something. Yeah. And the only thing that could.

45:59

a bit, just in the lamb is raccoon, of

46:02

course, but that's beside the point. You have to

46:04

come around full circle there. Of

46:06

course, of course. We'll leave that

46:08

device up to you as well.

46:10

Yeah. All

46:13

right. Well, this has been fun, Chris. I

46:15

do recommend in terms of if people wanna

46:17

learn more, there's a really good research

46:20

page on rabbit.tech, rabbit.tech

46:22

slash research. And down at the

46:24

bottom of the page, there's

46:27

a list of references

46:29

that they share throughout that

46:32

people might find interesting as

46:35

they explore the technology. I

46:37

would also recommend that people look at Lang

46:40

chain's documentation on tools.

46:43

And also maybe just check out a couple

46:45

of these tools. They're not that complicated. Like

46:47

I say, there's sort of, they

46:49

expect JSON input and then they run a

46:52

software function and do a thing. That's sort

46:54

of what's happening there. So maybe

46:56

check out some of those in the array of

46:58

tools that people have built for Lang chain and

47:01

try using them. So yeah, this has

47:03

been fun, Chris. Thanks, it was great.

47:05

Thanks for bringing the rabbit to our

47:08

attention. Yeah, hopefully see you in person

47:10

soon. That's right. And

47:12

yeah, we'll include some links in our show

47:14

notes. Everyone take a look at them. Talk

47:16

to you soon, Chris. Have a good one.

47:27

Thank you for listening to practical AI.

47:30

Your next step is to subscribe now,

47:32

if you haven't already. And

47:34

if you're a longtime listener of the show, help

47:36

us reach more people by sharing practical AI with

47:39

your friends and colleagues. Thanks

47:41

once again to Fastly and Fly for partnering

47:43

with us to bring you all change talk

47:45

podcasts. Check out what they're

47:47

up to at fastly.com and fly.io. And

47:50

to our beat freakin residents, Breakmaster Cylinder for

47:53

continuously cranking out the best beats in the

47:55

viz. That's all for now. We'll

47:57

talk to you again next time. All.

48:10

Be.

Rate

Get this podcast via API

From The Podcast

Practical AI: Machine Learning, Data Science

Making artificial intelligence practical, productive & accessible to everyone. Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs & more). The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More