Podchaser Logo
Home
Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Released Monday, 5th December 2022
Good episode? Give it some love!
Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Monday, 5th December 2022
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:13

Hello, and welcome to Podcastinit enit.

0:15

The podcast about Python and the people who make

0:17

it great. When

0:18

you're ready to launch your next app or want to try a

0:20

project you hear about on the show, you'll need somewhere to

0:22

deploy it. So check out our friends over at Linode.

0:24

With their managed Kubernetes platform,

0:27

it's easy to get started with the next generation

0:29

of deployment and scaling powered by the battle

0:31

tested Linode platform. form, including simple

0:33

pricing, node balancers, forty gigabit

0:36

networking, and dedicated CPU and

0:38

GPU instances. And now

0:40

can launch a managed MySQL postgres

0:42

or MongoDB cluster in minutes to keep

0:44

your critical data safe with automated backups

0:46

and failover. Go to python podcast

0:48

dot com slash linode today to get a

0:50

one hundred dollar credit to try out their new database

0:53

service, and don't forget to thank them for the continued

0:55

support of this show. Your

0:56

host, as usual, is Tobias Macy, and

0:59

this month, I'm running a series about python's use

1:01

in machine learning. If you enjoy this

1:03

episode, you can explore further on my new show,

1:05

the machine learning Podcastinit helps you go

1:07

from idea to production with machine learning.

1:10

To find out more, you can go to the machine

1:12

learning podcast dot com. Your

1:13

host is Tobias Macy, and today I'm

1:16

interviewing Travis Adaire about Predibase,

1:18

a low code platform for building ML models

1:20

in a declarative format. Travis, can you

1:22

start by introducing yourself? Thanks

1:24

for having me on today, Tobias. So

1:26

I'm Travis. I'm the CTO of CreditBase.

1:29

CreditBase is low code

1:31

platform designed to make machine

1:33

learning more accessible and

1:35

more useful to the enterprise. Before

1:39

that, I was a tech lead manager

1:41

for Uber's machine learning platform, leading

1:43

team focused on deep learning training, one

1:46

of the lead maintainers on the Horwab

1:48

open source project

1:50

and also a core contributor to

1:52

the Lululemon project as well, which is one of

1:54

the foundational technologies for

1:56

credit base. And

1:58

do you remember how you first got started working

2:00

in the area of machine learning? Absolutely.

2:01

Yeah. So it goes

2:03

back a bit to twenty eleven

2:06

or so. I was working for Lawrence

2:08

Livermore National Lab

2:10

on processing about ten terabytes

2:12

of seismic data, and our goal

2:15

is to try to do some

2:17

analysis of it, to detect weapons

2:20

certainly enough. But

2:21

what we found was that

2:23

there were a lot of over fifty

2:25

percent of the data or so was noise, and

2:27

we had no good way to detect it. So

2:29

started pulling out some of my undergrad AI

2:32

textbooks, started implementing some

2:34

support vector machine and ran it on

2:37

top of the tube.

2:38

and got just really excited about the whole

2:40

thing, published an article in computers

2:42

and geosciences journal

2:44

and to go to grad

2:46

school and get my mass is that machine

2:48

learning and

2:49

more active in the the industry.

2:51

And, yeah, that's all I'll say.

2:53

And now that has

2:55

brought you where you are today with the

2:58

business and the project. And I'm wondering

3:00

if you can describe a bit more about what it is

3:02

that you're building there and some of the story behind

3:05

you decided that this was a problem

3:07

space that you wanted to spend your time and focus

3:09

on. When

3:10

I was at Uber, I actually started off

3:12

as a machine learning engineer, so

3:14

working on kind of the vertical problems

3:16

of ML.

3:17

And what I found was that

3:20

there were a lot of things I wanted to do. Like, I

3:22

wanted to, you know, try deep learning

3:24

and try training on large data sets

3:26

and multi GPU and all these sorts of things.

3:29

but

3:29

there just wasn't a lot of good tooling available

3:31

at the time to do that. There was TensorFlow

3:34

and we

3:35

wanted to run TensorFlow scalably. There

3:37

was, like, a whole lot of hoops you had to

3:39

jump through. And

3:40

then integrating that with something like Spark

3:42

that we were using for data processing was

3:44

just, like, forget

3:45

about it.

3:46

So what I realized was that if I wanted

3:48

to solve it all problems, I really need

3:50

to start with the it all tooling and

3:52

infrastructure. And so I joined the

3:55

Michelangelo deep learning team.

3:57

And while I was there working

3:59

on this horizontal

3:59

problem, you know,

4:01

we worked with a lot of customers that

4:03

had very similar patterns that emerged

4:06

to mine where

4:08

we

4:08

often found that there was this struggle

4:10

to get something that was productionizable,

4:13

like, at scale. Right? And there is

4:15

also a repeating pattern

4:17

of there

4:18

just being, frankly, more ML problems

4:21

than there was ML expertise of the company.

4:23

And so from a kind of horizontal platform

4:26

standpoint of Uber AI,

4:28

tasked with figuring out, you know, how can

4:30

we help get models of production

4:32

faster we kind of

4:33

realized that there was a need for better

4:36

abstractions, better infrastructure

4:38

more generally. And

4:39

so the tool looping

4:42

that my co founder, Puro, put together,

4:44

end

4:44

up being a perfect encapsulation of

4:46

that vision of being able to say,

4:48

you know, let's let researchers build

4:51

state the art models and put them into

4:53

as components into this framework. And

4:56

then that gives the vertical teams in the

4:58

company this very easy declarative

5:00

interface to just kind of swap

5:02

in and out different components for their

5:04

data without having to rewrite,

5:06

you know, huge swaths of python

5:08

code every time. And we

5:10

realized that this was like a very successful pattern

5:12

for Uber. And at the same time, you know,

5:15

Ludwig became open source, saw

5:17

that it resonated very strongly with the community

5:19

and realized that there was like a very real

5:21

need for this kind of better

5:23

abstraction layer, if you will, in

5:25

the industry,

5:26

and decide to form Predibase with the

5:28

intent at pushing the

5:30

state of the art forward in terms of

5:33

what kinds of tools data science

5:35

and machine learning teams have available

5:37

to them. to make them more productive and

5:39

kind of decrease the time, the value of machine

5:41

learning in the enterprise. In

5:43

terms of the audience that you're

5:45

focused on, particularly given

5:47

that you're very early in your journey

5:49

of being a new company and starting

5:51

to work with some of the first

5:53

sets of customers I'm curious

5:55

how you think about the

5:57

priorities that you're trying to support

5:59

and how that

5:59

influences the areas of focus

6:02

that you're investing in and the user

6:04

experience and feature development that you're prioritizing?

6:07

Yeah. So we like to say that we

6:10

don't expect a Predibase that we're gonna be

6:12

your first machine learning a call that you've

6:14

ever trained. Right? So oftentimes, we're

6:16

coming into organizations that

6:18

have a lot of machine learning problems that they

6:20

wanna solve. maybe they have some

6:22

kind of horizontal team that's focused

6:24

on trying to build out a platform for

6:27

doing machine learning. And maybe

6:29

they've tried using some AutoML tools

6:31

in past and have struggled with getting

6:33

them into production. And so what

6:35

we identify is, you know, we

6:37

see these teams that have struggled in this way

6:39

And the value proposition that we

6:41

wanna bring to them is to say, you know, if

6:43

you've used, like, you know,

6:45

some traditional ML systems in the past,

6:47

like spark and I'll live or what have you and

6:49

you're struggling to kind of up

6:51

level to deep learning and more state

6:53

of the art techniques in the industry. we

6:56

provide, like, a platform that gives you those

6:58

capabilities and a form factor that's, like,

7:00

much more familiar to you. And if you're

7:02

struggling to kind of keep up with the amount

7:04

of problems that the organization has, like,

7:06

maybe you have teams of engineers

7:08

that have in all problems that maybe are

7:10

not the top priority for whole company,

7:13

but a very important priority for that

7:15

team. provides a

7:17

platform that allows

7:19

those teams to be unblocked and allows

7:21

everyone in the organization to collaborate

7:23

together

7:24

towards building these solutions. And

7:26

so this focus on you

7:29

know, collaboration and kind of

7:31

mixed modality, like, you know, very

7:33

broad set of tasks that people might wanna solve.

7:35

those are very core focuses for

7:37

us when we look at companies that we

7:39

wanna partner with at this stage.

7:41

One

7:41

of the interesting things that you mentioned is

7:44

that you're working with companies who have

7:46

a lot of machine learning problems

7:48

to solve. And I'm wondering if you can

7:50

talk to what that really

7:52

means? Like, how you can identify that a

7:54

problem that you have is a machine learning

7:56

problem or whether machine learning is the

7:58

right approach to being

7:59

able to provide value in

8:02

utility for a given

8:04

objective that you're trying to achieve?

8:05

I

8:06

would say that it comes in a few

8:08

different flavors. Like on the one hand, you

8:10

have kind of traditional

8:13

data warehouse type systems

8:15

that have tables that have historical

8:17

data or transactional data And

8:19

so very often, the story there is

8:21

people wanna do some kind of predictive analytics.

8:24

Right? So we know who

8:25

churned last month, you know, we wanna

8:27

predict who's gonna churn next month. So

8:29

that kind of forward looking predictive

8:32

capability is, like, one,

8:34

like, type

8:34

of problem that we see a lot with companies.

8:37

that

8:37

fits very nicely into machine learning.

8:39

So you have a lot of data in your

8:41

database. You wanna be able to, you

8:43

know, predict or, like, make forward

8:45

looking statements about that data.

8:47

That's one area where we saw it.

8:49

But beyond that kind of structured

8:51

problem, there's also this question of

8:53

unstructured data as well. Right? And so you have a

8:55

lot of companies that have text

8:57

data or image data or audio data

9:00

sitting around. that they've collected, and they

9:02

don't really know what to do with it. And it's

9:04

maybe not so much a question of, you know,

9:06

I have data that says, what customers

9:08

submitted and support take in the past, I

9:10

want to predict what support tickets are going to say in the

9:12

future. It's not anything like that, but it's more

9:14

about just understanding semantically

9:16

what's going on in the data and then how

9:18

that can be used to better

9:20

informed and predictive forward looking models that

9:22

we want to build on

9:23

that more transactional data.

9:25

So this idea of unlocking the power

9:27

of un structured data is another really core one. And

9:29

so one of the things I think is

9:31

very unique about and Ludwig is

9:33

the ability that we can kind of bridge

9:35

this gap between structured and structured

9:37

data. So the platform is is very flexible.

9:40

It's data oriented in such a way that

9:42

if you have some transactional

9:44

tabular data and you also have some

9:46

unstructured image or audio or

9:48

text data, those can be combined

9:50

together into a unified machine

9:52

learning model in a very simple and straightforward

9:55

way. And so we can unblock

9:57

organizations that have all this disparate

9:59

data and they

9:59

want us drive value from it, but they haven't

10:02

figured out how in an effective way.

10:04

That's

10:04

where we think there's a lot of power for machine

10:06

learning, particularly in Predibase to kind of

10:08

slot it. Yeah. One

10:09

of the ways that I've seen those

10:11

different applications categorized is

10:13

the difference between predictive

10:15

analytics, which is the first category

10:17

that you mentioned versus prescriptive

10:19

analytics of this is what you should

10:21

do and then descriptive analytics

10:23

to say, just wanna understand what

10:25

this is trying to tell me.

10:27

Right. Right. Absolutely. I think that

10:29

descriptive component is

10:30

one that not a lot of people have tapped

10:33

into. In

10:33

terms of the way

10:36

that you have formulated the product

10:38

that you're building at Predibase, you're positioning

10:40

it as declarative ML And

10:43

you've mentioned earlier that organizations may

10:45

have had experience trying to use

10:47

the category of tools

10:49

called AutoML And I'm wondering if you can

10:51

just talk to the differences in

10:54

the nomenclature as far

10:56

as what that really means and

10:58

how the sort of expectations are

11:01

different between an AutoML

11:03

category of tool and a declarative

11:05

ML category of tool. Absolutely. Yeah.

11:07

I think this is like a very

11:09

key differentiator between how

11:11

we're thinking about the problem and how a lot of

11:13

other companies out there are. So

11:15

the way that we think about it is that at a

11:17

high level, there are very similar capabilities

11:19

in terms of being able to

11:21

put ML in the hands of non experts at

11:23

kind of the starting point. But we believe that

11:26

declarative ML provides a

11:28

more principled and flexible

11:30

path forward. So whereas a lot

11:32

of AutoML solutions, I think, in

11:34

it, generates state, AutoML often

11:36

becomes this kind of kitchen

11:38

sink style approach where the

11:40

system will throw everything it can at the

11:42

problem. They can think of And

11:44

if something works, then great. You

11:46

know, you kinda take that baton and run

11:48

with it. If something doesn't work,

11:50

then there's not really a lot of options

11:52

that you have terms of what's going to happen

11:55

next, like how someone who is maybe

11:57

a domain expert or a data expert

11:59

can come in

11:59

and kind of help out with, you know,

12:02

unblocking things

12:03

where we think declarative ML provides

12:05

a difference here is

12:07

that because it gives you this

12:09

very complete specification,

12:12

you can start at something very high level and say,

12:14

you know, I just want to predict this target

12:16

given these input variables. And

12:18

you can get a baseline from there.

12:21

but

12:21

it's not the end of the story. And so because it's

12:23

very explicit about saying, here's everything that

12:25

the system did, and you can

12:27

modify and customize any

12:29

aspect of this down to individual,

12:32

you know, layers of a neural network.

12:34

Right? It allows people to

12:36

then iterate on these systems, on

12:38

these implementations over time.

12:40

and kind of build towards

12:43

a working solution in a more

12:45

principled way. So for example,

12:47

they can say, oh, well, I initially

12:49

tried building this model with this set of

12:51

parameters. And then for V2I

12:53

swapped out this model architecture and

12:55

I changed the learning rate from this

12:57

to this. and it gives you that audit

12:59

trail of being able to say, here are all the things I

13:01

tried, here's what changed, and

13:03

here's the effect that that had on on

13:05

model performance. And so we

13:07

think that this is also very

13:09

powerful for enabling collaboration.

13:12

So if, you know, someone who is an

13:14

engineer maybe wants to train

13:15

their first model in

13:16

Predibase can do that without having to know a

13:19

lot of details about the

13:21

how of what's going on under the hood.

13:23

But if they get a solution and then they want

13:25

to maybe have a more expert data

13:27

scientist, take a look and say, you know, what do

13:29

you think we should try in order

13:31

to get better performance? they can take a

13:33

look at the config and say, oh, well, you

13:35

know, I see that you're using this parameter here.

13:37

Let's maybe try swapping that out,

13:39

see what happens. And so

13:42

it gives you that ability to make

13:44

these incremental changes in a very simple

13:46

way that if you were kind of

13:48

going down to just pure low level

13:50

tools like PyTorch or TensorFlow, it'd

13:52

be much more difficult to do that because

13:54

you'd be having to ship over entire

13:57

cheaper notebooks and Python

13:59

libraries and, you know, what's the

14:01

execution environment for all this?

14:03

It's much more difficult for someone to just

14:05

quickly take a look and kinda provide

14:07

feedback and kind of next steps.

14:09

And we also believe that this

14:11

ties in very nicely to our

14:13

version of AutoML, which we

14:15

would call iterative ML or

14:17

interim L where we see it being

14:19

much more of a conversation that you're having

14:21

with the system where you try

14:23

something out the system can

14:25

propose some new things to

14:27

modify in the in the specification for

14:29

the model. You can choose either except

14:31

to reject any of those things train for

14:33

a little bit more and then use the results of

14:36

that previous run to

14:38

then inform what you're gonna try next. So it

14:40

becomes a very in the loop back

14:42

and

14:42

forth process

14:43

that progresses in a much more in

14:45

a way that we think is much more like how

14:47

traditional software development is done, right, where you

14:49

have to get repo make some

14:51

code commits. And over time, you

14:53

can see the code evolve and change to

14:55

kind of better conform to the end

14:57

state. as opposed to just trying

14:59

to get all out there at once and, you

15:01

know, not have any way of knowing, you

15:04

know, what was the his street? What was the

15:06

effect of every change that we did?

15:08

In this

15:08

category of declarative ML,

15:11

another company that I've seen using

15:13

that terminology is continual,

15:15

which is based on being able

15:17

to build machine

15:19

learning pipelines on top of your data

15:21

warehouse so that you can just treat

15:23

your machine learning workflow as sequel

15:25

effectively. I'm wondering if you

15:27

can just characterize the

15:29

relative strengths and use cases of what you're

15:31

building at pred versus what they're

15:33

building at continual? Yeah.

15:34

Absolutely. So I guess the first thing I'd say

15:37

is that, you know, in general, we're very glad

15:39

to see other companies kind

15:41

of validate the idea behind

15:43

declarative about, you know, from following

15:45

the work that interests and and the folks that

15:47

continue to have been doing, It's always

15:49

been very nice to see that they've referenced our

15:51

work on Lulily again over ten. So

15:53

one of our co founders, Chris Ray,

15:55

was had a company called Lattice that

15:57

had a product called Overton that was acquired

15:59

by Apple, which was another early

16:01

declarative analysis. And so I think in

16:03

general, there's, like, a really good shared

16:05

vision of kind of moving the conversation

16:07

forward about better abstractions in a moment.

16:09

So I think there's definitely an element

16:11

of, you

16:11

know, rising tides, you know, raising all

16:14

ships to it. where

16:15

I think their differences would be

16:17

they definitely are very linked into the

16:19

kind of modern data stack operation

16:21

side of things. I think

16:23

that's their value proposition resonates very

16:25

nicely with people who are

16:27

active DVT users, for example. That's

16:29

a big part of kind of how they're approaching

16:31

the problem, which I think is a totally valid

16:33

way to think about it. For us,

16:35

we definitely think that we can do

16:38

a lot not just on the operation

16:40

side, but on the model development

16:42

side as well. So in with

16:44

Ludwig, we provide a framework

16:46

that is also pushing forward the

16:48

state of the art of what ML

16:50

models can do. Right? And so that's

16:52

a big part of the story for

16:54

us is trying to figure out how do

16:56

we help users get good

16:58

models in the first place and do it in a

17:00

way that is very low barriers

17:02

to entry, but very high performance

17:04

and high ceiling. Right? We also believe

17:06

the operations component is a big part of

17:08

that, but it's not the only part. I think there's still

17:10

a lot of work that needs to be done on just

17:13

getting to a good model that you wanna put in

17:15

production. And so

17:16

that's where I think Ludwig

17:18

is a bit different from what some of the other tools

17:20

out there provide, and that's

17:22

we're

17:22

also tackling that aspect of the problem and

17:24

the people. And as far

17:25

as the implementation and

17:28

architecture of what you've built

17:30

at Predibase, can you talk to the

17:32

overall system design and the ways

17:34

that you've thought about the

17:36

architecture of how to

17:38

approach this problem of making

17:40

declarative ML accessible

17:42

and easy to operate so

17:44

that teams who don't necessarily

17:46

want to invest in building

17:48

the entire ML ops stack

17:50

can be able to pick it up and run with

17:52

it and start to gain value very

17:54

quickly. Predibase

17:54

built as a multi

17:57

cloud native platform. We're built

17:59

on top of Kubernetes.

18:01

And so, you know, we have deployments

18:03

that run AWS, Azure, GCP,

18:05

and also on some on premise

18:07

Kubernetes clusters. So we believe that that's

18:09

like a very core part to make

18:11

it flexible so that wherever your data

18:13

happens to live, you know, we can

18:16

push the compute down to be as close

18:18

to that data as possible to

18:20

minimize the latency and minimize

18:22

the egress costs and all of those things.

18:24

So that's

18:24

a very core part of how it's architected.

18:27

We

18:27

also have a separation, but between

18:29

what control plane and data plane of

18:32

the system. Our data plane

18:34

is built on top of the Ludwig

18:36

Anré

18:36

open source work that we've done.

18:39

So we use Ray for doing the

18:40

distributed aspect of, you know,

18:43

scaling the to large datasets and

18:45

paralyzing the work. We

18:46

use Horobat for doing distributed data

18:49

parallel training. And then

18:50

we also have a serving layer

18:52

as well that's built on top of that.

18:54

And then

18:55

we also have a separate control

18:57

plane that provides a serverless

18:59

abstraction layer on top of this data

19:01

plane so that from the user

19:04

perspective, they don't need to be as concerned

19:06

about provisioning rate

19:08

clusters that run and, like, how to

19:10

right size it for the workload and

19:12

whether I wanna use this GPU

19:14

or that GPU. So that's a

19:16

big aspect of what we provide on

19:18

the infrastructure side is this kind

19:20

of intelligent provisioning life

19:22

cycle management of the compute

19:24

resources and making sure that these

19:26

long running training and prediction

19:28

workloads can be processed end to

19:30

end in an efficient way. And then,

19:32

of course, there's a whole another serving stack

19:34

as well that we're building out that's built

19:36

on top of NVIDIA

19:37

tried and and we'll have a lot, hopefully, on

19:39

the to say on in terms of our work

19:41

there, kind of some blog posts coming out of the

19:44

future. But that's

19:45

something that we're also looking to push into the open

19:47

source to some extent as well as some

19:49

of the

19:49

serving capabilities for Ludwig

19:52

that we're bringing to

19:53

the enterprise.

19:54

as you started to go down the

19:57

path of starting to build out this

19:59

platform and explore the

20:01

capabilities that you wanted to offer, I'm

20:03

wondering how the initial design

20:05

and ideas and vision around where

20:07

you wanted to end up have

20:10

shifted and evolved and some of

20:12

the directions that you have

20:14

moved in order to be able to

20:16

accommodate some of the early feedback that you've gotten as

20:18

you work with design partners and

20:20

to some of the overall evolution of

20:22

the platform as you started to dig

20:24

deeper into this space? That's

20:26

a great

20:26

question. So definitely, we had a certain

20:28

set of assumptions. coming in about

20:30

what the market was looking for and kind of

20:33

what level the user wanted to think about the

20:35

problem. Right? Whether this was a

20:37

production problem first and foremost for

20:39

them. a

20:39

research problem somewhere in

20:42

between. Right? And so

20:43

we definitely had a very early focus

20:45

on thinking a lot about the analyst

20:48

use case. and

20:48

how, you know, there are people who have data

20:50

but don't

20:51

have a background in ML who want to

20:53

be, you know, up leveled with that

20:56

capability. And so we thought a lot you

20:58

know, making it kind of operations

21:00

and production oriented to begin with. What

21:02

we found in kind of the early working with early

21:04

customers is that there's still a lot

21:06

of interest in, you know, the model development

21:09

aspect. And, you know, you're never going to at

21:11

least, you know, how

21:12

AI machine learning works today,

21:14

get that perfect model every

21:16

time, like, without any kind of

21:18

manual intervention or kind of domain expertise.

21:21

And

21:21

so we definitely, from a very early

21:23

on working with customers realize that

21:25

having a teaching

21:26

element of the platform was very

21:29

important as well explaining,

21:31

you know, here

21:31

are the options you have available to you

21:34

in this declare to specification. Here's how

21:36

you

21:36

should think about using one option

21:38

versus the other and what the appropriate ranges

21:41

are. how you should go about doing model development in

21:43

terms of starting with, you know, a really

21:45

complex

21:45

model or baseline,

21:48

understanding

21:48

how the model

21:50

perform kind of in a post hoc way and

21:52

saying, you

21:52

know, these were the features that

21:55

contributed the most to the model's

21:57

predictions or these were deals that

21:58

maybe, you know, were

21:59

imbalanced in some way, and so there are

22:02

other corrections we need to make. So

22:04

definitely

22:04

that aspect of iteration

22:07

and instilling kind of machine learning

22:09

best practices is something

22:11

that has been a

22:13

learned experience so, you know, one of the

22:15

most recent additions we made to the

22:17

platform just before coming

22:18

out of stealth was investing really

22:20

heavily in a Python SDK

22:24

similar to what the Loom and Python SDK

22:26

does, but provides it with some more

22:28

enterprise features that

22:30

really make it well integrated into

22:32

a data science stack where you're able

22:35

to experiment

22:36

with data, experiment with models,

22:38

iterate, as opposed to just going straight

22:40

forward the production. bottle

22:42

from day one. Right? So that was definitely

22:44

something we learned earlier on

22:46

in the process of working with customers.

22:48

And it's

22:48

also interesting to think about which

22:52

areas of the overall machine

22:54

learning problem and life

22:56

cycle you're looking to be

22:58

able to facilitate because

23:00

there are, you know, boundless

23:03

capabilities where, you know, there's the

23:05

experiment tracking and model

23:07

tracking. There's the model

23:09

monitoring to be able to understand, you know,

23:11

what the concept drift is happening once it's

23:13

in production. You know, there's the

23:15

kind of pipelineing So

23:17

there are definitely a number of different

23:20

areas that you could try to focus on, a number of

23:22

different directions you could try to push

23:24

into. I'm curious given the fact that you were

23:26

starting with Ludwig as

23:28

the kind of core building

23:30

block, how that helped to

23:32

shape your overall consideration

23:34

of the appropriate scope for what you're

23:36

trying to build and how you were thinking about

23:39

what are the maybe boundaries

23:41

and interfaces that you

23:43

want to incorporate to be able

23:45

to let fit

23:47

into the overall workflow

23:49

and life cycle of machine learning

23:51

in an organization. well-being able

23:54

to be very opinionated and

23:56

drive the conversation around the

23:58

areas that you wanted to own. It's

24:00

a really good question, I think, because this,

24:02

I think,

24:02

is at the core of the problem of

24:05

startups in the space trying to define

24:07

their category. Right? because

24:09

I think when you look at the space in

24:11

general, what you see is that there are

24:13

a lot of really good tools that are

24:15

what you might call point solutions that

24:17

are solving one aspect of the problem,

24:19

whether it's

24:20

explainability, model training,

24:22

model serving, what have you. Right?

24:24

But,

24:24

really, what organizations need is they

24:26

they do need something that is end to

24:29

end. Right? That is actually a platform

24:31

that is fundamentally delivering

24:33

business value, not just, you know,

24:35

model training or something like that. So the

24:37

way that we think about it broadly

24:40

is that we want

24:42

to be able to provide a

24:44

story that goes from data to

24:46

deployment for the users. So

24:48

connecting data from your

24:50

data warehouse or database,

24:51

providing best in class model

24:54

training at scale with the

24:56

serverless infrastructure and

24:57

then providing a really clean and simple

25:00

path to deployment that can be

25:02

either a REST API for

25:04

low latency real time prediction. or

25:07

the people SQL life language

25:09

that provides batch prediction capabilities

25:11

to the user. And starting

25:13

with that core vision of, like, this

25:15

is the journey for the user. There are, as you

25:18

said, a lot of other aspects to it as

25:20

well, like, you know, model explainability, data

25:23

preparation, and data quality and data

25:25

versioning, model monitoring, model

25:27

drift detection. And the way we're thinking

25:29

about these things today is pretty

25:31

similar to we think about them actually on the

25:33

open source side with Lululemon where we

25:35

want to try to be as

25:37

integrated with the community as possible.

25:39

Right? So On Lude Lake, we integrations

25:41

with common ML, weights and

25:43

biases, while logs, ML

25:46

flow, and stack that we're

25:47

working on. And

25:48

so these tools, you know, provide,

25:51

like, different capabilities of different

25:53

parts of the process. Right? Like, experiment

25:55

tracking or

25:56

model monitoring.

25:58

But the way we wanna think about it

25:59

is if you already have a

26:02

tool that you like these

26:04

problems. Like, we don't want to have to say, private

26:06

based is a rip

26:07

and replace solution for you. Right?

26:09

We wanna be well integrated. So

26:11

if you

26:11

wanna use weights and biases or

26:14

comment, you know, you

26:14

just give us an API key and we'll log

26:16

things there, and then have a nice way to

26:18

link back and forth between the two. or

26:20

if you're using y logs slash y

26:23

labs for doing model

26:25

monitoring, you know, we're thinking

26:26

about ways that we can integrate there

26:28

to do automated model retraining based

26:31

on triggers that come from

26:33

from

26:33

MyLab. So that's the way we're thinking

26:35

about the problem today is let's integrate as

26:37

much possible in the parts of the platform

26:39

where we don't feel we're providing

26:41

strong differentiated value or that we

26:43

could provide, like, a best in class

26:45

value proposition on.

26:46

while still telling the user, like,

26:49

hey. If you are starting from scratch,

26:51

right, and you don't have an ML

26:53

platform today, Predibase isn't a point

26:55

solution. It's something that we'll give you end to

26:57

end from the

26:58

data to a deployed model that can

27:00

start delivering value. And then you can layer on

27:02

more

27:02

tooling on top of that, you know, as we

27:05

see folks. And

27:06

so in terms of that workflow,

27:08

in the case where somebody is green fielding.

27:10

They say, I want to adopt AML. This

27:12

is my first foray into that. I'm going

27:14

to use predease to be able to experiment with

27:17

how can I take this data that I have and turn it into

27:19

something useful that I can do with it. Just wondering

27:21

if you can just talk through that kind of

27:23

end to end workflow of starting with

27:25

the data and ending with I have

27:27

a model running in production and I'm doing something

27:29

with it. In the tool,

27:31

there are different ways that users can do it.

27:33

We do have a web UI that people

27:35

can use to all

27:37

the actions. Everything that you

27:38

can do in the platform can also be done

27:40

through our SQL like language people

27:43

as well as the Python SDK.

27:45

So we

27:45

have many different views depending on the

27:48

persona that's using the platform that do the

27:50

same thing. But

27:50

regardless of which entry point you choose

27:53

to use, the steps are largely the

27:55

same as

27:55

you first start with the data.

27:58

So if you have your data in

27:59

Snowflake or s three or big query

28:02

or whatever, you

28:02

just give us some credentials, point

28:05

us to what table or what bucket

28:07

you're interested in working with,

28:08

and then we can start with any data

28:11

that is structured in some kind of

28:13

table like form. Right?

28:15

So that can be an actual database

28:17

table. That

28:17

can also be a part k

28:20

file, CSV, anything like that. And then,

28:22

you know, maybe a question that comes after

28:24

that is, what if I wanna use unstructured

28:26

data like images or audio?

28:29

So

28:29

the way we think about that today is that

28:31

give us the URLs to

28:33

those images or those audio files as

28:35

columns in your tabular data, and

28:37

then we can pick those up join all that

28:39

together until, like, a single flat

28:42

tabular view for training. Right? Once you've

28:44

pointed us to the data that you wanna

28:46

work with, we automatically do

28:48

all the metadata extraction and

28:50

schema extraction from the data for you.

28:52

So we know, you know, what

28:53

data types the data is and

28:56

in that sort of thing. And then you can

28:57

start creating models. So you, you know,

28:59

go into the model builder UI or

29:01

use the SDK to build a model in

29:03

a way that very similar to how you do it at Ludwig. And

29:06

all you need to specify

29:06

to get started is just the target or

29:09

targets since we support multi

29:11

task learning. that you

29:13

wanna predict.

29:13

From there, you can customize any aspects

29:16

of the training through either

29:17

layering on full

29:19

kind of like hyper parameter optimization.

29:22

Automell suggested, like,

29:24

configurations for everything. Forestar

29:26

was

29:26

just a very simple baseline. Either

29:29

way, you can go, like, any level of

29:31

the extremes. Right? Or any level of customization

29:34

customization between, and

29:34

then start training a model. Once you

29:37

start training a model, it will be sent to one

29:39

of the what we call engines. So that's one

29:41

of our serverless clusters that

29:43

does the computation for you. That

29:45

lives, you know, wherever your data happens to

29:47

live and the same kind of

29:49

region. Right? Model will get trained.

29:51

And from there, you can start using

29:53

people or the Python SDK to

29:55

validates. We also provide a full set of visualizations for

29:57

the user to explore in terms

29:59

of,

29:59

you know,

30:00

understanding the explainability of, like, feature

30:03

importance. We also have

30:05

calibration plots, all sorts of

30:06

other things like confusion matrices,

30:09

etcetera, that you can dig into. And

30:11

From there, you can either iterate on the model,

30:13

continue to develop in kind of

30:15

a incremental way that with a fully kind

30:17

of versions and lineage process,

30:20

And

30:20

then once you're happy with the model, there's kind of a one click

30:22

deployment that we have where you

30:24

can deploy it to a rest end point and

30:26

then start curling it with, you

30:28

know, JSON objects as you

30:30

see fit. And then if

30:31

you'd like to, you know, retire the model

30:33

or replace it with a new model version, it's

30:35

a similar one click kind of deployment process

30:38

as well. And

30:39

then there's, of course, ways that you can automate this as well

30:41

to do retraining

30:43

as well as do validation

30:45

to determine when you want

30:47

to trigger redeployment.

30:49

Right? So if you say, I have a

30:51

held out test data set, I only want to redeploy

30:53

when the, you know, new model does

30:55

better than the old model on that data set,

30:57

you know, that's

30:58

something that you can configure with

31:00

the platform as well. And

31:01

at a high level, that's the journey that

31:03

we see as being the core flow that

31:05

the wants to go through. So connect data, train model,

31:08

deploy. And then there's a lot in

31:09

between, of course, that kind of fills in the gaps,

31:11

but that's fundamentally what the

31:14

platform provides. And you

31:15

mentioned the QQL dialect a

31:17

couple of times, and I noticed

31:19

that when I was going through some of

31:21

the blog posts and some

31:23

of the early material that you have about what you're building at

31:26

So I'm wondering if you can speak

31:28

to some of the motivation

31:30

behind creating this new dialect

31:33

and this new, I guess, language you could call it, and

31:35

some of the ways that you think about the

31:37

semantic and syntactic design

31:40

of it. Yeah. So

31:40

with people, we think it's a

31:42

very natural way to extend the

31:45

declarative idea because I think it plays

31:47

very nicely into the

31:49

way that think about ML systems today compared

31:52

to databases a few decades

31:54

back, right, where you

31:55

had in the level languages,

31:57

like cobalt that people would be writing,

31:59

would

31:59

be interacting with databases and and

32:02

then SQL comes along and provides this

32:04

very nice declarative way

32:06

of expressing all sorts of complex

32:08

data analysis that you might wanna do.

32:11

And we

32:11

see people as being the natural extension

32:13

of that idea to the ML domain.

32:16

where since you already have this declared

32:19

specification that provides a very tight

32:21

semantic link between

32:23

your data fields, the fields of your

32:25

data set, and

32:26

the fields that are the inputs and outputs of your model

32:28

and then everything that happens in between.

32:30

People

32:30

provides a very natural way

32:33

to express you

32:35

know, the model prediction request that

32:37

you might wanna do. So what I think is

32:39

very powerful about people is that you

32:41

can do something as complex as

32:43

a batch prediction over, you know,

32:45

a ten terabyte data set

32:48

using

32:48

some model that you that wanna write out

32:50

to a downstream table Normally,

32:52

you would end up writing like an ETL job

32:54

in Spark to do something like this, but

32:56

that's just a one line people query, which

32:58

would be predict

33:00

target, given, select

33:02

star from data or whatever. Right? And you

33:04

can, of course, then do all

33:06

sorts of more complex things from there

33:08

in terms of joining

33:09

tables across different data sets, filtering them,

33:12

doing sliced analysis, you have ways of

33:14

doing what we call hypothetical

33:16

queries

33:16

that are kinda similar to

33:18

what you might do for real time prediction where

33:20

you want to

33:21

take like an entirely new data point

33:23

and then express it as a query that then can

33:25

be predicted on. And so I think,

33:27

you know, certainly one powerful use case

33:29

of people is this idea of a

33:32

more efficient way of doing batch prediction that

33:34

fits in nice into other tools

33:36

that do ELT like

33:38

DVT is a really good example there

33:40

where we already have a DVT integration.

33:42

That way, we've written that some of our users are

33:44

using. And so if you want to

33:45

be able to express your

33:48

prediction pipeline as

33:50

sequel, essentially, like people provides a

33:52

very natural way to do that. But

33:54

we

33:54

also think that people is a very

33:56

powerful enabler of putting

33:59

ML

33:59

inference and, like, letting people interact with

34:02

and understand the model. stakeholders

34:04

who might not today

34:05

ever really interact with an eval

34:07

model. Right? So anyone now who

34:10

understands SQL can start

34:12

making predictions start to play around

34:14

with understanding, like, what do I have to change in

34:15

this input to make the model predict something else? Really? It's

34:17

a very fun kind of just interactive process

34:19

that users can go

34:22

through. And

34:22

these sorts of people careers are a very nice sharing

34:25

point as well. So the data scientists has a model

34:27

that they've trained and the

34:29

analyst wants to play

34:31

around

34:31

with it or wants to see the result of

34:34

some prediction on some slice of

34:36

data, you just need

34:36

to share that people career with them

34:39

and that say, hey, go run this and, you know, let

34:41

me know how it goes instead of having

34:43

to ship whole notebooks and

34:45

Python files and whatever

34:47

So

34:48

that's kind of where we see the value of people as batch prediction,

34:50

as well as for kind of

34:52

pipelineing and doing, you know, BLT type

34:56

workloads. as

34:56

well as this kind of shareability of making all inference

34:58

more accessible to the broader organization.

35:01

Noting the

35:02

PQL acronym My

35:06

initial thought when I first read that in the blog post was, oh, obviously,

35:09

the p stands for prednis, but

35:11

it stands for predictive. And

35:14

so I'm wondering if you can talk to what your overall vision

35:16

is for this syntax and

35:18

if you intend for it to

35:21

be something that is maybe

35:24

adopted outside of as

35:26

kind of a general standard for this

35:28

means of interacting with machine

35:30

learning and just some of the all vision

35:33

there. So the interesting

35:33

thing there is that the name people

35:35

actually predated the name credit

35:37

base. So we had the

35:39

name people in mind before when we came up with

35:42

But to

35:42

your point, I definitely do

35:45

see people being something larger

35:47

than in a lot of

35:49

ways. Like, we want to see

35:51

more folks in the industry adopted.

35:53

And so the vision for people is

35:55

that Now you do

35:55

have a lot of the AI tools today

35:58

that have very tight integration with

35:59

SQL, and we'd like to

36:01

be able to see you

36:03

know, very nice integration with people in a lot

36:05

of these tools in the future as well, you

36:07

know, thinking about how we

36:09

can make it standard that the larger community

36:12

embraces, I think there is a lot of value

36:13

to that. And so we do work

36:15

very closely with some

36:17

companies in the BI space

36:19

through a Linux Foundation, actually, where

36:22

Ludwig and Horribe, the projects that we

36:24

maintain are hosted. And

36:25

so we do have a collaboration there with

36:27

the AI plus BI committee that

36:29

is working on exactly

36:31

this problem of integrating,

36:32

you know, machine learning prediction into

36:34

BI systems. And

36:36

that's where I think things can

36:38

go. the

36:40

standard ends up becoming well adopted in the

36:42

future. And another interesting

36:44

element of the overall

36:46

ML space is the question of

36:49

collaboration. You mentioned PQL allows you

36:51

to say, I've got this model. I wanna pass

36:53

it off to this analyst to be able to play with

36:55

it and experiment with its

36:58

capabilities. Maybe provide some feedback on ways that I should tweak it

37:00

to, you know, make it more powerful

37:02

for a certain use case. And I'm just

37:04

curious how you think about that

37:06

collaboration aspect

37:08

of Predibase you've designed the platform to

37:10

be able to be kind

37:12

of idiomatic and recognizable for

37:15

different roles and stakeholders across

37:18

the organization who are interacting with

37:20

the different capabilities of the model and

37:22

the overall workflow? So

37:23

I do think that collaboration is

37:26

very core to what we're doing because we

37:28

see this as being a tool, not just

37:30

for an

37:30

individual data scientist or engineer,

37:32

but a tool for an organization. Right?

37:35

And so

37:36

we do have different metaphors

37:38

that we think about that relate

37:40

to different stakeholders that you can

37:42

see kind of visions of each of them in

37:44

the platform. So for folks

37:46

who are more on the analytics side,

37:48

we do

37:48

have a query editor built into

37:50

the UI that lets you just start writing

37:53

people or even ordinary SQL

37:55

queries, the

37:55

parser is expressive enough to kind of

37:58

support both in the editor

37:59

and kind of

37:59

playing around with things as you would if

38:02

you're using superset or some other,

38:04

like, the i slash analytics tool.

38:06

But we also then for the kind of data

38:08

scientists and the

38:10

engineer personas, we

38:10

have a lot of tools that kind of adhere to

38:12

more of like a Github style workflow where, you know,

38:15

folks will be able to

38:17

kind incrementally update models in a way that is,

38:20

you know, versioned and so you can kind of

38:22

dip between different models

38:24

and then have this ability to do

38:26

experiments in separate branches. And then once you're happy with how the experiment

38:28

is doing, saying, oh, this experiment is

38:30

now doing better than what is currently in

38:34

production, let's,

38:34

you know, merge this back into the main similar to how

38:37

you would and get. And then, you know,

38:39

that kind of becomes a concept very similar to

38:41

a pull request where people

38:44

comment and kind of say, hey, I don't agree with this particular parameter

38:46

choice. Can we maybe revisit this?

38:48

So there are different

38:50

ways that you make

38:52

it more approachable to people that we've thought about doing it to make it more

38:54

approachable to people by having those call

38:57

outs

38:57

that harken back to things that they are

38:59

already familiar with. but

39:01

at

39:01

the same time giving them something that's net new. Right? Like,

39:04

I think the problem was just, you know, get

39:06

using ML today is that to

39:08

get doesn't provide a story

39:10

about the non source code artifacts. Right? So you need to

39:12

use external tools for that. And

39:14

things are not super tightly integrated,

39:15

and that's where something like

39:17

front of base slot in to

39:19

fill that gap for that

39:21

particular persona. Right? So it's

39:23

all about large part providing the

39:25

right metaphors for the right person that's, excuse

39:27

me, which will Digging

39:29

into the Ludwig aspect of what you're building, as you

39:31

mentioned, it's an open source project, it

39:33

predates the business, you have

39:35

used it as sort

39:37

of the core building block of what you're

39:40

providing. I'm curious if you can talk to

39:42

some of the ways that you're thinking

39:44

about the governance of the open

39:46

source project. and how you identify

39:48

which pieces of the engineering that you're doing

39:50

on and around Ludwig are part

39:54

of the business and which parts belong with the open source

39:56

project. And along with that,

39:58

some of the ways that your work on pred debase

39:59

has fed back into the

40:02

Ludwig project. From

40:03

the governance standpoint, we have

40:05

been making a concerted effort to

40:07

get more folks involved and, you know,

40:10

we hold regular monthly meetings with

40:12

the community talk

40:13

about the roadmap, gets buy in from

40:15

different people about what features

40:17

are important. So right now, one thing that

40:19

we've been working on on the open source

40:21

side because a lot

40:21

of other companies have been interested is

40:24

working on a model hub that

40:26

provides, you know, some ability to

40:28

share different trained blue loop models

40:30

and configurations.

40:31

So that's something that's

40:31

definitely been a community driven effort to

40:34

date. And then I would say that

40:36

in terms of how we see the relationship

40:38

between what's front of base what's

40:40

loop like, we

40:41

do have a very substantial part of

40:43

the engineering team that works almost

40:45

exclusively on the open source. And so it's

40:47

very important to us that not

40:50

just take Ludwig as something that

40:52

we consume downstream, but that we

40:54

also actively, you know,

40:56

are investing back into Ludwig and

40:58

making it better that that will make

41:00

front of us better by default.

41:02

Right? But a really good example there is

41:04

on the work we've done on

41:06

scalability recently. we have some customers that we've worked with are

41:08

training on larger datasets,

41:10

terabyte plus. And so we've

41:12

had to

41:13

think a lot about you

41:16

know, what are the bottlenecks and literally, you know, literally is a

41:18

very complex system in a lot of ways. We

41:20

deal with every type of modality of data

41:22

at the same time potentially. Right?

41:25

and need to have an efficient way to pipeline

41:28

it all in terms of both the data

41:30

processing and the model training and

41:32

the prediction. And

41:32

so we've invested quite a lot in building that out

41:35

specifically to improve credit base. But

41:37

the nice thing is all of

41:39

those features that become ultimately

41:42

part of Loomy because that's where

41:44

the

41:44

core of those capabilities live. I'd

41:46

say

41:46

two other big features that are

41:49

coming to Ludwig, being driven by requirements on the Predibase

41:51

side, have been. One would be the

41:53

improved

41:53

AutoML capabilities that we've been

41:56

investing in tools. So this would be

41:58

kind of suggesting

41:59

configurations and suggesting hyperparameter search

42:02

ranges based on

42:03

the data, based on

42:06

past trials. trainings and things like that. And then the other

42:08

is on the serving side.

42:09

One thing we definitely found on credit

42:10

basis is that there's a

42:13

very strong need to

42:15

make sure that the serving environment

42:17

is isolated and doesn't

42:20

have tons of

42:22

external dependencies that

42:22

blow up the deployments, I mean, now adds to

42:24

your overhead. Since moving from TensorFlow to PyTorch

42:26

last year, we've invested quite

42:30

a lot and

42:30

building out a transcript layer for doing serving, which

42:33

allows us to strip out all

42:35

of the

42:35

Python dependencies on Ludwig

42:37

at serving time.

42:40

and

42:40

provide a very low latency end to end, serveable

42:42

that does not only model inference,

42:44

but also does the preprocessing, so

42:47

the data transformation. as

42:49

well as

42:49

the post processing. And this is

42:51

something that, you know, it was very important what we're doing

42:54

in front of base, but we've made all of that open

42:56

source as part of labeling as well. So now

42:58

the community to take advantage of

43:00

it as well.

43:01

In terms of the

43:03

early applications of the Predibase

43:05

platform that you're building and how you've

43:07

been working with with some of your early design partners. I'm

43:09

wondering what are some of most interesting or innovative or

43:12

unexpected ways that you've seen the

43:14

platform used? Yeah. There have

43:15

definitely been some ones that

43:18

surprised us. So we

43:20

definitely

43:20

expected that Tabula

43:22

was

43:22

going to be a very important use case.

43:24

for folks. And so we invested a lot in making sure that we had state

43:27

the art architectures and capabilities on

43:29

tablet data. And so that

43:31

turned out to be true.

43:33

But we've also found that there are quite a lot

43:35

of really interesting unstructured

43:37

data sets that people have been

43:39

working with as well where

43:40

they're trying to predict, you know, anomalies and, like,

43:42

image data, very large image data sets

43:45

are doing kind of a

43:46

really interesting mixed modality training

43:50

with like text and tabular. We've also found that

43:52

there

43:52

are a lot of situations where users

43:55

wanna do machine learning training without a

43:57

lot of labeled data. And

43:59

that's I think

43:59

particularly interesting one because

44:02

it's been leading

44:02

us to invest a lot more heavily

44:04

in building out self supervised learning

44:07

capabilities into labeling. And

44:09

so, you know, one thing that we're working

44:11

on actively right now is

44:13

building out a really sleek pretraining

44:15

API for Lululeaks so

44:17

that you can without

44:18

needing to specify, you know, a target column or anything like that,

44:20

do some initial training to learn a good

44:22

representation of the data that

44:24

you can

44:25

then apply downstream to a

44:27

lot of different task. And so that's one that has definitely

44:29

been informed by what we've been

44:31

seeing from

44:32

customers as being a a very critical

44:35

need for them. that's

44:36

now informing a lot of products right now. In your own

44:38

experience

44:38

of going from working

44:41

at Uber and helping

44:43

to solve the problems that they have for machine

44:46

learning and producing these

44:48

useful open source projects

44:50

that have been available to

44:52

the community. and then turning that into building a business around

44:54

those capabilities and from the lessons

44:56

that you learned in Uber. I'm

44:58

wondering if you can just talk to some of

45:00

the most interesting or

45:02

unexpected or challenging lessons that you've learned in

45:04

the process of building base?

45:06

Yeah.

45:06

So I would say that

45:09

there

45:09

have been some really interesting problems

45:12

that mirror a lot of problems

45:14

that we encountered at

45:16

Uber. So I

45:16

think that when you look back at my time Uber, the

45:18

story there was

45:19

that I was very keen

45:22

on unification

45:24

of infrastructure. And

45:26

so one of the things that I was really heavily flushing

45:28

towards the end of my time there was on

45:30

moving away from

45:31

a a spark and

45:34

then plus random bespoke,

45:36

like training architecture built on top of

45:38

Oravod and some other things towards

45:40

using Ray as a unified infrastructure

45:42

layer And

45:43

so that very heavily informs the

45:46

direction that we took with credit

45:48

base in terms of building out

45:50

our training

45:52

system of being capable of being this single

45:54

compute cluster that is

45:56

capable

45:56

of doing the preprocessing,

45:59

the training, batch

46:00

prediction, kind of the whole thing end to end. That's worked out really well.

46:02

And then, you know, when we were starting to build

46:04

credit base, then we had to take

46:07

this kind of data

46:08

plane that came from all these years of working at Uber and, like, all the

46:10

lessons that we learned along the way to think about,

46:13

okay, now how we're gonna

46:14

make this into

46:16

a

46:17

truly serverless enterprise experience. Right? And

46:19

so we did a lot

46:20

in terms of the early days of,

46:22

like, building out the control plane layer

46:26

I

46:26

think there were quite a lot of lessons

46:28

we learned along the way about how you should think about coupling

46:30

in these sorts of big

46:33

complex distributed systems. where,

46:36

you know, we interface boundaries between

46:39

the control plane or the data plane

46:41

that were not particularly

46:43

well defined,

46:44

you know, there was a lot of tight coupling. So sometimes failures

46:47

would occur and certain things

46:49

that should not

46:49

have failed would fail

46:51

because there was much

46:53

coupling in there. And what we've

46:55

done over time is rearchitect

46:57

the platform to be much better

46:59

isolated so that we use more

47:02

kind of event driven architecture,

47:04

so more message brokers and things like that that

47:06

kind of makes things very clearly

47:08

separated. And

47:08

that's been a very big learning,

47:10

building an enterprise platform is, you know, how important

47:12

it is to really define the service

47:14

boundaries well between the different

47:16

points in the system. And overall, you

47:19

know, we found that reliability, robustness,

47:21

stability. These have been like

47:23

concerns that when you start building

47:25

the company, you don't initially think, oh, yeah. These are

47:27

gonna be the top things that I'm gonna put on the road map. Right? But

47:29

now that's definitely, like,

47:30

top of mind for us at all times is

47:33

how do we the

47:34

platform in a way where we

47:36

account for as

47:36

many things going wrong as possible and have

47:38

a story around making sure that at the

47:40

end of the day,

47:41

the user gets a very

47:43

clean and a very responsive experience,

47:46

right, that doesn't fail in some weird

47:48

unexpected way. Because of the

47:49

fact that

47:50

you are running a large and

47:53

scalable and multi cloud

47:55

system with a lot of distributed

47:57

systems going on. I'm curious how

47:59

you have approached the kind of

48:02

that as you iterate on the product, you're

48:04

able to very quickly get feedback

48:06

as to

48:08

whether a change has caused a regression in terms of your,

48:11

you know, ability to quickly recover

48:13

or being able to

48:16

identify potential issues with fault tolerance and just how you're

48:18

able to think about managing

48:20

forward progress and iterative development

48:22

on the platform well and ensuring

48:25

that you maintain those principles of stability and

48:28

scalability and fall tolerance?

48:30

Yeah. That has

48:30

been, I'd say, one of the more

48:33

difficult challenges to solve. I had to

48:35

say that we're still figuring out the right

48:37

way to think about some of these

48:39

things. But we've definitely invested quite

48:41

a lot in both ensuring, like, the

48:43

benchmarking of the BlueJeans sign. And so there's

48:46

an active project from one of our

48:48

employees working on building

48:49

out an entire benchmarking

48:51

pipeline for Ludwig so that every

48:54

time a change happens, we can,

48:56

you

48:56

know, validate it against different workloads

48:58

and make sure that model

49:00

performance is good. GPU utilization is good.

49:02

Memory utilization is good. That's sort of all

49:04

the kind of metrics that we care about.

49:07

for the workload are there at that level so

49:09

that we know that, okay,

49:11

it's not a change in load weight

49:13

that is causing failure to spike or something like

49:15

that after this change is made. So

49:17

that's I think the first aspect we have to

49:19

get right is, like, making sure that the open

49:22

source is very stable

49:24

and meets the requirements that we

49:26

have said. And

49:26

then from there, we have quite a lot that we've done

49:28

on the platform side in terms of building out continuous

49:32

integration and different tiers

49:34

of deployments for

49:36

the whole system to make sure that it's all well tested before

49:38

we

49:38

do a release. So we do

49:40

have

49:41

a regular release cadence that we have

49:43

set up with our customers. every

49:45

change we make goes into a live

49:48

staging environment that we test out

49:50

internally, goes through a full battery of

49:52

integration tests that

49:54

actually run cubanese cluster on, you know, live

49:56

compute resources and make sure that all

49:58

the different models that we, you know,

49:59

regularly test out are working

50:02

correctly and

50:04

not failing in any unexpected ways. And then we've also

50:06

invested a lot

50:06

on the observability side as well

50:09

in terms of making

50:10

sure

50:11

that we know Okay.

50:12

So if this workload used to take a minute to run and now

50:14

it takes five minutes, you know, what's the

50:16

part of the

50:17

system that's suddenly taking longer? Like,

50:19

what's the part in system

50:21

that sounded taking more memory. Right? Be able to see what

50:23

that trend line looks like and what

50:25

the inflection point

50:28

was. Right? And

50:29

so that's been a big area of focus for us lately

50:31

because it's just very important for

50:33

us

50:33

to ensure that as

50:36

we get more and more people contributing to code and more

50:38

and more moving parts that we

50:40

identify as

50:41

quickly as possible, like

50:43

when something changes, and then can go

50:45

back and and address it.

50:48

Right? So having every single

50:50

connect go through full CI process has

50:52

been very critical to

50:54

that, and I

50:54

think pretty good policies in place where, you know, we make sure

50:56

that

50:56

we don't commit anything to the main

50:58

line if, you know, the tests aren't in

51:01

a good stage and that's we

51:03

always make sure that we prioritize

51:05

stability and bug fixes above

51:07

new feature development. So all of those

51:09

best practices, I think, are very key to

51:11

getting it right. it's

51:12

still something we're learning as we go. And so

51:14

for

51:14

individuals or organizations that are

51:17

looking to be able to

51:20

accelerate the rate at which they're able to experiment with

51:22

and adopt machine learning to

51:24

address some of the organizational

51:26

and product problems that they're

51:29

trying to all four. What are the cases where is the

51:31

wrong choice? I mean, that's, I think,

51:33

a very valid question. And I

51:35

think there are definitely

51:37

times

51:37

when it might not be the

51:39

right choice for your organization. When

51:41

we

51:41

think about, like, where

51:43

the market segments you know, you can

51:45

kind of think of it for quadrants, I guess. Right? You on the one hand to maybe two

51:47

axes to make a sense. I was like, on the one

51:49

hand, you have organizations

51:51

that have low data

51:54

versus

51:54

organizations that have high data. And then on the other

51:56

axis, you have organizations that

51:59

have

51:59

high

51:59

ML experience and low ML

52:02

experience. Right?

52:04

And

52:04

so definitely, you know, the bread and butter customer for us would

52:06

be a company that's very high in terms

52:09

of, like, data volume, and quantity,

52:12

but not as high in terms of, you know, having a big sophisticated

52:14

ML team. You can certainly have an

52:16

ML team, but, you know, not one that

52:18

wouldn't want to necessarily say that, like,

52:22

Google research should be a target customer in the works.

52:24

Right?

52:24

And then on the

52:24

flip side, you have organizations that maybe

52:27

don't have a lot of

52:30

data at all. And certainly, I think there are companies

52:32

out there that are

52:32

trying to think about ways that they can bring in

52:35

alternate companies that don't have but,

52:38

you know, for specialized use cases where it's like using pre

52:40

trained models and things like that. But

52:42

that's not

52:42

what we're currently looking at.

52:45

We definitely still are thinking, you know, companies that have a lot of

52:47

data and don't quite know how to get enough value out of

52:50

it. That's very core to

52:51

to what we do

52:54

well. Right? And

52:55

I would also say that it's very important for a customer,

52:57

a credit base to have some variety of of use

52:59

cases that they wanna solve.

53:02

it's

53:02

definitely not a prerequisite, but I would say that when you look at

53:04

the market, there are companies that only

53:07

do fraud detection or only do

53:09

computer vision or something like

53:12

that. And,

53:12

you know, I won't necessarily wanna say that database is

53:15

gonna be all of them all the time

53:17

on every task. Right? So, like, what

53:19

I would say is that we

53:21

provide a very good solution

53:23

for time to value relative

53:26

to these other platforms. Right? If

53:28

you have a

53:28

good variety of different things you wanna do in the space. So certainly,

53:30

I think if you wanna do,

53:32

you know, computer vision

53:36

and NLP, from, like, a purely

53:38

cost benefit standpoint, I think that

53:40

we have a much stronger value proposition

53:42

there than if you were to try to do

53:43

point solutions for all of these different things.

53:46

Right? So that's

53:47

the other aspect that's maybe less of a hard requirement, but still I

53:49

think an important differentiator. As

53:51

you continue to iterate on

53:52

the product and now that

53:56

you have gone out of stealth

53:58

and you're starting to accept new customers onto the platform? What are some

54:00

of the things you have planned for the near to

54:02

medium term or

54:04

any particular areas

54:06

of focus or new features that you're excited to dig into?

54:08

I'm certainly very

54:09

excited about having a full SaaS

54:11

version of the product that people

54:13

can try out Right

54:15

now, we're in a closed beta. So,

54:17

you

54:17

know, we are certainly really

54:20

excited when people come to us and say they wanna

54:22

try it out, and we'll set up some time

54:24

to do a pilot with them. But I'm, you know, very

54:26

excited

54:26

about the possibility of having a website people

54:28

should just log in to start using it,

54:30

you know, without any commitment. Right?

54:33

And so something

54:34

that we're definitely working on right now and

54:36

and thinking about how we can put that in people's

54:38

hands. From a product

54:39

standpoint, there's also a

54:42

lot that for things about now. So I mentioned the self supervised learning

54:44

work before. There's also some work

54:46

that we're

54:46

doing to the open source community as well

54:50

around better support for custom components and kind of

54:52

user defined functions, if you will.

54:54

So, you know, with Ludwig and

54:56

Creditbase, there's quite a lot of

54:58

flexibility and terms of, like, your

55:00

degree to specify, like, every

55:02

parameter of the model. But if you

55:04

wanna add new model architectures,

55:06

it's possible today, but we wanna make that experience even easier for folks

55:08

so that just a very lightweight interface

55:10

you implement and then you can register

55:12

that component as

55:14

just another option in your config within database that other

55:17

people in the organization can use.

55:19

And then also

55:20

this concept of

55:22

the model hub slash model registry. I think is one

55:24

that I'm

55:24

very excited that will be provide benefits for

55:27

both the open source

55:29

users as well as the commercial

55:31

users where you can

55:32

do things like define canonical components

55:35

that you

55:35

wanna use in your organization. So if

55:38

there's, like, a

55:39

feature that gets used all the time in

55:41

different models, if I remember at Uber. We

55:43

had some features, like, related to,

55:45

like, customers related

55:48

to, like, locals that were just used in all different types of

55:50

models. Right? So being able to

55:52

have canonical

55:52

encoders for those that are

55:54

maybe pre trained even on you

55:57

know, a very large data set. So there's very low

55:59

cost

55:59

to fine tuning them. I'm very

56:01

excited about building out that capability

56:04

as well. Well,

56:05

for anybody who wants to get in touch with

56:07

you and follow along with the work that you're doing, I'll

56:09

have you add your preferred contact information to

56:11

the show notes. And as a final question, I'd like to get

56:13

your perspective on what you see as being the biggest barrier

56:15

to adoption for machine learning

56:18

today. So definitely,

56:19

I think that there is

56:21

a very big barrier to adoption that comes from just

56:23

not having

56:24

good enough abstractions to

56:27

start getting value out of machine learning.

56:30

So I think the analogy that

56:31

I like to draw up here is

56:33

really about software and kind

56:35

of what's enabled software

56:36

to keep the world is the famous article in The Wall Street Journal once

56:38

said. Right? And it really comes down to

56:41

this idea of modularity and being able

56:43

to kind

56:43

of stand on

56:46

the shoulders giant. So instead of being re implement every, you know,

56:48

great new idea that comes

56:49

along, like you just download a library

56:51

and use that software. I

56:53

think hasn't

56:54

had this abstraction before,

56:56

right? And I think that it's been a very

56:58

big inhibitor to people actually being able

57:00

to adopt it is you know, new idea comes out

57:03

from research, but companies aren't

57:05

able

57:05

to productionize it and

57:07

actually get it

57:08

deliver value because they're too busy trying to reinvent the wheel and reinvent

57:11

the infrastructure and figuring out how to

57:13

get data from one

57:14

place to another, clean up their data.

57:18

So

57:19

definitely I think having better abstractions and better

57:21

canonical sources of data as well are the

57:23

two biggest variants of my opinion. So I think once you

57:25

get to a point

57:28

where all

57:28

the data is clean and then standard data warehouse systems and

57:30

is ready for machine learning. And

57:32

then you have very powerful abstractions like

57:35

Predibase that allow you

57:38

to take best

57:38

in class models and just run it right on this, you know,

57:40

nice clean medical data source, then

57:43

you'll have a very,

57:44

very fast path to production. And

57:48

so we

57:48

definitely think we can move the needle on the modeling

57:50

side. And

57:50

I think certainly companies like

57:53

DVT, Snowflake, others are doing

57:55

a great job on data side. And

57:57

once these two things converge, then hopefully, we'll be able to

57:59

really

57:59

start, you know, delivering

58:00

more value. But that's definitely, I think,

58:02

where companies struggle than us today.

58:06

Alright. Well,

58:06

thank you very much for taking the time today to

58:08

join me and share the work that you've been doing at

58:10

Prada based. It's definitely a very interesting

58:14

plot form and product that you're building there. So I'm excited to see

58:16

where you go from here. So thank you

58:18

again for all the time and energy that you

58:20

and your team are putting into

58:23

making it easier for organizations

58:25

to get onboarded with ML and

58:27

be able to experiment with it and gain

58:29

some of the value from its capabilities. thank

58:31

you again for that, and I hope you enjoy the rest of

58:33

your day. Awesome.

58:34

Thank you, Tobias. I really appreciate

58:36

it and as

58:38

well. Thank

58:40

you for listening. Don't forget to check out

58:42

our other shows. The Data Engineering podcast,

58:44

which covers the latest on modern data

58:46

management, and the machine learning podcast, which

58:48

helps you go from idea to production with

58:51

learning. Visit the site at

58:53

pythonpodcasts dot com to subscribe to the show, sign

58:55

up for the mailing list and read the show notes.

58:57

And if you'll learn something or try it out a project from

58:59

the show, then tell us about it. email hosts

59:02

at pythonpodcast with your

59:04

story. And to help other people

59:06

find the show, please leave a review on Apple

59:08

and tell

59:09

your friends and coworkers.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features