What exactly is Open Source AI? (Interview) by The Changelog: Software Development, Open Source | Podchaser

Episode from the podcastThe Changelog: Software Development, Open Source

What exactly is Open Source AI? (Interview)

Released Friday, 16th February 2024

Good episode? Give it some love!

What exactly is Open Source AI? (Interview)

What exactly is Open Source AI? (Interview)

Friday, 16th February 2024

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:15

Look a Mac brands, this is

0:17

that she's log I managed to

0:19

go of yeah again. This we

0:21

were joined by Stephen. I'm a

0:23

fully the Executive Director of the

0:25

Open Source Initiative. The oh it's

0:28

I. We Open Source Initiative is

0:30

responsible for representing the idea in

0:32

the definition of Open Source globally.

0:34

Stefano shares the challenges they face

0:36

as a Us based organization with

0:38

a global impact. We discuss the

0:40

work Stefano in the Open Source

0:43

Initiative are doing to define Open.

0:45

Source A I am why we

0:47

need an accepted and shared definition.

0:49

Of course we also tell about

0:51

the potential impact if a poorly

0:53

defined Open Source A emerges from

0:56

their efforts. Also to mention that

0:58

stuff and with Philly under the

1:00

weather for this conversation but he

1:02

power through because of how important

1:04

this topic is. A massive thank

1:07

you to our friends in Are

1:09

Partners A fly.io The Home of

1:11

Change well.com It's simple launch apps,

1:13

new users, they transform. Containers into

1:15

Micro Vm that run in their

1:18

hardware in thirty plus regions on

1:20

six continents. Launch an app for

1:22

free at fly.io. With.

1:34

Up Friends This episode of that

1:37

she's always brought you Buy our

1:39

friends over at Resell and I'm

1:41

he will leave Robinson V P

1:43

a product. Liana You know the

1:45

tagline for Resell. Develop previous ship

1:47

which has been perfect but now

1:49

there's more. after the ship process

1:51

you have to worry about security

1:53

of their ability in other parts

1:55

of just running an application. Production

1:57

was the story there. What's beyond

1:59

shipping? Yeah, you know,

2:01

when I'm building my side projects or when I'm

2:04

building my personal site, it often looks like develop

2:06

preview ship. I try out some new features, I

2:08

try out a new framework, I'm just hacking around

2:10

with something on the weekends. Everything

2:12

looks good, great. I ship it, I'm done. But

2:14

as we talk to more customers, as we've grown

2:16

as a company, as we've added new products, there's

2:19

a lot more to the product portfolio

2:21

of Vercel nowadays to help pass that

2:23

experience. So when you're building larger, more

2:26

complex products and when you're

2:28

working with larger teams, you want to

2:30

have more features, more functionality. So tangibly,

2:32

what that means is features like our

2:34

Vercel firewall product to help you be

2:36

safe and to have that layer of

2:38

security. Features like our logging and observability

2:40

tools so you can understand and observe

2:42

your application and production, understand if there's

2:45

errors, understand if things are running smoothly

2:47

and get alerted on those. And

2:49

also then really an expansion of our

2:51

integration suite as well too, because you

2:53

might already be using a tool like

2:55

a Datadog or you might already be

2:58

using a tool at the end of

3:00

this software development lifecycle that you want

3:02

to integrate with to continue to scale

3:04

and secure and observe your application and

3:06

we try to fit into those as

3:08

well too. So we've kind of continued

3:10

to bolster and improve the last mile

3:12

of delivery. That sounds amazing.

3:15

So who's using the Vercel platform like that?

3:17

Can you share some names? Yeah,

3:19

I'm thrilled that we have some

3:21

amazing customers like Under Armour, Nintendo,

3:24

Washington Post, Zapier who use

3:26

Vercel's running cloud to not only help

3:28

scale their infrastructure, scale their business and

3:31

their product, but then also enable their

3:33

team of many developers to be able

3:35

to iterate on their products really quickly

3:37

and take their ideas and build the

3:40

next great thing. Very cool.

3:42

With your configuration for over 35 frameworks,

3:44

Vercel's running cloud makes it easy for

3:46

any team to deploy their apps. So

3:49

you can get started with a 14 day free

3:51

trial of Vercel Pro or get

3:53

a customized enterprise demo from their

3:55

team. Visit vercel.com/teams log

3:58

pod to get started. That's

4:01

vercel.com/Cheese Law

4:03

Pod. Well,

4:33

Stef, no, it's been a while. Actually,

4:35

never, which is a good thing, I

4:37

suppose, but now we're here. Fantastic. We

4:39

were at All Things Open recently, and

4:41

we tried to sync up with you,

4:43

but we missed the message. And

4:45

so we were like, we got to get you on the podcast. And obviously,

4:47

you know, this show, the Cheese

4:49

Law, was born around open source. And

4:52

I kind of find it strange and

4:55

sad that we've never had anybody from

4:57

the open source initiative on this podcast.

4:59

It's – I'm glad you're here to

5:02

change that, so welcome. Thank

5:04

you. Thank you for having me. It's a pleasure.

5:07

Sorry we missed it. We missed

5:09

each other in South Carolina. It

5:11

was a great event. Oh, man. We love

5:13

All Things Open. We love Todd and their

5:15

team there. We think All Things Open is

5:18

the place to be at the end

5:20

of the year. If you're a fan of open

5:22

source, you're an advocate of open

5:24

source, and just the way that it's permeating all

5:27

the software, right? It's one. Open source is one.

5:29

And now we're just living in a hopefully

5:31

mostly open source world, right? Absolutely.

5:35

Absolutely. I mean, just last week

5:37

was an article published,

5:41

estimated the value of open source

5:43

software as a whole. The

5:46

numbers are incredible. These

5:48

researchers from Harvard Business

5:51

School went and looked

5:53

at the value of open source as it

5:55

is consumed or produced.

6:00

Dollar numbers on it. I am you the

6:02

school because I I don't know how among

6:04

analysts term a purely guess someone and have

6:06

analysed trial Yemen illegal brain and from Iowa

6:08

knows you Okay Donna hired quantify the value

6:11

of oh man oh it's quite valuable center

6:13

that little the how do you value and

6:15

he quantify the vive of the source like

6:17

worried it was the metrics a. The.

6:19

Key Office? Do you know? They. Can zip

6:22

lines of code or they counted the

6:24

Our Steam estimated the hours that it

6:26

would take to rewrite from scratch or

6:28

does software that isn't news. And.

6:31

A Use Dub Dub assets that

6:33

are debatable are already and with

6:35

some with some of those count.

6:37

And. Using those two with other sits there

6:40

to meet a t w that would take

6:42

to replicate own of the open source software

6:44

deniers that is available and they put the

6:46

numbers around. Eight. Point: Eight

6:48

Trillion dollars. Ah. I would

6:51

I? she just say all the dollars. Really?

6:53

Personally I would the sailed all the.yeah why

6:55

me? it's a huge number. Hundred dollars right?

6:57

It doesn't every dollar today like really depend

6:59

on open source at some layer. So.

7:02

Like really couldn't beat us all the dollars. Well

7:04

a bright it. It's an impressive number and it's

7:06

really hard to picture it how much how big

7:08

it is. I will our winter had them to

7:10

look it up and. Allison said

7:12

it's three times as much as

7:14

Microsoft. Market. Dot. And

7:17

ah, it's larger than the whole

7:19

of Juri state's budget. Like twenty

7:21

twenty two. His budget in the

7:23

United States that includes many deaths

7:25

are debate. Medicare six point, Three

7:27

Trillion Four yeah, The Australians their

7:29

right, more mortuary and that I've

7:31

got tired of an easier thing,

7:33

right? I got strains of anything

7:36

really media. Not. Even and sense

7:38

can there's a trend since housing allow you

7:40

to keep on by Kiss almost as series

7:42

of the I returned as is a the

7:45

banks and Ceo you. Are sad

7:47

that's on a think about really why here

7:49

and I'm are like eight point eight trillion

7:51

and I started things once you're out to

7:53

nine and I realize that's like a set

7:55

of a trillion dollars is going around at

7:57

a lot of my head around that is.

8:00

A nice rounding error on your in your

8:02

favor for was your own dollars right? Oil

8:05

and my nice for sir around it

8:07

off and out as impose and to

8:10

maintain. Are you know that? Be nice?

8:12

Yeah well. I don't know

8:14

everybody listen this podcast will be. I

8:16

think a lot of them will be

8:18

both the you know in light of

8:20

recent see better now on assume that

8:22

are list of ship is super informed

8:24

of what the open source initiative is.

8:27

I can kind of recently about paid

8:29

stefano but I'd prefer that youth and

8:31

of give us a. A taste

8:33

of what. The. L A Size

8:35

Really About what is the organization as

8:37

A Five One C Three. You know

8:39

it's a public benefit operation in California.

8:41

The What exactly is the Open Source

8:43

Initiative For all that guy with talked

8:45

about what is it? Oh yeah, in

8:47

a nutshell, we are the mundane. There's

8:49

of the open Source definition. And

8:51

dance. The. Open Source Definition

8:53

is a ten points checklist.

8:56

That has been used for twenty six years.

8:58

We have. Celebrated. Twenty Four

9:01

Years last year. It's the the checklist

9:03

that has been used to evaluate licenses.

9:06

Daddy Is legal documents said com

9:08

together with software packages. To.

9:10

Make sure that the packages the

9:12

soft or comes with said Sweden's.

9:15

Said. Are. Written. Down as.

9:18

Can. Be summarized as Tor Freedoms comes from

9:20

dumfries Off to the assumption at these

9:22

the freedom to use the software. Without

9:25

having to ask for permissions.

9:27

The. Freedom to study and to make sure that.

9:30

You know in to understand. What?

9:33

It does and when supposed to be doing. And

9:35

nothing else. And then for.you

9:38

need access to the source code. And.

9:40

Then dead freedom to modify it seemed

9:42

to fix it and and increase it's

9:44

capacity. Or. Help yourselves

9:47

and the freedom to make

9:49

copies.is for yourselves or for.

9:52

Add. To help others. And then

9:54

those three them were were written down

9:56

in the. eighties. By.

9:58

the free software foundation and

10:01

the open source initiative started a couple

10:03

of decades after that, picking

10:05

up the principles and spreading them out

10:07

a little bit in a more practical

10:10

way. In

10:13

a time, at a time when a lot

10:16

of software was being deployed and

10:18

powering the internet, basically, this

10:21

definition and this is a

10:23

license, licenses, gives users

10:26

and developers clarity about the

10:28

things that they can do. Provides

10:31

the agency and independence and

10:33

control and all

10:35

of that clarity is what

10:37

has propelled and generated that huge ecosystem

10:40

that is worth 8.8 trillion.

10:45

So who formed the initiative and

10:47

then how did it sustain

10:50

and continue? Seems like the definition

10:52

is pretty set but like what is

10:54

the work that goes on continually? Yeah,

10:56

well, the work that goes on continuously

10:59

is, especially now recently,

11:01

it's the policy,

11:03

the monitoring of policy works and

11:06

everything that goes around it. The

11:09

concept of open source seems to

11:11

be set but it's constantly under

11:13

threat because evolution

11:15

of technology, changes of business models,

11:17

the rise and the rise of

11:20

importance and power of new actors

11:22

constantly shifts and

11:26

tends to push the definition itself

11:29

of open source in different directions, the

11:31

meaning of open source in different directions

11:34

and regulation also tends to

11:36

introduce hurdles that

11:38

we need to be aware of. The

11:41

organization, what we do, we

11:43

have three programs. One is

11:45

called the Legal and Licenses

11:47

Program and that's where we

11:49

maintain the definition. We review

11:51

new licenses as they get approved and

11:54

we also keep a database of

11:56

licensing information for packages

11:58

because often... developers don't

12:00

use the right words or miss

12:03

some pieces, a lot of packages don't have

12:05

the right data. And

12:07

we have, we are maintaining

12:10

the community that maintains this

12:12

machine called Clearly Defined. On

12:14

the policy front, that's another program, the

12:17

policy and standards front,

12:19

we monitor the activity of standard

12:21

setting organizations and

12:23

the activity of regulators in the United

12:25

States and Europe mostly to make

12:27

sure that all the new laws and rules and

12:31

the standards can be implemented with

12:33

open source code and the regulation

12:35

doesn't stop or doesn't block the

12:38

development distribution of open source software.

12:40

And then a third program is

12:43

on advocacy and

12:45

outreach and that's the

12:47

activities that we do with maintaining

12:49

the blog, having the communication, running

12:51

events. And in this

12:53

program, we're also hosting the conversations

12:55

around the funding open source AI,

12:58

which is a requirement that

13:01

came out especially a couple of years ago,

13:03

getting rapidly glowing

13:06

of hotness at

13:08

us. So we were

13:10

basically forced to start this

13:12

process because open AI is

13:14

a brand new system, the

13:17

brand new activities that forces us

13:20

to review the principles to see if

13:22

they still apply and how they need

13:24

to be modified or we can apply

13:26

to AI systems as a whole. And

13:29

we are a charity organization, you mentioned that.

13:32

So our sponsors are individuals

13:34

who donate to become members and

13:37

they can donate any amounts from

13:39

$50 a year up to what have you.

13:42

And we have a few hundreds of those,

13:44

almost a thousand. And then we

13:47

have corporate sponsors who give

13:49

us money also, donations to

13:52

keep this work going. It's in

13:54

their interest to have an independent

13:56

organization that maintains the definition and

13:59

having both. of all of these

14:01

donors, corporate donors, makes the

14:04

organization stronger. So we don't

14:06

depend on any

14:08

one scene individually of them. So

14:11

despite the fact that we get money from

14:13

Google or Amazon or Microsoft and

14:15

GitHub, we don't have to swear our

14:17

own agencies to them. Do

14:19

you also defend the license so far as

14:22

going to court with people who would

14:25

misuse it or no? It hasn't happened,

14:27

but we do have, I

14:29

mean, not under my watch, but

14:31

we do have experts and now

14:34

on our board and in our

14:36

circle of licensing

14:39

experts, we do have lawyers who

14:41

go to court constantly to

14:44

defend the license, to train more,

14:46

protect users. And

14:48

they're there as like expert witnesses? Exactly.

14:51

And we do provide, we

14:53

have provided briefs for courts,

14:56

opinion pieces for regulators and

14:59

responses to requests for

15:01

information in various legislation

15:03

here. How challenging is it

15:05

to be a US-based

15:08

founded idea now,

15:12

organization that represents

15:14

and defends this definition that really,

15:16

you know, going back to the trillions,

15:18

like, I mean, all the money, all the

15:21

dollars, like it's a world problem.

15:23

It's not just a United States

15:25

problem. How does this organization operate

15:28

internationally? What challenges do

15:30

you face as a

15:32

US-based nonprofit, but representative

15:34

of the idea of open source

15:36

that really impacts everyone globally?

15:39

Yeah, that's a very good question. In

15:41

fact, it is challenging. So I started

15:43

at the organization only over

15:45

two years ago, and I'm

15:48

Italian. And so I do have connections

15:50

to Europe and knowledge about Europe. We

15:52

do have board members that are based

15:54

in Europe and other board members in

15:57

the United States. And it is actually

15:59

quite a challenge. quite challenging to

16:01

be involved into this global

16:03

conversations because now, a little

16:05

bit like maybe in

16:07

the late 90s, open source

16:10

is becoming increasingly getting

16:12

at the center of geopolitical challenges

16:14

and not because of open source

16:16

per se, but because

16:19

software is so incredibly existing

16:21

everywhere and most of that software that

16:23

exists is open source. So

16:26

there have been a lot of challenges

16:29

as the relationship, the trade relationship

16:31

with other actors like

16:33

Russia, Ukraine, now with

16:35

the war in Israel and

16:38

Gaza and the trade

16:40

wars with China, between China and

16:42

United States. There are a

16:44

lot of geopolitical issues that we are at

16:46

the center of and we're

16:48

finding really complicated. In

16:51

fact, we do have, we

16:54

have raised more money to increase

16:56

our disability on the policy front.

16:58

We have right now, at

17:01

the moment, we have two people working,

17:03

one in Europe and one is more

17:05

focused in the United States. Both

17:07

of them are part-time, but we do have budget

17:10

to hire at least another

17:12

one, if not two

17:14

policy analysts to help

17:16

us review the incredible amount of legislation

17:19

that is coming. We're just talking about

17:21

United States and Europe. I

17:23

guess even one more layer than that is

17:26

that I don't know

17:28

if it's a self-profession of the defender ship

17:30

of the term of open source. I understand

17:32

where it came from to some degree, you

17:34

know, and I wonder if, how

17:36

do you all handle the

17:38

responsibility of not

17:40

so much owning the trademark term of open source,

17:42

but to defending it? So in a way, you

17:45

kind of own it by

17:47

defending it because you have to defend it. Like it's

17:49

some version of responsibility, which is maybe

17:51

a by-product of ownership, right? There's

17:54

a pushback happening out there. Like there's even

17:56

a conversation of recent where,

17:58

you know, they can't describe their software

18:00

is open source because

18:03

the term means something. And we all agree

18:05

on that, right? We understand that. And

18:07

I'm not trying to defend that,

18:09

but like how do

18:11

you operate as an organization that defends this term? Yeah,

18:14

I mean this is really funny because we

18:17

don't have a trademark on the term

18:19

open source applied to software. We have

18:21

a soft power, Q1, that

18:23

is given to us by all

18:25

the people who, just like you just said,

18:29

recognize that the term open source is

18:31

what we have designed, we have defined.

18:34

We maintain the definition. It's

18:36

kind of recursive if you want. But

18:39

corporations, individual developers,

18:42

all their institutions like academia,

18:44

researchers, they recognize that open

18:47

source means exactly those 10,

18:50

the list of licenses, those 10 points

18:52

which you want the four freedoms that

18:54

are listed. And we maintain

18:56

that. And

18:58

this has become quite visible also even

19:00

in court where they do

19:03

understand that if someone is like

19:05

there was a recent case involving

19:08

the company Neo4j and

19:11

during that litigation that

19:13

is quite complicated with entrenched.

19:16

I'm not a lawyer. I'm not going to dive

19:19

into legal things. But

19:21

the one key takeaway that is easy

19:23

for me to grasp and communicate is

19:26

that the judge recognize that the

19:29

value of open source is in the

19:31

definition that we maintain and

19:34

calling open source something that is not

19:36

a license, that is not a license

19:39

that we have better approved is

19:42

false advertising. And that held up in

19:44

court? Oh, yeah. And

19:46

so is that what you would say to people who are

19:49

perhaps maybe nonchalant isn't the

19:51

best word but unimpressed

19:54

by open source as a

19:56

definition and they think it's

19:59

stodgy and. type and the thing that

20:01

they're doing is close enough and they

20:03

like the term, they're going to use

20:05

the term and they've got open-ish code

20:08

or source available or business source.

20:11

Because a lot of people are kind of pushing not just

20:13

against the definition itself but like against

20:15

the idea that we need a definition or like

20:17

you guys get to have the definition. What

20:20

do you say to them? Yeah, you know

20:22

they're self-serving. They try to

20:24

be self-serving and they're trying to destroy

20:26

the commons that way quite

20:29

visibly. I think that users see

20:31

through them and it's not

20:33

even in their interest but you

20:35

know how it works sometimes corporations,

20:38

their greed goes up to

20:40

they care only about the next quarter and

20:43

who cares about what happens next. Maybe

20:46

the next CEO will have to take care meanwhile they're

20:48

just going to laugh all the way to the bank.

20:51

And that is the approach that I see

20:53

many of these people who complain

20:56

or who try to

20:58

redefine open source because it doesn't serve

21:00

their purpose. What we maintain it doesn't

21:02

fully serve their purpose. So instead of

21:05

respecting the commons and the

21:07

shared ideas, they act

21:10

like bullies and find all

21:12

sorts of excuses to redesign.

21:14

We've seen it happening. I've been

21:16

in free software and open source

21:18

post in my career since I was

21:20

in my 20s and I've seen

21:23

what was happening with the early

21:25

days with the proprietary Unix

21:28

guys that were going around telling us

21:30

that this Linux thing is

21:32

never going to work, you're joking, you're

21:36

giving away. Then they started to

21:38

be scared and started saying, hey, you're giving away

21:40

your jewels. You know why are you

21:42

doing this? You're depriving us of our life

21:45

support. The families are

21:47

going to be begging on the streets. I

21:49

remember having this conversation with a sales guy

21:51

from Moscow and

21:54

Microsoft coming up with their

21:56

program in the early 2000s. shared

22:00

source program because they just

22:02

could not get, they wrapped their head around

22:05

the fact that you could make money sharing

22:07

your source code. But they were forced by

22:09

the market to show at least a little

22:11

bit of what was happening behind the scenes.

22:13

They were losing deals. So

22:16

we've seen it already. They're

22:19

gonna keep on going like this, but

22:21

there is plenty of interest in maintaining

22:24

plenty more forces on the other side

22:26

to maintain, then to

22:28

keep the bar straight, to keep

22:30

going where we're going. Because that

22:33

clarity is, so

22:35

it's such a powerful, such a

22:37

powerful instrument to be able to

22:39

say, I'm open source, therefore, I

22:42

know what I can do, I know what I cannot do, and

22:45

have that collaboration straightened up. The

22:48

legal departments, the compliance departments,

22:50

the public tenders, they all

22:52

tend to have very clear

22:55

and speedy review of

22:58

processes that instead, everyone has

23:00

a different understanding of what open source

23:02

means. Yeah, we go back to the

23:04

brand, right? I'm in

23:07

Italy now and I'm surprised to see a

23:09

lot of Starbucks stores

23:11

opening. And I'm

23:14

absolutely baffled, like why is this happening?

23:16

This country has plenty of bar every

23:18

quarter. So there's a cafe with a

23:20

decent coffee. Why do you need

23:22

a brand? Because people have been going

23:24

around traveling the world, they see the brand, they

23:27

recognize it, they know what they can do, they

23:29

know what they're gonna get, what they're gonna get,

23:31

and they go out there. And it's the same

23:33

with open source. Gets

23:56

separate. Hit enter. A

23:59

lot of friends. This episode is brought

24:01

to you by our friends at

24:03

Sanadia. Sanadia is helping teams take

24:06

NAS to the next level via

24:08

a global multi-cloud, multi-geo, and extensible

24:10

service fully managed by Sanadia. They

24:13

take care of all the infrastructure,

24:15

management, monitoring, and maintenance for you

24:17

so you can focus on building

24:20

exceptional distributed applications. And

24:22

I'm here with VP of product and engineering, Byron Ruth.

24:24

So Byron, in the NAS

24:27

versus Kafka conversation, I hear

24:29

a couple different things. One

24:31

I hear out there, I hate Kafka with

24:33

a passion. That's quoted by the way on

24:35

Hacker News. I hear Kafka

24:37

is dead, long live Kafka. And then

24:39

I hear Kafka is the default, but

24:42

I hate it. So what's the deal

24:44

with NAS versus Kafka? Yeah,

24:46

so Kafka is an interesting one. I've

24:48

personally followed Kafka for quite some time

24:50

ever since the LinkedIn days. And I

24:52

think what they've done in terms of

24:55

transitioning the landscape to event streaming

24:57

has been wonderful. I think they

25:00

definitely were the sort of

25:02

first market for persistent data streaming.

25:04

However, over time, as people

25:07

have adopted it, they were the first

25:09

to market, they provided a solution, but

25:11

you don't know what you don't know in

25:13

terms of you need this solution, you need

25:16

this capability, but inevitably, there's also

25:18

all this operational pain and overhead

25:20

that people have come to associate

25:22

with Kafka deployments. Based on our

25:24

experience and what users and customers

25:26

have come to us with, they

25:29

would say, we are spending a

25:31

ton of money on spend on

25:33

a team to maintain our Kafka

25:35

clusters, or managed services,

25:37

or something like that. The paradigm

25:40

of how they model topics and

25:43

how you partition topics and how you

25:45

scale them is not really

25:47

in line with what they fundamentally want to

25:49

do. And that's where NAS

25:51

can provide, as we refer to

25:54

it, subject-based addressing, which has a

25:56

much more granular way of addressing

25:58

messages, sending messages. subscribing

26:00

to messages and things like that

26:02

which is very different from what

26:04

Kafka does. And the second that

26:07

we introduced persistence with our Jetstream

26:09

subsystem as we refer to it

26:11

a handful of years ago, we

26:13

literally had a flood of people

26:15

saying, can I replace my Kafka

26:17

deployments with this NATs Jetstream alternative?

26:20

And we've been getting constant inbounds,

26:22

constant customers asking, hey, can you

26:24

enlighten us with what NATs can

26:26

do? And oh, by the way,

26:28

here's all these other dependencies like

26:31

Redis and other things and some of

26:33

our services based things that we could

26:35

potentially migrate and evolve over time by

26:38

adopting NATs as a technology, as a

26:40

core technology to people's systems and platforms.

26:42

So this has been largely organic. We

26:45

never from day one, with our persistence

26:47

layer Jetstream, the intention was never to

26:49

say we're going to go after Kafka.

26:52

But because of how we

26:54

layered the persistence on top of this

26:56

really nice PubSub core NATs foundation, and

26:58

then we promoted it and we say, hey,

27:00

now we have the same semantics,

27:03

same paradigm with these new primitives

27:05

that introduce persistence in terms

27:07

of streams and consumers, the flood

27:10

gate just opened and everyone was

27:12

frankly coming to us and wanting

27:14

to simplify their architecture, reduce costs,

27:16

operational costs, get all of

27:18

these other advantages that NATs has to offer

27:21

that Kafka does not whatsoever, or any of

27:23

the other similar offerings out there. And you

27:25

get all these other advantages that NATs has

27:27

to offer. So there's someone out

27:29

there listening to this right now, they're the

27:31

Kafka cluster admin, the person in

27:33

charge of this cluster going down or not,

27:36

they manage the team, they feel the pain, all

27:38

the things, give a prescription, what should they

27:40

do? What we always recommend

27:43

is that you can go to the

27:45

NATs website, download the server, look at

27:47

the client and model a stream. There's

27:49

some guides on doing that. We also

27:51

have so native provided a basically a

27:53

packet of resources to inform

27:55

people because we get again, so many

27:57

inbound requests about how do you compare?

28:00

apps in Kafka and we're like, let's actually

28:02

just put a thing together that can inform

28:04

people how to compare and contrast them.

28:06

So we have a link on the website

28:08

that we can share and you

28:10

can basically go get those set

28:12

of resources. This includes a very

28:14

like lengthy white paper from an

28:16

outside consultant that did performance benchmarks

28:18

and stuff like that and discuss

28:20

basically the different trade-offs

28:22

that are made and they also

28:25

do a total cost of ownership

28:27

assessment between people who are organizations

28:29

running Kafka versus running NATs

28:31

for comparable workloads. Well

28:34

there you go. You have

28:36

a prescription. Check for a

28:38

link in the show notes

28:40

to those resources. Yesterday's tech

28:42

is not cutting it. NATs

28:44

powered by the global multi-cloud

28:46

multi-geo and extensible service that

28:48

is fully managed by Cineadia.

28:51

It's the way of the

28:53

future. Learn more at cineadia.com/change

28:55

log that's synadia.com/change log. So

29:07

last year on this time, Meta released

29:11

Llama, their large

29:13

language model, and

29:15

to much fanfare and applause and they

29:18

announced it as open source. We

29:21

know a lot has transpired since then but at

29:23

the time what was your response to that

29:26

even personally or as the executive

29:28

director of the OSI? Like what were you thinking? What

29:30

were you doing in the wake of that

29:32

announcement? Well we were

29:34

already looking at open source AI in

29:37

general. We were trying to understand what

29:39

this new world meant and what the

29:41

impact was on the principles of open

29:43

source as they applied to new artifacts

29:46

that are being created in AI and

29:49

we already had come to the

29:51

conclusion that open source AI

29:53

is a different animal than open source

29:55

software. There are many

29:58

many differences. So

30:00

we meet at me two years ago, over

30:02

two years ago, that was one of the

30:04

first things that I started was to really

30:06

push the board and to push the community

30:08

to think about AI as

30:10

a new architect that acquired

30:13

and deserved also a deep

30:16

understanding and a deep

30:18

analysis to see how we could transport

30:20

the benefits of open source software into

30:22

this world. But Rene's

30:24

of LAMA2 kind of cemented that

30:27

idea. It is a

30:29

completely new artifact because they

30:31

have released, sure they have released a lot

30:34

of information, a lot of details, but

30:36

for example, we don't know exactly what

30:38

went into the training data. And

30:41

well, LAMA2 also came out

30:43

with a license that

30:46

really has a lot of restrictions

30:48

on use. So it's having restrictions

30:50

on use is one of the

30:52

things that we don't like, I

30:54

mean, the open source definition

30:56

forbids. You cannot have any restrictions on use.

30:59

And you know, a surface value, the

31:02

license for LAMA2 seems innocent, right? One

31:04

of the things says, well, you cannot

31:06

use LAMA2 for commercial applications if you

31:08

have more than a few million, I

31:11

don't remember exactly how many, a few

31:13

million active users, monthly active

31:16

users. Okay, you

31:18

know, maybe that's a fair limitation.

31:21

And in my mind, I was like, so

31:23

what does it mean that the government of

31:25

India cannot use it? The

31:27

government of Italy, maybe, you know, if

31:30

you want to embed this into, so

31:33

that's already an exclusion

31:35

and starts to have to think about it,

31:37

you know, think about, yeah, I'm a startup,

31:39

yeah, I'm a small thing. But

31:42

what happens when you get to the six million

31:44

users when, you know, all of a sudden you

31:46

have to lower up and change completely your processes.

31:48

But then there are a couple of other restrictions inside

31:51

that license that are even more

31:53

innocent on surface. But when you

31:55

start diving deeper, like you cannot do anything

31:57

illegal with it. Okay. All

32:00

right, so let me say if I

32:02

help someone decide whether

32:04

they can or they should have an

32:07

abortion or if I

32:09

want to have this tool used in

32:11

applications to help me, I don't

32:14

know, get refugees out of

32:16

war zones into a lot of places.

32:19

And maybe I'm considered a

32:21

terrorist organization by the government

32:24

that is using that. So are

32:26

I doing something illegal? And

32:28

so on whose side, you know, who

32:30

needs to be evaluating that? If

32:33

these licensing terms that the Open

32:35

Source Initiative really doesn't think they're

32:37

useful, they're valuable, and they

32:39

should not be part of a license, they

32:42

should not be part of a contract

32:44

in general, and they need to be

32:46

dealt at a separate level. So

32:48

that's what I was looking at. I

32:51

was like, oh, LAMA2, oh my God.

32:53

It's not open source because clearly this

32:55

licensing thing would never pass our approval.

32:59

And at the same time, we don't even know

33:01

exactly what open source means. Why are you polluting

33:03

this pay? So I was really

33:05

upset. Yeah. So then

33:07

do you spring into action? Like what does

33:09

the OSI do? Because you're the defenders of

33:11

the definition, and here's a misuse, a huge

33:13

public misuse. Do you write a blog post?

33:15

Do you send a letter

33:17

from a lawyer? What do you do?

33:20

Luckily, we were already into this two-year process

33:22

of defining open source AI. So

33:29

we have... Actually I

33:31

was already in conversations with Meta

33:34

to have them join the process

33:36

and support the process to

33:38

find the shared definition of open source

33:40

AI. And in fact, they're part of

33:42

this conversation, then I'm adding with

33:45

not just corporations like Google,

33:47

Microsoft, GitHub, Amazon, et cetera.

33:50

But also we invited researchers

33:53

and academia, creators of AI,

33:56

experts of ethics and

33:58

philosophy organizations. organizations that

34:00

deal with open in general,

34:03

open knowledge, open data like

34:05

Wikimedia, comments, open knowledge

34:07

foundation, Mozilla Foundation. And

34:10

we're talking also with expert

34:13

thematics, but also organizations like

34:15

Digital Rights Group, like the

34:17

ESS and other organizations around the

34:19

world who are helping into

34:22

this debate.

34:24

Like we had to first go through an

34:26

exercise to understand and come to a shared

34:29

agreement that AI is a different thing

34:31

than software. Then we

34:34

went through an exercise to find the

34:36

shared values that we want to have represented and

34:39

why we want to have the same

34:41

sort of advantages that

34:44

we have for software also posted over to the

34:46

AI system. And

34:49

then we have identified the freedoms

34:51

that we want to have exercised.

34:54

And now we're at the point where we

34:57

are trying to enlist in being

34:59

the list of components of

35:02

AI systems, which is

35:04

not as simple as binary

35:07

code, compiler, and source

35:09

code. So it's not as simple

35:11

as that. It's a lot more complicated.

35:14

So we're building this list of components for

35:16

specific systems. And

35:19

the idea is by the end

35:21

of the spring, early summer, to

35:24

have the equivalent of what we have now

35:26

as a checklist for legal documents for

35:28

software and have the equivalent for AI

35:31

systems and their components so

35:34

that we will know basically we have

35:36

a release candidate for an open source AI

35:38

definition. And you mentioned that,

35:41

and there's, I think you posted this

35:43

eight days ago, a new draft of

35:46

the open source AI definition version 0.0.5 is available.

35:50

I'm going to read from, I think, what you

35:52

might be alluding to, which is exactly what is

35:54

open source AI. And it says, linked

35:56

up to the HackMD document, it

35:59

says, what is open source AI? To be open

36:01

source, an AI system needs to be available

36:03

under legal terms that grant the freedoms to,

36:05

one, use the system for

36:07

any purpose and without having to ask for permission,

36:10

two, study how the system works and

36:13

inspect its components, three, modify

36:15

the system for any purpose, including

36:17

to change its output, and

36:19

four, share the system for others

36:21

to use with or without modifications for any purpose.

36:25

So those seem to be the four hinges that this

36:27

– what is open source AI is hinging upon,

36:29

at least in its current draft. Is that

36:32

pretty accurate, considering it's recent

36:34

eight days ago? Yeah. Those are

36:36

the four principles that we want

36:38

to have represented. Now,

36:41

the very crucial question is what comes

36:43

next, is what is – if you

36:45

are familiar with the four freedoms of

36:48

software, those set by the

36:50

Free Software Foundation in the late 80s. They

36:53

have one – those freedoms have one

36:55

little sentence attached to it, to the

36:57

freedom to study and the freedom to

37:00

modify. They both say access

37:02

to the source code is a precondition for this,

37:05

which really means to clarify, it's

37:07

that little addition. It's meant to

37:09

clarify that if you

37:11

want to study a system, if

37:13

you want to modify it, you need to

37:15

have a way to make modifications to it.

37:18

It is not just – it's

37:20

the preferred form to make modifications from the

37:22

human perspective. It's not that you give me

37:24

a binary and then I have to

37:26

decompile it or try to figure

37:28

out from reverse engineering how it works.

37:31

Give me the source code. I need the source code here

37:33

to study. For the

37:35

AI systems, we haven't

37:37

really found yet a

37:39

shared understanding or a shared

37:41

agreement on what it means

37:44

to have access to the

37:46

preferred form to make modification to an

37:48

AI system. That's the exercise

37:50

that we're running now and we – yeah. Yeah,

37:53

that's interesting. The preferred form of modification

37:55

is really interesting. Like you

37:57

said, you don't want to give a binary. Expect –

38:00

diverse engineering because that's

38:02

possible, right? That's possible maybe to a small

38:04

subset. It's not the preferred route to

38:06

get to Rome. It's just like that's not the route I

38:08

want to go down, right? I want a different way. Yeah.

38:11

And you want to have a simple way. So even

38:14

some licenses even have more

38:16

specific wording around defining

38:19

what source code actually means like

38:21

the GNU GPL is one of those

38:24

very clear description and prescriptions about

38:27

what needs to be given to

38:30

users in order to exercise those freedoms.

38:33

Their freedoms as a user. So

38:35

for AI, yeah, for AI, it's

38:37

complicated because there are a few

38:39

new things for which we

38:42

don't even have, there are no core

38:44

cases yet. You know, I keep

38:46

repeating the same story. When software came out

38:48

for the first time, started to come out

38:50

at the labs, research labs,

38:53

they started to become a commercial artifact

38:56

that people could just sell. There

38:59

was a conscious decision to apply copyright

39:01

to it. It was not a given

39:04

fact that it was going to be using

39:07

copyright, like copyright law. So

39:09

that decision was a lucky one, honestly,

39:12

and it was a well thought out,

39:14

I don't know which of the two,

39:17

because copyright as a legal

39:19

system is very similar across the

39:21

world. And building the open

39:23

source definition, the principle of the definition,

39:26

the legal documents that go with

39:28

software for open source software and

39:30

pre-software, those legal documents built

39:32

on double copyright means that they're

39:35

very, very similarly applied pretty much

39:37

everywhere around the world. The alternative

39:39

at the time was a conversation,

39:41

were conversations around treating

39:43

software as an invention and therefore

39:46

covered by patents. Patent

39:48

law is a whole different mess around

39:50

the world. The whole different applications that

39:52

have all different terms, much

39:55

more complicated to deal with. So

39:57

for AI, we're pretty much at the scene.

40:00

stage where there are some new

40:02

artifacts like the model after

40:05

you train a model and that

40:07

produces weights and parameters that

40:09

go into the model. Those

40:11

models, honestly, it's not clear

40:14

what kind of legal frameworks apply to

40:16

those things. And we might be at

40:18

the same time in history where

40:20

we could have to imagine

40:22

and think and maybe suggest and recommend,

40:25

what the best course of action will be,

40:27

whether it makes sense to treat them as

40:29

copyrightable entities, artifacts,

40:32

or nothing at all, or

40:34

inventions, or any, you know,

40:37

some other rights or exclusive

40:39

rights. And the same

40:41

goes into the other big

40:44

conversation that is happening already,

40:46

but for which there is no, I don't have

40:49

a clear view of where it's going to end,

40:51

these are the conversations around the

40:54

right to data mining. And

40:57

if you follow the conversations around

40:59

charge EPT being sued by New

41:01

York Times and Getty Images, Stability

41:03

AI, and Getty Images, and

41:05

GitHub being sued by Anonymous,

41:08

etc., etc. A lot of

41:10

those lawsuits hinge on

41:13

what's happening, why are

41:15

these powerful corporations going

41:17

around and calling the internet, aggregating

41:20

all of this information and data

41:22

that we have provided, uploaded, we

41:25

society, some commercial actors,

41:27

some non-commercial actors, we have created

41:30

this wealth of data on the

41:33

internet, and they're going around creating

41:35

it and basically making it

41:37

proprietary, building models that

41:39

they have for themselves. And on top of that,

41:42

you can already start seeing like, oh my God,

41:44

they're going to be eventually making a lot of

41:46

money out of the things that

41:48

we have created. Or even more scarily,

41:50

like sometimes I think about this myself,

41:52

I've been uploading my pictures for

41:55

many years without paying too much. So

41:57

there is another base out there, I'm sure

41:59

that. I want to build another base out

42:01

there of my pictures as I was aging. And

42:04

now these pictures are being – can be used, could

42:06

be used by an evil government

42:08

or evil actor to recognize me around

42:10

the streets at any time. And

42:16

I'd allow it in a course. Is that fair?

42:18

Is that not fair? Those are

42:20

big questions and there is no easy

42:22

or simple answer. Yeah. So

42:25

did you enumerate – and I missed

42:27

it – or can we enumerate the

42:29

components that you have decided

42:31

so far are part of an

42:34

AI system, the code I heard,

42:36

the training data, etc.? Yeah.

42:39

There are three main categories. So maybe four.

42:42

Like one is the – yeah, is in

42:44

the category of data. One is

42:46

in the category of code. That

42:48

is the other category is models.

42:52

And there is a four category that

42:54

goes into other things like documentation, for example.

42:57

Instructions of how to use or

42:59

scientific papers. In the data

43:01

parts, some of the components

43:04

are the training data, the testing

43:06

data. In the code

43:08

parts go the tooling to –

43:11

like for the architecture, the inference code

43:13

to run the model. Something

43:15

that is written by human in general,

43:17

the vehicles have in there the code

43:20

to filter and set

43:22

up the data sets and prepare

43:25

them for the training. And

43:27

then in the models, you have

43:29

the model architecture, the model

43:31

parameters, including weights, other parameters,

43:33

and things like that. There

43:36

might be intermediate steps

43:39

during the training. And

43:41

the last bit is documentation,

43:43

how to use samples,

43:45

output. So there is

43:47

an initial list of all of these components

43:50

that have been – we worked with

43:53

or actually the Linux Foundation

43:55

worked on creating this list

43:58

Specifically for generative AI. Enlarge

44:00

language models. yeah, there were working

44:02

with them. I mean, we're using

44:04

deserve their list as I as

44:06

a. Backdrop. Burrow as a

44:09

starting point to move forward this conversation.

44:11

Now. The question that we need to ask

44:13

own adding displaced and it you go to

44:16

the draft Five you will see a name

44:18

to D Matrix Basically from this though components

44:20

that are of sixteen. To remember correctly seventeen.

44:23

This. A Components. And then on

44:25

a row. Next. To them. That.

44:27

Is a question Do I needed to

44:29

run it to? I needed to use

44:31

it. The I mean do anything use

44:33

it I need to hobby into and

44:35

to study to a need this component

44:37

to modify the system. And. Will

44:40

referring to the system like this is

44:42

one of the important thing is. They

44:44

open source definition of first to the program.

44:47

And a program is never define

44:49

added a program in pretty much

44:51

we know what it is a

44:53

say I ask is. And.

44:55

Again is a bit complicated. Question looks

44:58

very simple on surface by when he

45:00

studied the B D by becomes completely

45:02

because what is an Ai system. Rak.

45:05

So we started using D, a definition

45:07

that is been. It's becoming quite

45:09

popular in every regulation and around

45:11

the world. Did say it's a

45:13

work done by Deal Organisation for

45:15

Economic Cooperation and Development D O

45:18

U C D. And. Out

45:20

damn defined an ai system.

45:22

New in very. Broad

45:25

terms. And. His definition is

45:27

being used in many regulations like

45:29

from the. United States

45:31

Executive Order on a I

45:34

nest. Also uses it in

45:36

Europe d A I are uses

45:38

it is I'll go with a

45:41

slight Betty Small minor buddy Asian

45:43

It seems to be quite popular

45:45

but dora detractors and indeed it

45:48

is quite generic to it. Sometimes

45:50

when you agree that it current

45:52

carefully me. You. Didn't color

45:55

a spreadsheet. It's really bizarre. So.

45:58

Let's say that hypothetically, I'm like. a

46:00

medical company that has been working

46:02

on a large language model and

46:05

I have proprietary data. So

46:08

I have like readings and reports

46:10

and stuff that we've accumulated over

46:12

years and I create an

46:14

LLM based on that data that

46:16

ultimately can answer questions about medicine

46:19

or whatever. And I want to

46:21

open source that. I need to be

46:23

able to make it so it's usable,

46:26

studyable, modifiable, and shareable.

46:29

And it seems like the training data even

46:32

though that's the most proprietary part and

46:34

perhaps the most difficult part to actually make

46:36

available or sometimes impossible is

46:39

necessary not to use but

46:42

to study and modify. It seems like

46:45

so if I release the

46:47

model, the code,

46:50

all the parameters, everything we use to build

46:52

a model, everything except for like the source

46:54

original data under what you guys are

46:56

currently working on, that would not be open source AI, would

46:58

it? Honestly, that is

47:01

a very good case. Example

47:03

for why I think we

47:06

need to carefully reason around

47:08

what exactly do I need to

47:10

study? What kind of access, what

47:12

sort of access do I need?

47:15

Is that the original data set? Because

47:17

if it is the original data set, then

47:19

we will never going to have an open source AI. Right.

47:23

That's where I'm getting to. Like it's not going to happen. It's not

47:25

going to happen. Yeah. So

47:27

maybe, and this is my working

47:29

hypothesis that I threw out there, maybe

47:32

what we need is a very good

47:34

description of what that data is.

47:37

Maybe samples, maybe instructions

47:39

on how to replicate it. Because

47:42

for example, there might be data that

47:45

is copyrighted. You might have

47:47

the right under first use or under

47:49

different exclusions of copyright. You may have

47:51

the rights to create a copy and

47:54

create a derivative like around the training.

47:57

But not to redistribute it. Because if you redistribute

47:59

it, then... start infringing. So

48:01

I think we need to be carefully

48:03

thinking about data. And the

48:05

reason why I became more

48:08

and more convinced that we don't need

48:10

the original data set is

48:12

because I've seen wonderful

48:15

mixing, wonderful remixing

48:18

of models, even

48:21

splitting of models

48:23

and recombinations of

48:25

models, creating whole new

48:28

capabilities, new AI

48:30

capabilities, without having

48:32

to retrain a single thing. So

48:36

I'm starting to believe really

48:38

that the AI weights in

48:40

machine learning, the weights in

48:42

the architecture, has it so it's

48:44

not a binary code. It's not a binary

48:46

system that the binary code that you have

48:48

to reverse engineer. If you

48:51

have sufficiently detailed instructions on how

48:53

it's been built and

48:55

what went into it, you

48:57

should be able, you might be

49:00

able to create new systems, reassemble

49:02

it, study how it works,

49:04

and executing it, modify. So the

49:06

preferred form to make modifications is

49:09

not necessarily going through

49:11

the pipeline or rebuilding the whole system

49:13

from scratch, which for many

49:15

reasons may be impossible. I

49:18

do like the idea of a small

49:20

subset of the data set that's

49:22

anonymized or sanitized

49:24

in some way, shape, or form. It's like this is

49:26

the acceptable sample

49:29

amount required for the study

49:31

portion or the modification portion. Yeah.

49:34

It could be the schema for example. It could

49:36

be the... Right. Provide your own

49:38

data in here if you can, which you

49:40

can obviously find other ways to use

49:43

artificial intelligence to generate more data. So that's

49:45

the whole thing, right? But I

49:47

feel like that's acceptable

49:50

to me to provide some

49:52

sort of sampling or as you said, the schema. I think that

49:54

makes sense to me. Yeah. Yeah.

49:57

The research is going also in the district shared

49:59

with... data cards

50:01

and model cards, lots

50:03

of that

50:23

you put up in a picture in a museum

50:26

and nobody can do anything with it.

50:28

It needs to be practical. I

50:31

keep repeating the open source definition,

50:33

how success because it

50:35

enabled something practical and it

50:37

has success because other people

50:40

have written it, other people have decided

50:42

to use it. If you

50:44

should keep on insisting from your pedestal that

50:46

you shall do this

50:49

and that, you may

50:51

not be finding enough little

50:53

crowds to follow here. If

50:57

no one is using it, what's the point? What's

51:22

up friends? I'm here with one of my

51:24

new friends, Zane Hamilton from CIQ. Zane,

51:27

we're coming up on a hard deadline with

51:29

the centa's end of life later this year

51:31

in July and there are still folks out

51:34

there considering what their next move should be.

51:36

Then last year we had a bunch of

51:38

change around Red Hat Enterprise Linux that makes

51:40

it quote less open source in the eyes

51:43

of the community with many saying, Rello is

51:45

open source but where is the source and

51:47

why can't I download and install it? Now,

51:49

Rocket Linux is fully open source and CIQ

51:52

is a founding support partner

51:54

that offers paid support for

51:57

migration, installation, configuration, training, etc.

52:00

But what exactly does an enterprise

52:02

or a Linux to get

52:05

when they choose the free and open source rocky

52:07

Linux and then ultimately the support from CIQ if

52:09

they need it? There's a lot going on in

52:11

the enterprise Linux space today. There's a lot of

52:14

end of life of CentOS. People are making decisions on

52:16

where to go next. The standard of what enterprise Linux

52:18

looks like tomorrow is kind of up in the air.

52:21

What CIQ is doing is we're trying to help

52:23

those people that are going through these different decisions

52:25

that they're having to make and how they go

52:28

about making those decisions. And that's where our expertise

52:30

really comes into play. A lot of people

52:32

who have been through very complex Linux migrations,

52:34

be it from the old days of integrating

52:36

from AI X or Solaris onto Linux and

52:38

even going from version to version because to

52:40

be honest enterprise Linux version to version is

52:42

not always been an easy conversion. It

52:45

hasn't been and you will hear that from us.

52:47

Typically the best idea is to do an in place

52:49

upgrade. Not always a real easy thing to do,

52:51

but what we've done is we have started looking at

52:53

and securing a path of how can we actually

52:55

go through that? How can we help a customer who's

52:57

moving from CentOS 7 because of the end of

52:59

life in July of this year? What

53:01

does that migration path look like and how can we help? And

53:03

that's where we're looking in ways to help automate from an admin

53:05

perspective. If you're working with

53:07

us, we've been through this. We can actually go through and

53:10

build out that new machine and do a lot of the

53:12

back end manual work for you so that all you really

53:14

have to do at the end of the day is validate

53:16

your applications up and running in the new space. And then

53:19

we automate the switch over. So we've worked through

53:21

a lot of that. There's also decisions you're making

53:23

around. I'm paying a very large bill

53:25

for something. I'm not necessarily getting the most value out

53:28

of. I don't want to continue down that path. We

53:31

can help you make that shift over to an

53:33

open source operating system, Rocky Linux and help derive

53:35

what's next. Help you be

53:37

involved in a community and help make sure

53:39

that that environment you have is stable. It's

53:41

going to be validated by the actual vendors

53:43

that you're using today. And that's really where

53:46

we want to be as a partner from

53:48

not just an end user perspective, but in

53:50

an industry perspective. We are

53:52

working with a lot of those top tier vendors out there

53:54

of certifying Rocky, making sure that it gets pushed back to

53:56

the R.E.S.F. Making sure that we

53:58

can validate that everything is there. and secure that

54:00

needs to be there and helping you on

54:03

that journey of moving. And that's where we,

54:05

CIQ, really show our value on top of

54:07

an open source operating system is we have

54:09

the expertise. We've done this before. We're in

54:12

the trenches with you and we're defining that

54:14

path of how to move forward. Okay, ops

54:16

and sys admin folks out there, what are

54:18

you choosing? Sentos is end of life soon.

54:20

You may be using it, but if you

54:23

want a support partner in the trenches with

54:25

you, in the open source trenches with you,

54:27

check out CIQ. The founding

54:29

support partner of Rocky Linux.

54:31

They've stood up the RESF,

54:34

which is the home for

54:36

open source enterprise software. The

54:38

Rocky enterprise software foundation. That

54:40

is they've helped to orchestrate

54:42

the open ELA, a collaboration

54:44

created by and upheld by

54:46

CIQ, Oracle and SUSE. Check

54:48

out Rocky Linux at rockylinux.org,

54:51

the RESF at resf.org. And

54:53

of course, if you need

54:55

support, check out our friends

54:57

at ciq at ciq.com.

55:12

Fully acknowledging that it's a work in progress

55:14

and you're not done. Given

55:16

your current mental model of

55:18

the definition as it is working, are there systems

55:20

out there today that you would rubber stamp and

55:23

say like, this is open source AI, I'm thinking

55:25

of perhaps Mistral has a bunch of stuff going

55:27

on and they're committed to open and transparent, but

55:29

I don't know exactly what that means for them.

55:32

Have you looked at anything and do you

55:34

have like things you're comparing against as you

55:36

build to make sure that there's a set

55:38

of things that exist or could exist that

55:41

are practical? Not yet. I know

55:43

that there is, we have an

55:45

affiliate organization called Ellutho

55:47

AI. They are a

55:49

group of researchers. They recently incorporated

55:52

as a file one C3, not

55:54

broken in your state. And

55:57

from the very beginning, they've been doing

55:59

a lot of. of research in the open,

56:02

the reasoning, data sets, and structure, and research

56:05

papers, models, and weights, and everything like

56:08

that. So I'm really leaning

56:10

a lot on them to shine

56:12

a light on how this can be done. But

56:15

I don't want to be too

56:17

restricted in my mind. They are

56:20

very open with an open science

56:22

and open research mentality. I

56:25

think that there is an open

56:27

AI and open source AI

56:29

that is not as equally

56:32

open necessarily, but it can

56:35

still practically have meaningful impact.

56:37

It can generate that positive

56:39

reinforcement of innovation and

56:41

pernicious collaboration, et cetera. So

56:44

yes, I need an un-looted AI,

56:47

but I'm also very open. And

56:49

I'm sure there will be other

56:51

organizations, other groups, as

56:54

we go and elaborate more on what

56:56

we actually need to what is preferred

56:58

form to make modifications to an AI

57:00

system that we're going to discover more. So

57:03

no open source AI

57:05

yet. So there's no rubber stamp for

57:08

anything out there currently. Well, I mean,

57:10

I said I could rubber stamp PTA

57:12

and the un-looted AI, but I don't want

57:15

to say that that's necessarily the only thing.

57:17

Right, there might be more stuff. And again,

57:19

those are the ones, the guys that I,

57:21

because I know how they work. Yesterday

57:24

or the other day, Altma was

57:26

released by the Allen AI Institute.

57:28

And that seems to be also

57:30

quite openly available for models with

57:33

science behind it, et cetera. I

57:35

haven't looked at their licenses and

57:37

I haven't looked at it carefully.

57:39

So I can't really tell. It

57:41

might always well be an

57:43

open source AI system. I

57:46

was trying to get to a definitive, really. Is

57:48

there not a stamped open

57:50

source AI out there yet?

57:54

Well, I can tell you what is not. Mama

57:56

2 is not. Open AI

57:58

is not. Touch it out. All right. a

58:00

denialist more than a permit list. Yes. So

58:02

I suppose one of the questions which maybe

58:05

is obvious but I got to ask it

58:08

is what is the benefit if

58:10

I'm building a model and I'm releasing a new

58:12

AI? What is the benefit to

58:15

it being open source? To

58:17

meet this open source AI definition, what is

58:19

the benefit to its originator?

58:22

And then obviously it's humanity, I kind of get

58:24

that, but like what is the

58:26

benefit? It's pretty easy to kind of clarify that with

58:29

software, right? We see how that's working

58:31

because we've got 30 years

58:33

of history or more in a lot of

58:35

cases. We've got track record there. We

58:38

don't have track record here. It's still

58:40

early pioneer days. What's

58:42

the benefit? That is a very good

58:44

question and I don't

58:47

have an answer for it. I mean I

58:49

know the benefit for humanity. I know the

58:52

benefit for the science of it and this

58:54

is what really those

58:56

benefits are what triggered the internet.

58:58

Like if software started to come

59:00

out of the labs without

59:03

the definition of true software, without

59:05

the GPN license, without the

59:07

BSD research, I

59:09

don't think we would have had such a

59:11

fast evolution of software

59:14

computer science. We would

59:16

not have the internet that we see

59:18

today if everyone had

59:20

to buy a license from Solarius,

59:22

Sun from Oracle, etc.

59:26

If a data center would have to, you know,

59:28

you would have to go and

59:30

call the Sun Microsystems or IBM's

59:32

sales team to be able to

59:35

build a data center instead of

59:37

using just boxes and slapping remix

59:39

and Apache Web Server on it.

59:43

We would have had a completely different

59:45

history of digital world or the

59:47

past. I mean completely different. So I

59:49

can see the benefit for society and

59:51

science. For some of these

59:53

corporations, I'm assuming that they have made

59:56

some of their calculations on

59:58

stopping the competition. or

1:00:01

creating competitive advantages, maybe

1:00:03

in pure Silicon Valley approach,

1:00:05

like get more users, we'll figure out

1:00:08

the business model later. There

1:00:10

is some of that going on, likely,

1:00:12

most likely. But I can't,

1:00:14

I haven't had that conversation yet with

1:00:17

any smart people I

1:00:19

know thinking about the business models

1:00:21

behind this, the possible ways of privatizing,

1:00:24

or I don't know, finding revenue

1:00:26

streams and maybe a top properties

1:00:29

open source model. Do

1:00:31

you think that they're becoming commoditized

1:00:33

if we specifically talk about these

1:00:35

large language models? If

1:00:37

we call AI that for now, recognizing

1:00:39

there's an umbrella term and there's other

1:00:41

things that also that represents, do

1:00:43

you think that they are becoming commoditized

1:00:45

and will continue to enough so

1:00:48

that open source can keep up

1:00:50

with proprietary in terms of quality,

1:00:53

or even surpass just

1:00:55

because of the number of smart people releasing

1:00:57

things? I

1:00:59

don't know, that's what I'm asking honestly. What

1:01:01

are your thoughts on it? Honestly, recently I

1:01:03

saw this new system that

1:01:05

it's a text to speech system, and

1:01:08

they built it, this team of developers

1:01:11

from a company called Polabora. They

1:01:13

built this system by

1:01:15

splitting a system from OpenAI,

1:01:18

another from either an Epic

1:01:20

or, now I don't remember

1:01:22

exactly, but they split

1:01:24

an AI system. They took it

1:01:26

and they slipped it, their input

1:01:29

for outputs, and they attached another

1:01:31

model of their own training with

1:01:33

small data sets, and they built a brand new

1:01:35

thing. I think, I mean,

1:01:38

this is the kind of stuff that

1:01:40

is inspiring, like at one point, there's

1:01:42

gonna be, I'm sure that the quick

1:01:44

evolution of this discipline would make it

1:01:47

so that smaller teams with smaller amount

1:01:49

of data would be able to create

1:01:51

very powerful machines. And maybe

1:01:54

the advantages of these

1:01:56

large corporations are now

1:01:58

deploying, delivering, and

1:02:00

distributing openly accessible

1:02:03

AI models, maybe in

1:02:06

their mind, having optimized hardware, cloud

1:02:08

resources that they can sell, maybe

1:02:11

that's where they're going with one of their

1:02:13

revenue streams, they imagined that they

1:02:15

were coming, would be coming from.

1:02:18

Yeah, that is exciting. I did see, I think

1:02:20

it was like, Codeium AI

1:02:23

just recently announced a

1:02:25

model that beats

1:02:27

DeepMind on code generation, according

1:02:30

to benchmarks that I haven't looked at, as

1:02:32

well as Copilot, and that's

1:02:34

from a smaller player. I'm not sure

1:02:36

if that's open or closed or what, but it

1:02:39

is kind of pointing towards like, okay, there's

1:02:42

significant competition. And like you

1:02:44

said, remixing and the ability

1:02:46

to combine and change,

1:02:49

and even in some cases, swap out

1:02:52

and take the best results, that we

1:02:54

will have a vibrant ecosystem

1:02:56

of these things. And I think open

1:02:58

source is the best model

1:03:00

for vibrant ecosystems. So

1:03:03

that rings true with me. Doesn't

1:03:06

mean it's right, but it sounds right. Yeah,

1:03:09

this is a tough one. This is really

1:03:11

a tough nut to crack, really. I mean,

1:03:13

even at the forums you

1:03:15

have, I believe you're calling it

1:03:17

the deep dive, right? It's deep

1:03:19

dive colon AI. And this

1:03:21

is the place where you're hoping that many

1:03:24

folks can come and organize. You

1:03:26

say it's the global multi-stakeholder effort

1:03:28

to define open source AI, and

1:03:30

that you're bringing together various

1:03:33

organizations and individuals to collaboratively write a

1:03:35

new document, which is what we've been

1:03:37

talking about, directly and indirectly.

1:03:40

Who else has invited this? Like, how does

1:03:42

this get around? How do people know about

1:03:44

this? Who is invited to the

1:03:47

table to define or help define? Is

1:03:49

this an open way to define

1:03:51

it? What is happening

1:03:53

here? Who's participating? But at

1:03:56

this point, it's now public, so

1:03:58

anyone can really join. forum

1:04:00

and can join me in the

1:04:03

biweekly town hall

1:04:05

meetings. So that part

1:04:07

is public and everybody's

1:04:10

welcome to join. We're going to keep

1:04:12

on going with public reports

1:04:15

and small working groups with

1:04:17

people that we're picking but only

1:04:20

because of agility in the collaborations.

1:04:22

We want to have, we're picking

1:04:24

people that we know of or that

1:04:26

we have been in touch with coming

1:04:29

from a variety of experiences. Say

1:04:31

we're talking to creators of AI

1:04:34

in academia, large corporations,

1:04:36

small corporations, startups, lawyers,

1:04:39

people who work with regulators,

1:04:42

think tanks and lobbying organizations.

1:04:44

We're talking to experts in

1:04:47

other fields like ethics and

1:04:49

philosophy. We keep on chatting

1:04:53

with, we have identified six

1:04:56

stakeholders categories with

1:04:58

trying to have a representation, also

1:05:01

geographically distributed from North

1:05:04

America, South America, Asia,

1:05:07

Pacific, Europe, Africa.

1:05:10

Last year we had conversations

1:05:12

with about 80 people from

1:05:14

representatives of all these categories

1:05:16

in a private group just

1:05:18

to get things kick-started. And

1:05:21

we have had meetings in

1:05:23

person starting in June

1:05:25

in San Francisco and

1:05:28

July in the post land and

1:05:30

then other meetings in the bioreg

1:05:32

in Europe like we had meetings

1:05:34

in person with some of these

1:05:37

people doing different conferences. But

1:05:40

starting this year we're going to be, this first half

1:05:42

of the year we're going to be super public. We're

1:05:44

going to gather, we're going to

1:05:46

be publishing all the results of the working groups

1:05:49

and we're going to be taking comments

1:05:51

on the forums and then we're

1:05:53

going to have an in-person meeting. We're

1:05:56

aiming late May or June with

1:05:59

at least two representatives for each

1:06:01

of the stakeholder categories to

1:06:03

get in a room and produce

1:06:05

either now to

1:06:07

the last pieces in

1:06:10

the definition, moving on the comments and

1:06:12

come out with other that meaning with

1:06:14

a range candidate, something that we feel

1:06:16

like there is endorsement from a dozen

1:06:19

different organizations across the world and

1:06:21

across the experience. Then we're

1:06:23

going to use and we're raising funds for it

1:06:26

to have at least four events in

1:06:29

different parts of the world between June

1:06:31

and the end of October. One

1:06:34

of these events will definitely going to be at all

1:06:36

things open. What we're

1:06:38

going to gather more potential

1:06:40

endorsements as soon as

1:06:42

we get to five endorsement from

1:06:45

each of the different categories. I

1:06:48

think we're going to be able to say this is version one. We

1:06:51

can start working with it and see

1:06:53

where we land and maybe next year

1:06:55

we're going to have by that time, I

1:06:57

mean by October November, the board

1:06:59

will also have a process for

1:07:01

the maintenance of this definition

1:07:04

because most likely we're going to have

1:07:06

to think about how to maintain it,

1:07:08

how to respond to

1:07:11

challenges, whether there's technological

1:07:13

or regulatory challenges

1:07:16

or just we missed a mark and

1:07:19

we realize later we'll have to fix it. Yeah,

1:07:23

kind of want to backtrack slightly

1:07:25

I guess as I hear you

1:07:27

talk about this and kind of coming to a

1:07:30

version of BLAST sometime this year based

1:07:32

upon certain details. Like when I ask you and

1:07:34

I know this is your response and not so

1:07:36

much a corporate response in

1:07:38

terms of what's the benefit of

1:07:40

being an open source artificial intelligence, like what's

1:07:43

the benefit of being open source AI? Like

1:07:46

all this effort to define it and

1:07:49

then what if there's not that many people

1:07:51

who really want to be defined by it?

1:07:53

Like I guess that's an interesting consideration is

1:07:55

that all this effort to define it but

1:07:58

maybe there is no real benefit. or

1:08:00

the benefit is unclear and then folks just,

1:08:03

it's almost like saying, it's definitely

1:08:05

a line, right? It's like, okay, everything is

1:08:07

basically not and there's very few that are

1:08:09

basically, or at least initially. Maybe as iteration

1:08:12

and progress happens that more and more will

1:08:14

see a benefit and maybe that benefit permeates

1:08:17

more clearly, then we can see it now. Yeah.

1:08:20

I don't want to think about that. I

1:08:23

don't want to think about that. No, it's

1:08:26

one of those things. Like if you start any

1:08:28

of them or think you are the winner, you're

1:08:31

probably going to fail, right? So it's

1:08:33

not one of the outcomes that I

1:08:36

see tremendous amount of pressure. I

1:08:38

mean, it's unlikely that that's going to

1:08:40

happen. That's what I want to say. I

1:08:43

have had a lot of pressure

1:08:46

from corporations, regulators.

1:08:50

The AI act has a provision in

1:08:52

there, a text that

1:08:54

says that provide some exclusions to

1:08:57

the mandates of the law for

1:08:59

open source AI. There is

1:09:02

no definition in there. So regulators

1:09:04

need it. Largest corporations

1:09:06

need it. Researchers

1:09:08

need some clarity. I

1:09:12

hear a lot of researchers,

1:09:14

they want data. And

1:09:17

they want data. It doesn't mean that they

1:09:19

weren't necessarily the original data. Some

1:09:22

of them are not. But they do

1:09:24

want to have good deficit. And that

1:09:26

only comes if there is a clarity

1:09:29

about what are the boundaries of what

1:09:31

is allowed for them to accumulate data.

1:09:35

Because data becomes very, very messy very

1:09:37

quickly. Privacy law,

1:09:39

copyright law, trade secrets,

1:09:42

illegal content, content is illegal in

1:09:44

some parts of the country or

1:09:46

some countries and some other countries. It

1:09:50

becomes really, really

1:09:52

messy very quickly. And

1:09:54

researchers don't have a way to deal with

1:09:56

it right now. They need help.

1:10:00

I agree that you should keep doing it. I didn't mean

1:10:02

to sound like it should be a failure. Sometimes

1:10:04

I think it might be beneficial to think about failure at the

1:10:06

beginning because it's like, well, you got to consider

1:10:09

your exit before you can go in in a way. I'm not

1:10:11

saying you should do that, but I'm glad you

1:10:13

are defining it. It does need to be defined.

1:10:15

I didn't mean to be necessarily like what if,

1:10:17

but there's a lot of effort going

1:10:19

into this. I can see how a

1:10:22

lot of your attention is

1:10:24

probably spent simply on defining

1:10:26

this and working with all the folks, all

1:10:29

the stakeholders, all the opinion makers, etc.

1:10:32

that are necessary to define

1:10:34

what it is. It's a

1:10:36

lot of work. It's all work.

1:10:38

You're absolutely right. This is taking

1:10:40

most of my attention. And yes,

1:10:42

I do see a couple of

1:10:44

failure options. We can fail if

1:10:46

we're late and if we get it wrong. But

1:10:50

for getting it wrong, the

1:10:52

fact that it's defined with a version

1:10:54

number, I think we can

1:10:56

fix it over time. And we really shouldn't

1:10:59

be expecting to have a perfect first time.

1:11:03

It's changing too quickly, the

1:11:05

whole landscape. And the other getting

1:11:08

in late is also part of

1:11:10

the reason why I'm pushing to get

1:11:12

something out of the door because

1:11:15

a lot of pressure

1:11:17

exists in the market to have

1:11:19

something. Everyone is

1:11:21

calling them their models,

1:11:23

open source, AI, recognizing

1:11:26

that there is value in

1:11:28

that term implicitly. But if there is

1:11:30

no clarity, it's going to be diluted

1:11:32

very, very, very rapidly. Before

1:11:34

Jared and I got on this call, one thing we had a loose

1:11:37

discussion, and I quickly stopped talking because we have

1:11:39

a term. I think it's

1:11:41

pretty well known in broadcasting and podcasting is

1:11:43

like don't waste tape. And

1:11:46

I didn't want to share my deep sentiment, although I

1:11:48

loosely mentioned it to Jared in our pre-call just kind

1:11:51

of 10 minutes before we met up. It was

1:11:54

basically what is at stake. I

1:11:57

know we talked just loosely here about

1:11:59

failure as well. an option and what

1:12:01

is failure and is it iterative on the

1:12:03

version numbers you just mentioned, but is there

1:12:05

a bigger concern at stake if the

1:12:08

definition that you come up with collectively

1:12:11

is not perfectly suited? Like does the

1:12:13

term open source in software

1:12:15

now, is the term now fractured because

1:12:18

the arbiter of the

1:12:20

term open source has

1:12:22

not been able to carefully and

1:12:24

accurately define open source AI?

1:12:27

Is there a bigger loss that could

1:12:29

happen? And I'm sorry to have to ask that

1:12:31

question, but I have to. Yeah,

1:12:34

you don't want me to sleep tonight, huh? Sorry

1:12:38

about that. I think

1:12:40

so far we've been able to win,

1:12:43

in quotes, win in

1:12:45

the public when we push back

1:12:47

on the term of open source

1:12:49

because it's pretty well accepted, right?

1:12:53

And whether, and I'm going

1:12:55

to say this, but whether we like it or not, OSI

1:12:57

has been the guardian,

1:12:59

so to speak, of that term. Some

1:13:01

say you've taken that right. I

1:13:04

think you've been given that right over decades

1:13:07

of trust. And then

1:13:09

some cases there's some mistrust and that's not so

1:13:11

much me. It's just out there and not

1:13:13

everybody's been happy with every decision you come up with

1:13:16

and that's going to be the case, right? If you're

1:13:18

not making some enemies, you're not doing some things right,

1:13:20

I suppose, in the world because nobody's going to like

1:13:22

your choices, right? But I

1:13:24

think, I wonder that. I

1:13:26

personally wonder if you can't define this well,

1:13:29

does the term open source change or

1:13:32

is becoming open to change? There

1:13:35

is that raise cover where, but that's one of

1:13:37

the reasons why I'm being extra

1:13:39

careful to make sure that everyone's

1:13:42

involved and have a voice

1:13:44

and has a chance to voice their opinion

1:13:46

and all of these opinions are recorded

1:13:48

publicly so we can go back

1:13:50

and point out the

1:13:52

place where we made a bad choice and

1:13:55

be able to correct or not. Yeah.

1:13:59

Stefano, real quick. What's the number one

1:14:01

place people should go if they were to

1:14:03

get involved like the URL? Here's

1:14:06

how you can be part of that discussion

1:14:08

discuss the open source org. There we go

1:14:10

Yes, well, we're gonna be having you know

1:14:12

all of our conversations. All right,

1:14:14

you heard it. That'll be in the show notes so

1:14:17

if you are interested in this even if you just want to

1:14:19

listen and be Lurking and watching

1:14:21

as it makes progress definitely hit that up

1:14:23

if you want your voice heard and you

1:14:25

want to help Stefano and his team Make

1:14:28

this definition awesome and encompassing and

1:14:30

successful. Yes I think the

1:14:32

more voices the better the earlier on the better So

1:14:35

that we can have a great open

1:14:37

source AI definition. Thank you. Thanks

1:14:39

Stefano. Appreciate your time. Thank you so much

1:14:41

Thank you It's

1:14:46

a big question mark with the future

1:14:49

of the open source AI Definition will

1:14:51

be well the first draft of the

1:14:53

open source AI definition is linked in

1:14:55

the show notes I highly encourage you

1:14:58

to check this out dig in learn

1:15:00

about what's happening here voice your opinion

1:15:02

If you have a strong opinion, but

1:15:04

definitely pay attention as you can hear

1:15:06

with some of the uncomfortability With

1:15:09

the questions we asked about what

1:15:11

happens if the open source AI

1:15:13

definition Falls a little

1:15:16

short or what the ramifications are

1:15:18

or potential impact might be? I

1:15:20

think we all need to pay

1:15:22

close attention to how this definition

1:15:24

evolves and lands links are

1:15:26

in the show notes So check them out

1:15:28

and again Thank you to Stefano because he

1:15:31

did have a cold during

1:15:33

this conversation And he

1:15:35

powered through because he knew this was an important

1:15:37

conversation to have here on this podcast and to

1:15:39

share with you So thank you Stefano

1:15:42

up next on the pod is our friendly-turned-friend

1:15:45

Jamie Tana coming up on friends and

1:15:48

next week. It's about making your shell

1:15:50

magical with Ellie Huxtable Talking about a

1:15:52

tune check it out at a tune

1:15:55

sh Okay, once again a

1:15:57

big thank you to our friends and our partners

1:16:00

at fly.io, our friends

1:16:02

at typesense.org, and

1:16:04

of course, our friends at

1:16:06

century.io. Use the code

1:16:08

changelove to get $100 off the team

1:16:11

plan. You can do so at century.io.

1:16:14

Okay, BMC, those beats are banging. We have

1:16:16

that album out there, Dance Party. I don't

1:16:19

know about you, but I've been dancing a

1:16:21

lot more because that album has been on

1:16:23

repeat on all my places that I

1:16:25

listen to music. So I've been dancing a lot. Dance

1:16:28

Party is out there. Check it out

1:16:30

at changelove.com/beats. That's

1:16:33

it, the show's done. Thank you for tuning in. We'll

1:16:35

see you soon.

Rate

Get this podcast via API

From The Podcast

The Changelog: Software Development, Open Source

Software’s best weekly news brief, deep technical interviews & talk show.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More