Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:15
Look a Mac brands, this is
0:17
that she's log I managed to
0:19
go of yeah again. This we
0:21
were joined by Stephen. I'm a
0:23
fully the Executive Director of the
0:25
Open Source Initiative. The oh it's
0:28
I. We Open Source Initiative is
0:30
responsible for representing the idea in
0:32
the definition of Open Source globally.
0:34
Stefano shares the challenges they face
0:36
as a Us based organization with
0:38
a global impact. We discuss the
0:40
work Stefano in the Open Source
0:43
Initiative are doing to define Open.
0:45
Source A I am why we
0:47
need an accepted and shared definition.
0:49
Of course we also tell about
0:51
the potential impact if a poorly
0:53
defined Open Source A emerges from
0:56
their efforts. Also to mention that
0:58
stuff and with Philly under the
1:00
weather for this conversation but he
1:02
power through because of how important
1:04
this topic is. A massive thank
1:07
you to our friends in Are
1:09
Partners A fly.io The Home of
1:11
Change well.com It's simple launch apps,
1:13
new users, they transform. Containers into
1:15
Micro Vm that run in their
1:18
hardware in thirty plus regions on
1:20
six continents. Launch an app for
1:22
free at fly.io. With.
1:34
Up Friends This episode of that
1:37
she's always brought you Buy our
1:39
friends over at Resell and I'm
1:41
he will leave Robinson V P
1:43
a product. Liana You know the
1:45
tagline for Resell. Develop previous ship
1:47
which has been perfect but now
1:49
there's more. after the ship process
1:51
you have to worry about security
1:53
of their ability in other parts
1:55
of just running an application. Production
1:57
was the story there. What's beyond
1:59
shipping? Yeah, you know,
2:01
when I'm building my side projects or when I'm
2:04
building my personal site, it often looks like develop
2:06
preview ship. I try out some new features, I
2:08
try out a new framework, I'm just hacking around
2:10
with something on the weekends. Everything
2:12
looks good, great. I ship it, I'm done. But
2:14
as we talk to more customers, as we've grown
2:16
as a company, as we've added new products, there's
2:19
a lot more to the product portfolio
2:21
of Vercel nowadays to help pass that
2:23
experience. So when you're building larger, more
2:26
complex products and when you're
2:28
working with larger teams, you want to
2:30
have more features, more functionality. So tangibly,
2:32
what that means is features like our
2:34
Vercel firewall product to help you be
2:36
safe and to have that layer of
2:38
security. Features like our logging and observability
2:40
tools so you can understand and observe
2:42
your application and production, understand if there's
2:45
errors, understand if things are running smoothly
2:47
and get alerted on those. And
2:49
also then really an expansion of our
2:51
integration suite as well too, because you
2:53
might already be using a tool like
2:55
a Datadog or you might already be
2:58
using a tool at the end of
3:00
this software development lifecycle that you want
3:02
to integrate with to continue to scale
3:04
and secure and observe your application and
3:06
we try to fit into those as
3:08
well too. So we've kind of continued
3:10
to bolster and improve the last mile
3:12
of delivery. That sounds amazing.
3:15
So who's using the Vercel platform like that?
3:17
Can you share some names? Yeah,
3:19
I'm thrilled that we have some
3:21
amazing customers like Under Armour, Nintendo,
3:24
Washington Post, Zapier who use
3:26
Vercel's running cloud to not only help
3:28
scale their infrastructure, scale their business and
3:31
their product, but then also enable their
3:33
team of many developers to be able
3:35
to iterate on their products really quickly
3:37
and take their ideas and build the
3:40
next great thing. Very cool.
3:42
With your configuration for over 35 frameworks,
3:44
Vercel's running cloud makes it easy for
3:46
any team to deploy their apps. So
3:49
you can get started with a 14 day free
3:51
trial of Vercel Pro or get
3:53
a customized enterprise demo from their
3:55
team. Visit vercel.com/teams log
3:58
pod to get started. That's
4:01
vercel.com/Cheese Law
4:03
Pod. Well,
4:33
Stef, no, it's been a while. Actually,
4:35
never, which is a good thing, I
4:37
suppose, but now we're here. Fantastic. We
4:39
were at All Things Open recently, and
4:41
we tried to sync up with you,
4:43
but we missed the message. And
4:45
so we were like, we got to get you on the podcast. And obviously,
4:47
you know, this show, the Cheese
4:49
Law, was born around open source. And
4:52
I kind of find it strange and
4:55
sad that we've never had anybody from
4:57
the open source initiative on this podcast.
4:59
It's – I'm glad you're here to
5:02
change that, so welcome. Thank
5:04
you. Thank you for having me. It's a pleasure.
5:07
Sorry we missed it. We missed
5:09
each other in South Carolina. It
5:11
was a great event. Oh, man. We love
5:13
All Things Open. We love Todd and their
5:15
team there. We think All Things Open is
5:18
the place to be at the end
5:20
of the year. If you're a fan of open
5:22
source, you're an advocate of open
5:24
source, and just the way that it's permeating all
5:27
the software, right? It's one. Open source is one.
5:29
And now we're just living in a hopefully
5:31
mostly open source world, right? Absolutely.
5:35
Absolutely. I mean, just last week
5:37
was an article published,
5:41
estimated the value of open source
5:43
software as a whole. The
5:46
numbers are incredible. These
5:48
researchers from Harvard Business
5:51
School went and looked
5:53
at the value of open source as it
5:55
is consumed or produced.
6:00
Dollar numbers on it. I am you the
6:02
school because I I don't know how among
6:04
analysts term a purely guess someone and have
6:06
analysed trial Yemen illegal brain and from Iowa
6:08
knows you Okay Donna hired quantify the value
6:11
of oh man oh it's quite valuable center
6:13
that little the how do you value and
6:15
he quantify the vive of the source like
6:17
worried it was the metrics a. The.
6:19
Key Office? Do you know? They. Can zip
6:22
lines of code or they counted the
6:24
Our Steam estimated the hours that it
6:26
would take to rewrite from scratch or
6:28
does software that isn't news. And.
6:31
A Use Dub Dub assets that
6:33
are debatable are already and with
6:35
some with some of those count.
6:37
And. Using those two with other sits there
6:40
to meet a t w that would take
6:42
to replicate own of the open source software
6:44
deniers that is available and they put the
6:46
numbers around. Eight. Point: Eight
6:48
Trillion dollars. Ah. I would
6:51
I? she just say all the dollars. Really?
6:53
Personally I would the sailed all the.yeah why
6:55
me? it's a huge number. Hundred dollars right?
6:57
It doesn't every dollar today like really depend
6:59
on open source at some layer. So.
7:02
Like really couldn't beat us all the dollars. Well
7:04
a bright it. It's an impressive number and it's
7:06
really hard to picture it how much how big
7:08
it is. I will our winter had them to
7:10
look it up and. Allison said
7:12
it's three times as much as
7:14
Microsoft. Market. Dot. And
7:17
ah, it's larger than the whole
7:19
of Juri state's budget. Like twenty
7:21
twenty two. His budget in the
7:23
United States that includes many deaths
7:25
are debate. Medicare six point, Three
7:27
Trillion Four yeah, The Australians their
7:29
right, more mortuary and that I've
7:31
got tired of an easier thing,
7:33
right? I got strains of anything
7:36
really media. Not. Even and sense
7:38
can there's a trend since housing allow you
7:40
to keep on by Kiss almost as series
7:42
of the I returned as is a the
7:45
banks and Ceo you. Are sad
7:47
that's on a think about really why here
7:49
and I'm are like eight point eight trillion
7:51
and I started things once you're out to
7:53
nine and I realize that's like a set
7:55
of a trillion dollars is going around at
7:57
a lot of my head around that is.
8:00
A nice rounding error on your in your
8:02
favor for was your own dollars right? Oil
8:05
and my nice for sir around it
8:07
off and out as impose and to
8:10
maintain. Are you know that? Be nice?
8:12
Yeah well. I don't know
8:14
everybody listen this podcast will be. I
8:16
think a lot of them will be
8:18
both the you know in light of
8:20
recent see better now on assume that
8:22
are list of ship is super informed
8:24
of what the open source initiative is.
8:27
I can kind of recently about paid
8:29
stefano but I'd prefer that youth and
8:31
of give us a. A taste
8:33
of what. The. L A Size
8:35
Really About what is the organization as
8:37
A Five One C Three. You know
8:39
it's a public benefit operation in California.
8:41
The What exactly is the Open Source
8:43
Initiative For all that guy with talked
8:45
about what is it? Oh yeah, in
8:47
a nutshell, we are the mundane. There's
8:49
of the open Source definition. And
8:51
dance. The. Open Source Definition
8:53
is a ten points checklist.
8:56
That has been used for twenty six years.
8:58
We have. Celebrated. Twenty Four
9:01
Years last year. It's the the checklist
9:03
that has been used to evaluate licenses.
9:06
Daddy Is legal documents said com
9:08
together with software packages. To.
9:10
Make sure that the packages the
9:12
soft or comes with said Sweden's.
9:15
Said. Are. Written. Down as.
9:18
Can. Be summarized as Tor Freedoms comes from
9:20
dumfries Off to the assumption at these
9:22
the freedom to use the software. Without
9:25
having to ask for permissions.
9:27
The. Freedom to study and to make sure that.
9:30
You know in to understand. What?
9:33
It does and when supposed to be doing. And
9:35
nothing else. And then for.you
9:38
need access to the source code. And.
9:40
Then dead freedom to modify it seemed
9:42
to fix it and and increase it's
9:44
capacity. Or. Help yourselves
9:47
and the freedom to make
9:49
copies.is for yourselves or for.
9:52
Add. To help others. And then
9:54
those three them were were written down
9:56
in the. eighties. By.
9:58
the free software foundation and
10:01
the open source initiative started a couple
10:03
of decades after that, picking
10:05
up the principles and spreading them out
10:07
a little bit in a more practical
10:10
way. In
10:13
a time, at a time when a lot
10:16
of software was being deployed and
10:18
powering the internet, basically, this
10:21
definition and this is a
10:23
license, licenses, gives users
10:26
and developers clarity about the
10:28
things that they can do. Provides
10:31
the agency and independence and
10:33
control and all
10:35
of that clarity is what
10:37
has propelled and generated that huge ecosystem
10:40
that is worth 8.8 trillion.
10:45
So who formed the initiative and
10:47
then how did it sustain
10:50
and continue? Seems like the definition
10:52
is pretty set but like what is
10:54
the work that goes on continually? Yeah,
10:56
well, the work that goes on continuously
10:59
is, especially now recently,
11:01
it's the policy,
11:03
the monitoring of policy works and
11:06
everything that goes around it. The
11:09
concept of open source seems to
11:11
be set but it's constantly under
11:13
threat because evolution
11:15
of technology, changes of business models,
11:17
the rise and the rise of
11:20
importance and power of new actors
11:22
constantly shifts and
11:26
tends to push the definition itself
11:29
of open source in different directions, the
11:31
meaning of open source in different directions
11:34
and regulation also tends to
11:36
introduce hurdles that
11:38
we need to be aware of. The
11:41
organization, what we do, we
11:43
have three programs. One is
11:45
called the Legal and Licenses
11:47
Program and that's where we
11:49
maintain the definition. We review
11:51
new licenses as they get approved and
11:54
we also keep a database of
11:56
licensing information for packages
11:58
because often... developers don't
12:00
use the right words or miss
12:03
some pieces, a lot of packages don't have
12:05
the right data. And
12:07
we have, we are maintaining
12:10
the community that maintains this
12:12
machine called Clearly Defined. On
12:14
the policy front, that's another program, the
12:17
policy and standards front,
12:19
we monitor the activity of standard
12:21
setting organizations and
12:23
the activity of regulators in the United
12:25
States and Europe mostly to make
12:27
sure that all the new laws and rules and
12:31
the standards can be implemented with
12:33
open source code and the regulation
12:35
doesn't stop or doesn't block the
12:38
development distribution of open source software.
12:40
And then a third program is
12:43
on advocacy and
12:45
outreach and that's the
12:47
activities that we do with maintaining
12:49
the blog, having the communication, running
12:51
events. And in this
12:53
program, we're also hosting the conversations
12:55
around the funding open source AI,
12:58
which is a requirement that
13:01
came out especially a couple of years ago,
13:03
getting rapidly glowing
13:06
of hotness at
13:08
us. So we were
13:10
basically forced to start this
13:12
process because open AI is
13:14
a brand new system, the
13:17
brand new activities that forces us
13:20
to review the principles to see if
13:22
they still apply and how they need
13:24
to be modified or we can apply
13:26
to AI systems as a whole. And
13:29
we are a charity organization, you mentioned that.
13:32
So our sponsors are individuals
13:34
who donate to become members and
13:37
they can donate any amounts from
13:39
$50 a year up to what have you.
13:42
And we have a few hundreds of those,
13:44
almost a thousand. And then we
13:47
have corporate sponsors who give
13:49
us money also, donations to
13:52
keep this work going. It's in
13:54
their interest to have an independent
13:56
organization that maintains the definition and
13:59
having both. of all of these
14:01
donors, corporate donors, makes the
14:04
organization stronger. So we don't
14:06
depend on any
14:08
one scene individually of them. So
14:11
despite the fact that we get money from
14:13
Google or Amazon or Microsoft and
14:15
GitHub, we don't have to swear our
14:17
own agencies to them. Do
14:19
you also defend the license so far as
14:22
going to court with people who would
14:25
misuse it or no? It hasn't happened,
14:27
but we do have, I
14:29
mean, not under my watch, but
14:31
we do have experts and now
14:34
on our board and in our
14:36
circle of licensing
14:39
experts, we do have lawyers who
14:41
go to court constantly to
14:44
defend the license, to train more,
14:46
protect users. And
14:48
they're there as like expert witnesses? Exactly.
14:51
And we do provide, we
14:53
have provided briefs for courts,
14:56
opinion pieces for regulators and
14:59
responses to requests for
15:01
information in various legislation
15:03
here. How challenging is it
15:05
to be a US-based
15:08
founded idea now,
15:12
organization that represents
15:14
and defends this definition that really,
15:16
you know, going back to the trillions,
15:18
like, I mean, all the money, all the
15:21
dollars, like it's a world problem.
15:23
It's not just a United States
15:25
problem. How does this organization operate
15:28
internationally? What challenges do
15:30
you face as a
15:32
US-based nonprofit, but representative
15:34
of the idea of open source
15:36
that really impacts everyone globally?
15:39
Yeah, that's a very good question. In
15:41
fact, it is challenging. So I started
15:43
at the organization only over
15:45
two years ago, and I'm
15:48
Italian. And so I do have connections
15:50
to Europe and knowledge about Europe. We
15:52
do have board members that are based
15:54
in Europe and other board members in
15:57
the United States. And it is actually
15:59
quite a challenge. quite challenging to
16:01
be involved into this global
16:03
conversations because now, a little
16:05
bit like maybe in
16:07
the late 90s, open source
16:10
is becoming increasingly getting
16:12
at the center of geopolitical challenges
16:14
and not because of open source
16:16
per se, but because
16:19
software is so incredibly existing
16:21
everywhere and most of that software that
16:23
exists is open source. So
16:26
there have been a lot of challenges
16:29
as the relationship, the trade relationship
16:31
with other actors like
16:33
Russia, Ukraine, now with
16:35
the war in Israel and
16:38
Gaza and the trade
16:40
wars with China, between China and
16:42
United States. There are a
16:44
lot of geopolitical issues that we are at
16:46
the center of and we're
16:48
finding really complicated. In
16:51
fact, we do have, we
16:54
have raised more money to increase
16:56
our disability on the policy front.
16:58
We have right now, at
17:01
the moment, we have two people working,
17:03
one in Europe and one is more
17:05
focused in the United States. Both
17:07
of them are part-time, but we do have budget
17:10
to hire at least another
17:12
one, if not two
17:14
policy analysts to help
17:16
us review the incredible amount of legislation
17:19
that is coming. We're just talking about
17:21
United States and Europe. I
17:23
guess even one more layer than that is
17:26
that I don't know
17:28
if it's a self-profession of the defender ship
17:30
of the term of open source. I understand
17:32
where it came from to some degree, you
17:34
know, and I wonder if, how
17:36
do you all handle the
17:38
responsibility of not
17:40
so much owning the trademark term of open source,
17:42
but to defending it? So in a way, you
17:45
kind of own it by
17:47
defending it because you have to defend it. Like it's
17:49
some version of responsibility, which is maybe
17:51
a by-product of ownership, right? There's
17:54
a pushback happening out there. Like there's even
17:56
a conversation of recent where,
17:58
you know, they can't describe their software
18:00
is open source because
18:03
the term means something. And we all agree
18:05
on that, right? We understand that. And
18:07
I'm not trying to defend that,
18:09
but like how do
18:11
you operate as an organization that defends this term? Yeah,
18:14
I mean this is really funny because we
18:17
don't have a trademark on the term
18:19
open source applied to software. We have
18:21
a soft power, Q1, that
18:23
is given to us by all
18:25
the people who, just like you just said,
18:29
recognize that the term open source is
18:31
what we have designed, we have defined.
18:34
We maintain the definition. It's
18:36
kind of recursive if you want. But
18:39
corporations, individual developers,
18:42
all their institutions like academia,
18:44
researchers, they recognize that open
18:47
source means exactly those 10,
18:50
the list of licenses, those 10 points
18:52
which you want the four freedoms that
18:54
are listed. And we maintain
18:56
that. And
18:58
this has become quite visible also even
19:00
in court where they do
19:03
understand that if someone is like
19:05
there was a recent case involving
19:08
the company Neo4j and
19:11
during that litigation that
19:13
is quite complicated with entrenched.
19:16
I'm not a lawyer. I'm not going to dive
19:19
into legal things. But
19:21
the one key takeaway that is easy
19:23
for me to grasp and communicate is
19:26
that the judge recognize that the
19:29
value of open source is in the
19:31
definition that we maintain and
19:34
calling open source something that is not
19:36
a license, that is not a license
19:39
that we have better approved is
19:42
false advertising. And that held up in
19:44
court? Oh, yeah. And
19:46
so is that what you would say to people who are
19:49
perhaps maybe nonchalant isn't the
19:51
best word but unimpressed
19:54
by open source as a
19:56
definition and they think it's
19:59
stodgy and. type and the thing that
20:01
they're doing is close enough and they
20:03
like the term, they're going to use
20:05
the term and they've got open-ish code
20:08
or source available or business source.
20:11
Because a lot of people are kind of pushing not just
20:13
against the definition itself but like against
20:15
the idea that we need a definition or like
20:17
you guys get to have the definition. What
20:20
do you say to them? Yeah, you know
20:22
they're self-serving. They try to
20:24
be self-serving and they're trying to destroy
20:26
the commons that way quite
20:29
visibly. I think that users see
20:31
through them and it's not
20:33
even in their interest but you
20:35
know how it works sometimes corporations,
20:38
their greed goes up to
20:40
they care only about the next quarter and
20:43
who cares about what happens next. Maybe
20:46
the next CEO will have to take care meanwhile they're
20:48
just going to laugh all the way to the bank.
20:51
And that is the approach that I see
20:53
many of these people who complain
20:56
or who try to
20:58
redefine open source because it doesn't serve
21:00
their purpose. What we maintain it doesn't
21:02
fully serve their purpose. So instead of
21:05
respecting the commons and the
21:07
shared ideas, they act
21:10
like bullies and find all
21:12
sorts of excuses to redesign.
21:14
We've seen it happening. I've been
21:16
in free software and open source
21:18
post in my career since I was
21:20
in my 20s and I've seen
21:23
what was happening with the early
21:25
days with the proprietary Unix
21:28
guys that were going around telling us
21:30
that this Linux thing is
21:32
never going to work, you're joking, you're
21:36
giving away. Then they started to
21:38
be scared and started saying, hey, you're giving away
21:40
your jewels. You know why are you
21:42
doing this? You're depriving us of our life
21:45
support. The families are
21:47
going to be begging on the streets. I
21:49
remember having this conversation with a sales guy
21:51
from Moscow and
21:54
Microsoft coming up with their
21:56
program in the early 2000s. shared
22:00
source program because they just
22:02
could not get, they wrapped their head around
22:05
the fact that you could make money sharing
22:07
your source code. But they were forced by
22:09
the market to show at least a little
22:11
bit of what was happening behind the scenes.
22:13
They were losing deals. So
22:16
we've seen it already. They're
22:19
gonna keep on going like this, but
22:21
there is plenty of interest in maintaining
22:24
plenty more forces on the other side
22:26
to maintain, then to
22:28
keep the bar straight, to keep
22:30
going where we're going. Because that
22:33
clarity is, so
22:35
it's such a powerful, such a
22:37
powerful instrument to be able to
22:39
say, I'm open source, therefore, I
22:42
know what I can do, I know what I cannot do, and
22:45
have that collaboration straightened up. The
22:48
legal departments, the compliance departments,
22:50
the public tenders, they all
22:52
tend to have very clear
22:55
and speedy review of
22:58
processes that instead, everyone has
23:00
a different understanding of what open source
23:02
means. Yeah, we go back to the
23:04
brand, right? I'm in
23:07
Italy now and I'm surprised to see a
23:09
lot of Starbucks stores
23:11
opening. And I'm
23:14
absolutely baffled, like why is this happening?
23:16
This country has plenty of bar every
23:18
quarter. So there's a cafe with a
23:20
decent coffee. Why do you need
23:22
a brand? Because people have been going
23:24
around traveling the world, they see the brand, they
23:27
recognize it, they know what they can do, they
23:29
know what they're gonna get, what they're gonna get,
23:31
and they go out there. And it's the same
23:33
with open source. Gets
23:56
separate. Hit enter. A
23:59
lot of friends. This episode is brought
24:01
to you by our friends at
24:03
Sanadia. Sanadia is helping teams take
24:06
NAS to the next level via
24:08
a global multi-cloud, multi-geo, and extensible
24:10
service fully managed by Sanadia. They
24:13
take care of all the infrastructure,
24:15
management, monitoring, and maintenance for you
24:17
so you can focus on building
24:20
exceptional distributed applications. And
24:22
I'm here with VP of product and engineering, Byron Ruth.
24:24
So Byron, in the NAS
24:27
versus Kafka conversation, I hear
24:29
a couple different things. One
24:31
I hear out there, I hate Kafka with
24:33
a passion. That's quoted by the way on
24:35
Hacker News. I hear Kafka
24:37
is dead, long live Kafka. And then
24:39
I hear Kafka is the default, but
24:42
I hate it. So what's the deal
24:44
with NAS versus Kafka? Yeah,
24:46
so Kafka is an interesting one. I've
24:48
personally followed Kafka for quite some time
24:50
ever since the LinkedIn days. And I
24:52
think what they've done in terms of
24:55
transitioning the landscape to event streaming
24:57
has been wonderful. I think they
25:00
definitely were the sort of
25:02
first market for persistent data streaming.
25:04
However, over time, as people
25:07
have adopted it, they were the first
25:09
to market, they provided a solution, but
25:11
you don't know what you don't know in
25:13
terms of you need this solution, you need
25:16
this capability, but inevitably, there's also
25:18
all this operational pain and overhead
25:20
that people have come to associate
25:22
with Kafka deployments. Based on our
25:24
experience and what users and customers
25:26
have come to us with, they
25:29
would say, we are spending a
25:31
ton of money on spend on
25:33
a team to maintain our Kafka
25:35
clusters, or managed services,
25:37
or something like that. The paradigm
25:40
of how they model topics and
25:43
how you partition topics and how you
25:45
scale them is not really
25:47
in line with what they fundamentally want to
25:49
do. And that's where NAS
25:51
can provide, as we refer to
25:54
it, subject-based addressing, which has a
25:56
much more granular way of addressing
25:58
messages, sending messages. subscribing
26:00
to messages and things like that
26:02
which is very different from what
26:04
Kafka does. And the second that
26:07
we introduced persistence with our Jetstream
26:09
subsystem as we refer to it
26:11
a handful of years ago, we
26:13
literally had a flood of people
26:15
saying, can I replace my Kafka
26:17
deployments with this NATs Jetstream alternative?
26:20
And we've been getting constant inbounds,
26:22
constant customers asking, hey, can you
26:24
enlighten us with what NATs can
26:26
do? And oh, by the way,
26:28
here's all these other dependencies like
26:31
Redis and other things and some of
26:33
our services based things that we could
26:35
potentially migrate and evolve over time by
26:38
adopting NATs as a technology, as a
26:40
core technology to people's systems and platforms.
26:42
So this has been largely organic. We
26:45
never from day one, with our persistence
26:47
layer Jetstream, the intention was never to
26:49
say we're going to go after Kafka.
26:52
But because of how we
26:54
layered the persistence on top of this
26:56
really nice PubSub core NATs foundation, and
26:58
then we promoted it and we say, hey,
27:00
now we have the same semantics,
27:03
same paradigm with these new primitives
27:05
that introduce persistence in terms
27:07
of streams and consumers, the flood
27:10
gate just opened and everyone was
27:12
frankly coming to us and wanting
27:14
to simplify their architecture, reduce costs,
27:16
operational costs, get all of
27:18
these other advantages that NATs has to offer
27:21
that Kafka does not whatsoever, or any of
27:23
the other similar offerings out there. And you
27:25
get all these other advantages that NATs has
27:27
to offer. So there's someone out
27:29
there listening to this right now, they're the
27:31
Kafka cluster admin, the person in
27:33
charge of this cluster going down or not,
27:36
they manage the team, they feel the pain, all
27:38
the things, give a prescription, what should they
27:40
do? What we always recommend
27:43
is that you can go to the
27:45
NATs website, download the server, look at
27:47
the client and model a stream. There's
27:49
some guides on doing that. We also
27:51
have so native provided a basically a
27:53
packet of resources to inform
27:55
people because we get again, so many
27:57
inbound requests about how do you compare?
28:00
apps in Kafka and we're like, let's actually
28:02
just put a thing together that can inform
28:04
people how to compare and contrast them.
28:06
So we have a link on the website
28:08
that we can share and you
28:10
can basically go get those set
28:12
of resources. This includes a very
28:14
like lengthy white paper from an
28:16
outside consultant that did performance benchmarks
28:18
and stuff like that and discuss
28:20
basically the different trade-offs
28:22
that are made and they also
28:25
do a total cost of ownership
28:27
assessment between people who are organizations
28:29
running Kafka versus running NATs
28:31
for comparable workloads. Well
28:34
there you go. You have
28:36
a prescription. Check for a
28:38
link in the show notes
28:40
to those resources. Yesterday's tech
28:42
is not cutting it. NATs
28:44
powered by the global multi-cloud
28:46
multi-geo and extensible service that
28:48
is fully managed by Cineadia.
28:51
It's the way of the
28:53
future. Learn more at cineadia.com/change
28:55
log that's synadia.com/change log. So
29:07
last year on this time, Meta released
29:11
Llama, their large
29:13
language model, and
29:15
to much fanfare and applause and they
29:18
announced it as open source. We
29:21
know a lot has transpired since then but at
29:23
the time what was your response to that
29:26
even personally or as the executive
29:28
director of the OSI? Like what were you thinking? What
29:30
were you doing in the wake of that
29:32
announcement? Well we were
29:34
already looking at open source AI in
29:37
general. We were trying to understand what
29:39
this new world meant and what the
29:41
impact was on the principles of open
29:43
source as they applied to new artifacts
29:46
that are being created in AI and
29:49
we already had come to the
29:51
conclusion that open source AI
29:53
is a different animal than open source
29:55
software. There are many
29:58
many differences. So
30:00
we meet at me two years ago, over
30:02
two years ago, that was one of the
30:04
first things that I started was to really
30:06
push the board and to push the community
30:08
to think about AI as
30:10
a new architect that acquired
30:13
and deserved also a deep
30:16
understanding and a deep
30:18
analysis to see how we could transport
30:20
the benefits of open source software into
30:22
this world. But Rene's
30:24
of LAMA2 kind of cemented that
30:27
idea. It is a
30:29
completely new artifact because they
30:31
have released, sure they have released a lot
30:34
of information, a lot of details, but
30:36
for example, we don't know exactly what
30:38
went into the training data. And
30:41
well, LAMA2 also came out
30:43
with a license that
30:46
really has a lot of restrictions
30:48
on use. So it's having restrictions
30:50
on use is one of the
30:52
things that we don't like, I
30:54
mean, the open source definition
30:56
forbids. You cannot have any restrictions on use.
30:59
And you know, a surface value, the
31:02
license for LAMA2 seems innocent, right? One
31:04
of the things says, well, you cannot
31:06
use LAMA2 for commercial applications if you
31:08
have more than a few million, I
31:11
don't remember exactly how many, a few
31:13
million active users, monthly active
31:16
users. Okay, you
31:18
know, maybe that's a fair limitation.
31:21
And in my mind, I was like, so
31:23
what does it mean that the government of
31:25
India cannot use it? The
31:27
government of Italy, maybe, you know, if
31:30
you want to embed this into, so
31:33
that's already an exclusion
31:35
and starts to have to think about it,
31:37
you know, think about, yeah, I'm a startup,
31:39
yeah, I'm a small thing. But
31:42
what happens when you get to the six million
31:44
users when, you know, all of a sudden you
31:46
have to lower up and change completely your processes.
31:48
But then there are a couple of other restrictions inside
31:51
that license that are even more
31:53
innocent on surface. But when you
31:55
start diving deeper, like you cannot do anything
31:57
illegal with it. Okay. All
32:00
right, so let me say if I
32:02
help someone decide whether
32:04
they can or they should have an
32:07
abortion or if I
32:09
want to have this tool used in
32:11
applications to help me, I don't
32:14
know, get refugees out of
32:16
war zones into a lot of places.
32:19
And maybe I'm considered a
32:21
terrorist organization by the government
32:24
that is using that. So are
32:26
I doing something illegal? And
32:28
so on whose side, you know, who
32:30
needs to be evaluating that? If
32:33
these licensing terms that the Open
32:35
Source Initiative really doesn't think they're
32:37
useful, they're valuable, and they
32:39
should not be part of a license, they
32:42
should not be part of a contract
32:44
in general, and they need to be
32:46
dealt at a separate level. So
32:48
that's what I was looking at. I
32:51
was like, oh, LAMA2, oh my God.
32:53
It's not open source because clearly this
32:55
licensing thing would never pass our approval.
32:59
And at the same time, we don't even know
33:01
exactly what open source means. Why are you polluting
33:03
this pay? So I was really
33:05
upset. Yeah. So then
33:07
do you spring into action? Like what does
33:09
the OSI do? Because you're the defenders of
33:11
the definition, and here's a misuse, a huge
33:13
public misuse. Do you write a blog post?
33:15
Do you send a letter
33:17
from a lawyer? What do you do?
33:20
Luckily, we were already into this two-year process
33:22
of defining open source AI. So
33:29
we have... Actually I
33:31
was already in conversations with Meta
33:34
to have them join the process
33:36
and support the process to
33:38
find the shared definition of open source
33:40
AI. And in fact, they're part of
33:42
this conversation, then I'm adding with
33:45
not just corporations like Google,
33:47
Microsoft, GitHub, Amazon, et cetera.
33:50
But also we invited researchers
33:53
and academia, creators of AI,
33:56
experts of ethics and
33:58
philosophy organizations. organizations that
34:00
deal with open in general,
34:03
open knowledge, open data like
34:05
Wikimedia, comments, open knowledge
34:07
foundation, Mozilla Foundation. And
34:10
we're talking also with expert
34:13
thematics, but also organizations like
34:15
Digital Rights Group, like the
34:17
ESS and other organizations around the
34:19
world who are helping into
34:22
this debate.
34:24
Like we had to first go through an
34:26
exercise to understand and come to a shared
34:29
agreement that AI is a different thing
34:31
than software. Then we
34:34
went through an exercise to find the
34:36
shared values that we want to have represented and
34:39
why we want to have the same
34:41
sort of advantages that
34:44
we have for software also posted over to the
34:46
AI system. And
34:49
then we have identified the freedoms
34:51
that we want to have exercised.
34:54
And now we're at the point where we
34:57
are trying to enlist in being
34:59
the list of components of
35:02
AI systems, which is
35:04
not as simple as binary
35:07
code, compiler, and source
35:09
code. So it's not as simple
35:11
as that. It's a lot more complicated.
35:14
So we're building this list of components for
35:16
specific systems. And
35:19
the idea is by the end
35:21
of the spring, early summer, to
35:24
have the equivalent of what we have now
35:26
as a checklist for legal documents for
35:28
software and have the equivalent for AI
35:31
systems and their components so
35:34
that we will know basically we have
35:36
a release candidate for an open source AI
35:38
definition. And you mentioned that,
35:41
and there's, I think you posted this
35:43
eight days ago, a new draft of
35:46
the open source AI definition version 0.0.5 is available.
35:50
I'm going to read from, I think, what you
35:52
might be alluding to, which is exactly what is
35:54
open source AI. And it says, linked
35:56
up to the HackMD document, it
35:59
says, what is open source AI? To be open
36:01
source, an AI system needs to be available
36:03
under legal terms that grant the freedoms to,
36:05
one, use the system for
36:07
any purpose and without having to ask for permission,
36:10
two, study how the system works and
36:13
inspect its components, three, modify
36:15
the system for any purpose, including
36:17
to change its output, and
36:19
four, share the system for others
36:21
to use with or without modifications for any purpose.
36:25
So those seem to be the four hinges that this
36:27
– what is open source AI is hinging upon,
36:29
at least in its current draft. Is that
36:32
pretty accurate, considering it's recent
36:34
eight days ago? Yeah. Those are
36:36
the four principles that we want
36:38
to have represented. Now,
36:41
the very crucial question is what comes
36:43
next, is what is – if you
36:45
are familiar with the four freedoms of
36:48
software, those set by the
36:50
Free Software Foundation in the late 80s. They
36:53
have one – those freedoms have one
36:55
little sentence attached to it, to the
36:57
freedom to study and the freedom to
37:00
modify. They both say access
37:02
to the source code is a precondition for this,
37:05
which really means to clarify, it's
37:07
that little addition. It's meant to
37:09
clarify that if you
37:11
want to study a system, if
37:13
you want to modify it, you need to
37:15
have a way to make modifications to it.
37:18
It is not just – it's
37:20
the preferred form to make modifications from the
37:22
human perspective. It's not that you give me
37:24
a binary and then I have to
37:26
decompile it or try to figure
37:28
out from reverse engineering how it works.
37:31
Give me the source code. I need the source code here
37:33
to study. For the
37:35
AI systems, we haven't
37:37
really found yet a
37:39
shared understanding or a shared
37:41
agreement on what it means
37:44
to have access to the
37:46
preferred form to make modification to an
37:48
AI system. That's the exercise
37:50
that we're running now and we – yeah. Yeah,
37:53
that's interesting. The preferred form of modification
37:55
is really interesting. Like you
37:57
said, you don't want to give a binary. Expect –
38:00
diverse engineering because that's
38:02
possible, right? That's possible maybe to a small
38:04
subset. It's not the preferred route to
38:06
get to Rome. It's just like that's not the route I
38:08
want to go down, right? I want a different way. Yeah.
38:11
And you want to have a simple way. So even
38:14
some licenses even have more
38:16
specific wording around defining
38:19
what source code actually means like
38:21
the GNU GPL is one of those
38:24
very clear description and prescriptions about
38:27
what needs to be given to
38:30
users in order to exercise those freedoms.
38:33
Their freedoms as a user. So
38:35
for AI, yeah, for AI, it's
38:37
complicated because there are a few
38:39
new things for which we
38:42
don't even have, there are no core
38:44
cases yet. You know, I keep
38:46
repeating the same story. When software came out
38:48
for the first time, started to come out
38:50
at the labs, research labs,
38:53
they started to become a commercial artifact
38:56
that people could just sell. There
38:59
was a conscious decision to apply copyright
39:01
to it. It was not a given
39:04
fact that it was going to be using
39:07
copyright, like copyright law. So
39:09
that decision was a lucky one, honestly,
39:12
and it was a well thought out,
39:14
I don't know which of the two,
39:17
because copyright as a legal
39:19
system is very similar across the
39:21
world. And building the open
39:23
source definition, the principle of the definition,
39:26
the legal documents that go with
39:28
software for open source software and
39:30
pre-software, those legal documents built
39:32
on double copyright means that they're
39:35
very, very similarly applied pretty much
39:37
everywhere around the world. The alternative
39:39
at the time was a conversation,
39:41
were conversations around treating
39:43
software as an invention and therefore
39:46
covered by patents. Patent
39:48
law is a whole different mess around
39:50
the world. The whole different applications that
39:52
have all different terms, much
39:55
more complicated to deal with. So
39:57
for AI, we're pretty much at the scene.
40:00
stage where there are some new
40:02
artifacts like the model after
40:05
you train a model and that
40:07
produces weights and parameters that
40:09
go into the model. Those
40:11
models, honestly, it's not clear
40:14
what kind of legal frameworks apply to
40:16
those things. And we might be at
40:18
the same time in history where
40:20
we could have to imagine
40:22
and think and maybe suggest and recommend,
40:25
what the best course of action will be,
40:27
whether it makes sense to treat them as
40:29
copyrightable entities, artifacts,
40:32
or nothing at all, or
40:34
inventions, or any, you know,
40:37
some other rights or exclusive
40:39
rights. And the same
40:41
goes into the other big
40:44
conversation that is happening already,
40:46
but for which there is no, I don't have
40:49
a clear view of where it's going to end,
40:51
these are the conversations around the
40:54
right to data mining. And
40:57
if you follow the conversations around
40:59
charge EPT being sued by New
41:01
York Times and Getty Images, Stability
41:03
AI, and Getty Images, and
41:05
GitHub being sued by Anonymous,
41:08
etc., etc. A lot of
41:10
those lawsuits hinge on
41:13
what's happening, why are
41:15
these powerful corporations going
41:17
around and calling the internet, aggregating
41:20
all of this information and data
41:22
that we have provided, uploaded, we
41:25
society, some commercial actors,
41:27
some non-commercial actors, we have created
41:30
this wealth of data on the
41:33
internet, and they're going around creating
41:35
it and basically making it
41:37
proprietary, building models that
41:39
they have for themselves. And on top of that,
41:42
you can already start seeing like, oh my God,
41:44
they're going to be eventually making a lot of
41:46
money out of the things that
41:48
we have created. Or even more scarily,
41:50
like sometimes I think about this myself,
41:52
I've been uploading my pictures for
41:55
many years without paying too much. So
41:57
there is another base out there, I'm sure
41:59
that. I want to build another base out
42:01
there of my pictures as I was aging. And
42:04
now these pictures are being – can be used, could
42:06
be used by an evil government
42:08
or evil actor to recognize me around
42:10
the streets at any time. And
42:16
I'd allow it in a course. Is that fair?
42:18
Is that not fair? Those are
42:20
big questions and there is no easy
42:22
or simple answer. Yeah. So
42:25
did you enumerate – and I missed
42:27
it – or can we enumerate the
42:29
components that you have decided
42:31
so far are part of an
42:34
AI system, the code I heard,
42:36
the training data, etc.? Yeah.
42:39
There are three main categories. So maybe four.
42:42
Like one is the – yeah, is in
42:44
the category of data. One is
42:46
in the category of code. That
42:48
is the other category is models.
42:52
And there is a four category that
42:54
goes into other things like documentation, for example.
42:57
Instructions of how to use or
42:59
scientific papers. In the data
43:01
parts, some of the components
43:04
are the training data, the testing
43:06
data. In the code
43:08
parts go the tooling to –
43:11
like for the architecture, the inference code
43:13
to run the model. Something
43:15
that is written by human in general,
43:17
the vehicles have in there the code
43:20
to filter and set
43:22
up the data sets and prepare
43:25
them for the training. And
43:27
then in the models, you have
43:29
the model architecture, the model
43:31
parameters, including weights, other parameters,
43:33
and things like that. There
43:36
might be intermediate steps
43:39
during the training. And
43:41
the last bit is documentation,
43:43
how to use samples,
43:45
output. So there is
43:47
an initial list of all of these components
43:50
that have been – we worked with
43:53
or actually the Linux Foundation
43:55
worked on creating this list
43:58
Specifically for generative AI. Enlarge
44:00
language models. yeah, there were working
44:02
with them. I mean, we're using
44:04
deserve their list as I as
44:06
a. Backdrop. Burrow as a
44:09
starting point to move forward this conversation.
44:11
Now. The question that we need to ask
44:13
own adding displaced and it you go to
44:16
the draft Five you will see a name
44:18
to D Matrix Basically from this though components
44:20
that are of sixteen. To remember correctly seventeen.
44:23
This. A Components. And then on
44:25
a row. Next. To them. That.
44:27
Is a question Do I needed to
44:29
run it to? I needed to use
44:31
it. The I mean do anything use
44:33
it I need to hobby into and
44:35
to study to a need this component
44:37
to modify the system. And. Will
44:40
referring to the system like this is
44:42
one of the important thing is. They
44:44
open source definition of first to the program.
44:47
And a program is never define
44:49
added a program in pretty much
44:51
we know what it is a
44:53
say I ask is. And.
44:55
Again is a bit complicated. Question looks
44:58
very simple on surface by when he
45:00
studied the B D by becomes completely
45:02
because what is an Ai system. Rak.
45:05
So we started using D, a definition
45:07
that is been. It's becoming quite
45:09
popular in every regulation and around
45:11
the world. Did say it's a
45:13
work done by Deal Organisation for
45:15
Economic Cooperation and Development D O
45:18
U C D. And. Out
45:20
damn defined an ai system.
45:22
New in very. Broad
45:25
terms. And. His definition is
45:27
being used in many regulations like
45:29
from the. United States
45:31
Executive Order on a I
45:34
nest. Also uses it in
45:36
Europe d A I are uses
45:38
it is I'll go with a
45:41
slight Betty Small minor buddy Asian
45:43
It seems to be quite popular
45:45
but dora detractors and indeed it
45:48
is quite generic to it. Sometimes
45:50
when you agree that it current
45:52
carefully me. You. Didn't color
45:55
a spreadsheet. It's really bizarre. So.
45:58
Let's say that hypothetically, I'm like. a
46:00
medical company that has been working
46:02
on a large language model and
46:05
I have proprietary data. So
46:08
I have like readings and reports
46:10
and stuff that we've accumulated over
46:12
years and I create an
46:14
LLM based on that data that
46:16
ultimately can answer questions about medicine
46:19
or whatever. And I want to
46:21
open source that. I need to be
46:23
able to make it so it's usable,
46:26
studyable, modifiable, and shareable.
46:29
And it seems like the training data even
46:32
though that's the most proprietary part and
46:34
perhaps the most difficult part to actually make
46:36
available or sometimes impossible is
46:39
necessary not to use but
46:42
to study and modify. It seems like
46:45
so if I release the
46:47
model, the code,
46:50
all the parameters, everything we use to build
46:52
a model, everything except for like the source
46:54
original data under what you guys are
46:56
currently working on, that would not be open source AI, would
46:58
it? Honestly, that is
47:01
a very good case. Example
47:03
for why I think we
47:06
need to carefully reason around
47:08
what exactly do I need to
47:10
study? What kind of access, what
47:12
sort of access do I need?
47:15
Is that the original data set? Because
47:17
if it is the original data set, then
47:19
we will never going to have an open source AI. Right.
47:23
That's where I'm getting to. Like it's not going to happen. It's not
47:25
going to happen. Yeah. So
47:27
maybe, and this is my working
47:29
hypothesis that I threw out there, maybe
47:32
what we need is a very good
47:34
description of what that data is.
47:37
Maybe samples, maybe instructions
47:39
on how to replicate it. Because
47:42
for example, there might be data that
47:45
is copyrighted. You might have
47:47
the right under first use or under
47:49
different exclusions of copyright. You may have
47:51
the rights to create a copy and
47:54
create a derivative like around the training.
47:57
But not to redistribute it. Because if you redistribute
47:59
it, then... start infringing. So
48:01
I think we need to be carefully
48:03
thinking about data. And the
48:05
reason why I became more
48:08
and more convinced that we don't need
48:10
the original data set is
48:12
because I've seen wonderful
48:15
mixing, wonderful remixing
48:18
of models, even
48:21
splitting of models
48:23
and recombinations of
48:25
models, creating whole new
48:28
capabilities, new AI
48:30
capabilities, without having
48:32
to retrain a single thing. So
48:36
I'm starting to believe really
48:38
that the AI weights in
48:40
machine learning, the weights in
48:42
the architecture, has it so it's
48:44
not a binary code. It's not a binary
48:46
system that the binary code that you have
48:48
to reverse engineer. If you
48:51
have sufficiently detailed instructions on how
48:53
it's been built and
48:55
what went into it, you
48:57
should be able, you might be
49:00
able to create new systems, reassemble
49:02
it, study how it works,
49:04
and executing it, modify. So the
49:06
preferred form to make modifications is
49:09
not necessarily going through
49:11
the pipeline or rebuilding the whole system
49:13
from scratch, which for many
49:15
reasons may be impossible. I
49:18
do like the idea of a small
49:20
subset of the data set that's
49:22
anonymized or sanitized
49:24
in some way, shape, or form. It's like this is
49:26
the acceptable sample
49:29
amount required for the study
49:31
portion or the modification portion. Yeah.
49:34
It could be the schema for example. It could
49:36
be the... Right. Provide your own
49:38
data in here if you can, which you
49:40
can obviously find other ways to use
49:43
artificial intelligence to generate more data. So that's
49:45
the whole thing, right? But I
49:47
feel like that's acceptable
49:50
to me to provide some
49:52
sort of sampling or as you said, the schema. I think that
49:54
makes sense to me. Yeah. Yeah.
49:57
The research is going also in the district shared
49:59
with... data cards
50:01
and model cards, lots
50:03
of that
50:23
you put up in a picture in a museum
50:26
and nobody can do anything with it.
50:28
It needs to be practical. I
50:31
keep repeating the open source definition,
50:33
how success because it
50:35
enabled something practical and it
50:37
has success because other people
50:40
have written it, other people have decided
50:42
to use it. If you
50:44
should keep on insisting from your pedestal that
50:46
you shall do this
50:49
and that, you may
50:51
not be finding enough little
50:53
crowds to follow here. If
50:57
no one is using it, what's the point? What's
51:22
up friends? I'm here with one of my
51:24
new friends, Zane Hamilton from CIQ. Zane,
51:27
we're coming up on a hard deadline with
51:29
the centa's end of life later this year
51:31
in July and there are still folks out
51:34
there considering what their next move should be.
51:36
Then last year we had a bunch of
51:38
change around Red Hat Enterprise Linux that makes
51:40
it quote less open source in the eyes
51:43
of the community with many saying, Rello is
51:45
open source but where is the source and
51:47
why can't I download and install it? Now,
51:49
Rocket Linux is fully open source and CIQ
51:52
is a founding support partner
51:54
that offers paid support for
51:57
migration, installation, configuration, training, etc.
52:00
But what exactly does an enterprise
52:02
or a Linux to get
52:05
when they choose the free and open source rocky
52:07
Linux and then ultimately the support from CIQ if
52:09
they need it? There's a lot going on in
52:11
the enterprise Linux space today. There's a lot of
52:14
end of life of CentOS. People are making decisions on
52:16
where to go next. The standard of what enterprise Linux
52:18
looks like tomorrow is kind of up in the air.
52:21
What CIQ is doing is we're trying to help
52:23
those people that are going through these different decisions
52:25
that they're having to make and how they go
52:28
about making those decisions. And that's where our expertise
52:30
really comes into play. A lot of people
52:32
who have been through very complex Linux migrations,
52:34
be it from the old days of integrating
52:36
from AI X or Solaris onto Linux and
52:38
even going from version to version because to
52:40
be honest enterprise Linux version to version is
52:42
not always been an easy conversion. It
52:45
hasn't been and you will hear that from us.
52:47
Typically the best idea is to do an in place
52:49
upgrade. Not always a real easy thing to do,
52:51
but what we've done is we have started looking at
52:53
and securing a path of how can we actually
52:55
go through that? How can we help a customer who's
52:57
moving from CentOS 7 because of the end of
52:59
life in July of this year? What
53:01
does that migration path look like and how can we help? And
53:03
that's where we're looking in ways to help automate from an admin
53:05
perspective. If you're working with
53:07
us, we've been through this. We can actually go through and
53:10
build out that new machine and do a lot of the
53:12
back end manual work for you so that all you really
53:14
have to do at the end of the day is validate
53:16
your applications up and running in the new space. And then
53:19
we automate the switch over. So we've worked through
53:21
a lot of that. There's also decisions you're making
53:23
around. I'm paying a very large bill
53:25
for something. I'm not necessarily getting the most value out
53:28
of. I don't want to continue down that path. We
53:31
can help you make that shift over to an
53:33
open source operating system, Rocky Linux and help derive
53:35
what's next. Help you be
53:37
involved in a community and help make sure
53:39
that that environment you have is stable. It's
53:41
going to be validated by the actual vendors
53:43
that you're using today. And that's really where
53:46
we want to be as a partner from
53:48
not just an end user perspective, but in
53:50
an industry perspective. We are
53:52
working with a lot of those top tier vendors out there
53:54
of certifying Rocky, making sure that it gets pushed back to
53:56
the R.E.S.F. Making sure that we
53:58
can validate that everything is there. and secure that
54:00
needs to be there and helping you on
54:03
that journey of moving. And that's where we,
54:05
CIQ, really show our value on top of
54:07
an open source operating system is we have
54:09
the expertise. We've done this before. We're in
54:12
the trenches with you and we're defining that
54:14
path of how to move forward. Okay, ops
54:16
and sys admin folks out there, what are
54:18
you choosing? Sentos is end of life soon.
54:20
You may be using it, but if you
54:23
want a support partner in the trenches with
54:25
you, in the open source trenches with you,
54:27
check out CIQ. The founding
54:29
support partner of Rocky Linux.
54:31
They've stood up the RESF,
54:34
which is the home for
54:36
open source enterprise software. The
54:38
Rocky enterprise software foundation. That
54:40
is they've helped to orchestrate
54:42
the open ELA, a collaboration
54:44
created by and upheld by
54:46
CIQ, Oracle and SUSE. Check
54:48
out Rocky Linux at rockylinux.org,
54:51
the RESF at resf.org. And
54:53
of course, if you need
54:55
support, check out our friends
54:57
at ciq at ciq.com.
55:12
Fully acknowledging that it's a work in progress
55:14
and you're not done. Given
55:16
your current mental model of
55:18
the definition as it is working, are there systems
55:20
out there today that you would rubber stamp and
55:23
say like, this is open source AI, I'm thinking
55:25
of perhaps Mistral has a bunch of stuff going
55:27
on and they're committed to open and transparent, but
55:29
I don't know exactly what that means for them.
55:32
Have you looked at anything and do you
55:34
have like things you're comparing against as you
55:36
build to make sure that there's a set
55:38
of things that exist or could exist that
55:41
are practical? Not yet. I know
55:43
that there is, we have an
55:45
affiliate organization called Ellutho
55:47
AI. They are a
55:49
group of researchers. They recently incorporated
55:52
as a file one C3, not
55:54
broken in your state. And
55:57
from the very beginning, they've been doing
55:59
a lot of. of research in the open,
56:02
the reasoning, data sets, and structure, and research
56:05
papers, models, and weights, and everything like
56:08
that. So I'm really leaning
56:10
a lot on them to shine
56:12
a light on how this can be done. But
56:15
I don't want to be too
56:17
restricted in my mind. They are
56:20
very open with an open science
56:22
and open research mentality. I
56:25
think that there is an open
56:27
AI and open source AI
56:29
that is not as equally
56:32
open necessarily, but it can
56:35
still practically have meaningful impact.
56:37
It can generate that positive
56:39
reinforcement of innovation and
56:41
pernicious collaboration, et cetera. So
56:44
yes, I need an un-looted AI,
56:47
but I'm also very open. And
56:49
I'm sure there will be other
56:51
organizations, other groups, as
56:54
we go and elaborate more on what
56:56
we actually need to what is preferred
56:58
form to make modifications to an AI
57:00
system that we're going to discover more. So
57:03
no open source AI
57:05
yet. So there's no rubber stamp for
57:08
anything out there currently. Well, I mean,
57:10
I said I could rubber stamp PTA
57:12
and the un-looted AI, but I don't want
57:15
to say that that's necessarily the only thing.
57:17
Right, there might be more stuff. And again,
57:19
those are the ones, the guys that I,
57:21
because I know how they work. Yesterday
57:24
or the other day, Altma was
57:26
released by the Allen AI Institute.
57:28
And that seems to be also
57:30
quite openly available for models with
57:33
science behind it, et cetera. I
57:35
haven't looked at their licenses and
57:37
I haven't looked at it carefully.
57:39
So I can't really tell. It
57:41
might always well be an
57:43
open source AI system. I
57:46
was trying to get to a definitive, really. Is
57:48
there not a stamped open
57:50
source AI out there yet?
57:54
Well, I can tell you what is not. Mama
57:56
2 is not. Open AI
57:58
is not. Touch it out. All right. a
58:00
denialist more than a permit list. Yes. So
58:02
I suppose one of the questions which maybe
58:05
is obvious but I got to ask it
58:08
is what is the benefit if
58:10
I'm building a model and I'm releasing a new
58:12
AI? What is the benefit to
58:15
it being open source? To
58:17
meet this open source AI definition, what is
58:19
the benefit to its originator?
58:22
And then obviously it's humanity, I kind of get
58:24
that, but like what is the
58:26
benefit? It's pretty easy to kind of clarify that with
58:29
software, right? We see how that's working
58:31
because we've got 30 years
58:33
of history or more in a lot of
58:35
cases. We've got track record there. We
58:38
don't have track record here. It's still
58:40
early pioneer days. What's
58:42
the benefit? That is a very good
58:44
question and I don't
58:47
have an answer for it. I mean I
58:49
know the benefit for humanity. I know the
58:52
benefit for the science of it and this
58:54
is what really those
58:56
benefits are what triggered the internet.
58:58
Like if software started to come
59:00
out of the labs without
59:03
the definition of true software, without
59:05
the GPN license, without the
59:07
BSD research, I
59:09
don't think we would have had such a
59:11
fast evolution of software
59:14
computer science. We would
59:16
not have the internet that we see
59:18
today if everyone had
59:20
to buy a license from Solarius,
59:22
Sun from Oracle, etc.
59:26
If a data center would have to, you know,
59:28
you would have to go and
59:30
call the Sun Microsystems or IBM's
59:32
sales team to be able to
59:35
build a data center instead of
59:37
using just boxes and slapping remix
59:39
and Apache Web Server on it.
59:43
We would have had a completely different
59:45
history of digital world or the
59:47
past. I mean completely different. So I
59:49
can see the benefit for society and
59:51
science. For some of these
59:53
corporations, I'm assuming that they have made
59:56
some of their calculations on
59:58
stopping the competition. or
1:00:01
creating competitive advantages, maybe
1:00:03
in pure Silicon Valley approach,
1:00:05
like get more users, we'll figure out
1:00:08
the business model later. There
1:00:10
is some of that going on, likely,
1:00:12
most likely. But I can't,
1:00:14
I haven't had that conversation yet with
1:00:17
any smart people I
1:00:19
know thinking about the business models
1:00:21
behind this, the possible ways of privatizing,
1:00:24
or I don't know, finding revenue
1:00:26
streams and maybe a top properties
1:00:29
open source model. Do
1:00:31
you think that they're becoming commoditized
1:00:33
if we specifically talk about these
1:00:35
large language models? If
1:00:37
we call AI that for now, recognizing
1:00:39
there's an umbrella term and there's other
1:00:41
things that also that represents, do
1:00:43
you think that they are becoming commoditized
1:00:45
and will continue to enough so
1:00:48
that open source can keep up
1:00:50
with proprietary in terms of quality,
1:00:53
or even surpass just
1:00:55
because of the number of smart people releasing
1:00:57
things? I
1:00:59
don't know, that's what I'm asking honestly. What
1:01:01
are your thoughts on it? Honestly, recently I
1:01:03
saw this new system that
1:01:05
it's a text to speech system, and
1:01:08
they built it, this team of developers
1:01:11
from a company called Polabora. They
1:01:13
built this system by
1:01:15
splitting a system from OpenAI,
1:01:18
another from either an Epic
1:01:20
or, now I don't remember
1:01:22
exactly, but they split
1:01:24
an AI system. They took it
1:01:26
and they slipped it, their input
1:01:29
for outputs, and they attached another
1:01:31
model of their own training with
1:01:33
small data sets, and they built a brand new
1:01:35
thing. I think, I mean,
1:01:38
this is the kind of stuff that
1:01:40
is inspiring, like at one point, there's
1:01:42
gonna be, I'm sure that the quick
1:01:44
evolution of this discipline would make it
1:01:47
so that smaller teams with smaller amount
1:01:49
of data would be able to create
1:01:51
very powerful machines. And maybe
1:01:54
the advantages of these
1:01:56
large corporations are now
1:01:58
deploying, delivering, and
1:02:00
distributing openly accessible
1:02:03
AI models, maybe in
1:02:06
their mind, having optimized hardware, cloud
1:02:08
resources that they can sell, maybe
1:02:11
that's where they're going with one of their
1:02:13
revenue streams, they imagined that they
1:02:15
were coming, would be coming from.
1:02:18
Yeah, that is exciting. I did see, I think
1:02:20
it was like, Codeium AI
1:02:23
just recently announced a
1:02:25
model that beats
1:02:27
DeepMind on code generation, according
1:02:30
to benchmarks that I haven't looked at, as
1:02:32
well as Copilot, and that's
1:02:34
from a smaller player. I'm not sure
1:02:36
if that's open or closed or what, but it
1:02:39
is kind of pointing towards like, okay, there's
1:02:42
significant competition. And like you
1:02:44
said, remixing and the ability
1:02:46
to combine and change,
1:02:49
and even in some cases, swap out
1:02:52
and take the best results, that we
1:02:54
will have a vibrant ecosystem
1:02:56
of these things. And I think open
1:02:58
source is the best model
1:03:00
for vibrant ecosystems. So
1:03:03
that rings true with me. Doesn't
1:03:06
mean it's right, but it sounds right. Yeah,
1:03:09
this is a tough one. This is really
1:03:11
a tough nut to crack, really. I mean,
1:03:13
even at the forums you
1:03:15
have, I believe you're calling it
1:03:17
the deep dive, right? It's deep
1:03:19
dive colon AI. And this
1:03:21
is the place where you're hoping that many
1:03:24
folks can come and organize. You
1:03:26
say it's the global multi-stakeholder effort
1:03:28
to define open source AI, and
1:03:30
that you're bringing together various
1:03:33
organizations and individuals to collaboratively write a
1:03:35
new document, which is what we've been
1:03:37
talking about, directly and indirectly.
1:03:40
Who else has invited this? Like, how does
1:03:42
this get around? How do people know about
1:03:44
this? Who is invited to the
1:03:47
table to define or help define? Is
1:03:49
this an open way to define
1:03:51
it? What is happening
1:03:53
here? Who's participating? But at
1:03:56
this point, it's now public, so
1:03:58
anyone can really join. forum
1:04:00
and can join me in the
1:04:03
biweekly town hall
1:04:05
meetings. So that part
1:04:07
is public and everybody's
1:04:10
welcome to join. We're going to keep
1:04:12
on going with public reports
1:04:15
and small working groups with
1:04:17
people that we're picking but only
1:04:20
because of agility in the collaborations.
1:04:22
We want to have, we're picking
1:04:24
people that we know of or that
1:04:26
we have been in touch with coming
1:04:29
from a variety of experiences. Say
1:04:31
we're talking to creators of AI
1:04:34
in academia, large corporations,
1:04:36
small corporations, startups, lawyers,
1:04:39
people who work with regulators,
1:04:42
think tanks and lobbying organizations.
1:04:44
We're talking to experts in
1:04:47
other fields like ethics and
1:04:49
philosophy. We keep on chatting
1:04:53
with, we have identified six
1:04:56
stakeholders categories with
1:04:58
trying to have a representation, also
1:05:01
geographically distributed from North
1:05:04
America, South America, Asia,
1:05:07
Pacific, Europe, Africa.
1:05:10
Last year we had conversations
1:05:12
with about 80 people from
1:05:14
representatives of all these categories
1:05:16
in a private group just
1:05:18
to get things kick-started. And
1:05:21
we have had meetings in
1:05:23
person starting in June
1:05:25
in San Francisco and
1:05:28
July in the post land and
1:05:30
then other meetings in the bioreg
1:05:32
in Europe like we had meetings
1:05:34
in person with some of these
1:05:37
people doing different conferences. But
1:05:40
starting this year we're going to be, this first half
1:05:42
of the year we're going to be super public. We're
1:05:44
going to gather, we're going to
1:05:46
be publishing all the results of the working groups
1:05:49
and we're going to be taking comments
1:05:51
on the forums and then we're
1:05:53
going to have an in-person meeting. We're
1:05:56
aiming late May or June with
1:05:59
at least two representatives for each
1:06:01
of the stakeholder categories to
1:06:03
get in a room and produce
1:06:05
either now to
1:06:07
the last pieces in
1:06:10
the definition, moving on the comments and
1:06:12
come out with other that meaning with
1:06:14
a range candidate, something that we feel
1:06:16
like there is endorsement from a dozen
1:06:19
different organizations across the world and
1:06:21
across the experience. Then we're
1:06:23
going to use and we're raising funds for it
1:06:26
to have at least four events in
1:06:29
different parts of the world between June
1:06:31
and the end of October. One
1:06:34
of these events will definitely going to be at all
1:06:36
things open. What we're
1:06:38
going to gather more potential
1:06:40
endorsements as soon as
1:06:42
we get to five endorsement from
1:06:45
each of the different categories. I
1:06:48
think we're going to be able to say this is version one. We
1:06:51
can start working with it and see
1:06:53
where we land and maybe next year
1:06:55
we're going to have by that time, I
1:06:57
mean by October November, the board
1:06:59
will also have a process for
1:07:01
the maintenance of this definition
1:07:04
because most likely we're going to have
1:07:06
to think about how to maintain it,
1:07:08
how to respond to
1:07:11
challenges, whether there's technological
1:07:13
or regulatory challenges
1:07:16
or just we missed a mark and
1:07:19
we realize later we'll have to fix it. Yeah,
1:07:23
kind of want to backtrack slightly
1:07:25
I guess as I hear you
1:07:27
talk about this and kind of coming to a
1:07:30
version of BLAST sometime this year based
1:07:32
upon certain details. Like when I ask you and
1:07:34
I know this is your response and not so
1:07:36
much a corporate response in
1:07:38
terms of what's the benefit of
1:07:40
being an open source artificial intelligence, like what's
1:07:43
the benefit of being open source AI? Like
1:07:46
all this effort to define it and
1:07:49
then what if there's not that many people
1:07:51
who really want to be defined by it?
1:07:53
Like I guess that's an interesting consideration is
1:07:55
that all this effort to define it but
1:07:58
maybe there is no real benefit. or
1:08:00
the benefit is unclear and then folks just,
1:08:03
it's almost like saying, it's definitely
1:08:05
a line, right? It's like, okay, everything is
1:08:07
basically not and there's very few that are
1:08:09
basically, or at least initially. Maybe as iteration
1:08:12
and progress happens that more and more will
1:08:14
see a benefit and maybe that benefit permeates
1:08:17
more clearly, then we can see it now. Yeah.
1:08:20
I don't want to think about that. I
1:08:23
don't want to think about that. No, it's
1:08:26
one of those things. Like if you start any
1:08:28
of them or think you are the winner, you're
1:08:31
probably going to fail, right? So it's
1:08:33
not one of the outcomes that I
1:08:36
see tremendous amount of pressure. I
1:08:38
mean, it's unlikely that that's going to
1:08:40
happen. That's what I want to say. I
1:08:43
have had a lot of pressure
1:08:46
from corporations, regulators.
1:08:50
The AI act has a provision in
1:08:52
there, a text that
1:08:54
says that provide some exclusions to
1:08:57
the mandates of the law for
1:08:59
open source AI. There is
1:09:02
no definition in there. So regulators
1:09:04
need it. Largest corporations
1:09:06
need it. Researchers
1:09:08
need some clarity. I
1:09:12
hear a lot of researchers,
1:09:14
they want data. And
1:09:17
they want data. It doesn't mean that they
1:09:19
weren't necessarily the original data. Some
1:09:22
of them are not. But they do
1:09:24
want to have good deficit. And that
1:09:26
only comes if there is a clarity
1:09:29
about what are the boundaries of what
1:09:31
is allowed for them to accumulate data.
1:09:35
Because data becomes very, very messy very
1:09:37
quickly. Privacy law,
1:09:39
copyright law, trade secrets,
1:09:42
illegal content, content is illegal in
1:09:44
some parts of the country or
1:09:46
some countries and some other countries. It
1:09:50
becomes really, really
1:09:52
messy very quickly. And
1:09:54
researchers don't have a way to deal with
1:09:56
it right now. They need help.
1:10:00
I agree that you should keep doing it. I didn't mean
1:10:02
to sound like it should be a failure. Sometimes
1:10:04
I think it might be beneficial to think about failure at the
1:10:06
beginning because it's like, well, you got to consider
1:10:09
your exit before you can go in in a way. I'm not
1:10:11
saying you should do that, but I'm glad you
1:10:13
are defining it. It does need to be defined.
1:10:15
I didn't mean to be necessarily like what if,
1:10:17
but there's a lot of effort going
1:10:19
into this. I can see how a
1:10:22
lot of your attention is
1:10:24
probably spent simply on defining
1:10:26
this and working with all the folks, all
1:10:29
the stakeholders, all the opinion makers, etc.
1:10:32
that are necessary to define
1:10:34
what it is. It's a
1:10:36
lot of work. It's all work.
1:10:38
You're absolutely right. This is taking
1:10:40
most of my attention. And yes,
1:10:42
I do see a couple of
1:10:44
failure options. We can fail if
1:10:46
we're late and if we get it wrong. But
1:10:50
for getting it wrong, the
1:10:52
fact that it's defined with a version
1:10:54
number, I think we can
1:10:56
fix it over time. And we really shouldn't
1:10:59
be expecting to have a perfect first time.
1:11:03
It's changing too quickly, the
1:11:05
whole landscape. And the other getting
1:11:08
in late is also part of
1:11:10
the reason why I'm pushing to get
1:11:12
something out of the door because
1:11:15
a lot of pressure
1:11:17
exists in the market to have
1:11:19
something. Everyone is
1:11:21
calling them their models,
1:11:23
open source, AI, recognizing
1:11:26
that there is value in
1:11:28
that term implicitly. But if there is
1:11:30
no clarity, it's going to be diluted
1:11:32
very, very, very rapidly. Before
1:11:34
Jared and I got on this call, one thing we had a loose
1:11:37
discussion, and I quickly stopped talking because we have
1:11:39
a term. I think it's
1:11:41
pretty well known in broadcasting and podcasting is
1:11:43
like don't waste tape. And
1:11:46
I didn't want to share my deep sentiment, although I
1:11:48
loosely mentioned it to Jared in our pre-call just kind
1:11:51
of 10 minutes before we met up. It was
1:11:54
basically what is at stake. I
1:11:57
know we talked just loosely here about
1:11:59
failure as well. an option and what
1:12:01
is failure and is it iterative on the
1:12:03
version numbers you just mentioned, but is there
1:12:05
a bigger concern at stake if the
1:12:08
definition that you come up with collectively
1:12:11
is not perfectly suited? Like does the
1:12:13
term open source in software
1:12:15
now, is the term now fractured because
1:12:18
the arbiter of the
1:12:20
term open source has
1:12:22
not been able to carefully and
1:12:24
accurately define open source AI?
1:12:27
Is there a bigger loss that could
1:12:29
happen? And I'm sorry to have to ask that
1:12:31
question, but I have to. Yeah,
1:12:34
you don't want me to sleep tonight, huh? Sorry
1:12:38
about that. I think
1:12:40
so far we've been able to win,
1:12:43
in quotes, win in
1:12:45
the public when we push back
1:12:47
on the term of open source
1:12:49
because it's pretty well accepted, right?
1:12:53
And whether, and I'm going
1:12:55
to say this, but whether we like it or not, OSI
1:12:57
has been the guardian,
1:12:59
so to speak, of that term. Some
1:13:01
say you've taken that right. I
1:13:04
think you've been given that right over decades
1:13:07
of trust. And then
1:13:09
some cases there's some mistrust and that's not so
1:13:11
much me. It's just out there and not
1:13:13
everybody's been happy with every decision you come up with
1:13:16
and that's going to be the case, right? If you're
1:13:18
not making some enemies, you're not doing some things right,
1:13:20
I suppose, in the world because nobody's going to like
1:13:22
your choices, right? But I
1:13:24
think, I wonder that. I
1:13:26
personally wonder if you can't define this well,
1:13:29
does the term open source change or
1:13:32
is becoming open to change? There
1:13:35
is that raise cover where, but that's one of
1:13:37
the reasons why I'm being extra
1:13:39
careful to make sure that everyone's
1:13:42
involved and have a voice
1:13:44
and has a chance to voice their opinion
1:13:46
and all of these opinions are recorded
1:13:48
publicly so we can go back
1:13:50
and point out the
1:13:52
place where we made a bad choice and
1:13:55
be able to correct or not. Yeah.
1:13:59
Stefano, real quick. What's the number one
1:14:01
place people should go if they were to
1:14:03
get involved like the URL? Here's
1:14:06
how you can be part of that discussion
1:14:08
discuss the open source org. There we go
1:14:10
Yes, well, we're gonna be having you know
1:14:12
all of our conversations. All right,
1:14:14
you heard it. That'll be in the show notes so
1:14:17
if you are interested in this even if you just want to
1:14:19
listen and be Lurking and watching
1:14:21
as it makes progress definitely hit that up
1:14:23
if you want your voice heard and you
1:14:25
want to help Stefano and his team Make
1:14:28
this definition awesome and encompassing and
1:14:30
successful. Yes I think the
1:14:32
more voices the better the earlier on the better So
1:14:35
that we can have a great open
1:14:37
source AI definition. Thank you. Thanks
1:14:39
Stefano. Appreciate your time. Thank you so much
1:14:41
Thank you It's
1:14:46
a big question mark with the future
1:14:49
of the open source AI Definition will
1:14:51
be well the first draft of the
1:14:53
open source AI definition is linked in
1:14:55
the show notes I highly encourage you
1:14:58
to check this out dig in learn
1:15:00
about what's happening here voice your opinion
1:15:02
If you have a strong opinion, but
1:15:04
definitely pay attention as you can hear
1:15:06
with some of the uncomfortability With
1:15:09
the questions we asked about what
1:15:11
happens if the open source AI
1:15:13
definition Falls a little
1:15:16
short or what the ramifications are
1:15:18
or potential impact might be? I
1:15:20
think we all need to pay
1:15:22
close attention to how this definition
1:15:24
evolves and lands links are
1:15:26
in the show notes So check them out
1:15:28
and again Thank you to Stefano because he
1:15:31
did have a cold during
1:15:33
this conversation And he
1:15:35
powered through because he knew this was an important
1:15:37
conversation to have here on this podcast and to
1:15:39
share with you So thank you Stefano
1:15:42
up next on the pod is our friendly-turned-friend
1:15:45
Jamie Tana coming up on friends and
1:15:48
next week. It's about making your shell
1:15:50
magical with Ellie Huxtable Talking about a
1:15:52
tune check it out at a tune
1:15:55
sh Okay, once again a
1:15:57
big thank you to our friends and our partners
1:16:00
at fly.io, our friends
1:16:02
at typesense.org, and
1:16:04
of course, our friends at
1:16:06
century.io. Use the code
1:16:08
changelove to get $100 off the team
1:16:11
plan. You can do so at century.io.
1:16:14
Okay, BMC, those beats are banging. We have
1:16:16
that album out there, Dance Party. I don't
1:16:19
know about you, but I've been dancing a
1:16:21
lot more because that album has been on
1:16:23
repeat on all my places that I
1:16:25
listen to music. So I've been dancing a lot. Dance
1:16:28
Party is out there. Check it out
1:16:30
at changelove.com/beats. That's
1:16:33
it, the show's done. Thank you for tuning in. We'll
1:16:35
see you soon.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More