Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:11
Hello, and welcome to the Data Engineering
0:13
Podcast, the show about modern data management. You
0:16
shouldn't have to throw away the database to build
0:18
with fast-changing data. You should be able
0:20
to keep the familiarity of SQL and
0:22
the proven architecture of cloud warehouses, but
0:24
swap the decades-old batch computation model for
0:26
an efficient incremental engine to get complex
0:28
queries that are always up to date.
0:31
With Materialize, you can. It's the only true
0:34
SQL streaming database built from the ground up
0:36
to meet the needs of modern data products.
0:39
Whether it's real-time dashboarding and analytics,
0:41
personalization and segmentation, or automation and
0:43
alerting, Materialize gives you the ability
0:45
to work with fresh, correct, and
0:47
scalable results, all in a familiar
0:50
SQL interface. Go to dataengineeringpodcast.com/materialize
0:52
today to get two
0:54
weeks free. Introducing
0:57
Rutterstack Profiles. Rutterstack
0:59
Profiles takes the SAS guesswork and
1:01
SQL grunt work out of building complete
1:03
customer profiles so you can quickly ship
1:06
actionable, enriched data to every downstream team.
1:09
You specify the customer traits, then Profiles
1:11
runs the joins and computations for you
1:14
to create complete customer profiles. Get
1:16
all of the details and try the
1:19
new product today at dataengineeringpodcast.com/ rutterstack.
1:23
Your host is Tobias Macy, and today
1:25
I'm going to be sharing an update
1:27
on my own experience of the journey
1:29
of building a data platform from scratch.
1:32
And today I'm in particular going
1:35
to be focusing on the challenges
1:37
of integrating the disparate tools that
1:39
are required to build a comprehensive
1:41
platform and some of
1:44
the complexities around being able
1:46
to maintain a single
1:48
source of truth or a single interface for
1:50
being able So
2:00
for the better part of six years now,
2:03
maybe close to seven, I
2:05
have been working in technology for
2:07
over a decade. And I have
2:09
been spending the past year and
2:11
a half coming up on
2:13
maybe two years building a
2:16
data platform from scratch to get all
2:18
the buzzwords out of the way using
2:21
a cloud-first data
2:23
lake house architecture, focusing
2:25
on DBT for transformation,
2:28
Firebyte for extract and load, DAGSTR
2:30
for the full integration, and using
2:32
Trino as the query engine for
2:35
data on top of S3. So
2:38
recognizing that that puts me in the
2:41
minority of most people who are building
2:43
a data platform, particularly if they have
2:45
small teams, that has put
2:47
me in the position of needing to figure
2:51
out some of the interfaces
2:53
for integration. For
2:56
a lot of people where they're just
2:58
getting started with building up the data
3:00
platform, they're probably going to be going
3:03
with the managed platform route or picking
3:05
from a selection of different vendors. The
3:08
canonical one for the so-called modern
3:10
data stack is probably 5TRAN, Snowflake,
3:13
and DBT, with
3:15
maybe Looker as your business
3:18
intelligence layer. I'm
3:20
not going to spend a lot of time
3:22
in this episode digging into the motivations for
3:24
why I selected those different components of the
3:26
stack, because I've covered it in other episodes,
3:28
which I will link to in the show
3:30
notes. But it has
3:32
definitely led to a certain
3:34
amount of friction as I try to
3:37
manage some of the different integrations out
3:39
of the box, although the story for
3:41
that particular set of technologies has been
3:44
steadily getting better as time goes by.
3:47
Today, I really want to talk about
3:49
where I am in my journey currently.
3:52
I have a core set of
3:54
functioning capabilities. I can ingest data.
3:57
I can transform it. I can
3:59
query it. But now I'm
4:01
getting to the point of needing to be able to
4:04
onboard more people, provide a
4:06
more seamless user experience, being
4:09
able to manage some of the
4:11
different means of data sharing or
4:13
data delivery, where maybe not everybody
4:15
who's going to be accessing the
4:17
data lives within the bounds of
4:19
my team or my department. And
4:23
there are definitely data sharing
4:26
capabilities that are part of some
4:28
of the different platforms, most notable
4:30
being probably BigQuery and Snowflake with
4:32
the ways that they manage data
4:35
sharing. But there are
4:37
a number of different ways of approaching that.
4:40
Given that I am in the
4:42
world of building a lake house
4:44
architecture, I've got my data in
4:46
S3, I'm using the iceberg
4:48
table format, so all of the data
4:50
is already representable as a
4:53
table. And
4:55
so then the question is, okay, for
4:57
somebody who just wants to be able
4:59
to access the data, what's the best
5:01
way to deliver it? Do I need
5:03
to provide one-off jobs to generate
5:06
a CSV and send it to them
5:08
via email? Do I need to give
5:10
an S3 bucket to be able to load things from? It
5:13
all depends on what the level of sophistication is of
5:15
the people who are going to be consuming the data.
5:17
Maybe it's just a dashboard format and people just need
5:19
to be able to look at the data. One
5:23
of the challenges of that handoff
5:25
is also when you need to
5:27
be able to be
5:29
considerate of how that data is
5:31
going to be used after you
5:33
present it, because maybe
5:35
you want to be able to give a
5:38
visual representation or give away for somebody to
5:41
access the data, but
5:43
you don't want them to then
5:45
exfiltrate it into another system
5:47
by a means of a CSV export, for
5:49
instance. And that's where you
5:51
start getting into questions of governance
5:54
and who has access
5:56
to what, being able to audit data access.
6:00
But harking back to one
6:02
of the episodes I did a while ago
6:04
on the idea of shadow IT in the
6:06
data industry, the
6:08
best way to prevent people from
6:12
taking data out of the context in which
6:14
you want it to be presented and bringing
6:16
it into other tools is
6:18
to reduce any friction
6:22
or pain that they're experiencing accessing
6:24
the data in the way that
6:26
you have presented to them. Because
6:29
if your options are the best
6:31
option and the most accessible option for
6:34
the people who are viewing
6:36
that data, then they're not going to want to
6:38
bringing it out of that system because you are
6:40
giving them the best experience. And
6:43
so that's where I am right now of figuring
6:45
out what is that best experience for everybody? What
6:48
are the requirements? How do I then manage
6:50
that? And a lot
6:52
of the complexity comes in with
6:54
the elements of interoperability
6:57
and integration as
6:59
you start to add more layers
7:01
and components and capabilities into
7:04
the overall platform. And
7:07
I'm using the term platform deliberately because
7:09
I am aiming for a holistic experience
7:12
for end users versus just a number
7:14
of point solutions where somebody
7:16
can maybe plug something in and they do their
7:18
thing and then somebody else can plug something in
7:20
and do their own thing. I
7:24
want to figure out what is the minimal self set
7:26
of interfaces that I
7:28
need to build and support to be able
7:30
to address the widest variety of
7:33
needs while still being extensible for the
7:35
case where somebody has a bespoke requirement
7:37
that I need to be able to
7:40
fulfill. How do I make sure that
7:43
that doesn't add an undue amount of
7:45
maintenance burden on myself and my team
7:48
while still being able to deliver on that
7:50
request. In
7:52
general, the way that that
7:55
degree of interoperability is managed
7:57
is through adoption of open
8:00
standards that everybody has agreed
8:03
upon. So SQL is probably
8:05
the longest-lived one of those
8:07
open standards, although to be fair, there
8:09
are multiple different varieties of it, but
8:11
at its core, SQL is understandable. So
8:13
if you have a means of using
8:16
SQL to query data, that is
8:18
going to make it
8:20
easy for a lot of different tools
8:22
and people to be able to do
8:24
their exploration and self-serve. DBT
8:27
has capitalized on that as
8:29
a means of being able to build their
8:32
products to be able to say, okay, the
8:34
majority of structured data sources are going to
8:36
be addressable by SQL, so we're going to
8:38
build a tool that allows
8:41
people to build a
8:43
full engineering development and delivery flow
8:45
for that SQL data, manage the
8:48
transformations through our tool, and then
8:50
add a lot of nicety around
8:52
it in terms of the lineage
8:54
and data documentation, et cetera. Because
8:57
of the fact that they have
8:59
invested so much in the ease of
9:01
use of the tool as well as
9:04
doing a lot of advocacy in the
9:06
community to drive adoption,
9:08
that has led to a number
9:11
of different other tools
9:13
integrating with them. So they
9:15
have become a de facto
9:17
interface for managing transformations in
9:20
a warehouse or warehouse-like
9:22
context. So there are tools such
9:24
as Preset that
9:26
are building
9:28
integrations into DBT as
9:31
far as the data types
9:33
and the metadata that they generate. There
9:36
are entire products that are built
9:38
on top of the metadata that
9:40
DBT generates. LiteDash is one of
9:42
those. So
9:46
that has given DBT as a
9:48
tool and as a company a
9:50
lot of momentum, a lot
9:52
of inertia. So it makes
9:55
it more difficult for other
9:57
people who maybe are targeting a similar
9:59
type of tool. use case but
10:01
want to add a step
10:03
change to some of the capabilities
10:05
around that tool to be
10:08
able to actually break in and overtake that
10:10
market. So the most notable one that I'm
10:12
aware of so far is SQL Mesh who
10:15
are actually adding a compatibility layer
10:17
with dbt for being able to
10:20
run SQL Mesh on top of a dbt
10:22
project where you can actually execute the transformations
10:24
without having to do any code changes which
10:27
solves for the adoption step
10:30
of I just want to be able to try out
10:32
this tool but then there is
10:34
still the challenge that they're going to have
10:36
to overcome of building up
10:38
that set of integrations with
10:40
the broader ecosystem that dbt
10:43
has benefited from so
10:45
that makes that road to adoption
10:47
and the road to being a
10:50
viable competitor a lot
10:52
longer. Another
10:54
tool that has benefited from this status
10:57
of being a de facto standard and
10:59
a reference implementation is Airflow
11:01
where they have been around for a long
11:04
enough time and they have been adopted by a
11:06
large enough user base that if
11:08
you as a new tool vendor build
11:11
your initial integration with Airflow
11:14
then you have a large enough addressable market
11:16
that it is worth the time of building
11:18
that integration but it's not
11:21
necessarily worth it to build
11:23
that same integration for Daxter or Prefect
11:26
or one of the other
11:29
orchestration systems that are out there. So as
11:31
you are building
11:34
your own platforms and doing this
11:36
tool selection it's worth considering am
11:38
I choosing something that is going
11:40
to be able to benefit from the existing
11:43
weight of the community integrations that
11:45
are available for it. If
11:48
not then is the value that
11:50
it is providing worth that additional
11:52
complexity of me having to go
11:54
out and build those integrations or
11:56
work with other vendors to build
11:58
those integrations. Developing
12:04
event-driven pipelines is going to be
12:06
a lot easier. Meet Functions. Memphis
12:08
Functions enable developers and data engineers
12:11
to build an organizational toolbox of
12:13
functions to process, transform, and enrich
12:15
ingested events on the fly, in
12:17
a serverless manner using AWS Lambda
12:20
syntax, without boilerplate. It includes
12:22
orchestration, error handling, and infrastructure
12:24
in almost any language, including
12:26
Go, Python, JavaScript, .NET, Java,
12:28
SQL, and more. Go
12:31
to dataengineeringpodcast.com/memphis today to
12:33
get started. And
12:37
then another element of
12:39
that integration question is the
12:43
challenge of a lot of the
12:45
different tools in the stack will
12:47
want to be the single source
12:49
of truth for certain aspects of
12:51
your platform. One of
12:53
the ones that I've been dealing with
12:55
recently is the question of access control
12:57
and role definition and security and auditability
13:00
of the data. So because
13:03
I am building with a
13:05
disaggregated database engine where the
13:08
storage and the query access
13:10
are separated, that gives two
13:12
different locations where those
13:15
definitions can be stored. So
13:18
one camp will say, well, because
13:20
of the fact that the storage layer is
13:23
the lowest level and will have
13:25
potentially multiple different points of access
13:27
where maybe I'm using TreeNode to
13:29
query it by SQL, but I
13:31
may also just be accessing
13:34
the data in S3 directly using
13:36
some Python tool, or maybe I'm
13:38
using something like Spark or Flink
13:40
to do other processing approaches to
13:42
it beyond just SQL. So
13:44
all of that access information needs to
13:46
live in the storage layer. So that's
13:48
the approach that companies like Tabular are
13:50
taking with the Iceberg table format, where
13:54
as long as you have the storage layer secured, then
13:56
it doesn't matter what all the other layers on top
13:58
of it might have to look like. to say about
14:00
access control because it's going to be enforced at the
14:02
table level. And then
14:05
you go the next level up,
14:07
engines like Trino or the Galaxy
14:09
platform from Starburst say we want
14:11
to own the access control because
14:14
we are going to
14:16
have more visibility into the specifics of
14:18
the queries and we can add things
14:20
like role-level filtering in the role definitions.
14:23
So we should be the ones to own that role-based
14:26
access control information or attribute-based
14:29
access control and then
14:32
that's definitely another
14:34
viable option but then you go
14:36
another layer up the stack into
14:38
business intelligence and maybe
14:41
there's even more granularity available because
14:43
you can say oh well this
14:45
person has access to this
14:47
particular data set with this role-level
14:50
filtering and they can view these visualizations
14:52
on that data but maybe they can't
14:54
write their own queries against that data.
14:57
So there's that challenge of okay
15:00
well do I have to define
15:02
the roles in three different places? Do
15:05
I have to have slightly different roles across all
15:07
those three places? How do I align them to
15:09
be able to say well this user in the
15:11
business intelligence is the same as this user in
15:14
the query engine is the same as this user
15:16
in the storage layer and make sure that they
15:18
are getting a cohesive experience across those boundaries. So
15:22
particularly when you have
15:24
tool systems that maybe
15:26
don't want to or better
15:29
maybe disincentivized to do
15:31
that propagation of role information so
15:33
for instance if I use tabular
15:35
and I say I'm going to
15:38
define a role that will grant read access
15:40
to this subset of tables
15:43
but no other visibility of the rest of
15:45
the data set how
15:48
then do I reflect that information to the
15:50
query layer of who that user is to
15:53
be able to enforce those permissions
15:55
and then all the way up to the business
15:57
intelligence layer to say from the
15:59
storage layer These are the permissions that
16:01
you have in the UI. So that
16:04
brings in the need to have some
16:06
manner of single sign-on and
16:08
single source of identity for people
16:10
across all of those boundaries. Beyond
16:13
the question of permissions, there's another set
16:15
of information that is disjoint and
16:19
wants to be owned by different components
16:21
within the stack. And that is the
16:23
question of data flow, data processing,
16:26
data lineage, where
16:29
each tool maybe has a certain
16:31
view of what the lineage graph
16:33
looks like or what the processing
16:35
looks like. But
16:38
you don't necessarily have the complete
16:40
end-to-end view of a piece of
16:42
data from maybe where it
16:44
lands in an application database all the way
16:47
through to where it's being
16:49
presented in a business intelligence dashboard
16:51
or incorporated as a feature for
16:53
machine learning model training workflow. So
16:56
for instance, DBT has the table
16:58
level lineage of the transformations that
17:01
it's providing. Tools like
17:03
Airflow and DAGSTER have the view of
17:05
the lineage of all of the tasks
17:07
that they are responsible for executing, but
17:10
they don't necessarily have information
17:13
about out-of-band transformations that
17:15
are happening by some analysts who
17:17
are running ad hoc queries in
17:19
the data warehouse. Or they maybe
17:22
don't have visibility into the
17:25
data as it's landing in an
17:27
application database and they only see the data
17:29
once it lands in the warehouse, or maybe
17:31
they can see the data
17:34
integration step with an AirBiter or 5TRAN to
17:36
say, okay, and at this
17:38
table in the application database feeds into this table
17:40
in the warehouse and then these DBT flows happen,
17:43
but maybe it loses sight of the
17:45
data dashboards that are being generated. And
17:49
so again, this is the question of open
17:52
protocols, interoperability. So tools like open
17:55
lineage are designed to help address that,
17:57
where you fire events. that
18:00
can be then constituted into a
18:02
more cohesive lineage graph. And
18:05
then you also have systems such
18:07
as metadata platforms that are designed
18:09
to be more
18:11
holistic views of the entire
18:14
ecosystem and incorporate things like
18:16
data discovery, data governance, which
18:18
gives you a single place to be able
18:21
to view all that information. But then you're
18:23
back to that question of integration of, OK,
18:25
well, this can store and convey all of
18:27
this information, but how do
18:29
I get all this information into it? So
18:31
each of those tools are going to have
18:33
different means of being able to
18:35
push or pull data, but
18:38
you have to make sure as the platform designer
18:40
and operator that
18:43
those data flows are also happening. So it's
18:46
an additional set of tags that you need
18:48
to make sure are running, you need to
18:50
make sure they're reliable. So
18:52
it's useful, but it's also an
18:54
additional burden. So
18:57
these are all things that I've been dealing
18:59
with recently. And then in
19:02
the metadata catalog situation, even if you
19:04
do manage to feed all of your
19:06
data into that, it
19:09
is useful as a means of discovery
19:12
or a means of being able to keep
19:16
tabs on what's happening. But
19:18
then it also feeds back into, OK, well,
19:20
if I want to use this as my
19:22
single source of truth, how
19:24
then do I propagate that truth back into other
19:26
systems? And that's where you start to get into
19:28
questions of things like active metadata. And
19:31
then you have another set of integrations
19:33
and another direction of integrations
19:35
where if I say, OK, I have
19:37
my metadata catalog, this is my source
19:39
of truth for role information and who
19:41
can access what. Now I need to
19:43
be able to push that back down
19:45
into the storage layer and into the
19:47
query engine and into the business intelligence
19:49
dashboard. And I need to
19:51
make sure that all of those integrations are reliable
19:54
and that there are appropriate
19:56
mappings between the different concepts throughout
19:58
the different systems. Data
20:04
lakes are notoriously complex. For
20:06
data engineers who battle to build
20:08
and scale high-quality data workflows on
20:10
the data lake, Starburst powers petabyte-scale
20:12
SQL analytics fast at a fraction
20:14
of the cost of traditional methods,
20:16
so that you can meet all
20:18
of your data needs, ranging from
20:20
AI to data applications to complete
20:22
analytics. Trusted by teams of all
20:24
sizes, including Comcast and DoorDash, Starburst
20:26
is a data lake analytics platform
20:28
that delivers the adaptability and flexibility
20:30
a lake has ecosystem promises. And
20:33
Starburst does all of this on an
20:35
open architecture, with first-class support for Apache
20:38
Iceberg, Delta Lake, and Hoodie, so you
20:40
always maintain ownership of your data. Want
20:43
to see Starburst in action? Go
20:46
to dataengineeringpodcast.com/Starburst and get $500 in
20:48
credits to try Starburst Galaxy today,
20:50
the easiest and fastest way to
20:53
get started using Trino. So
20:58
this is definitely one of the benefits
21:00
that fully vertically integrated platforms have of
21:02
you don't have to fight with all
21:04
of those different layers of integration. But
21:08
the problem there is that you have
21:10
to rely on the integrations that have
21:12
been built, and maybe you are constrained
21:14
by what that vertically integrated platform offers.
21:17
You don't necessarily have the means of
21:19
being able to extend it into areas
21:22
that the platform developers
21:25
haven't had the time to implement
21:27
or haven't been exposed to as
21:30
a necessity. So
21:33
it's the continual
21:37
story of software
21:40
design, software development, of how do
21:42
we build these systems that are
21:44
extensible and can be integrated, but
21:46
also make sure that the product
21:49
that you're building doesn't just
21:51
get crushed under the weight of
21:53
having to maintain all of these
21:55
different point solutions. I think that
21:58
we're definitely... iterating towards
22:00
these community standards. I think that tools
22:02
like Open Lineage, I think that the
22:05
work that the open metadata folks are
22:07
doing with their schema first approach to
22:09
metadata is interesting. I
22:11
think that the work that DBT has done to
22:13
become that de facto standard for this is
22:15
how your transformations are represented, so other tools can
22:18
be able to build on top of that, are
22:20
all valuable. So it's great to see the
22:22
direction that the community has taken on
22:24
all of these fronts. But I do
22:27
think that we are definitely still not to
22:30
the point where we have
22:32
a lot of the answers fully baked. I
22:34
think that everybody
22:36
who is investing in this ecosystem, everybody who
22:38
is building these tools and using these tools
22:41
and giving feedback to the tool vendors is
22:43
helping to bring us to a better place. But
22:47
as somebody who is trying to integrate
22:50
so many different pieces and try to figure
22:52
out what does that holistic platform look like,
22:55
how do I build it in a way
22:57
that I can maintain it with a small
23:00
team, but also be able to have
23:02
the flexibility required to address a wide set
23:04
of audiences, it's definitely
23:07
still a challenge, but one
23:09
that I've been enjoying having
23:11
the opportunity to explore and
23:13
invest in. So now
23:15
they have been iterating on these challenges and
23:17
thinking through how best to build that holistic
23:21
platform and something that is going
23:23
to be enjoyable and usable
23:25
by a number of different people. The next
23:28
main architectural component that I've
23:30
been starting to work towards
23:32
is that metadata platform so that I
23:34
can have that more cross-cutting view, so
23:37
that I can improve the data discovery story
23:39
for people who aren't part of the engineering
23:41
team, who just want to see what data
23:43
do you have, where is it, how do
23:46
I access it. So that's where I'm going
23:48
to be doing some of my next work
23:50
of picking the metadata platform, getting it
23:52
integrated, getting all of the data flows
23:54
propagated to that tool so that we
23:56
can see how everything
23:58
is flowing. and then being
24:00
able to start integrating
24:03
single sign-on as a means of identity
24:05
management across the different layers and then
24:07
being able to say, okay, you
24:10
came through the metadata platform to do data
24:12
discovery. Now you say, okay, here's the data
24:14
set that I want to explore
24:17
or here is the chart that I
24:19
want to view, then being able to
24:21
have a simple means of
24:23
clicking a button and jumping them into
24:25
the experience that they're requesting, for instance,
24:27
or being able to say, okay, I
24:29
need to query this database and
24:31
then giving them the pre-filled set of
24:34
credentials or the pre-filled client connection to
24:36
be able to run those
24:38
queries with their tool of choice, whether
24:40
that be something like a Tableau or
24:42
a SQL command line, etc. And
24:45
as I have been doing
24:48
this work, the most interesting
24:50
or innovative or unexpected aspects
24:53
have definitely been these integration
24:55
boundaries of saying, okay, this is the tool that I
24:57
have, this is the other tool that I have, I
24:59
would like to be able to use them together in
25:02
this manner, now how do I go about and do that?
25:04
So I am
25:06
definitely happy that the Python
25:10
language has become one of
25:12
the most widely adopted ecosystems
25:15
for data engineering because it does simplify
25:18
some of that work where you
25:20
can pretty easily assume that there's going to be
25:22
a Python library that does at least 80% of
25:25
what you're trying to do. So in, for
25:28
instance, the open metadata platform, they have
25:30
a Python library for doing
25:32
that metadata ingestion. So even if they
25:34
don't have an out of the box
25:36
solution for ingesting metadata from the tool
25:39
that you're using, they do have
25:41
a Python client that you can provide
25:43
metadata to them in order
25:46
to be able to propagate information from a
25:48
system that they haven't already addressed. So I
25:50
do think that that is a another
25:54
good trend in the ecosystem of providing
25:56
a good set of software clients
25:58
to be able to... to integrate with
26:01
their tool, even if it's from a
26:04
system that they themselves didn't already plan
26:06
to integrate with. So while it does
26:08
add a bit of extra burden to
26:10
the people who are trying to use
26:13
those systems, there is at least a
26:15
path to success for it. I'd
26:17
also say that that's probably the most
26:19
challenging lesson that I've learned as well,
26:21
of figuring out what are the points
26:24
of integration that are worth investing in, and
26:26
what are the ones that I
26:29
don't have to invest in right now, and maybe
26:31
I can wait to see if the ecosystem around
26:33
that tool grows up. So SQL Mesh is a
26:36
tool that I've been keeping an eye on. When
26:39
I first came across it, it didn't have
26:41
support for Trino, so that was an obvious
26:43
no for being able to use it. Now
26:45
they have it, but they don't yet have
26:47
an integration with Dagster. And
26:49
then also, as I was pointing to earlier,
26:51
the existing weight of integration with dbt gives me
26:53
a bit of pause of, okay, even if I
26:56
do get it working with Trino and Dagster, what
26:59
about all the other pre-built
27:01
integrations that I am not
27:03
going to be able to take advantage
27:05
of because I have decided to use
27:07
a newer tool that hasn't been as
27:10
widely adopted and integrated. So definitely
27:12
a challenging aspect of that as well. If
27:15
you have found my musings useful
27:17
or informative, or if they have
27:19
inspired something that you would like
27:21
to discuss further, I'm definitely always
27:23
happy to take suggestions or
27:25
bring people on the show or
27:29
take some feedback. So for anybody who wants to
27:31
get in touch with me, I'll add my contact
27:33
information to the show notes, but the best way
27:35
is on dataengineeringpodcast.com.
27:38
That has links to how you can find me. And
27:42
for the final question of what I see
27:44
as being the biggest gap in the tooling
27:46
or technology for data management today, I
27:49
definitely think it's that single source
27:51
of truth for identity and
27:53
access across the data stack of
27:55
being able to figure out how
27:57
do I manage permissions.
28:00
and roles across a variety
28:02
of tools without having to
28:05
build different integrations every
28:07
single time. So
28:10
definitely look forward to seeing more
28:12
investment in that, maybe even using
28:14
something like Open Policy Agent from
28:16
the cloud native ecosystem. So definitely,
28:20
definitely happy to continue investigating
28:22
that as well. So thank you for taking the
28:24
time to listen to me. I hope
28:26
it has been, if not
28:28
informative, at least entertaining. I'm very
28:30
thankful for being able to
28:33
run this show and bring
28:35
all of these ideas to everybody
28:37
who listens. So in
28:39
honor of this being the Thanksgiving week,
28:41
I just wanted to share that gratitude
28:43
to everybody who takes the time out
28:46
of their life to pay
28:48
attention to myself and the people I bring
28:50
on the show. So thank you again,
28:52
and I hope you all have a good rest of your day.
29:23
I hope you enjoyed this show. If you enjoyed
29:25
this show, please like, share, and subscribe. If
29:30
you want to know more about the show, please leave a
29:32
review on Apple Podcasts.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More