Podchaser Logo
Podchaser Logo
Charts
Great Expectations: Data Pipeline Testing with Abe Gong

Great Expectations: Data Pipeline Testing with Abe Gong

Released Monday, 17th February 2020
Good episode? Give it some love!
Great Expectations: Data Pipeline Testing with Abe Gong

Great Expectations: Data Pipeline Testing with Abe Gong

Great Expectations: Data Pipeline Testing with Abe Gong

Great Expectations: Data Pipeline Testing with Abe Gong

Monday, 17th February 2020
Good episode? Give it some love!
Rate Episode
List

imageA data pipeline is a series of steps that takes large data sets and creates usable results from them. At the beginning of a data pipeline, a data set might be pulled from a database, a distributed file system, or a Kafka topic. Throughout a data pipeline, different data sets are joined, filtered, and statistically analyzed.

At the end of a data pipeline, data might be put into a data warehouse or Apache Spark for ad-hoc analysis and data science. At this point, the end-user of the data set expects that data to be clean and accurate. But how do we have any guarantees about the correctness?

Abe Gong is the creator of Great Expectations, a system for data pipeline testing. In Great Expectations, the developer creates tests called “expectations”, which verify certain characteristics of the data set at different phases in a data pipeline. This helps ensure that the end result of a multi-stage data pipeline is correct.

Abe joins the show to discuss the architecture of a data pipeline and the use cases of Great Expectations.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Great Expectations: Data Pipeline Testing with Abe Gong appeared first on Software Engineering Daily.

Show More
Rate
List

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more
Do you host or manage this podcast?
Claim and edit this page to your liking.
,