With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
Can you describe what your definition of a "data product" is?
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Rudderstack:  RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines. RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team. RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again. Visit [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack) to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.
Upsolver:  Build Real-Time Pipelines. Not Endless DAGs! Creating real-time ETL pipelines is extremely time-consuming and engineering intensive. Why? Because when we attempt to shoehorn a 30-year old batch process into a real-time pipeline, we create an orchestration hell that makes every pipeline a data engineering project. Every pipeline is composed of transformation logic (the what) and orchestration (the how). If you run daily batches, orchestration is simple and there’s plenty of time to recover from failures. However, real-time pipelines with per-hour or per-minute batches make orchestration intricate and data engineers find themselves burdened with building Direct Acyclic Graphs (DAGs), in tools like Apache Airflow, with 10s to 100s of steps intended to address all success and failure modes, task dependencies and maintain temporary data copies. Ori Rafael, CEO and co-founder of Upsolver, will unpack this problem that bottlenecks real-time analytics delivery, and describe a new approach that completely eliminates the need for orchestration, so you can remove Airflow from your development critical path and deliver reliable production pipelines quickly. Go to [dataengineeringpodcast.com/upsolver](dataengineeringpodcast.com/upsolver) to start your 30 day trial with unlimited data, and see for yourself how to avoid DAG hell.
Datafold:  Datafold helps you deal with data quality in your pull request. It provides automated regression testing throughout your schema and pipelines so you can address quality issues before they affect production. No more shipping and praying, you can now know exactly what will change in your database ahead of time. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI, so in a few minutes you can get from 0 to automated testing of your analytical code. Visit our site at [dataengineeringpodcast.com/datafold](https://www.dataengineeringpodcast.com/datafold) today to book a demo with Datafold.
Linode:  Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to: [dataengineeringpodcast.com/linode](https://www.dataengineeringpodcast.com/linode) today you’ll even get a $100 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!