Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Released Sunday, 7th January 2024

Good episode? Give it some love!

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Sunday, 7th January 2024

Good episode? Give it some love!

Rate Episode

SummaryData processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.AnnouncementsHello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Jignesh Patel about the research that he is conducting on technical scalability and user experience improvements around data managementInterviewIntroductionHow did you get involved in the area of data management?Can you start by summarizing your current areas of research and the motivations behind them?What are the open questions today in technical scalability of data engines?What are the experimental methods that you are using to gain understanding in the opportunities and practical limits of those systems?As you strive to push the limits of technical capacity in data systems, how does that impact the usability of the resulting systems?When performing research and building prototypes of the projects, what is your process for incorporating user experience into the implementation of the product?What are the main sources of tension between technical scalability and user experience/ease of comprehension?What are some of the positive synergies that you have been able to realize between your teaching, research, and corporate activities?In what ways do they produce conflict, whether personally or technically?What are the most interesting, innovative, or unexpected ways that you have seen your research used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on research of the scalability limits of data systems?What is your heuristic for when a given research project needs to be terminated or productionized?What do you have planned for the future of your academic research?Contact InfoWebsite (https://jigneshpatel.org/)LinkedIn (https://www.linkedin.com/in/jigneshmpatel/)Parting QuestionFrom your perspective, what is the biggest gap in the tooling or technology for data management today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning.Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] (mailto:[email protected])) with your story.To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workersLinksCarnegie Mellon Universe (https://www.cmu.edu/)Parallel Databases (https://en.wikipedia.org/wiki/Parallel_database)Genomics (https://en.wikipedia.org/wiki/Genomics)Proteomics (https://en.wikipedia.org/wiki/Proteomics)Moore's Law (https://en.wikipedia.org/wiki/Moore%27s_law)Dennard Scaling (https://en.wikipedia.org/wiki/Dennard_scaling)Generative AI (https://en.wikipedia.org/wiki/Generative_artificial_intelligence)Quantum Computing (https://en.wikipedia.org/wiki/Quantum_computing)Voltron Data (https://voltrondata.com/)Podcast Episode (https://www.dataengineeringpodcast.com/voltron-data-apache-arrow-episode-346/)Von Neumann Architecture (https://en.wikipedia.org/wiki/Von_Neumann_architecture)Two's Complement (https://en.wikipedia.org/wiki/Two%27s_complement)Ottertune (https://ottertune.com/)Podcast Episode (https://www.dataengineeringpodcast.com/ottertune-database-performance-optimization-episode-197/)dbt (https://www.getdbt.com/)Informatica (https://www.informatica.com/)Mozart Data (https://mozartdata.com/)Podcast Episode (https://www.dataengineeringpodcast.com/mozart-data-modern-data-stack-episode-242/)DataChat (https://datachat.ai/)Von Neumann Bottleneck (https://www.techopedia.com/definition/14630/von-neumann-bottleneck)The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)