Podchaser Logo
Podchaser Logo
Charts
Dask: Scalable Python with Matthew Rocklin

Dask: Scalable Python with Matthew Rocklin

Released Monday, 27th April 2020
Good episode? Give it some love!
Dask: Scalable Python with Matthew Rocklin

Dask: Scalable Python with Matthew Rocklin

Dask: Scalable Python with Matthew Rocklin

Dask: Scalable Python with Matthew Rocklin

Monday, 27th April 2020
Good episode? Give it some love!
Rate Episode
List

image

Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist by giving them access to high level APIs.

Data science is often performed over huge datasets, and the data structures that are instantiated with those datasets need to be spread across multiple machines. To manage large distributed datasets, a library such as scikit-learn can use a system called Dask. Dask allows the instantiation of data structures such as a Dask dataframe or a Dask array.

Matthew Rocklin is the creator of Dask. He joins the show to talk about distributed computing with Dask, its use cases, and the Python ecosystem. He also provides a detailed comparison between Dask and Spark, which is also used for distributed data science.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Dask: Scalable Python with Matthew Rocklin appeared first on Software Engineering Daily.

Show More
Rate
List

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more
Do you host or manage this podcast?
Claim and edit this page to your liking.
,