Please take the Dask User Survey for 2019. Your reponse helps to prioritize future work.

We are pleased to announce the release of Dask version 2.0. This is a major release with bug fixes and new features.

Most major version changes of software signal many new and exciting features. That is not the case with this release. Instead, we’re bumping the major version number because we’ve broken a few APIs to improve maintainability, and because we decided to drop support for Python 2.

This blogpost outlines these changes.

Install

As always, you can conda install Dask:

conda install dask

or pip install from PyPI:

pip install "dask[complete]" --upgrade

Full changelogs are available here:

Drop support for Python 2

Python 2 reaches end of life in 2020, just six months away. Most major PyData projects are dropping Python 2 support around now. See the Python 3 Statement for more details about some of your favorite projects.

Python 2 users can continue to use older versions of Dask, which are in widespread use today. Institutions looking for long term support of Dask in Python 2 may wish to reach out to for-profit consulting companies, like Quansight.

Dropping Python 2 will allow maintainers to spend more of their time fixing bugs and developing new features. It will also allow the project to adopt more modern development practices going forward.

Small breaking changes

We now include a list with a brief description of most of the breaking changes:

The distributed.bokeh module has moved to distributed.dashboard

Various ncores keywords have been moved to nthreads

keywords have been moved to Client.map/gather/scatter no longer accept iterators and Python queue objects. Users can handle this themselves with submit / as_completed or can use the Streamz library.

/ or can use the Streamz library. The worker /main route has moved to /status

route has moved to Cluster.workers is now a dictionary mapping worker name to worker, rather than a list as it was before

Some larger fun changes

We didn’t only break things. We also added some new things :)

Array metadata

Previously Dask Arrays were defined by their shape, chunkshape, and datatype, like float, int, and so on.

Now, Dask Arrays also know the type of their chunks. Historically this was almost always a NumPy array, so it didn’t make sense to store, but now that Dask Arrays are being used more frequently with sparse array chunks and GPU array chunks we now maintain this information as well in a ._meta attribute. This is already how Dask dataframes work, so it should be familiar to advanced users of that module.

>>> import dask.array as da >>> x = da . eye ( 1000000 ) >>> x . _meta array ([], shape = ( 0 , 0 ), dtype = float64 ) >>> import sparse >>> s = x . map_blocks ( sparse . COO . from_numpy ) >>> s . _meta < COO : shape = ( 0 , 0 ), dtype = float64 , nnz = 0 , fill_value = 0.0 >

This work was largely done by Peter Entschev

Array HTML output

Dask arrays now print themselves nicely in Jupyter notebooks, showing a table of information about their size and chunk size, and also a visual diagram of their chunk structure.

import dask.array as da x = da . ones (( 10000 , 1000 , 1000 ))

Array Chunk Bytes 80.00 GB 125.00 MB Shape (10000, 1000, 1000) (250, 250, 250) Count 640 Tasks 640 Chunks Type float64 numpy.ndarray 1000 1000 10000

Proxy Worker dashboards from the Scheduler dashboard

If you’ve used Dask.distributed they you’re probably familiar with Dask’s scheduler dashboard, which shows the state of computations on the cluster with a real-time interactive Bokeh dashboard. However you may not be aware that Dask workers also have their own dashboard, which shows a completely separate set of plots for the state of that individual worker.

Historically these worker dashboards haven’t been as commonly used because it’s hard to connect to them. Users don’t know their address, or network rules don’t enable direct web connections. Fortunately, the scheduler dashboard is now able to proxy a connection from the user to the worker dashbaord.

You can access this by clicking on the “Info” tab and then selecting the “dashboard” link next to any of the workers. You will need to also install jupyter-server-proxy

pip install jupyter-server-proxy

Thanks to Ben Zaitlen for this fun addtition. We hope that now that these plots are made more visible, people will invest more into developing plots for them.

Black everywhere

We now use the Black code formatter throughout most Dask repositories. These repositories include pre-commit hooks, which we recommend when developing on the project.

cd /path/to/dask git checkout master git pull upstream master pip install pre-commit pre-commit install

Git will then call black and flake8 whenever you attempt to commit code.

Dask Gateway

We would also like to inform readers about the somewhat new Dask Gateway project that enables institutions and IT to control many Dask clusters for a variety of users.

Acknowledgements

There have been several releases since the last time we had a release blogpost. The following people contributed to the following repositories since the 1.1.0 release on January 23rd:

dask/dask (Rick) Richard J Zamora Abhinav Ralhan Adam Beberg Alistair Miles Álvaro Abella Bascarán Anderson Banihirwe Aploium Bart Broere Benjamin Zaitlen Bouwe Andela Brett Naul Brian Chu Bruce Merry Christian Hudon Cody Johnson Dan O’Donovan Daniel Saxton Daniel Severo Danilo Horta Dimplexion Elliott Sales de Andrade Endre Mark Borza Genevieve Buckley George Sakkis Guillaume Lemaitre HSR05 Hameer Abbasi Henrique Ribeiro Henry Pinkard Hugo Ian Bolliger Ian Rose Isaiah Norton James Bourbeau Janne Vuorela John Kirkham Jim Crist Joe Corbett Jorge Pessoa Julia Signell JulianWgs Justin Poehnelt Justin Waugh Ksenia Bobrova Lijo Jose Marco Neumann Mark Bell Martin Durant Matthew Rocklin Michael Eaton Michał Jastrzębski Nathan Matare Nick Becker Paweł Kordek Peter Andreas Entschev Philipp Rudiger Philipp S. Sommer Roma Sokolov Ross Petchler Scott Sievert Shyam Saladi Søren Fuglede Jørgensen Thomas Zilio Tom Augspurger Yu Feng aaronfowles amerkel2 asmith26 btw08 gregrf mbarkhau mcsoini severo tpanza

dask/distributed Adam Beberg Benjamin Zaitlen Brett Jurman Brett Randall Brian Chu Caleb Chris White Daniel Farrell Elliott Sales de Andrade George Sakkis James Bourbeau Jim Crist John Kirkham K.-Michael Aye Loïc Estève Magnus Nord Manuel Garrido Marco Neumann Martin Durant Mathieu Dugré Matt Nicolls Matthew Rocklin Michael Delgado Michael Spiegel Muammar El Khatib Nikos Tsaousis Olivier Grisel Peter Andreas Entschev Sam Grayson Scott Sievert Tom Augspurger Torsten Wörtwein amerkel2 condoratberlin deepthirajagopalan7 jukent plbertrand

dask/dask-ml Alejandro Florian Rohrer James Bourbeau Julien Jerphanion Matthew Rocklin Nathan Henrie Paul Vecchio Ryan McCormick Saadullah Amin Scott Sievert Sriharsha Atyam Tom Augspurger

dask/dask-jobqueue Andrea Zonca Guillaume Eynard-Bontemps Kyle Husmann Levi Naden Loïc Estève Matthew Rocklin Matyas Selmeci ocaisa

dask/dask-kubernetes Brian Phillips Jacob Tomlinson Jim Crist Joe Hamman Joseph Hamman Matthew Rocklin Tom Augspurger Yuvi Panda adam

dask/dask-examples Christoph Deil Genevieve Buckley Ian Rose Martin Durant Matthew Rocklin Matthias Bussonnier Robert Sare Tom Augspurger Willi Rath

dask/dask-labextension Daniel Bast Ian Rose Matthew Rocklin Yuvi Panda



Please enable JavaScript to view the comments powered by Disqus.

Disqus