Here we describe key technical improvements made in the last three years.

Data structures

Sparse matrices

scipy.sparse offers seven sparse matrix data structures, also known as sparse formats. The most important ones are the row- and column-compressed formats (CSR and CSC, respectively). These offer fast major-axis indexing and fast matrix-vector multiplication, and are used heavily throughout SciPy and dependent packages.

Over the last three years, our sparse matrix handling internals were rewritten and performance was improved. Iterating over and slicing of CSC and CSR matrices is now up to 35% faster, and the speed of coordinate (COO)/diagonal (DIA) to CSR/CSC matrix format conversions has increased. SuperLU63 was updated to version 5.2.1, enhancing the low-level implementations leveraged by a subset of our sparse offerings.

From a new features standpoint, scipy.sparse matrices and linear operators now support the Python matrix multiplication (@) operator. We added scipy.sparse.norm and scipy.sparse.random for computing sparse matrix norms and drawing random variates from arbitrary distributions, respectively. Also, we made a concerted effort to bring the scipy.sparse API into line with the equivalent NumPy API where possible.

cKDTree

The scipy.spatial.ckdtree module, which implements a space-partitioning data structure that organizes points in k-dimensional space, was rewritten in C++ with templated classes. Support was added for periodic boundary conditions, which are often used in simulations of physical processes.

In 2013, the time complexity of the k-nearest-neighbor search from cKDTree.query was approximately loglinear68, consistent with its formal description69. Since then, we enhanced cKDTree.query by reimplementing it in C++, removing memory leaks and allowing release of the global interpreter lock (GIL) so that multiple threads may be used70. This generally improved performance on any given problem while preserving the asymptotic complexity.

In 2015, SciPy added the sparse_distance_matrix routine for generating approximate sparse distance matrices between KDTree objects by ignoring all distances that exceed a user-provided value. This routine is not limited to the conventional L2 (Euclidean) norm but supports any Minkowski p-norm between 1 and infinity. By default, the returned data structure is a dictionary of keys (DOK)-based sparse matrix, which is very efficient for matrix construction. This hashing approach to sparse matrix assembly can be seven times faster than constructing with CSR format71, and the C++ level sparse matrix construction releases the Python GIL for increased performance. Once the matrix is constructed, distance value retrieval has an amortized constant time complexity72, and the DOK structure can be efficiently converted to a CSR, CSC or COO matrix to allow for speedy arithmetic operations.

In 2015, the cKDTree dual tree counting algorithm73 was enhanced to support weights74, which are essential in many scientific applications, for example, computing correlation functions of galaxies75.

Unified bindings to compiled code

LowLevelCallable

As of SciPy version 0.19, it is possible for users to wrap low-level functions in a scipy.LowLevelCallable object that reduces the overhead of calling compiled C functions, such as those generated using Numba or Cython, directly from Python. Supported low-level functions include PyCapsule objects, ctypes function pointers and cffi function pointers. Furthermore, it is possible to generate a low-level callback function automatically from a Cython module using scipy.LowLevelCallable.from_cython.

Cython bindings for BLAS, LAPACK and special

SciPy has provided special functions and leveraged basic linear algebra subprograms (BLAS) and linear algebra package (LAPACK)76 routines for many years. SciPy now additionally includes Cython40 wrappers for many BLAS and LAPACK routines (added in 2015) and the special functions provided in the scipy.special subpackage (added in 2016), which are available in scipy.linalg.cython_blas, scipy.linalg.cython_lapack and scipy.special.cython_special, respectively. When writing algorithms in Cython, it is typically more efficient to call directly into the libraries SciPy wraps rather than indirectly, using SciPy’s Python APIs. These low-level interfaces for Cython can also be used outside of the SciPy codebase to gain access to the functions in the wrapped libraries while avoiding the overhead of Python function calls. This can give performance gains of one or two orders of magnitude for many use cases.

Developers can also use the low-level Cython interfaces without linking against the wrapped libraries77. This lets other extensions avoid the complexity of finding and using the correct libraries. Avoiding this complexity is especially important when wrapping libraries written in Fortran. Not only can these low-level wrappers be used without a Fortran compiler, they can also be used without having to handle all the different Fortran compiler ABIs and name mangling schemes.

Most of these low-level Cython wrappers are generated automatically to help with both correctness and ease of maintenance. The wrappers for BLAS and LAPACK are primarily generated using type information that is parsed from the BLAS and LAPACK source files using F2PY19, though a small number of routines use hand-written type signatures instead. The input and output types of each routine are saved in a data file that is read at build time and used to generate the corresponding Cython wrapper files. The wrappers in scipy.special.cython_special are also generated from a data file containing type information for the wrapped routines.

Since SciPy can be built with LAPACK 3.4.0 or later, Cython wrappers are only provided for the routines that maintain a consistent interface across all supported LAPACK versions. The standard BLAS interface provided by the various existing BLAS libraries is not currently changing, so changes are not generally needed in the wrappers provided by SciPy. Changes to the Cython wrappers for the functions in scipy.special follow corresponding changes to the interface of that subpackage.

Numerical optimization

The scipy.optimize subpackage provides functions for the numerical solution of several classes of root finding and optimization problems. Here we highlight recent additions through SciPy 1.0.

Linear optimization

A new interior-point optimizer for continuous linear programming problems, linprog with method = ’interior-point’, was released with SciPy 1.0. Implementing the core algorithm of the commercial solver MOSEK78, it solves all of the 90+ NETLIB LP benchmark problems79 tested. Unlike some interior point methods, this homogeneous self-dual formulation provides certificates of infeasibility or unboundedness as appropriate.

A presolve routine80 solves trivial problems and otherwise performs problem simplifications, such as bound tightening and removal of fixed variables, and one of several routines for eliminating redundant equality constraints is automatically chosen to reduce the chance of numerical difficulties caused by singular matrices. Although the main solver implementation is pure Python, end-to-end sparse matrix support and heavy use of SciPy’s compiled linear system solvers—often for the same system with multiple right hand sides owing to the predictor-corrector approach—provide speed sufficient for problems with tens of thousands of variables and constraints.

Nonlinear optimization: local minimization

The minimize function provides a unified interface for finding local minima of nonlinear optimization problems. Four new methods for unconstrained optimization were added to minimize in recent versions of SciPy: dogleg, trust-ncg, trust-exact and trust-krylov. All are trust-region methods that build a local model of the objective function based on first and second derivative information, approximate the best point within a local ‘trust region’ and iterate until a local minimum of the original objective function is reached, but each has unique characteristics that make it appropriate for certain types of problems. For instance, trust-exact achieves fast convergence by solving the trust-region subproblem almost exactly, but it requires the second derivative Hessian matrix to be stored and factored every iteration, which may preclude the solution of large problems (≥1,000 variables). In contrast, trust-ncg and trust-krylov are well suited to large-scale optimization problems because they do not need to store and factor the Hessian explicitly, instead using second derivative information in a faster, approximate way. We compare the characteristics of all minimize methods in detail in Table 1, which illustrates the level of completeness that SciPy aims for when covering a numerical method or topic.

Table 1 Optimization methods from minimize Full size table

Nonlinear optimization: global minimization

As minimize may return any local minimum, some problems require the use of a global optimization routine. The new scipy.optimize.differential_evolution function81,82 is a stochastic global optimizer that works by evolving a population of candidate solutions. In each iteration, trial candidates are generated by combination of candidates from the existing population. If the trial candidates represent an improvement, then the population is updated. Most recently, the SciPy benchmark suite gained a comprehensive set of 196 global optimization problems for tracking the performance of existing solvers over time and for evaluating whether the performance of new solvers merits their inclusion in the package.

Statistical distributions

The scipy.stats subpackage contains more than 100 probability distributions: 96 continuous and 13 discrete univariate distributions, and 10 multivariate distributions. The implementation relies on a consistent framework that provides methods to sample random variates, to evaluate the cumulative distribution function (CDF) and the probability density function (PDF), and to fit parameters for every distribution. Generally, the methods rely on specific implementations for each distribution, such as a closed-form expression of the CDF or a sampling algorithm, if available. Otherwise, default methods are used based on generic code, for example, numerical integration of the PDF to obtain the CDF. Key recent distributions added to scipy.stats include the histogram-based distribution in scipy.stats.rv_histogram and the multinomial distribution in scipy.stats.multinomial (used, for example, in natural language processing83).

Polynomial interpolators

Historically, SciPy relied heavily on the venerable FITPACK Fortran library by P. Dierckx53,84 for univariate interpolation and approximation of data, but the original monolithic design and API for interaction between SciPy and FITPACK was limiting for both users and developers.

Implementing a new, modular design of polynomial interpolators was spread over several releases. The goals of this effort were to have a set of basic objects representing piecewise polynomials, to implement a collection of algorithms for constructing various interpolators, and to provide users with building blocks for constructing additional interpolators.

At the lowest level of the new design are classes that represent univariate piecewise polynomials: PPoly (SciPy 0.13)85, BPoly (SciPy 0.13) and BSpline (SciPy 0.19)86, which allow efficient vectorized evaluations, differentiation, integration and root-finding. PPoly represents piecewise polynomials in the power basis in terms of breakpoints and coefficients at each interval. BPoly is similar and represents piecewise polynomials in the Bernstein basis (which is suitable, for example, for constructing Bézier curves). BSpline represents spline curves, that is, linear combinations of B-spline basis elements87.

In the next layer, these polynomial classes are used to construct several common ways of interpolating data: CubicSpline (SciPy 0.18)88 constructs a twice differentiable piecewise cubic function, Akima1DInterpolator and PCHIPInterpolator implement two classic prescriptions for constructing a C1 continuous monotone shape-preserving interpolator89,90.

Test and benchmark suite

Test suite

Test-driven development has been described as a way to manage fear and uncertainty when making code changes91. For each component of SciPy, we write multiple small executable tests that verify its intended behavior. The collection of these, known as a ‘test suite’, increases confidence in the correctness and accuracy of the library, and allows us to make code modifications known not to alter desired behavior. According to the practice of continuous integration92, all proposed contributions to SciPy are temporarily integrated with the master branch of the library before the test suite is run, and all tests must be passed before the contribution is permanently merged. Continuously monitoring the number of lines of code in SciPy covered by unit tests is one way we maintain some certainty that changes and new features are correctly implemented.

The SciPy test suite is orchestrated by a continuous integration matrix that includes POSIX and Windows (32/64-bit) platforms managed by Travis CI and AppVeyor, respectively. Our tests cover Python versions 2.7, 3.4, 3.5, 3.6, and include code linting with pyflakes and pycodestyle. There are more than 13,000 unit tests in the test suite, which is written for usage with the pytest (https://docs.pytest.org/en/latest) framework. In Fig. 2, we show historical test coverage data generated using a Docker-based approach (https://github.com/tylerjereddy/scipy-cov-track). With the exception of the removal of ∼61,000 lines of compiled code for SciPy v0.14, the volume of both compiled (C, C++ and Fortran) and Python code has increased between releases, as have the number of lines covered by unit tests. Test coverage at the SciPy 1.0 release point was at 87% for Python code according to pytest-cov (https://pypi.org/project/pytest-cov/). Coverage of compiled (C, C++ and Fortran) code was only 45% according to gcov (https://gcc.gnu.org/onlinedocs/gcc/Gcov.html), but the compiled codebase is much more robust than this figure would suggest as the figure does not correct for the inclusion of reputable vendor code, the original library of which is well-tested; generated code, for which full coverage is impractical; and deprecated code, which does not require unit tests. Documentation for the code is automatically built and published by the CircleCI service to facilitate evaluation of documentation changes/integrity.

Fig. 2 Python and compiled code volume in SciPy over time. Full size image

Benchmark suite

In addition to ensuring that unit tests are passing, it is important to confirm that the performance of the SciPy codebase improves over time. Since February 2015, the performance of SciPy has been monitored with Airspeed Velocity (asv https://github.com/airspeed-velocity/asv). SciPy’s run.py script conveniently wraps asv features such that benchmark results over time can be generated with a single console command. For example, in Fig. 3 we illustrate the improvement of scipy.spatial.cKDTree.query over roughly nine years of project history. The tree used in the benchmark was generated without application of toroidal topology (boxsize = None), and tests were performed by Airspeed Velocity 0.4 using Python 2.7, NumPy 1.8.2 and Cython versions 0.27.3, 0.21.1 and 0.18 (for improved backward compatibility). Substantial performance improvements were realized when cKDTree was fully Cythonized and again when it was rewritten in C++.