Announcement: We have started the process of making this project open source. The source codes are being rewritten for clarity, simplicity and consistency. As soon as the process is completed, all the new codes and running scripts will be made available.

-------------------------------------------------------------------------------- -----------------------------------

HISTORY:

September 13, 2018: Added R numbers for the Fibonacci Number test case (Problem 1)

September 13, 2018: Corrected R numbers for the Laplace Equation test case (Problem 5)

This report is the continuation of the work done in:

Basic Comparison of Python, Julia, R, Matlab and IDL

Here we:

Add new versions of languages Add JAVA Add more test cases. For each language, consistantly use the same method to measure the elapsed time. Provide source codes for all the test cases. Present all the timing results to the fourth digit accuracy (any number less tha 0.0001 is rounded to 0).

While reading this report, be mindful of the following:

Our intention is not to claim that one language is better than the other.

In our work, we are often asked to address users’ issues on the computing languages Python, Matlab, IDL, etc. We only have few hours to understand the coding principles of those languages and quickly write codes that resolve users’ issues. We present results in the point of view of a novice programmer.

If you are an advanced programmer or a language developer and you have results (obtained with optimization techniques) you want to share, feel free to contact us (with a web link) and we wil provide a link to your results here.

All the experiments presented here were done on Intel Xeon Haswell processor node. Each node has 28 cores (2.6 GHz each) and 128 Gb of available memory. We consider the following versions of the languages:







Problem 1: Fibonacci Number

The Fibonacci numbers are the sequence of numbers defined by the linear recurrence equation:

with .

Fibonacci numbers find applications in the fields of economics, computer science, biology, combinatoric, etc.

We implemented both the iterative method and the recursive one, and we record the elepased time for generating the Fibonacci numbers for a given n.

Language Option n=25 n=35 n=45 Python

0 0 0 Python + Numba

0 0 0 Julia

0 0 0 IDL

0 0 0 Matlab

0.0098 0.0032 0.0025 R

0.034 0.034 0.034 JAVA

0 0 0 Scala

0 0 0 Fortran gfortran 0 0 0

gfortran -O3 0 0 0

ifort 0 0 0

ifort -O3 0 0 0 C gcc 0 0 0

gcc -Ofast 0 0 0

icc 0 0 0

icc -Ofast 0 0 0

Table 1.1: Elapsed times (in seconds) obtained by computing Fibonacci numbers using then iterative method.

Language

Option n=25 n=35 n=45 Python

0.0211 2.5284 311.2046 Python + Numba

0.03 0.1 8.82 Julia

0 0.0335 4.130 IDL

0.0301 2.2573 304.2285 Matlab

0.0128 0.5149 58.9283 R

0.008 0.008 0.008 JAVA

0.0016 0.0414 4.8609 Scala

0.001 0.045 5.289 Fortran gfortran 0 0.0840 10.4326

gfortran -O3 0 0.0280 3.4602

ifort 0 0 0

ifort -O3 0 0 0 C gcc 0 0.04 5.07

gcc -Ofast 0 0.01 1.66

icc 0 0.02 3.15

icc -Ofast 0 0.02 3.07

Table 1.2: Elapsed times (in seconds) obtained by computing Fibonacci numbers using the recursive method.

Problem 2: Copy Arrays

This test case is meant to show how fast languages access non-contiguous memory locations.

Consider an arbitrary nxnx3 matrix A. We want to perform the following operations on A:

A(i,j,1) = A(i,j,2) A(i,j,3) = A(i,j,1) A(i,j,2) = A(i,j,3)

For instance, in Python the code looks like:

for i in range(n): for j in range(n): A[i,j,1] = A[i,j,2] A[i,j,3] = A[i,j,1] A[i,j,2] = A[i,j,3]

The above code segment uses loops. We are also interested on how the same operations are done using vectorization:

A[:,:,1] = A[:,:,2] A[:,:,3] = A[:,:,1] A[:,:,2] = A[:,:,3]

The problem allows us to see how each language handles loops and vectorization. We record the elapsed time needed to do the array assignments.

Language Option n=5000 n=7000 n=9000 Python

18.6055 37.1279 61.0172 Python + Numba

0.26 0.26 0.34 Julia

0.0907 0.1386 0.2274 IDL

6.8773 13.2422 21.9349 Matlab

0.2787 0.5223 0.8437 R

19.750 38.635 63.820 JAVA

0.1420 0.2680 0.4350 Scala

0.204 0.349 0.51 Fortran gfortran 0.1760 0.3480 0.5760

gfortran -O3 0.0720 0.1360 0.2200

ifort 0.0680 0.1360 0.2120

ifort -O3 0.0680 0.1320 0.2120 C gcc 0.1700 0.3400 0.5700

gcc -Ofast 0.0900 0.1800 0.2900

icc 0.0900 0.1800 0.3000

icc -Ofast 0.0900 0.1800 0.3000

Table 2.1: Elapsed times (in seconds) obtained by copying a matrix using loops.

Language Option n=5000 n=7000 n=9000 Python

0.4953 0.9689 1.5962 Python + Numba

0.834 1.29 1.96 Julia

0.2926 0.5471 0.8964 IDL

0.4091 0.8093 1.3315 Matlab

0.2845 0.5841 0.9193 R

2.956 5.785 9.566 Fortran gfortran 0.0960 0.2480 0.3080

gfortran -O3 0.0920 0.1840 0.3040

ifort 0.1200 0.2320 0.3760

ifort -O3 0.1200 0.2320 0.3880

Table 2.2: Elapsed times (in seconds) obtained by copying a matrix using vectorization.

Problem 3: Matrix Multiplication

We multiply two randomly generated nxn matrices A and B:

C=AxB

This problem shows the importance of taking advantage of built-in libraries available in each language. The elapsed times presented here only measure the times spent on the multiplication (as the size of the matrix varies).

Language Option n=1500 n=1750 n=2000 Python intrinsic 0.58 0.96 0.97 Python + Numba (loop) 3.64 6.33 13.57 Julia intrinsic 0.1494 0.2391 0.3497 IDL intrinsic 0.3028 0.3613 0.4797 Matlab intrinsic 0.9567 0.2575 0.2943 R

0.920 1.158 0.951 JAVA (loop) 6.8530 13.4700 29.2320 Scala (loop) 9.258 14.482 23.363 Fortran gfortran (loop) 17.2450 31.2299 60.1837

gfortran -O3 (loop) 3.3202 5.3043 12.3367

gfortran (matmul) 0.3520 0.5600 0.8280

gfortran -O3 (matmult) 0.3480 0.5560 0.7840

ifort (loop) 1.1400 1.8081 3.1001

ifort -O3 (loop) 0.5200 0.8240 1.2760

ifort (matmul) 1.1400 1.8121 2.9001

ifort -O3 (matmul) 1.1400 1.8121 2.9881

ifort (DGEMM) 0.2120 0.2280 0.3320 C gcc (loop) 13.4900 20.9600 31.4800

gcc -Ofast (loop) 1.3500 2.3900 4.3700

icc (loop) 1.2100 2.1600 4.0200

icc -Ofast (loop) 1.1500 1.7000 2.6600

Table 3.1: Elapsed times (in seconds) obtained by multiplying two randomly generated matrices.

Problem 4: Gauss-Legendre Quadrature

The Gauss-Legendre quadrature formulas approximate the integral of a functionby a weighted sum of function-values. When m function-values are used, the formula is exact for polynomials of degree zero through 2m — 1.

Language Option n=50 n=75 n=100 Python

0.1345 0.0183 0.0186 Julia

1.2962 1.3553 1.3556 IDL

0.0006 0.0009 0.0014 R







JAVA







Matlab

0.7739 0.7197 0.0853 Fortran gfortran 0 0.004 0.008

gfortran -O3 0 0.004 0.008

ifort 0 0.004 0.008

ifort -O3 0 0.004 0.008 C gcc







gcc -Ofast







icc







icc -Ofast







Table 4.1: Elapsed times (in seconds) obtained by performing the Gauss-Legendre qudrature.

Problem 5: Numerical Approximation of the 2D Laplace Equation

We find the numerical solution of the 2D Laplace equation:

U xx + U yy = 0

We use the Jacobi iterative solver. We are interested in fourth-order compact finite difference scheme (Gupta, 1984):

U i,j = (4(U i-1,j + U i,j-1 + U i+1,j + U i,j+1 ) + U i-1,j-1 + U i+1,j-1 + U i+1,j+1 + U i-1,j+1 )/20

The Jacobi iterative solver stops when the difference of two consecutive approximations falls below 10^{-6}.

Language Option n=100 n=150 n=200 Python

142.7886 705.268 2188.007 Python + Numba

1.2764 5.4262 16.396 Julia

1.0309 5.1724 16.1657

optimized 1.0987 5.5039 17.1473

optimized_smind 0.6215 3.0289 9.4964 IDL

83.6360 416.5523 1298.777 Matlab

1.8199 4.9914 9.1465 R

128.131 635.674 1971.329 JAVA

0.4850 2.0210 5.5980 Scala

0.545 2.289 6.202 Fortran gfortran 0.840 3.800 10.945

gfortran -O3 0.668 3.068 8.881

ifort 0.5360 2.4680 7.1520

ifort -O3 0.5360 2.4640 7.1520 C gcc 0.500 2.4200 7.7000

gcc -Ofast 0.2100 1.0400 3.1800

gcc -fPIC -Ofast -O3 -xc -shared 1.1410 5.5953 17.3381

icc 0.4500 2.2300 6.7900

icc -Ofast 0.3200 1.6000 4.8700

Table 5.1: Elapsed times (in seconds) obtained by numerically solving the Posson equation using a Jacobi iterative solver with loops.

Language Option n=100 n=150 n=200 Python

2.3209 10.7638 41.2477 Python + Numba

3.5021 12.5186 36.1285 Julia optimized_vectorized 2.3787 14.0944 42.1255 IDL

1.9159 10.1320 32.2211 Matlab

3.5102 6.4710 16.4999 R

21.177 102.229 333.366 Fortran gfortran 0.876 3.948 11.329

gfortran -O3 0.3560 1.7880 5.0880

ifort 0.3000 1.5440 4.4400

ifort -O3 0.2840 1.5680 4.4520

Table 5.2: Elapsed times (in seconds) obtained by numerically solving the Posson equation using a Jacobi iterative solver with vectorization.

Problem 6: Belief Propagation

The Belief Propagation can be applied to fields such as speech recognition, computer vision, image processing, medical diagnostics, parity check codes, etc. Its calculations involve a repeated sequence of matrix multiplications, followed by normalization. The Matlab, C and Julia codes are shown in the Justin Domke's weblog (Domke 2012). We report the computing times for various values of the number of iterations (N) when the matrix dimension is 5000x5000.





Language Option n=250 n=500 n=1000 Python

4.8186 7.3240 13.9176 Julia

3.957 7.684 14.855 IDL

18.3229 35.8977 71.0820 Matlab

2.6299 4.0708 6.8691 R

25.463 46.985 92.654 JAVA

321.403 642.395 1284.106 Fortran gfortran 22.5574 39.9224 89.9696

gfortran -O3 5.1603 9.5885 18.7051

ifort 4.6082 8.8605 17.3810

ifort -O3 4.6322 8.7325 17.4130 C gcc 2.6400 5.2800 10.5700

gcc -Ofast 2.3500 4.7200 9.4400

icc 1.4500 2.9000 5.8000

icc -Ofast 1.4400 2.9000 5.8100

Table 6.1: Elapsed times (in seconds) obtained by doing the Belief Propagation computations.

Problem 7: Metropolis-Hastings Algorithm

The Metropolis–Hastings (M–H) algorithm is a method for obtaining random samples from a probability distribution. We perform calculations for the implementation of a Metropolis-Hastings algorithm using a two dimensional distribution (Domke 2012). Results are shown when the number of iterations (N) varies.





Language Option n=5000 n=10000 n=15000 Python

0.02642 0.0637 0.0937 Julia

0.0002 0.0004 0.0006 IDL

0.0058 0.0219 0.0291 Matlab

0.0164 0.0194 0.0276 R

0.105 0.166 0.24 JAVA

0.006 0.007 0.009 Scala

0.009 0.012 0.014 Fortran gfortran 0 0.0040 0.0040

gfortran -O3 0 0 0

ifort 0 0 0

ifort -O3 0 0.0040 0 C gcc 0 0 0

gcc -Ofast 0 0 0

icc 0 0 0

icc -Ofast 0 0 0

Table 7.1: Elapsed times (in seconds) obtained by doing the Metropolis algorithm computations.

Problem 8: Manipulation of netCDF Files

We have a set of daily NetCDF files (7305) covering a period of 20 years (1990-2009). The files for a given year are in a sub-directory labeled YYYY (for instance 1990, 1991, 1992, etc.). We want to write a script that opens each file, reads a three-dimensional variable (longitude/latitude/level), and manipulates it. A pseudo code for the script reads:

Loop over the years Obtain the list of NetCDF files Loop over the files Read the variable (longitude/latitude/ level ) Compute the zonal mean average ( new array of latitude/ level ) Extract the column array at latitude 86 degree South Append the column array to a " master " array ( or matrix)

The goal here is to be able to do a generate the data to do a contour plot that looks like (obtained with Python):





This is the kind of problems that a typical user we support faces: a collection of thousands of files that needs to be manipulated to extract the desired information. Having tools that allow us to quickly read data from files (in formats such as NetCDF, HDF4, HDF5, grib) is critical for the work we do.

Note that unlike in Problem 4 of the previous report (where the daily files are in directories associated with months), the daily files to be read in in this case are stored in directories associated with the years. The access to the files is easier in this current problem and we expect the timing numbers to be reduced.

We report in Table 8.1 the elapsed times it took to solve Problem 8 with the various languages.

Language

Elapsed Time (s) Python 558.4496 Julia 580.5683 IDL 504.5634 Matlab 646.2261

Table 8.1: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files on a single processor.

All the above runs were conducted on a node that has 28 cores. Basically, only one core was used. We want to take advantage of all the available cores by spreading the reading of the files and making sure that the data of interest are gathered in the proper order. We use the multi-processing capabilities of the various languages to slightly modify the scripts. For each year, the daily files are read in by different threads (cores).The results are shown in Table 8.2.





Language numThreads=2

numThreads= 4

numThreads= 8

numThreads=16

Python 352.7964 238.1065 170.9945 105.3949

Table 8.2: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files using multiple threading.

Problem 9: Function Evaluations

We create an array x of length n and loop several times to perform the six operations:

y = sin(x) x = asin(y) y = cos(x) x = acos(y) y = tan(x) x = atan(y)

Language n=80000 n=90000 n=100000 Python 52.1014 58.4591 64.8276 Julia 55.5550 62.3450 69.2350 IDL 37.4798 42.0187 34.8829 Matlab 5.1866 5.6523 4.6116 R 89.500 101.439 112.269

Problem 10: Simple FFT

We create a nxn random complex matrix M and compute the following:

r = fft(M) r = abs(r)

Language n=10000 n=15000 n=20000 Python 10.5087 25.5764 45.1959 Julia 3.916 11.489 20.632 IDL 16.6154 36.5711 73.3394 Matlab 2.6606 6.0293 10.7011 R 60.722 157.626 269.651

Problem 11: Square Root of a Matrix

We consider an nxn matrix A with 6s on the diagonal and 1s everywhele else. We are lloking for the matrix B such that BxB = A. We record the time for determining B.

Language Option n=1000 n=2000 n=4000 Python SciPy sqrtm 2.2227 5.2814 45.7643 Julia sqrtm 0.4129 2.511 19.111 Matlab sqrtm 0.9683 1.3916 2.3767 R

1.057 3.602 19.122

Problem 12: Look and Say Sequence

We write codes to determine the look and say number of order n. Instead of starting with a single digit, we begin with 1223334444.

This test case highlights how languages manipulate strings of arbitrary length.

Language Options n=40 n=45 n=48 Python

2.2921 37.4429 224.4362 Julia

2.769 44.333 345.069 IDL

19.9563 296.4768 1570.4234 Matlab

412.5993 4501.6751

R

0.509 1.678 3.611 Java

0.0487 0.0947 0.1582 Scala

0.0390 0.1020 0.1720 Fortran gfortran 0.0160 0.0200 0.0200

gfortran -O3 0.0200 0.0240 0.0240

ifort 0.0120 0.0160 0.0120

ifort -O3 0.0160 0.0200 0.0080 C gcc 0.0800 0.2600 0.5300

gcc -Ofast 0.0400 0.2500 0.5000

icc 0.0700 0.2600 0.4800

icc -Ofast 0.0700 0.2100 0.4600

References

Source Files

All the source files for the problems presented here are in the attached file: sourceFiles2018.tar.gz

If you have a comment/suggestion/question, contact Jules Kouatchou (Jules.Kouatchou@nasa.gov)