Overview of the Julia-Python-R Universe

A side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.

Motivation

A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The Overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems. The comparison of the three ecosystems aims:

To be useful for people that are somewhat familiar with programming and want to inspect options and use the most appropriate tool

To promote interoperability, cross-validation and overall best-practices

To be factual as much as possible without drifting to judgement / opinions

To cover use cases relevant for the implementation of quantitative risk models



The comparison does not aim:

To be a detailed / comprehensive catalog of all available libraries (which count to many thousands!)

available libraries (which count to many thousands!) To cover use cases very removed from quantitative risk models

To be totally exhaustive (e.g to identify all the possible computer systems one can run a Python interpreter on, or count all the possible ways one can perform linear regression in R)

Disclaimers

The comparison does absolutely not provide an assessment of which system is "better". The proper way to use the comparison is to start with one's objectives, knowledge level, use case.

The comparison attempted here is not entirely appropriate as the three systems have quite different origins and architectural design choices. For example, strictly speaking R is not a general programming language. R is a system for statistical computation and graphics. It consists of a sufficiently general language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. Yet despite the disclaimer a comparison is justified because in very large domain of applications and use cases the three frameworks can be used interchangeably (or nearly so)

Structure

The comparison data are provided in tabular format in several distinct tables. Each table documents a relevant language or ecosystem subdomain. The number and focus areas of the different table are somewhat arbitrary and may expand in the future. The order is roughly from more generic aspects towards more specialized / advanced areas, concluding with interoperatibility.

Each table entry (row) highlights key functionality within the subdomain. The language columns point to information or packages and (where applicable) there is commentary. Reference links are included when useful.

At the bottom of some tables there is a row indicated Package Review. This row has a collection of links to the CRAN Task Reviews that aim to summarize the large number of R packages available for some data science tasks. There are also links to a mirror effort to create Python Task Views (this content is still WIP - contributors welcome, see below)

Getting Involved

You can provide simple and anonymous feedback on the wiki version of the overview using the feedback button at the bottom of the page. Alternatively you can become an Open Risk Manual author and actively edit the page. If you are more comfortable using github / markdown, there is a mirror page available here. Please note that the tables are in html format as they are generated automatically.

People interested in developing the Python Task Views can do so via the github repo.

History and Community

The objective of this section is to provide an overall comparison of the history of the two ecosystems, towards answering the question: who is really behind Python, R and Julia?

Devices and Operating Systems

This section aims to answer the question: Where (as in what kind of device and operating system) can I use Python, R or Julia? NB: This is not a how-to install Python or R in your system!, just an overview of what is available where.

Aspect Python R Julia Comment Linux Desktop Comes pre-installed apt-get install r-base apt-get install julia / Linux installer file Python is generally pre-installed as it is used by the Linux system itself. Different distributions may include different (potentially very old) versions of the three languages. Windows Windows installer Windows installer Windows installer All three languages are available for both Windows 7 and Windows 10 and 32 bit / 64 bit. MacOS 2.7 version is pre-installed MacOS installer MacOS installer file Raspbian Pre-installed apt-get install r-base apt-get install julia Linux is the operating system of choice for IoT devices, which means a basic Python installation is generally available Android / iOS Via python-for-android No No Python, R or Julia are not readily integrated on mobile devices (see also Deployment entry). Check Termux for an alternative option iOS No No No Cloud Servers As per Linux Desktop above As per Linux Desktop above As per Linux Desktop above Cloud servers typically run the Linux operating system and have Python installations available

Package Management

This section aims to answer the question: How can I extend the Python, R or Julia functionality with existing libraries. The ease of finding and installing packages is a very important aspect of the popularity of both and in marked contrast e.g. to languages like C++

Aspect Python R Julia Comment Discovery of Packages Online Search, Built-in PyCharm access to PyPI R-Studio Built-in access to CRAN Julia Docs, Julia Observer Python packages are released on PyPI, R packages are released on CRAN Number of Packages (Jun 2020) 242,551 15845 ~3821 Check here for the latest count: Python, R, Julia. Obviously comparing package number count across different universes comes with many caveats because the conventions about what is a "package", quality control etc are not harmonized. Online Repositories PyPI, via linux distributions CRAN github, gitlab, bitbucket etc are used for releasing Python, R and Julia for open source packages online, coordination of development and other community support Package Installation Done at OS level (PyPI, setup, conda, pip, easy_install, apt) Built-in install.packages Built-in Pkg package manager Python installation methods are quite varied (and have evolved over time) and can be either system wide (e.g. a linux distro package) or user specific Dependency Management pip, virtualenv packrat Federated package management virtualenv enables using isolated Python distributions and package collections within the same system. Julia uses project environments Loading Packages import statement library statement import / using statements

Package Documentation

This section aims to answer the question: How can I document a Python, R or Julia module? The ease and quality of documentation is an important factor in adoption and efficient use of a language as it both helps beginners learn new functionality and experienced users ensure better quality work

Aspect Python R Julia Comment Source level documentation Built-in docstrings Docstrings docstrings Formats markdown, reStructuredText markdown, latex Markdown R packages in CRAN include References Manuals (PDF, typically from latex) Documentation Generator sphinx roxygen2 Documenter Online documentation readthedocs CRAN, bookdown Julia Docs

Language Characteristics

This section aims to answer the question: What does code in Python, R or Julia look like from a programming perspective? Many standard aspects of programming languages are available in all three systems so are not included.

Aspect Python R Julia Comment Compiled / Interpreted Interpreted Interpreted Compiled Just-in-time (JIT) Julia code can be executed interactively Main Implementation Language C (CPython) C and Fortran Julia This is the language used for the interpretation of a Python or R script. Julia is written in Julia Other Implementation Languages Java (Jython), RustPython etc pqR, Renjin, FastR etc Many alternative implementations of the underlying interpreter exist for both Python and R. A new approach available for Python and Julia is to compile to Webassembly for native execution in the browser: Python/Pyodide, Julia/Charlotte Type System Dynamic (Duck) Typing Dynamic Dynamic (Duck) Typing All three systems have essentially dynamic type systems (in contrast with languages such as C++, Java or Rust) Primitive Data Types Numbers (Integers, Float), Strings, Boolean Numeric, Int, Character, Logical (and the pairlist) Numbers, Char, Bool Double precision is standard in all systems. Higher precision is only via libraries. Julia has a native 128 bit integer type. Native Data Structures List, Tuple, Dict List, Vector, Data Frame, Factor Tuple, Dict, Set, Array, Vector, Matrix and more Object Oriented Yes Yes Selective R has a variety of Object Oriented implementations with different design and functionalities, they are denoted S3, S4, R5 and R6 respectively, Julia implements select OO aspects via the Struct composite type Code Structure Based on Indentation Free Style Free Style Standard Libraries Extensive Built-in Functions Base Python has an extensive standard library as it covers a larger CS domain, In contrast R and Julia have a more extensive set of data science oriented features included by default Building Packages / Extensions Modules, Via bindings to C/C++ Creating R packages Julia Packages See below under HPC for more specific options

Development Environment

This section aims to answer the question: How can I develop and test code / applications written in Python, R or Julia?

Files, Databases and Data Manipulation

This section aims to answer the following questions: What direct connectors to files stored on disk or data stored in databases are available for Python, R and Julia? Further, once we have connected to a data source, how can we fetch, store in memory and do preliminary work with the imported data?

Workflow Management

This section aims to answer the question: What tool are available to help manage data science workflows in Python, R and Julia respectively?

Aspect Python R Julia Comment ETL Bonobo, petl, pygrametl Programmatic Workflow Management Airflow, Luigi

General Purpose Mathematical Libraries

This section aims to answer the question: What building blocks are available for undertaking basic quantitative (numerical) work in Python, R and Julia respectively? NB: The division of what is core mathematics and what is a specialized domain is a bit arbitrary.

Aspect Python R Julia Comment General Purpose vectors and n-dimensional arrays (as storage) numpy Built-in array The R system comes with many basic array functionalities available built-in Numerical Linear Algebra (matrix operations) numpy.linalg Matrix, RcppArmadillo, RcppEigen Built-in support (LinearAlgebra.Basic), StaticArrays, BandedMatrices, IterativeSolvers For specialized operations (large / sparse matrices see below in HPC), eigenpy and pybind11 provide alternative means to use C++ numerical linear algebra in Python Mathematical (Special) Functions such as Gamma, Beta, Bessel scipy Built-in functions SpecialFunctions.jl The R system comes with many basic functionalities available built-in Random Number Generation Built-in, numpy.random Built-in functions Built-in (Random.Random) This entry is about generic random numbers. More specialized applications mentioned below Mathematical Optimisation JuMP Symbolic Algebra sympy Symata Curve Fitting scipy.optimize, numpy.polyfit Built-in ApproxFun Package Reviews Mathematics Task Views Numerical Mathematics, Optimization

Core Statistics Libraries

This section aims to answer the question: What libraries are available for undertaking standard statistical studies in Python, R or Julia? There is a large number of packages / modules with significant duplication / overlap, especially for the R system, hence only the major / indicative ones are considered.

Stochastic Processes

This section aims to answer the question: What libraries are available for estimating and/or simulating stochastic processes in Python, R or Julia?

Aspect Python R Julia Comment Survival Analysis lifelines survival Survival.jl Gaussian Processes GPy GauPro, GPfit, kergp, mlegp GaussianProcesses.jl Poisson Processes tick, py-hawkes poisson, NHPoisson, hawkes, emhawkes Package Reviews Survival Analysis

Econometrics / Timeseries Libraries

This section aims to answer the question: What libraries are available for undertaking econometric / Timeseries Data studies in Python, R or Julia?

Machine Learning Libraries

This section aims to answer the question: What libraries are available for machine learning projects in Python, R or Julia? The term machine learning is not too specific so we use this category to group various advanced / specialized libraries that are relevant for data science (but not e.g. computer vision and other specialized ML applications). NB: Machine learning algorithms are typically compute intensive and are thus implemented in system languages with eventual binding and API provided to Python or R environments

GeoSpatial Libraries

This section aims to answer the question: What libraries are available for working with GIS / geospatial data in Python, R or Julia? The geospatial package space is particularly fragmented, the selection focuses on some key anchor concepts.

Visualization

This section aims to answer the question: What functionality is available to produce data driven visualization in Python, R or Julia?

Web, Desktop and Mobile Deployment

This section aims to answer the question: What tools does each language ecosystem provide for the deployment of data based applications, whether this is via the web, desktop or mobile apps.

Semantic Web / Semantic Data

This section aims to answer the question: What tools and libraries are available for working with semantic data (RDF, OWL, JSON-LD etc) and other relevant domain specific metadata schemas?

Aspect Python R Julia Comment RDF Format rdflib rrdf JSON-LD Format rdflib.jsonld JSON-LD is an alternative web-friendly serialization format for RDF OWL Ontologies ontospy, owlready2 Querying RDF (SPARQL) rdflib Rredland Serving RDF (SPARQL) rdflib SDMX Format pandasdmx rsdmx SDMX is the statistical data and metadata exchange format Package Review Semantic Data Task View

High Performance Computing

For our purposes high performance computing (HPC) is any use case that requires more than a single CPU (and its own RAM or disk). This section aims to answer the question: what are my options if I have performance bottlenecks in terms of CPU, memory or disk, hence covering topics such as concurrency or GPU computing. NB: Julia aims to address performance issues through compilation and other design choices

Using R, Python and Julia together

The section aims to answer the question: How can I use R from Python, Python from Julia, Julia from R and vice versa :-). The first rows of this table have the From/To Format (From X Call Y) for native integration between the three systems, where "Native" means that the integration is done using language bindings within the respective interpreters / REPL (not explicitly using the operating system or a server API)