Numerical Analysis & Statistics: MATLAB, R, NumPy, Julia

a side-by-side reference sheet

sheet one: grammar and invocation | variables and expressions | arithmetic and logic | strings | regexes | dates and time | tuples | arrays | arithmetic sequences | 2d arrays | 3d arrays | dictionaries | functions | execution control | file handles | directories | processes and environment | libraries and namespaces | reflection | debugging

sheet two: tables | import and export | relational algebra | aggregation

vectors | matrices | sparse matrices | optimization | polynomials | descriptive statistics | distributions | linear regression | statistical tests | time series | fast fourier transform | clustering | images | sound

bar charts | scatter plots | line charts | surface charts | chart options

tables | import and export | relational algebra | aggregation

vectors | matrices | sparse matrices | optimization | polynomials | descriptive statistics | distributions | linear regression | statistical tests | time series | fast fourier transform | clustering | images | sound

univariate charts | bivariate charts | multivariate charts

The version of software used to check the examples in the reference sheet.

How to determine the version of an installation.

Code which examples in the sheet assume to have already been executed.

r:

The ggplot2 library must be installed and loaded to use the plotting functions qplot and ggplot .

How to invoke the interpreter on a script.

How to launch a command line read-eval-print loop for the language.

r:

R installations come with a GUI REPL.

The shell zsh has a built-in command r which re-runs the last command. Shell built-ins take precedence over external commands, but one can invoke the R REPL with:

$ command r

How to pass the code to be executed to the interpreter as a command line argument.

How to get and set an environment variable.

Punctuation or keywords which define blocks.

matlab:

The list of keywords which define blocks is not exhaustive. Blocks are also defined by

switch, case, otherwise, endswitch

unwind_protect, unwind_protect_cleanup, end_unwind_protect

try, catch, end_try_catch

How statements are separated.

matlab:

Semicolons are used at the end of lines to suppress output. Output echoes the assignment performed by a statement; if the statement is not an assignment the value of the statement is assigned to the special variable ans .

In Octave, but not MATLAB, newlines are not separators when preceded by a backslash.

Character used to start a comment that goes to the end of the line.

octave:

Octave, but not MATLAB, also supports shell-style comments which start with # .

r:

Traditionally <- was used in R for assignment. Using an = for assignment was introduced in version 1.4.0 sometime before 2002. -> can also be used for assignment:

3 -> x

The compound assignment operators.

octave:

Octave, but not MATLAB, has compound assignment operators for arithmetic and bit operations:

+= -= *= /= **= ^= &= |=

Octave, but not MATLAB, also has the C-stye increment and decrement operators ++ and -- , which can be used in prefix and postfix position.

The operator for incrementing the value in a variable; the operator for decrementing the value in a variable.

matlab:

NaN can be used for missing numerical values. Using a comparison operator on it always returns false, including NaN == NaN . Using a logical operator on NaN raises an error.

octave:

Octave, but not MATLAB, provides NA which is a synonym of NaN .

r:

Relational operators return NA when one of the arguments is NA . In particular NA == NA is NA . When acting on values that might be NA , the logical operators observe the rules of ternary logic, treating NA is the unknown value.

How to test if a value is null.

octave:

Octave, but not MATLAB, has isna and isnull , which are synonyms of isnan and isempty .

A conditional expression.

The boolean literals.

matlab:

true and false are functions which return matrices of ones and zeros of type logical. If no arguments are specified they return single entry matrices. If one argument is provided, a square matrix is returned. If two arguments are provided, they are the row and column dimensions.

Values which evaluate to false in a conditional test.

matlab:

When used in a conditional, matrices evaluate to false unless they are nonempty and all their entries evaluate to true. Because strings are matrices of characters, an empty string ('' or "") will evaluate to false. Most other strings will evaluate to true, but it is possible to create a nonempty string which evaluates to false by inserting a null character; e.g. "false\000".

r:

When used in a conditional, a vector evaluates to the boolean value of its first entry. Using a vector with more than one entry in a conditional results in a warning message. Using an empty vector in a conditional, c() or NULL, raises an error.

The boolean operators.

octave:

Octave, but not MATLAB, also uses the exclamation point '!' for negation.

The relational operators.

octave:

Octave, but not MATLAB, also uses != for an inequality test.

The arithmetic operators: addition, subtraction, multiplication, division, quotient, remainder.

matlab:

mod is a function and not an infix operator. mod returns a positive value if the first argument is positive, whereas rem returns a negative value.

How to compute the quotient of two integers.

What happens when an integer is divided by zero.

How to perform float division, even if the arguments are integers.

What happens when a float is divided by zero.

octave:

Octave, but not MATLAB, supports ** as a synonym of ^ .

The square root function.

The result of taking the square root of a negative number.

The standard transcendental functions.

Constants for pi and e.

Ways of converting a float to a nearby integer.

The absolute value and signum of a number.

What happens when an expression evaluates to an integer which is too big to be represented.

What happens when an expression evaluates to a float which is too big to be represented.

The machine epsilon; the largest representable float and the smallest (i.e. closest to negative infinity) representable float.

Literals for complex numbers.

How to decompose a complex number into its real and imaginary parts; how to decompose a complex number into its absolute value and argument; how to get the complex conjugate.

How to generate a random integer from a uniform distribution; how to generate a random float from a uniform distribution.

How to set, get, and restore the seed used by the random number generator.

matlab:

At startup the random number generator is seeded using operating system entropy.

r:

At startup the random number generator is seeded using the current time.

numpy:

On Unix the random number generator is seeded at startup from /dev/random.

The bit operators left shift, right shift, and, or , xor, and negation.

matlab/octave:

bitshift takes a second argument which is positive for left shift and negative for right shift.

bitcmp takes a second argument which is the size in bits of the integer being operated on. Octave is not compatible with MATLAB in how the integer size is indicated.

r:

There is a library on CRAN called bitops which provides bit operators.

The syntax for a string literal.

Can a newline be included in a string literal? Equivalently, can a string literal span more than one line of source code?

octave:

Double quote strings are Octave specific.

A newline can be inserted into a double quote string using the backslash escape

.

A double quote string can be continued on the next line by ending the line with a backslash. No newline is inserted into the string.

Escape sequences for including special characters in string literals.

matlab:

C-style backslash escapes are not recognized by string literals, but they are recognized by the IO system; the string 'foo

' contains 5 characters, but emits 4 characters when written to standard output.

How to concatenate strings.

How to create a string which consists of a character of substring repeated a fixed number of times.

How to get the index of first occurrence of a substring.

How to get the substring at a given index.

octave:

Octave supports indexing string literals directly: 'hello'(1:4) .

How to split a string into an array of substrings. In the original string the substrings must be separated by a character, string, or regex pattern which will not appear in the array of substrings.

The split operation can be used to extract the fields from a field delimited record of data.

matlab:

Cell arrays, which are essentially tuples, are used to store variable-length strings.

A two dimensional array of characters can be used to store strings of the same length, one per row. Regular arrays cannot otherwise be used to store strings.

How to join an array of substrings into single string. The substrings can be separated by a specified character or string.

Joining is the inverse of splitting.

How to remove whitespace from the beginning and the end of a string.

Trimming is often performed on user provided input.

How to pad the edge of a string with spaces so that it is a prescribed length.

How to convert a number to a string.

How to convert a string to number.

How to put a string into all caps. How to put a string into all lower case letters.

How to create a string using a printf style format.

How to get the number of characters in a string.

How to get the character in a string at a given index.

octave:

Octave supports indexing string literals directly: 'hello'(1) .

How to convert an ASCII code to a character; how to convert a character to its ASCII code.

The supported character class abbreviations.

A character class is a set of one or more characters. In regular expressions, an arbitrary character class can be specified by listing the characters inside square brackets. If the first character is a circumflex ^ , the character class is all characters not in the list. A hyphen - can be used to list a range of characters.

matlab:

The C-style backslash escapes, which can be regarded as character classes which match a single character, are a feature of the regular expression engine and not string literals like in other languages.

The supported anchors.

The \< and \> anchors match the start and end of a word respectively.

How to test whether a string matches a regular expression.

How to perform a case insensitive match test.

How to replace all substring which match a pattern with a specified string; how to replace the first substring which matches a pattern with a specified string.

How to use backreferences in a regex; how to use backreferences in the replacement string of substitution.

How to get the current date and time.

r:

Sys.time() returns a value of type POSIXct .

The data type used to hold a combined date and time value.

matlab:

The Gregorian calendar was introduced in 1582. The Proleptic Gregorian Calendar is sometimes used for earlier dates, but in the Proleptic Gregorian Calendar the year 1 CE is preceded by the year 1 BCE. The MATLAB epoch thus starts at the beginning of the year 1 BCE, but uses a zero to refer to this year.

The data type used to hold the difference between two date/time types.

How to get the year, the month as an integer from 1 through 12, and the day of the month from a date/time value.

octave:

In Octave, but not MATLAB, one can use index notation on the return value of a function:

t = now datevec(t)(1)

How to get the hour as an integer from 0 through 23, the minute, and the second from a date/time value.

How to build a date/time value from the year, month, day, hour, minute, and second as integers.

How to convert a date value to a string using the default format for the locale.

How to parse a date/time value from a string in the manner of strptime from the C standard library.

How to write a date/time value to a string in the manner of strftime from the C standard library.

The name of the data type which implements tuples.

How to create a tuple, which we define as a fixed length, inhomogeneous list.

How to access an element of a tuple.

How to change one of a tuple's elements.

How to get the number of elements in a tuple.

This section covers one-dimensional arrays which map integers to values.

Multidimensional arrays are a generalization which map tuples of integers to values.

Vectors and matrices are one-dimensional and two-dimensional arrays respectively containing numeric values. They support additional operations including the dot product, matrix multiplication, and norms.

Here are the data types covered in each section:

section matlab r numpy julia arrays matrix (ndims = 1) vector list multidimensional arrays matrix array np.array vectors matrix (ndims = 1) vector np.array (ndim = 1) matrices matrix (ndims = 2) matrix np.matrix

How to get the type of the elements of an array.

Permitted data types for array elements.

matlab:

Arrays in Octave can only contain numeric elements.

Array literals can have a nested structure, but Octave will flatten them. The following literals create the same array:

[ 1 2 3 [ 4 5 6] ] [ 1 2 3 4 5 6 ]

Logical values can be put into an array because true and false are synonyms for 1 and 0. Thus the following literals create the same arrays:

[ true false false ] [ 1 0 0 ]

If a string is encountered in an array literal, the string is treated as an array of ASCII values and it is concatenated with other ASCII values to produce as string. The following literals all create the same string:

[ 'foo', 98, 97, 114] [ 'foo', 'bar' ] 'foobar'

If the other numeric values in an array literal that includes a string are not integer values that fit into a ASCII byte, then they are converted to byte sized values.

r:

Array literals can have a nested structure, but R will flatten them. The following literals produce the same array of 6 elements:

c(1,2,3,c(4,5,6)) c(1,2,3,4,5,6)

If an array literal contains a mixture of booleans and numbers, then the boolean literals will be converted to 1 (for TRUE and T) and 0 (for FALSE and F).

If an array literal contains strings and either booleans or numbers, then the booleans and numbers will be converted to their string representations. For the booleans the string representations are "TRUE'" and "FALSE".

The syntax, if any, for an array literal.

matlab:

The array literal

[1,'foo',3]

will create an array with 5 elements of class char.

r:

The array literal

c(1,'foo',3)

will create an array of 3 elements of class character, which is the R string type.

How to get the number of values in an array.

How to make an address copy, a shallow copy, and a deep copy of an array.

After an address copy is made, modifications to the copy also modify the original array.

After a shallow copy is made, the addition, removal, or replacement of elements in the copy does not modify of the original array. However, if elements in the copy are modified, those elements are also modified in the original array.

A deep copy is a recursive copy. The original array is copied and a deep copy is performed on all elements of the array. No change to the contents of the copy will modify the contents of the original array.

An arithmetic sequence is a sequence of numeric values in which consecutive terms have a constant difference.

An arithmetic sequence with a difference of 1.

An arithmetic sequence with a difference of 10.

An arithmetic sequence with a difference of 0.1.

An arithmetic sequence where the difference is computed using the start and end values and the number of elements.

How to iterate over an arithmetic sequence.

How to convert an arithmetic sequence to an array.





Multidimensional arrays are a generalization of arrays which map tuples of integers to values. All tuples in the domain of a multidimensional array have the same length; this length is the dimension of the array.

The multidimensional arrays described in this sheet are homogeneous, meaning that the values are all of the same type. This restriction allows the implementation to store the values of the multidimensional array in a contiguous region of memory without the use of references or points.

Multidimensional arrays should be contrasted with nested arrays. When arrays are nested, the innermost nested arrays contain the values and the other arrays contain references to arrays. The syntax for looking up a value is usually different:

# nested: a[1][2] # multidimensional: a[1, 2]

How to get the type of the values stored in a multidimensional array.

r:

circular shift—2d

rotate—2d

apply function element-wise

apply function to linear subarrays

The syntax for a dictionary literal.

How to get the number of keys in a dictionary.

How to use a key to lookup a value in a dictionary.

How to add or key-value pair or change the value for an existing key.

What happens when looking up a key that isn't in the dictionary.

How to delete a key-value pair from a dictionary.

How to iterate over the key-value pairs.

How to get an array containing the keys; how to get an array containing the values.

How to merge two dictionaries.

How to define a function.

How to invoke a function.

What happens when a function is invoked with too few arguments.

What happens when a function is invoked with too many arguments.

How to assign a default argument to a parameter.

How to define a function which accepts a variable number of arguments.

How the return value of a function is determined.

How to return multiple values from a function.

The syntax for an anonymous function.

How to store a function in a variable.

How to write a branch statement.

How to write a conditional loop.

How to write a C-style for statement.

How to break out of a loop. How to jump to the next iteration of a loop.

How to raise an exception.

How to handle an exception.

Standard input, standard output, and standard error.

How to write a line to stdout.

matlab:

The backslash escape sequence

is stored as two characters in the string and interpreted as a newline by the IO system.

How to get and set the working directory.

How to get the command line arguments.

How to get and set and environment variable.

How to load a library.

Show the list of libraries which have been loaded.

The list of directories the interpreter will search looking for a library to load.

How to source a file.

r:

When sourcing a file, the suffix if any must be specified, unlike when loading library. Also, a library may contain a shared object, but a sourced file must consist of just R source code.

How to install a package.

How to list the packages which have been installed.

How to get the data type of a value.

r:

For vectors class returns the mode of the vector which is the type of data contained in it. The possible modes are

numeric

complex

logical

character

raw

Some of the more common class types for non-vector entities are:

matrix

array

list

factor

data.frame

How to get the attributes for an object.

r:

Arrays and vectors do not have attributes.

How to get the methods for an object.

How to list the variables in scope.

How to undefine a variable.

How to undefine all variables.

How to interpret a string as source code and execute it.

How to get the documentation for a function.

How to list the functions and other definitions in a library.

How to search the documentation by keyword.

How to benchmark code.

Octave Manual

MATLAB Documentation

Differences between Octave and MATLAB

Octave-Forge Packages

The basic data type of MATLAB is a matrix of floats. There is no distinction between a scalar and a 1x1 matrix, and functions that work on scalars typically work on matrices as well by performing the scalar function on each entry in the matrix and returning the results in a matrix with the same dimensions. Operators such as the logical operators ('&' '|' '!'), relational operators ('==', '!=', '<', '>'), and arithmetic operators ('+', '-') all work this way. However the multiplication '*' and division '/' operators perform matrix multiplication and matrix division, respectively. The .* and ./ operators are available if entry-wise multiplication or division is desired.

Floats are by default double precision; single precision can be specified with the single constructor. MATLAB has convenient matrix literal notation: commas or spaces can be used to separate row entries, and semicolons or newlines can be used to separate rows.

Arrays and vectors are implemented as single-row ( 1xn ) matrices. As a result an n-element vector must be transposed before it can be multiplied on the right of a mxn matrix.

Numeric literals that lack a decimal point such as 17 and -34 create floats, in contrast to most other programming languages. To create an integer, an integer constructor which specifies the size such as int8 and uint16 must be used. Matrices of integers are supported, but the entries in a given matrix must all have the same numeric type.

Strings are implemented as single-row ( 1xn ) matrices of characters. Matrices cannot contain strings. If a string is put in matrix literal, each character in the string becomes an entry in the resulting matrix. This is consistent with how matrices are treated if they are nested inside another matrix. The following literals all yield the same string or 1xn matrix of characters:

'foo' [ 'f' 'o' 'o' ] [ 'foo' ] [ [ 'f' 'o' 'o' ] ]

true and false are functions which return matrices of ones and zeros. The ones and zeros have type logical instead of double, which is created by the literals 1 and 0. Other than having a different class, the 0 and 1 of type logical behave the same as the 0 and 1 of type double.

MATLAB has a tuple type (in MATLAB terminology, a cell array) which can be used to hold multiple strings. It can also hold values with different types.

An Introduction to R

Advanced R Programming

The Comprehensive R Archive Network

The primitive data types of R are vectors of floats, vectors of strings, and vectors of booleans. There is no distinction between a scalar and a vector with one entry in it. Functions and operators which accept a scalar argument will typically accept a vector argument, returning a vector of the same size with the scalar operation performed on each the entries of the original vector.

The scalars in a vector must all be of the same type, but R also provides a list data type which can be used as a tuple (entries accessed by index), record (entries accessed by name), or even as a dictionary.

In addition R provides a data frame type which is a list (in R terminology) of vectors all of the same length. Data frames are equivalent to the data sets of other statistical analysis packages.

NumPy and SciPy Documentation

matplotlib intro

NumPy for Matlab Users

Pandas Documentation

Pandas Method/Attribute Index

NumPy is a Python library which provides a data type called array . It differs from the Python list data type in the following ways:

N-dimensional. Although the list type can be nested to hold higher dimension data, the array can hold higher dimension data in a space efficient manner without using indirection.

type can be nested to hold higher dimension data, the can hold higher dimension data in a space efficient manner without using indirection. homogeneous. The elements of an array are restricted to be of a specified type. The NumPy library introduces new primitive types not available in vanilla Python. However, the element type of an array can be object which permits storing anything in the array.

In the reference sheet the array section covers the vanilla Python list and the multidimensional array section covers the NumPy array .

List the NumPy primitive types

SciPy, Matplotlib, and Pandas are libraries which depend on Numpy.

http://julialang.org/