How fast can be Python

Python is a high level programming language. Like others high level programming languages, Python is known to be slow, but for scientific computation, there are some useful tools and tips to make your code a lot faster. The experiments presented hereafter aim at illustrating this.

Experimentations Pt. 1 : Object attributes vs. Functions

The aim of the following experiments is to compare the use of objects attributes (or methods) and corresponding equivalent functions.

import numpy as np

N = 1000
v = np.ones(N)

%timeit np.size(v)
168 ns ± 1.39 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit v.size
45.1 ns ± 0.674 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit np.sum(v)
2.77 µs ± 74 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit v.sum()
2.08 µs ± 57.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Conclusions: Always use object attributes rather than equivalent functions

Experimentations Pt. 2 : low level function vs. convenient function

The aim of the following experiments is to compare the use of low level function and convenient functions.

import numpy as np

a = np.random.rand(1000)
b = np.random.rand(1000)

%timeit np.hstack((a, b))
2.77 µs ± 86.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# equivalent to np.hstack((a, b)):
%timeit np.concatenate((a, b))
1.26 µs ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%timeit np.vstack((a, b))
3.23 µs ± 55.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# equivalent to np.vstack((a, b)):
%timeit np.concatenate((a[np.newaxis, :], b[np.newaxis, :]))
1.76 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Conclusions: Prefer the use of low level numpy functions.

Experimentations Pt. 3 : Explicit vs. Implicit loops

The three function presented hereafter lead to the same results using three different formulations :

floop() uses explicit loops
fslice() uses slicing properties of ndarray objects
fnumba() uses the explicit formulation with a “just-in-time” compiler (numba)

import numpy as np
import numba as nb


def floop(N):

    v = np.arange(N**2).reshape(N, N)
    out = np.zeros_like(v)

    for i in range(1, N-1):
        for j in range(N):
            out[i, j] = 0.5*(v[i+1, j] - v[i-1, j])

    return out


def fslice(N):

    v = np.arange(N**2).reshape(N, N)
    out = np.zeros_like(v)
    out[1:-1, :] = 0.5*(v[2:, :] - v[:-2, :])

    return out


@nb.jit
def fnumba(N):

    v = np.arange(N**2).reshape(N, N)
    out = np.zeros_like(v)

    for i in range(1, N-1):
        for j in range(N):
            out[i, j] = 0.5*(v[i+1, j] - v[i-1, j])

    return out

The runing time for each function is as follows :

%timeit floop(1000)
2.1 s ± 78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit fslice(1000)
17.5 ms ± 559 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit fnumba(1000)
8.07 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It is worth noting that when calling a function decorated with nb.jit for the first time, this function is compiled in a low-level language (llvm) and then executed. So the first call to this function will take much longer than the followings !

Conclusions:

Avoid as possible explicit loops.
When calling a function one single time, always use slices rather than explicit loop.
When calling a function a large number of times, prefer compiled (jit) explicit formulation.