NumPy arrays

A NumPy array (ndarray) is the core data structure of the NumPy library: a multi-dimensional, homogeneous array of fixed-type numerical data, designed for efficient numerical computation. Unlike Python lists, NumPy arrays support fast vectorized operations, broadcasting, and dramatically better memory efficiency for large numerical datasets.

NumPy is the foundation of the scientific Python stack — pandas, scikit-learn, TensorFlow, and PyTorch all build on or interoperate with NumPy arrays.

Why NumPy

Python lists are flexible but inefficient for numerical computation:

Lists store Python objects, each with type tags and reference counts. Lots of overhead per element.
Lists don’t support vectorized operations — you need explicit for loops for element-wise math.
Lists are not memory-contiguous, hurting cache performance.

NumPy arrays solve all three:

Contiguous memory layout, like a C array.
Single homogeneous data type per array (int32, float64, etc.).
Vectorized operations implemented in C with SIMD where available.

The result: 10–100× faster numerical computation, plus cleaner code.

Creating arrays

From Python lists:

import numpy as np
 
a = np.array([1, 2, 3, 4])
b = np.array([[1, 2], [3, 4]])    # 2D array

From specific patterns:

np.zeros(5)          # [0, 0, 0, 0, 0]
np.ones((3, 3))      # 3x3 of ones
np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5) # [0.0, 0.25, 0.5, 0.75, 1.0]
np.random.rand(3, 3) # 3x3 of random floats in [0, 1)
np.eye(4)            # 4x4 identity matrix

Array attributes

Every array exposes:

arr = np.array([[1, 2, 3], [4, 5, 6]])
 
arr.shape       # (2, 3) — dimensions
arr.size        # 6 — total elements
arr.ndim        # 2 — number of dimensions
arr.dtype       # int64 (or whatever) — element type
arr.itemsize    # 8 — bytes per element

Indexing

Basic integer indexing is similar to Python lists:

arr = np.array([10, 20, 30, 40, 50])
arr[0]         # 10
arr[-1]        # 50
arr[2]         # 30

For multi-dimensional arrays, use comma-separated indices:

m = np.array([[1, 2, 3], [4, 5, 6]])
m[0, 1]        # 2 — row 0, column 1
m[1, 2]        # 6
m[1]           # [4, 5, 6] — entire row 1

For slicing, see NumPy array slicing. For boolean and condition-based selection, see NumPy advanced indexing.

Reshaping

Change the shape of an array without changing its data:

a = np.arange(12)                # [0, 1, 2, ..., 11]
a.reshape(3, 4)                  # 3×4 matrix
a.reshape(2, 2, 3)               # 2x2x3 tensor
a.reshape(-1, 4)                 # automatically infer first dimension

Reshape returns a view (not a copy) when possible — no data movement.

Flatten a multi-dim array to 1D:

m.flatten()      # makes a copy, returns 1D array
m.ravel()        # makes a view if possible, otherwise copy

Iteration

You can loop over a NumPy array, but you usually shouldn’t:

for x in arr:
    print(x)        # works, but slow

Vectorized operations (NumPy arithmetic and comparison operations) are vastly preferred. Loops should be a last resort.

Type considerations

NumPy infers the dtype from the input data:

np.array([1, 2, 3])          # int64 on most modern systems (NumPy ≥ 2.0)
np.array([1.0, 2.0, 3.0])    # float64
np.array([True, False])      # bool

The default integer dtype used to be platform-dependent: NumPy 1.x followed the C long type, which is 64-bit on Linux and macOS but 32-bit on Windows. NumPy 2.0 (June 2024) unified the default to a pointer-sized integer, so on essentially all modern 64-bit platforms — including Windows — the default is now int64. If you need to support pre-2.0 NumPy or want to be explicit about width regardless of version, pass dtype=np.int64.

Force a specific type:

np.array([1, 2, 3], dtype=np.float32)
np.zeros(5, dtype=np.int8)

Be careful: NumPy uses fixed-precision integer types by default, so np.array([1, 2, 3]).cumprod() for very large products can overflow silently. Cast to a bigger type or use Python’s arbitrary-precision integers.

Why NumPy in EE

NumPy is the workhorse for:

Signal processing: FFTs, filters, sampled data manipulation.
Linear algebra: matrix multiplication, eigenvalues, decompositions.
Image processing: images as 3D arrays (height, width, channels).
Statistics and data analysis: histograms, correlations.
Machine learning data prep: feature matrices, batch tensors.

For arithmetic operations on arrays, see NumPy arithmetic and comparison operations. For slicing patterns, see NumPy array slicing. For boolean filtering, see NumPy advanced indexing.

Idriss Rami — Notes

Explorer