Pandas DataFrame

A Pandas DataFrame is a two-dimensional labelled data structure, basically a table with rows and columns, where column names are strings and rows are indexed by an integer position (or by a custom index). It’s the central data structure of Pandas and the standard format for tabular data in Python data science.

A DataFrame’s mental model is a spreadsheet. Each column has a name (Height, Weight, Age) and a homogeneous type (NumPy underneath, typically float64, int64, object for strings, bool, datetime64). Each row corresponds to one observation: one student, one purchase, one sensor reading.

import pandas as pd
df = pd.read_csv("my_data.csv")
print(df.head())          # first 5 rows
print(df.shape)           # (n_rows, n_columns)
print(df.dtypes)          # types of each column
print(df.columns)         # column names

Three access patterns dominate everyday work:

Column access. df['Name'] returns a one-dimensional Pandas Series:

heights = df['Height']

Position-based access with .iloc (integer location). Same syntax as NumPy slicing:

df.iloc[0, 0]            # first row, first column
df.iloc[0:5, :]          # first 5 rows, all columns
df.iloc[:, -1]           # all rows, last column (often the label)

Label-based access with .loc. Indexes by row label and column name, and supports conditional row selection:

df.loc[df['Height'] > 5.8, ['Name', 'Height']]

This reads as: select rows where the Height column is greater than 5.8, and return the Name and Height columns. The Boolean expression df['Height'] > 5.8 evaluates row by row, and .loc keeps rows where the result is True.

DataFrames support the methods you’d expect for tabular operations: .groupby(...), .sort_values(...), .merge(...), .pivot_table(...), .fillna(...), .dropna(...), .interpolate(...), .rolling(...). Most of these return new DataFrames rather than modifying in place; pass inplace=True to modify the original.

For one-dimensional data, a Pandas Series is the analogous structure — a single column with a name and an index, but no other columns alongside it.

Idriss Rami — Notes

Explorer

Pandas DataFrame

Graph View

Backlinks