A Pandas DataFrame is a two-dimensional labelled data structure — essentially a table with rows and columns, where column names are strings and rows are indexed by an integer position (or by a custom index). It’s the central data structure of Pandas and the standard format for tabular data in Python data science.
A DataFrame’s mental model is a spreadsheet. Each column has a name (Height, Weight, Age) and a homogeneous type (NumPy underneath, typically float64, int64, object for strings, bool, datetime64). Each row corresponds to one observation — one student, one purchase, one sensor reading.
import pandas as pd
df = pd.read_csv("my_data.csv")
print(df.head()) # first 5 rows
print(df.shape) # (n_rows, n_columns)
print(df.dtypes) # types of each column
print(df.columns) # column namesThree access patterns dominate everyday work:
Column access. df['Name'] returns a one-dimensional Pandas Series:
heights = df['Height']Position-based access with .iloc (integer location). Same syntax as NumPy slicing:
df.iloc[0, 0] # first row, first column
df.iloc[0:5, :] # first 5 rows, all columns
df.iloc[:, -1] # all rows, last column (often the label)Label-based access with .loc. Indexes by row label and column name, and supports conditional row selection:
df.loc[df['Height'] > 5.8, ['Name', 'Height']]This reads as: select rows where the Height column is greater than 5.8, and return the Name and Height columns. The Boolean expression df['Height'] > 5.8 evaluates row by row, and .loc keeps rows where the result is True.
DataFrames support the methods you’d expect for tabular operations: .groupby(...), .sort_values(...), .merge(...), .pivot_table(...), .fillna(...), .dropna(...), .interpolate(...), .rolling(...). Most of these return new DataFrames rather than modifying in place — pass inplace=True to modify the original.
For one-dimensional data, a Pandas Series is the analogous structure — a single column with a name and an index, but no other columns alongside it.