Pandas is the Python library for tabular data manipulation. It’s built on top of NumPy and provides a structure called a DataFrame — essentially a table with rows and columns, with column names that act as labels. A DataFrame is mentally a spreadsheet living inside a Python program.
The conventional import:
import pandas as pdReading data into a DataFrame is one method call per format:
df = pd.read_csv("my_data.csv") # CSV
df = pd.read_json("my_data.json") # JSON
df = pd.read_excel("my_data.xlsx") # Excel
df = pd.read_sql(query, connection) # SQL database
df = pd.read_hdf("my_data.h5", "key") # HDF5Three patterns cover most of what we do with DataFrames in practice:
Column access by name. Returns a one-dimensional Pandas Series:
df['Name']Position-based indexing with .iloc. NumPy-style slicing on numeric row and column positions:
df.iloc[0:2, 0] # rows 0 and 1, column 0
df.iloc[:, -1] # all rows, last columnConditional filtering with .loc and a Boolean expression:
df.loc[df['Height'] > 5.8, :] # rows where Height > 5.8, all columnsThe Boolean array df['Height'] > 5.8 is evaluated row by row, and .loc keeps the rows where it’s True.
Pandas also provides the Pandas rolling method for windowed computations — moving averages, rolling features — and fillna, interpolate, dropna for Missing data handling. The pd.merge() and pd.concat() functions handle table joins and concatenations.
Pandas pairs naturally with Matplotlib (DataFrames have a .plot() method that wraps Matplotlib calls), scikit-learn (most sklearn estimators accept DataFrames directly), and NumPy (under the hood, columns are backed by NumPy arrays by default, though recent pandas versions also support PyArrow-backed dtypes). It’s essentially the lingua franca for tabular data in the Python data-science stack.