CSV

CSV (comma-separated values) is the simplest tabular file format. Each row of the table is a line of text, and the values within a row are separated by commas. The first line is usually a header naming the columns.

name,age,email
Alice,31,alice@example.com
Bob,24,bob@example.com

CSV is human-readable, every spreadsheet program can open it, and almost every programming language has built-in CSV support. In Python with Pandas:

import pandas as pd
df = pd.read_csv("my_data.csv")

The closely related TSV uses tabs instead of commas as separators. Strictly speaking, CSV can represent commas inside fields by quoting ("Smith, John" is legal per RFC 4180), but real-world CSV consumers handle quoting inconsistently and tabs almost never appear inside data values, so TSV sidesteps the most common parsing headaches.

CSV is the right format when the data is naturally tabular, when each record has the same fields, when no field contains nested structure, and when the dataset is small to medium-sized. For nested or irregular records, JSON is the right format. For large scientific arrays with hierarchical organization, HDF5 is. For tables with formulas, formatting, or multiple sheets, the spreadsheet formats XLS and XLSX exist but are harder to work with programmatically.

A subtle pitfall is that what counts as a CSV is loose. Some files use semicolons (;) instead of commas, common in European locales where commas serve as decimal separators. Some files quote string fields with embedded commas; some don’t. Pandas’ read_csv handles most variants, but the parsing flags (sep, quotechar, encoding) often need tweaking for real-world files.

Idriss Rami — Notes

Explorer

CSV

Graph View

Backlinks