A relational database stores data in tables with rows and columns, where tables can be linked to each other through relationships. A non-relational database might store data as documents, or as key-value pairs, or as graphs of nodes and edges; a relational database stores it specifically as tables that reference each other.
The reason this matters is that relational databases let us perform efficient complex queries. Suppose we have a Students table, a Supervisors table, and a Majors table. In a relational database we can ask show me all students whose supervisor has a Ph.D. and who are majoring in Kinesiology, and the engine will efficiently combine information across all three tables to answer. In a non-relational database, we’d have to write that combining logic ourselves.
Relational databases are compact, well-structured, and efficient — three adjectives the Introduction to Data Science course uses in exactly that order.
Some basic terminology:
- A table is a structured collection of related data, organized in rows and columns.
- A row (also called a record) is one entry in the table — one student, one purchase, one sensor reading.
- A column (also called a field) is a named attribute that takes a particular type of value — a student’s first name, a purchase’s amount, a sensor reading’s timestamp.
Two more concepts make the linking work: a Primary key is a column (or combination) whose values uniquely identify each row, and a Foreign key is a column whose values must exist in some other table’s primary key, pointing one table at another.
The design step is typically sketched with an Entity-relationship diagram (high-level conceptual) and a Relational schema (closer to implementation). The query language is SQL. The software that manages a relational database is a Database management system (DBMS).