Data Structure & Normalization

Data Structure Dependencies

Data can be structured in ways that create fragile dependencies. Ordering dependence means programs rely on records being stored in a specific sequence — change that order and the programs break. Indexing dependence ties programs to specific index numbers; adding or removing indices breaks them. Access-path dependence makes programs follow particular tree or network paths through the data; restructure those paths and everything fails. These dependencies couple applications tightly to physical storage, making maintenance expensive and evolution risky.

Simple and Nonsimple Domains

A simple domain is made up of atomic values — individual numbers or strings that cannot be decomposed further. A nonsimple domain is a column in a relation where each value is not a single item but an entire nested table by itself. Nonsimple domains complicate queries, updates, and integrity checking, so relational theory pushes toward eliminating them.

Normalization

Normalization is the process of eliminating nonsimple domains and reshaping relations into normal form, where all domains are simple. The method works by copying the parent relation's primary key into each child relation, expanding child keys to include the copied key, and then removing the nonsimple domain from the parent. Repeat this down the tree until every relation is flat.

The benefits are practical: normalized data is easy to store and send between systems, requires no pointers or hash-based addresses in the logical view, and simplifies naming because each attribute lives in exactly one place with a clear meaning.

Universal Data Sublanguage

A universal data sublanguage is a general way of talking to any relational database, independent of how the data is physically stored inside. SQL grew out of this idea. The sublanguage R handles defining tables, columns, keys, and constraints, as well as writing queries and updates. The host language H deals with files, memory, loops, user interfaces, and other utilities — the normal programming stuff that wraps around database operations.

Redundancy

Strong redundancy means a table is one hundred percent predictable from other tables in one fixed way, so you do not need to store it — you can always derive it on demand. Weak redundancy means a table depends on other tables but not in one fixed way forever, so you usually keep it stored to avoid expensive recalculations or because the dependency might change over time.

Rule Checking

Integrity rules can be checked continuously — every time data changes — which is slower but catches errors immediately, or periodically in batches, which is faster day to day but means errors are discovered later. The tradeoff is between responsiveness and throughput, and different systems choose different points on that spectrum depending on their workload and tolerance for inconsistency.

The DBMS Challenge

The main challenge for any DBMS is turning high-level relational queries — statements about what you want — into fast low-level operations on physical storage, without breaking correctness or sacrificing data independence. Achieving this balance is what separates a good database engine from a slow or brittle one.