13. Database Normalization & Data Integrity Design | Engineering Notes

# 🧠 Database Normalization & Data Integrity Design

I design database structures using normalization to ensure systems remain consistent, maintainable, and scalable as they grow across modules, workflows, and complex business rules.

🎯 How I Think About Database Design

Data issues rarely come from queries —

they come from poor structure.

If data is not designed properly, it leads to:

duplicated data

inconsistent updates

broken business logic

difficult maintenance

So I treat normalization as a core system design concern, not just a database concept in Database Management Systems.

🧩 What Normalization Means to Me

I don’t treat normalization as theory.

I treat it as a practical design mindset:

> each piece of data should live in the right place, once, with clear ownership

📐 Normal Forms (How I Apply Them)

I don’t memorize rules — I apply them through design thinking.

1️⃣ First Normal Form (1NF) — Atomic Data

no arrays or grouped values

each column stores a single value

👉 I avoid:

comma-separated values

JSON for relational data

2️⃣ Second Normal Form (2NF) — Full Dependency

every column must depend on the full primary key

especially important for composite keys

👉 I check:

> “Is this field dependent on the whole context or just part of it?”

3️⃣ Third Normal Form (3NF) — No Hidden Dependency

non-key columns should not depend on other non-key columns

👉 I validate:

> “Does this data belong here, or is it derived from another table?”

🧩 Design Approach

Separation of Responsibilities

I design tables based on business responsibilities, not convenience.

Example: KPI System

KPI Group → classification

KPI → definition

Employee → identity

KPI Employee → transaction/result

👉 Each table owns one responsibility

Master vs Transaction Data

I clearly separate:

Master Data

stable, defines structure

(KPI, policy, category)

Transaction Data

dynamic, represents operations

(KPI results, expenses)

👉 This is a direct result of applying normalization

Dependency Awareness

I always validate:

> “Does this field depend on the correct key?”

Example mistake I avoid:

plain text

kpi_employees
(
  kpi_id,
  employee_id,
  kpi_name,
  kpi_group_name,
  target,
  actual
)

Because:

kpi_name, kpi_group_name depend on kpi_id

not on KPI result

👉 So they belong in:

kpis

kpi_groups

🧠 Good Practices I Follow

design tables with single responsibility

store each data point in one place

avoid redundant columns

define clear foreign key relationships

separate master and transaction data

validate dependencies before adding fields

think in business domains, not just schema

⚖️ Tradeoff Awareness

normalized design:

✅ strong consistency

✅ easier updates

❌ more joins

denormalized design:

✅ faster reads

❌ duplication risk

👉 I balance both based on system needs

🚀 When I Intentionally Break Normalization

Normalization ensures clean structure —

but real systems also require business correctness over time.

💡 My Principle

> I break normalization only when business logic depends on historical state

📌 Snapshot-Based Design (Real Case)

Example: Expense System

An expense is created based on:

policy

expense category

approval flow

rules at that time

But over time:

policies may change

categories may be disabled

approval rules may be updated

approvers may be replaced

👉 If I rely only on normalized configuration tables:

old expenses may become inconsistent

approval history may break

business logic becomes unreliable

✅ My Approach (Transaction-Level Snapshot)

I do not pollute the expenses table with snapshot columns.

Instead, I preserve business state through transaction-level tables.

1️⃣ Core Expense (Clean Transaction)

plain text

expenses
(
  id,
  uuid,
  name,
  description,
  attributed_employee_id,
  submitted_by,
  amount,
  currency,
  submitted_date,
  paid_out_status,
  paid_out_date,
  pl_effective_date,
  expense_status,
  created_at,
  updated_at,
  deleted_at
)

👉 This table only represents the expense itself.

2️⃣ Approval Snapshot (Key Design)

plain text

expense_approvers
(
  id,
  uuid,
  expense_id,
  approver_type,
  approver_lvl,
  approver_id,
  status,
  approver_reason,
  response_date,
  created_at,
  updated_at
)

✔ Derived from:

plain text

policy_expense_approvers

👉 When an expense is submitted:

approval configuration is copied

and stored as expense_approvers

So even if:

policy changes

approval rules update

approvers change

👉 The expense keeps its original approval flow

3️⃣ File Association (Independent Context)

plain text

expense_files
(
  expense_id,
  file_id,
  type
)

Linked with:

plain text

files
(
  id,
  uuid,
  name,
path,
  extension,
size,
  ...
)

👉 Files are stored independently and linked via a pivot table

🎯 Why This Matters

This ensures:

historical accuracy

auditability

stable approval workflow

correct past approval records

no dependency on mutable configuration

👉 The system reflects:

> what was true when the expense was submitted, not what is true now

⚠️ Important Design Insight

This is controlled denormalization, not poor design.

configuration remains normalized

transactions store necessary snapshots

responsibilities remain clear

📌 Practical Approach

I design databases around:

> data ownership, lifecycle, and business boundaries

not just:

normalization rules

table structure

💬 Summary

My approach to database normalization focuses on:

clear data ownership

strong consistency

separation of concerns

correct dependency design

At the same time, I:

selectively break normalization

preserve business history through transaction tables

prioritize correctness over theory

This ensures systems remain reliable, scalable, and production-ready as they evolve.