Back to Engineering Notes
Professional ExperienceEngineering Note

13. Database Normalization & Data Integrity Design

I design database structures using normalization to ensure systems remain consistent, maintainable, and scalable as they grow across modules, workflows, and complex business rules.

# 🧠 Database Normalization & Data Integrity Design

I design database structures using normalization to ensure systems remain consistent, maintainable, and scalable as they grow across modules, workflows, and complex business rules.


🎯 How I Think About Database Design

Data issues rarely come from queries —

they come from poor structure.

If data is not designed properly, it leads to:

duplicated data
inconsistent updates
broken business logic
difficult maintenance

So I treat normalization as a core system design concern, not just a database concept in Database Management Systems.


🧩 What Normalization Means to Me

I don’t treat normalization as theory.

I treat it as a practical design mindset:

> each piece of data should live in the right place, once, with clear ownership


📐 Normal Forms (How I Apply Them)

I don’t memorize rules — I apply them through design thinking.

1️⃣ First Normal Form (1NF) — Atomic Data

no arrays or grouped values
each column stores a single value

👉 I avoid:

comma-separated values
JSON for relational data

2️⃣ Second Normal Form (2NF) — Full Dependency

every column must depend on the full primary key
especially important for composite keys

👉 I check:

> “Is this field dependent on the whole context or just part of it?”


3️⃣ Third Normal Form (3NF) — No Hidden Dependency

non-key columns should not depend on other non-key columns

👉 I validate:

> “Does this data belong here, or is it derived from another table?”


🧩 Design Approach

Separation of Responsibilities

I design tables based on business responsibilities, not convenience.

Example: KPI System

KPI Group → classification
KPI → definition
Employee → identity
KPI Employee → transaction/result

👉 Each table owns one responsibility


Master vs Transaction Data

I clearly separate:

Master Data
stable, defines structure
(KPI, policy, category)
Transaction Data
dynamic, represents operations
(KPI results, expenses)

👉 This is a direct result of applying normalization


Dependency Awareness

I always validate:

> “Does this field depend on the correct key?”

Example mistake I avoid:

plain text
kpi_employees
(
  kpi_id,
  employee_id,
  kpi_name,
  kpi_group_name,
  target,
  actual
)

Because:

kpi_name, kpi_group_name depend on kpi_id
not on KPI result

👉 So they belong in:

kpis
kpi_groups

🧠 Good Practices I Follow

design tables with single responsibility
store each data point in one place
avoid redundant columns
define clear foreign key relationships
separate master and transaction data
validate dependencies before adding fields
think in business domains, not just schema

⚖️ Tradeoff Awareness

normalized design:
✅ strong consistency
✅ easier updates
❌ more joins
denormalized design:
✅ faster reads
❌ duplication risk

👉 I balance both based on system needs


🚀 When I Intentionally Break Normalization

Normalization ensures clean structure —

but real systems also require business correctness over time.


💡 My Principle

> I break normalization only when business logic depends on historical state


📌 Snapshot-Based Design (Real Case)

Example: Expense System

An expense is created based on:

policy
expense category
approval flow
rules at that time

But over time:

policies may change
categories may be disabled
approval rules may be updated
approvers may be replaced

👉 If I rely only on normalized configuration tables:

old expenses may become inconsistent
approval history may break
business logic becomes unreliable

✅ My Approach (Transaction-Level Snapshot)

I do not pollute the expenses table with snapshot columns.

Instead, I preserve business state through transaction-level tables.


1️⃣ Core Expense (Clean Transaction)

plain text
expenses
(
  id,
  uuid,
  name,
  description,
  attributed_employee_id,
  submitted_by,
  amount,
  currency,
  submitted_date,
  paid_out_status,
  paid_out_date,
  pl_effective_date,
  expense_status,
  created_at,
  updated_at,
  deleted_at
)

👉 This table only represents the expense itself.


2️⃣ Approval Snapshot (Key Design)

plain text
expense_approvers
(
  id,
  uuid,
  expense_id,
  approver_type,
  approver_lvl,
  approver_id,
  status,
  approver_reason,
  response_date,
  created_at,
  updated_at
)

✔ Derived from:

plain text
policy_expense_approvers

👉 When an expense is submitted:

approval configuration is copied
and stored as expense_approvers

So even if:

policy changes
approval rules update
approvers change

👉 The expense keeps its original approval flow


3️⃣ File Association (Independent Context)

plain text
expense_files
(
  expense_id,
  file_id,
  type
)

Linked with:

plain text
files
(
  id,
  uuid,
  name,
path,
  extension,
size,
  ...
)

👉 Files are stored independently and linked via a pivot table


🎯 Why This Matters

This ensures:

historical accuracy
auditability
stable approval workflow
correct past approval records
no dependency on mutable configuration

👉 The system reflects:

> what was true when the expense was submitted, not what is true now


⚠️ Important Design Insight

This is controlled denormalization, not poor design.

configuration remains normalized
transactions store necessary snapshots
responsibilities remain clear

📌 Practical Approach

I design databases around:

> data ownership, lifecycle, and business boundaries

not just:

normalization rules
table structure

💬 Summary

My approach to database normalization focuses on:

clear data ownership
strong consistency
separation of concerns
correct dependency design

At the same time, I:

selectively break normalization
preserve business history through transaction tables
prioritize correctness over theory

This ensures systems remain reliable, scalable, and production-ready as they evolve.