Anomalies in Unnormalized Databases
Data anomalies are inconsistencies or errors that arise when a database is poorly structured—typically due to redundant or unnormalized data. These anomalies compromise data integrity, consistency, and efficiency.
Types of Anomalies
There are three primary types of anomalies:
Insert Anomaly
Occurs when you cannot insert data into a table without the presence of unrelated data.
Example
Suppose we have a single table storing both student and course enrollment data:
| StudentID | StudentName | CourseID | CourseName |
|-----------|-------------|----------|------------|
| 1 | Alice | CS101 | Databases |
Now, you want to add a new course before any student enrolls. You can’t—because the table requires a StudentID.
Why it happens
This happens when multiple entities (like Students and Courses) are stored in the same table. The design assumes that every course must have a student, which isn’t always true.
Solution
Split the tables:
erDiagram
Student {
int StudentID
string StudentName
}
Course {
string CourseID
string CourseName
}
Enrollment {
int StudentID
string CourseID
}
Student ||--o{ Enrollment : enrolls
Course ||--o{ Enrollment : includes
Now, you can insert a course independently.
Update Anomaly
Occurs when updating a single piece of data requires multiple updates across rows.
Example
-- Redundant data
| CourseID | CourseName | ProfessorEmail |
|----------|------------|--------------------|
| CS101 | Databases | prof@uni.edu |
| CS102 | AI | prof@uni.edu |
If the professor changes their email, you must update every row. Miss one? You’ve got inconsistent data.
Why it happens
This is due to data redundancy—the same piece of information (email) is stored in multiple places.
Solution
Separate professors into their own table:
erDiagram
direction LR
Professor {
int ProfessorID
string Email
}
Course {
string CourseID
string CourseName
int ProfessorID
}
Professor ||--o{ Course : teaches
Now, one update to the Professor table updates the email for all courses.
Deletion Anomaly
Occurs when deleting data unintentionally removes other valuable data.
Example
-- ENROLLMENT table
| StudentID | CourseID | CourseName |
|-----------|----------|------------|
| 1 | CS101 | Databases |
If the only student enrolled in CS101 drops the course and you delete the row, you also lose the course information.
Why it happens
This happens when multiple entities are stored together, and deleting one entity (student enrollment) removes another (course).
Solution
erDiagram
direction LR
Student {
int StudentID
string StudentName
}
Course {
string CourseID
string CourseName
}
Enrollment {
int StudentID
string CourseID
}
Student ||--o{ Enrollment : enrolls
Enrollment }o--|| Course: includes
Again, normalize the data into separate tables for Student, Course, and Enrollment.
Try it yourself :rocket:
Part 1: Analyze the Table Below
A database designer created the following table for a university’s course registrations:
| StudentID | StudentName | CourseID | CourseName | ProfessorName | ProfessorEmail |
|---|---|---|---|---|---|
| 101 | Alice Smith | CS101 | Databases | Dr. Fox | fox@uni.edu |
| 101 | Alice Smith | CS102 | Networks | Dr. Lin | lin@uni.edu |
| 102 | Bob Lee | CS101 | Databases | Dr. Fox | fox@uni.edu |
Question 1: Identify all possible anomalies in this table. Provide specific examples:
- Insertion anomaly
- Update anomaly
- Deletion anomaly
Question 2: Why do these anomalies occur? Hint: Consider redundancy and entity relationships.
Part 2: Normalize the Data
Redesign the table structure by dividing it into three or more normalized tables, removing redundancy and eliminating the identified anomalies.
- Draw ER diagrams (or tables) for the new structure on paper.
- Indicate primary keys and foreign keys.
- Explain how your new schema prevents each of the anomalies identified in Part 1.
Bonus challange
Suppose a new requirement is added: each professor may teach multiple courses, and some courses are co-taught by more than one professor. Modify your schema to accommodate this and preserve normalization. What normal form are you aiming for now?
Possible solution
```mermaid erDiagram Student { int StudentID string StudentName } Course { string CourseID string CourseName } Professor { int ProfessorID string ProfessorName string ProfessorEmail } Enrollment { int StudentID string CourseID } Teaches { string CourseID int ProfessorID } Student ||--o{ Enrollment : enrolls Course ||--o{ Enrollment : includes Professor ||--o{ Teaches : instructs Course ||--o{ Teaches : is_taught_in ``` Splitting data into: - Student(StudentID, StudentName) - Course(CourseID, CourseName) - Professor(ProfessorID, ProfessorName, Email) - A junction table for Teaches(CourseID, ProfessorID) - A junction table for Enrollment(StudentID, CourseID)