Database Design

Course list http://www.c-jump.com/bcc/

Database Design

General Design Considerations
Database Entities
Primary Key
Business Rules
Entity Relationships
Entity Relationships, cont.
Normalization
Enrollment Table
College Database
Entity Attributes
Foreign Key
Example 1: Instructor's Weekly Schedule
Example 2: Student's Weekly Schedule
Additional Info

General Design Considerations

The process of deciding what the database will look like is called database design. Designing a database involves choosing
- The tables that belong in the database
- The columns that belong in each table
- How tables and columns interact with each other
In this class, everybody is going to be working with a database in a single-user mode. My goal as your instructor is to make sure that our database structure is straightforward enough so that everybody can create it and play with some data.

Database Entities

To begin working with the real database, we first need to define our business model.
Our main theme of business in this class will be the business model of a college.
The school model includes a number of useful entities, such as: a student, an instructor, a course, a department, the schedule, and so on.
Entities describe objects and things that we have to deal with in business. An entity often refers to something that really exists in real life. Keep in mind that if car is a genuine physical object, legal units or contracts, such as driver license, an insurance policy, or a traffic violation ticket are also good examples of entities.
Typically, database design begins with pen and paper. The design model often goes far beyond the database storage. It can extend to describe an entire application or project, such as patient management system in a hospital or a claim support system for an insurance company.
In context of a database design, an entity eventually becomes a record in a corresponding database table. Each table is designated to store one particular type of record. A record contains values for every column in the table. Some values may be left blank, indicating that the data in that column is optional or is not available at the moment. For example, a student may choose not to have a personal email outside of the college, so this field can be left blank.
Since we focus on the database, and not overall application design, it is permissible to use the terms table row or record in place of an entity.

Primary Key

A record stored in a Student table is an example of an entity. The table could have the following columns:
```
    Student
    ---------
    ID
    FirstName
    LastName
    CollegeID
    Email
```
The ID column contains internal ID, or system ID for the student. This special ID isn't visible to a typical end-user: instead, it plays a role of a short key to the record. Since each student has official College ID, it is recognized as an external ID, or logical ID of the student. External IDs are typically strings. On the other hand, internal IDs are numbers. Numerical values are more efficient when the data is manipulated.
The internal ID also serves as a Primary Key in the Student table.
Primary key is the column or combination of columns that uniquely identifies the row.
One primary key uniquely identifies one record in the table. Multiple records with the same primary key are not permitted. More so, the database software will intentionally reject any attempt to insert a new record with already existing ID. To simplify management of internal IDs, most tables configure their primary key column to be an auto-generated number.
To highlight the presence of the primary key in the table, the column name is preceeded by an asterisk:
```
    Student
    ---------
   *ID
    FirstName
    LastName
    CollegeID
    Email
```

Business Rules

Database design consists of entities and relationships between them. (Recall that a conceptual entity was defined in the previous section as a row in a database table.)
Business Rules specify meaningful relationships between entities in the database. This can be illustrated by Entity Relationship Diagram, or ERD. If you google the term "ERD", the search engine will yield thousands of relevant websites, images, books, and videos:

The diagram suggests the following business rules:
- Customer Receipt can have multiple Payments.
- The Payment has date, total price, and is linked to the Customer Information record, which provides detailed info about the customer.
- The Payment has multiple Items.
- Each sales Item contains product name and quantity.
Receipt, Payment, Customer Information, and Item are entities. The lines connecting them on the diagram show entity relationships. Entity relationships can be converted into a set of business rules, written in plain English. However, the diagram is a convinient way of sharing the concepts of the design with different audiences of people.

Entity Relationships

Different books and design tools use different formats, symbols, and notations to describe database design.
Relations are the bonds between entities. They tell about certain rules that apply if you want to insert, update or delete occurrences (rows) from what later becomes tables. Relations also tell us much about the logic connections between entities. In the physical design of the database, relationships become constraints that govern how you are allowed to manipulate data in different tables. The relationships are there to enforce referential integrity. This refers to the internal logic of your system and ensures that your objects are referring to actual items in tables. To stay as close as possible to "pen and paper" diagramming, I choose a very basic diagramming tool: plain text. In my diagrams, capitalized words (such as Student) identify entities.
Primitive lines and arrows show relationships between entities. The relationships also have descriptions. For example, the diagram
```
    Student ---enrolls-----> Course
```
suggests that "Student enrolls in a course." Here, the Student and Course are examples of entities, and "enrolls" is the name of the relationship between them.
Alternatively,
```
    Course ------has------> Students
```
suggests the same thing: "Course has [enrolled] students." Either way, the Course and the Student are related.

Entity Relationships, cont.

There is a significant issue that makes the above relationships between students and courses more complex:
1. Any student can be enrolled in multiple courses
2. Any course can have multiple students.

This is very important! Because of such "multiplicity" we have to add a new entity, namely, the Section:

    Course ------has------> Section <-----enroll------ Students

or in plain English,

    "Course has sections"
    "Students enroll in sections"

Note that the arrows suggest the direction in which the relationships apply. Direction helps to translate the diagram into plain English sentences.
The database table storing course sections should have a column for the Instructor. Since course can have multiple sections, different sections of the same course can be taught by different instructors:
```
    Section ------has------> Instructor
```

Normalization

At this early database design stage, when we say "Students enroll in sections",
```
    Section <-----enroll------ Students
```
does it mean that the student record must keep a list of every section where the student is currently enrolled? Maybe. But how do we keep track of sections/students from previous semesters?
The answer is: we need another table to make connection between the academic year, the student, and the section in which the student is enrolled. Potential name for such new entity could be Enrollment. The diagram becomes
```
    Section <------has------ Enrollment ------has------> Students
```
Discovering the need for Enrollment table is a step towards database normalization.
Normalization procedures are design changes that follow guidelines based on widely-accepted database organization standards. Making your tables match these standards is called normalization. Typical normalization steps include:
- Splitting tables into two or more tables with fewer columns and well-defined primary key.
- Reducing number of data redundancies within your tables.
In previous example, the Payment has multiple sales Items. The payment table is a master table that holds the top level of information - date of the sale and the total price. Individual sale items are placed in a separate Item table. Each Item stores an ID of the Payment record to which the item belongs.
You can read more about database normalization here:
http://www.profsr.com/sql/sqless02.htm -- the database design lesson from "Database Design and SQL" series of lessons online.
Detailed aspects of database normalization procedures are beyond the scope of this course. However, you should consider learning more on this subject when you are involved in a software project using relational database.

Enrollment Table

There is no need to be afraid to create new entities at design phase, especially if they serve as a placeholder for a new idea.
Each record in the Enrollment table specifies academic year, term (Spring, Summer, or Fall), section ID, student ID of the student enrolled, midterm and final grades.
```
    Enrollment 
    -----------
   *ID
    AcademicYear
    Term: Spring/Summer/Fall
    SectionID 
    StudentID
    DateEnrolled
    MidtermGrade
    FinalGrade
```

College Database

Here is a list of potential candidates to become database tables when we build it in Access:

 Entity     Entity
 Name       Description
 ---------  ----------------------------------------------------------
College     General info about the college as a legal entity.
            It includes corporate tax ID, Name, Address,
            Name of the President, Main phone number, etc.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Student     Anybody who can enroll in a Course at the College

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Instructor  Anybody who can teach a course at the college.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Course      Standard description out of the college catalog:
            Every course has an short name and description, e.g.
            CIT122, Fluency II.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Section     Section of a course. Each course can have multiple
            Sections. Each section specifies the instructor, the
            term, such as Spring, Summer, or Fall, room, and the
            actual time slot in the schedule.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Schedule    A list of time slots that can be allocated to a course/section.
            The data values are:
            ----------------------------
            Name  Day StartTime EndTime
            ----- --- --------- --------
            M08AM M   08:00 AM  09:00 AM
            M09AM M   09:00 AM  10:00 AM
            ...
            T08AM T   08:00 AM  09:00 AM
            ...
            This is a fixed table: it serves as a "dictionary"
            of existing college-wide scheduling items.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Enrollment  Each record in this table has specifies semester academic
            year, term (Spring, Summer, Fall), section ID, ID of the
            student enrolled, midterm and final grades.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Entity Attributes

As we define different entities, we find that we are digging deeper and deeper into details.
It is also a good idea to keep track of entity attributes as we keep researching the business model.

Entity attributes become table columns when the actual database tables are created. For example,

    Enrollment 
    -----------
   *ID
    AcademicYear
    Term: Spring/Summer/Fall
    SectionID 
    StudentID
    DateEnrolled
    MidtermGrade
    FinalGrade


    Student
    ---------
   *ID
    FirstName
    LastName
    CollegeID
    Email


    Instructor
    ----------
   *ID
    CollegeID
    FirstName
    LastName
    Rank: I=Instructor, P=Professor C=Dept.Chair
    Type: F=Full Time, A=Adjunct
    DepartmentID


    Department
    ----------
   *ID
    Name
    ChairID: Department Chair's InstructorID
    ContactPhone
    ContactEmail

    Course
    -------
   *ID
    Name
    Description
    Type: Online/Lecture/Hybrid
    Term: Spring/Summer/Fall

    Section
    ---------
   *ID
    Name
    CourseID
    ScheduleID
    InstructorID
    Room

    Schedule
    ----------
   *ID
    Name
    Day:       M, T, W, R=Thursday, F, S, U=Sunday
    StartTime: possibly stored as text, e.g. "08:30 am"
    EndTime:   possibly stored as text

Please note that every table begins with its unique ID column. This is standard practice: each record in every table in the database should have unique identifier, automatically generated for us by the database software when the record is inserted. This original value is a read-only attribute: it cannot be changed, but it can be referenced by other tables to indicate logical relationship between the tables. Such cross-identifiers become pointers from one table to another. For example, each record in the Section table points to the record in the Course, Schedule, and Instructor tables via CourseID, ScheduleID, and InstructorID attributes, respectively.

Unique ID attributes are known as primary keys (I often use the terms "data field", "data attribute", and "data column" interchangeably.) Columns in other tables that match a primary key in the original table are defined as foreign keys. This arrangement reinforces database support for related tables. Furthermore, primary/foreign key relationships allow SQL SELECT statements to join the data in one proposed table to that in another (related) table.

Foreign Key

Foreign key is a column in a table, the values of which match those of the primary key in another table.
Designing proper primary key/foreign key consistency of the database becomes a foundation for referential integrity of the database.
In example with Payment and sales Items, the payment table is a master table that holds the top level of information - date of the sale and the total price. Individual sale items are placed in a separate Item table. Each Item stores an ID of the Payment record to which the item belongs:
```
        Payment             Item
        ----------          -----------
       *ID   <---------.   *ID
        Date           |    ProductName
        TotalPrice     |    Quantity
                       `----PaymentID
```
The PaymentID column in the Item table is the foreign key pointing to the Payment record to which the sales item belongs.

Example 1: Instructor's Weekly Schedule

Based on primary key/foreign key relationships between tables,

 Instructor        Section               Course
 ----------        ---------             -------
*ID <---------.   *ID             .---> *ID
 CollegeID    |    Name           |      Name
 FirstName    |    CourseID ------'      Description
 LastName     |    ScheduleID ----.      Type
 Rank         `--- InstructorID   |      Term
 Type              Room           |
 DepartmentID                     |
                                  |        Schedule
                                  |        ----------
                                  `-----> *ID
                                           Name
                                           Day
                                           StartTime
                                           EndTime

we can construct various algorithms to search and match data in the database. For example, the following steps can be taken to print instructor's weekly schedule:

                  ALGORITHMIC SOLUTION FOR INSTRUCTOR'S WEEKLY SCHEDULE
 -----------------------------------------------------------------------------------
 Step# Input data         Processing                    Output and decisions
 ----- -----------------  ----------------------------- ----------------------------
 1.    Instructor's       Traverse Instructor table     Instructor.ID, Instructor.Name
       College ID         until the record with         (If not found - DONE:
                          specified College ID          - nothing to print.)
                          is found.

 2.    Instructor.ID      Traverse Section table        Set of records from the
                          and locate every record       Section table.
                          for the given InstructorID.   (If no records found - DONE:
                                                        - nothing to print.)

 3.    Set of Section     Match records from the        Print:    
       IDs from records   Schedule table with            Instructor.Name
       found in step 2.   the subset of Section          Course.Name
                          records. Also match            Section.Name
                          records from the               Section.Room
                          Course table with              Schedule.Day
                          the subset of Section          Schedule.StartTime
                          records.                       Schedule.EndTime

 4.                       Done!

Please note that I am using combined "Table.Column" notation to indicate which data field is used. This is common SQL practice when referencing specific data in a table.

Example 2: Student's Weekly Schedule

A simplified algorithm to print current weekly schedule for a particular student could look like this:

              ALGORITHMIC SOLUTION FOR PRINTING STUDENT SCHEDULE
 ----------------------------------------------------------------------------------
 Step# Input data        Processing                    Output and decisions
 ----- ----------------- ----------------------------- ----------------------------
 1.    Student's         Search Student table and      Valid Student.ID (Primary Key)
       College ID        verify that the student's
                         college ID has a record.
                         Therefore, it is valid.


 2.    Student ID        Search Enrollment table for    A set of Student's
       (Student table    records that contain the       enrollment records
       Primary Key)      Student ID.


 3.    Set of enrollment Search Section table for the   Section info
       records           Section ID provided by the
                         enrollment record.
                         Retrieve section info.


 4.    Section record,   Retrieve and print student     Printout on screen or a printer
       Enrollment        name, course, section info,
       record            day of the week, starting and
                         ending times, and room number.

 5.                      Done!

Additional Info

http://www.profsr.com/sql/sqless01.htm