Course list http://www.c-jump.com/bcc/
In the previous exercise, you've created an empty database, reserving the space for the tables and queries. You may continue using an existing database file, or create a new one, e.g. "College.accdb"
Based on a proposed design for the "College" database, we should be able to create the tables, insert some data, and start testing ways to retrieve the information.
Many ideas expressed in this handout are influenced by one of my favorite books, "The Practical SQL Handbook: Using SQL Variants", Fourth Edition, by Judith S. Bowman, Sandra L. Emerson, Marcy Darnovsky. Addison Wesley, ISBN 0-201-70309-2
Tables are the basic building blocks of any database. They hold the rows and columns of your data. With SQL's data definition commands, you can create tables, drop tables, and alter them (add, remove, or rearrange columns or change their characteristics).
When you create a table with SQL, you do at least the following:
Name the table
Name the columns it contains
Specify the datatype of each column
Specify the NULL status of each column - whether that column permits or forbids NULL values
Recall the structure of the Student table from the Database Design handout:
Student
---------
*ID
FirstName
LastName
CollegeID
Email
An example of the CREATE TABLE SQL statement for this table might be
_____________________________________________________________________ ________________________________________________ CREATE_TABLE_Student CREATE TABLE Student ( ID COUNTER NOT NULL CONSTRAINT PK_StudentID PRIMARY KEY, LastName TEXT(50) NOT NULL, FirstName TEXT(50) NOT NULL, CollegeID TEXT(15) NOT NULL, Email TEXT(50) NULL )
NOTE: There are many SQL statements to deal with in this lesson. As I am typing the material, I am running each query on my own computer. I expect you to do the same. I also expect you to save every query that you execute. To streamline the process of saving each query, I am providing a suggested query name, such as CREATE_TABLE_Student on top of every SQL statement. The two lines with query name on the right side are not part of the actual SQL syntax - I am adding them only to pretty up the format of this handout.
Let's go back and examine the CREATE TABLE syntax.
The first line, known as CREATE TABLE clause, is straightforward: make sure your table name is made of alphabetical characters. The name may also include numbers, but not at the beginning of the table name. The underscores can be used to make your name more readable. For example,
Student123 -- okay, numbers can be present in the name Student_123 -- okay, underscores are fine 45table -- error: numbers cannot appear at the beginning of the name
MS Access also allows spaces in names of tables and columns. To make the SQL interpreter understand that the space is part of the name, it must be enclosed in a pair of square brackets like this:
[Customer Invoice]
Although spaces in database object names may appear attractive to a beginner, I must warn you that other SQL databases do not support such notation. If one day you decided to move your Access database to an Oracle server, the spaces will become a huge roadblock. This is unnecessary, plus you still must use the square brackets, which even further complicate the syntax of your SQL. My choice is to never include spaces in names (including names of files and folders on disc.) If you feel like adding spaces, use underscore characters instead:
Customer_Invoice
It's much cleaner and portable.
The second line of the CREATE TABLE syntax begins with an opening parenthesis, which you must be sure to type.
Then we give the name of the first column, named ID, followed by a space and the name of its datatype, COUNTER. The COUNTER datatype specifies that numeric values should be automatically generated for this column when the rows of data are inserted into the table.
The remaining columns, LastName, FirstName, CollegeID, and Email, are all using the TEXT datatype, which is a character string of a variable length. The TEXT datatype permits to specify additional information. In our case, TEXT(50) specifies a maximum length of 50 characters. This is usually done by putting additional value in parentheses immediately following the datatype.
NULL or NOT NULL specify whether the data in the column may be legally left unspecified. In other words,
if NOT NULL is present, the data in the column is required,
otherwise, the data is optional.
For example, CollegeID is a required field. The Email is optional.
Microsoft Access assumes NULL in the absence of NOT NULL, but I use it explicitly in every column of the CREATE TABLE statement, because other databases prefer it this way.
Note that individual column definitions are separated by commas.
The "CONSTRAINT PK_StudentID PRIMARY KEY" construct tells the database to recognize the ID column as primary key. This means that ID uniquely identifies each row in the table. Given the fact that values in ID column are auto-generated numbers, we don't have to be concerned about providing the ID values when inserting data rows into the table.
The last line of the SQL syntax ends with a closing parenthesis, which is mandatory.
So far, the most complex part of this SQL is the part that defines the ID column. But guess what? It will be very similar in every table. So there should be no need to worry too much about technical details of this column; we can just copy and paste most of its definition from one CREATE TABLE statement to another, knowing that it adds unique primary key to each table in our database.
The datatype of a column specifies what kind of information the column will hold:
characters, numbers, dates, and so on.
Choosing the proper datatypes for your columns is very important!
The database will know how the data is to be physically stored. It will also know how each datatype can be manipulated.
For example:
A character datatype holds letters, numbers, and special characters.
An integer datatype holds whole numbers only. Data in integer columns can be manipulated with arithmetic operators, while character data cannot.
Standard SQL provides many datatype conversion functions. Those allow data stored as one type to be treated as another type. For example, a StudentID, stored as TEXT, if it contains only digits, such as "900012345", can be converted to an actual number, 900012345. Another example is when you have a date column and want to extract date, month, or year parts. (More on this later.)
Relational systems can handle a wide variety of datatypes. Microsoft Access supports the following:
_____________________________________________________________________ ________________________________________ CREATE_TABLE_DatatypeExample CREATE TABLE DatatypeExample ( yes_no BIT, amount MONEY, birthday DATETIME, fraction FLOAT, number_32k SMALLINT, big_number INTEGER, text_data TEXT, any_data BINARY, picture IMAGE )
The descriptions of the Microsoft Access datatypes are:
BIT -- either zero or minus one: 0 or -1
MONEY -- a datatype to store currency: 100.55
DATETIME -- a column to store date or time, or both.
FLOAT -- a double-precision floating-point value, such as 12345.33333 holds fractional numbers with using decimal point.
SMALLINT -- a short integer between (–32,768) and (+32,767)
INTEGER -- a long integer between (–2,147,483,648) and (+2,147,483,647)
TEXT -- a variable-length text column of up to 255 unicode characters: ABC def
BINARY -- a variable-length string of binary bytes of data, up to 510 bytes. The binary strings are ideal for storing unreadable strings, like encrypted user passwords.
IMAGE -- if you create this field, you can right-click and "Insert Object...", and insert an image file, like a JPG, or a video file, or some other media-type file, into this column. The IMAGE field can hold maximum of 2.14 gigabytes of data.
Understanding characteristics of MS Access datatypes allows us to effectively define physical storage of database tables:
Character datatypes, like TEXT and BINARY, hold letters, numbers, and special characters. TEXT is always the best choice for zip codes and phone numbers! Zip codes are best stored in character columns because they often need to be sorted. Zip codes that begin with zero may not sort properly if they are stored as numbers.
Whole-number datatypes, like BIT, SMALLINT, and INTEGER hold integers only - no fractions or decimals.
Decimal datatypes, like FLOAT, hold numbers with fractions
The MONEY datatype is optimized to hold currency values
The DATETIME stores date, time, and combinations of date and time. Special date functions for determining the interval between two dates and for adding or subtracting a given amount of time to a date are provided.
Finally, the COUNTER datatype maintains a sequentially increasing number. It is also known as a Serial datatype in other relational databases besides Microsoft Access.
NULL represents some unknown, unavailable, or inapplicable value -- it's not the same as blank or zero.
Like most relational databases, MS Access allows us to assign NULL or NOT NULL to each column when the table is created.
Assigning NULL to a column instructs the system to insert NULL in that column if the user does not enter a value.
Assigning NOT NULL to a column means that NULL entries in that column are not allowed. Therefore, the user always must provide a value. A record in the table is considered unacceptable without that value, like a driver license without the person's date of birth.
Primary key columns should never allow NULLs, because the primary keys uniquely identify the row in the table. Unlike primary keys, foreign keys sometimes can have null values. (More about this later.)
NULL can be used with the list of values of the INSERT statement. For example,
_____________________________________________________________________ _________________________________________ INSERT_Student_ROBERT_SMITH INSERT INTO Student ( LastName, FirstName, CollegeID, Email ) VALUES ( 'Smith', 'Robert', '9001234', NULL )
Note that when our values are strings of characters, they must be decorated by single quotes, e.g. 'Smith'
The attributes of the Schedule table
Schedule
----------
*ID
Name
Day
StartTime
EndTime
can be expressed in Access SQL as
_____________________________________________________________________ _______________________________________________ CREATE_TABLE_Schedule CREATE TABLE Schedule ( ID COUNTER NOT NULL CONSTRAINT PK_ScheduleID PRIMARY KEY, Name TEXT(50) NOT NULL, Day TEXT(1) NOT NULL, StartTime DATETIME NOT NULL, EndTime DATETIME NOT NULL )
I want you to note that the name of the primary key constraint, such as PK_ScheduleID, must be unique in every table that you create. It is easy to copy and paste from one CREATE TABLE to another, but primary key part must be changed manually:
CONSTRAINT PK_ScheduleID PRIMARY KEY
The schedule records are for various time periods as follows:
_____________________________________________________________________ ______________________________________ INSERT_Schedule_M_0830_0945_AM INSERT INTO Schedule ( Name, Day, StartTime, EndTime ) VALUES ( 'M 08:30-09:45 AM', 'M', '8:30 AM', '9:45 AM' ) _____________________________________________________________________ ______________________________________ INSERT_Schedule_M_1000_1115_AM INSERT INTO Schedule ( Name, Day, StartTime, EndTime ) VALUES ( 'M 10:00-11:15 AM', 'M', '10:00 AM', '11:15 AM' )
The process of entering INSERT SQL statements for every period in the schedule is very tedious and boring. It's okay to open the Schedule table in Access and enter the data rows directly. I recommend that you create the Schedule table and type in a few rows for at least three weekdays, such as Monday, Tuesday, and Wednesday. Be sure to click the save button often to prevent any data loss.
The advantage of using INSERT SQL queries is that they can be re-run multiple times. For example, at this early stage, we may decide to redo the Schedule from scratch a few times. It should be possible to:
Delete all data rows from the Schedule:
_________________________________________________________________ ____________________________________________ DELETE_FROM_Schedule DELETE FROM Schedule
Change period VALUES in the INSERT statements as necessary
Re-run the INSERT statements.
The Schedule is one of those delicate parts of the database that must be properly setup before we can start filling in the rows of the Section table.
The following SQL SELECT statement demonstrates usage of the DATEDIFF function. The query prints every period in the schedule and computes each period's duration:
_____________________________________________________________________ _____________________________________________ SELECT_Schedule_Periods SELECT Day, StartTime, EndTime, DATEDIFF( 'h', StartTime, EndTime ) AS [HH] , DATEDIFF( 'n', StartTime, EndTime ) Mod 60 AS [MM] FROM Schedule
The result may look like this:
While records are retrieved from the table, the difference between two DATETIME columns, StartTime and EndTime, is computed in form of HH:MM. It is computed twice, first - to show the hours, then the minutes.
Computing the hours is pretty straight forward:
DATEDIFF( 'h', StartTime, EndTime )
First argument, 'h', specifies the type of difference in hours.
Displaying the minutes is a bit more complicated: since hours are already shown, we only want to display the remaining minutes:
DATEDIFF( 'n', StartTime, EndTime ) Mod 60
Because DATEDIFF function returns an integer result, the Mod operator is applied to extract the remainder (or "left over") after the division of all minutes by 60, which is the number of minutes in one hour. First argument, 'n', instructs DATEDIFF to return the difference in minutes.
There are many more operators and functions in Access SQL, which allow you to do computations with the data already stored in the database. We will discuss some of them in the future. Sometimes it is hard to find an online resource that describes operators and functions available in Access. For the impatient, here's a couple of good references:
Various MS Access Topics:
http://www.techonthenet.com/access/index.php
A tutorial from Microsoft:
http://office.microsoft.com/en-us/access-help/guide-to-expression-syntax-HA010131581.aspx
You already know that a table is created by issuing a SQL CREATE TABLE statement.
Using Foreign Key column introduces a constraint to enforce referential integrity. A constraint is a form of dependency between two tables. For example, when a new Instructor record is inserted, her DepartmentID attribute must reference the ID of a valid Department. It also means that a valid Department record must exist in the database before an instructor can be added (or assigned) to that department.
Referential integrity affects not only the order in which the data is added to the database, but also the order in which the tables are created.
Considering the College Database Design , the following relationships exist between tables, listed in the order of their dependencies:
|
|
|
|
The order of relationships reveals dependency of foreign key columns on the existence of columns defined as the original primary keys. Our design model uncovers a pattern that shows the order that must be followed when creating database tables. For example, the Enrollment table cannot be created before both Section and Student are created. Likewise, the Section table cannot be created before Instructor, Schedule, and Course. The Instructor table, in its own turn, depends on the existence of the Department.
I must say that a restriction placed on the order of table creation is somewhat an inconvenience. However, our plan to create the tables is still on track, because now we have even better understanding of the relational design imposed by the College business model.
In Department and Instructor tables, the Department.ID is the primary key, the Instructor.DepartmentID is the foreign key column. Before a foreign key constraint can be configured, the table with the primary key must be created first:
_____________________________________________________________________ _____________________________________________ CREATE_TABLE_Department CREATE TABLE Department ( ID COUNTER NOT NULL CONSTRAINT PK_DepartmentID PRIMARY KEY, Name TEXT(50) NOT NULL, ChairID INTEGER NULL, ContactPhone TEXT(15) NOT NULL, ContactEmail TEXT(50) NULL ) _____________________________________________________________________ _____________________________________________ CREATE_TABLE_Instructor CREATE TABLE Instructor ( ID COUNTER NOT NULL CONSTRAINT PK_InstructorID PRIMARY KEY, CollegeID TEXT(15) NOT NULL, LastName TEXT(50) NOT NULL, FirstName TEXT(50) NOT NULL, Rank TEXT(1) NOT NULL, Type TEXT(1) NOT NULL, DepartmentID INTEGER NOT NULL CONSTRAINT FK_DepartmentID REFERENCES Department (ID) )
The DepartmentID column in the Instructor table is configured as a numeric field (its datatype is INTEGER.) In addition, the SQL statement uses keywords CONSTRAINT and REFERENCES to make it a foreign key, pointing to the ID column in the Department table. (It's a lot of tech-talk, but in plain English, it means the department record must be inserted prior to any instructors assigned to that department.)
There are two methods of setting up new database tables:
(a) create all tables in any order without foreign key constraints and then use ALTER TABLE statements to add the foreign keys later.
(b) create tables with foreign keys using CREATE TABLE statements. In such case, the tables must be created in particular order, providing that a table with primary key must be created before any table with a corresponding foreign key reference.
These two methods are not mutually exclusive. For instance, SQL statement
_____________________________________________________________________ ______________________________________________ ALTER_TABLE_Department ALTER TABLE Department ADD CONSTRAINT FK_ChairID FOREIGN KEY (ChairID) REFERENCES Instructor (ID)
adds a new foreign key constraint to the Department.ChairID column. This foreign key requires that the value stored in Department.ChairID field must have a corresponding record in the Instructor table, identified by its Instructor.ID. In plain English, the Department Chairperson is also an instructor.
In case if you'd like to know what is the syntax of SQL Data Definition Language to remove database objects, here it is:
_____________________________________________________________________ __________________________________________ DROP_CONSTRAINT_FK_ChairID ALTER TABLE Department DROP CONSTRAINT FK_ChairID _____________________________________________________________________ _______________________________________________ DROP_TABLE_Instructor DROP TABLE Instructor _____________________________________________________________________ _______________________________________________ DROP_TABLE_Department DROP TABLE Department
The foreign key constraint FK_ChairID must be deleted first, otherwise, Access won't allow you to drop either of these tables.
The Instructor table should be deleted next, along with its foreign key, which references the Department.ID.
Finally, the Department table can be dropped.
Referential integrity rules of the relational database protect itself against deletion of tables, unless the DROP TABLE statements are applied in such order that eliminates constraints starting at the level where no dependency conflicts exist.
(You don't need to run these queries, but if you try them, make sure to go back and re-create the tables again to continue with this lesson.)
I am using SQL INSERT INTO statement to add the 'CIS' department:
_____________________________________________________________________ _______________________________________________ INSERT_Department_CIS INSERT INTO Department ( Name, ChairID, ContactPhone, ContactEmail ) VALUES ( 'CIS', NULL, '508-678-2811', 'cis@bristolcc.edu' )
Note that the department chair ID is NULL, because it must point to an instructor. However, at this point the Instructor table is still empty. To insert the record for CIS Department Chairperson into the Instructor table, I will use the INSERT syntax, which is based on SELECT query:
_____________________________________________________________________ ____________________________________________ INSERT_Instructor_Grocer INSERT INTO Instructor ( CollegeID, FirstName, LastName, Rank, Type, DepartmentID ) SELECT '99992222' AS CollegeID, 'Priscilla' AS FirstName, 'Grocer' AS LastName, 'C' AS Rank, 'F' AS Type, Department.ID AS DepartmentID FROM Department WHERE Department.Name = 'CIS'
The underlying SELECT statement creates a list of hard-coded data values that match the columns of the Instructor table. The last item on this list,
Department.ID AS DepartmentID
is not hard-coded: it asks the database to find the ID of the 'CIS' department that we know is already in the database.
Two records already exist in the database:
CIS record in the Department table, and
the CIS Chairperson record in the Instructor table.
However, the Department.ChairID has to be changed from NULL to the Instructor.ID that was automatically generated for Prof. Grocer:
_____________________________________________________________________ _______________________________________________ UPDATE_Department_CIS UPDATE Department INNER JOIN Instructor ON Department.ID = Instructor.DepartmentID SET Department.ChairID = Instructor.ID WHERE Instructor.LastName = 'Grocer'
This UPDATE SQL statement demonstrates an update with the value from a field in another table. The most critical part of this statement is the INNER JOIN. It tells the database to use the Instructor table as the look up table. Such lookup is formulated via the relationship
Department.ID = Instructor.DepartmentID
and the selection condition
Instructor.LastName = 'Grocer'
The JOIN operations will be covered in the upcoming lesson demonstrating formats of the SELECT SQL statement.
See also: Update Query Examples,
http://www.fmsinc.com/microsoftaccess/query/snytax/update-query.html
The Course table is next:
Course
-------
*ID
Name
Description
Type: Online/Lecture/Hybrid
Term: Spring/Summer/Fall
which in Access SQL becomes
_____________________________________________________________________ _________________________________________________ CREATE_TABLE_Course CREATE TABLE Course ( ID COUNTER NOT NULL CONSTRAINT PK_CourseID PRIMARY KEY, Name TEXT(50) NOT NULL, Description TEXT NULL, Type TEXT(25) NULL, Term TEXT(25) NULL )
A sample data row is
_____________________________________________________________________ ________________________________________________ INSERT_Course_CIT122 INSERT INTO Course ( Name, Description, Type, Term ) VALUES ( 'CIT122', 'Information Technology Fluency II', 'Online', 'Spring' )
The Section table is next:
Section
---------
*ID
Name
CourseID
ScheduleID
InstructorID
Room
which in Access SQL becomes
_____________________________________________________________________ ________________________________________________ CREATE_TABLE_Section CREATE TABLE Section ( ID COUNTER NOT NULL CONSTRAINT PK_SectionID PRIMARY KEY, Name TEXT(25) NOT NULL, CourseID INTEGER NOT NULL CONSTRAINT FK_CourseID REFERENCES Course (ID), ScheduleID INTEGER NULL CONSTRAINT FK_ScheduleID REFERENCES Schedule (ID), InstructorID INTEGER NULL CONSTRAINT FK_InstructorID REFERENCES Instructor (ID), Room TEXT(25) NULL )
A sample data row is
_____________________________________________________________________ ___________________________________________ INSERT_Section_CIT122_A2W INSERT INTO Section ( Name, CourseID, ScheduleID, InstructorID, Room ) SELECT 'A2W' AS Name, Course.ID AS CourseID, NULL AS ScheduleID, NULL AS InstructorID, 'Online' AS Room FROM Course WHERE Course.Name = 'CIT122'
Again -- an example of the INSERT combined with SELECT that creates a list of mostly hard-coded data values, except
Course.ID AS CourseID
which instructs the database to locate the ID of the 'CIT122' course already on record.
At this point, the section has no particular schedule period and an unknown instructor. That's okay: records are often inserted with the most critical data, and later updated with the rest of the info.
Finally, the last table on our list,
Enrollment
---------
*ID
AcademicYear
Term: Spring/Summer/Fall
StudentID
DateEnrolled
SectionID
MidtermGrade
FinalGrade
which in Access SQL becomes
_____________________________________________________________________ _____________________________________________ CREATE_TABLE_Enrollment CREATE TABLE Enrollment ( ID COUNTER NOT NULL CONSTRAINT PK_EnrollmentID PRIMARY KEY, AcademicYear INTEGER NOT NULL, Term TEXT(25) NOT NULL, DateEnrolled DATETIME NOT NULL, StudentID INTEGER NOT NULL CONSTRAINT FK_StudentID REFERENCES Student (ID), SectionID INTEGER NULL CONSTRAINT FK_SectionID REFERENCES Section (ID), MidtermGrade TEXT(5) NULL, FinalGrade TEXT(5) NULL )
A sample data row is
_____________________________________________________________________ ______________________________________________ INSERT_Enrollment_9001234 INSERT INTO Enrollment ( AcademicYear, Term, DateEnrolled, StudentID ) SELECT 2012 AS AcademicYear, 'Spring' AS Term, DATE() AS DateEnrolled, Student.ID AS StudentID FROM Student WHERE Student.CollegeID = '9001234'
The values in SectionID, MidtermGrade, and FinalGrade fields are left undefined (initially set as NULL.) These fields can be updated manually using the datasheet view of the Enrollment table.
The DATE() is Access SQL function that returns current date and time, which is then stored in Enrollment.DateEnrolled column.
Both MS Access and Excel can store large amounts of data.
Excel performs instant formula-based calculations and return visual results on the data that is stored as text.
Excel is very convenient for a single PC user. An Excel worksheet is called flat data or nonrelational data. A single worksheet may contain the data that in Access would be split into multiple tables.
When complexity of the data model grows, for example, when a simple list of customers suddenly requires multiple billing and shipping addresses, multiple contact persons for each customer, and other pieces that no longer fit into a "flat list" model, Access (or another relational database platform) becomes the better solution.
Access, like all relational databases, focuses on data organization, which includes strong requirements on data types and validation of matches between records in multiple tables.
The goal of relational database is to maintain data integrity and store data in best format that can be accessed concurrently by multiple users from different computers.
The SQL is a "database programming language" which supports many features that are similar to Excel formulas.
SQL lacks design for immediate Excel-like data analysis. SQL also lacks support for graphical charts. Many small businesses store and collect data using Access, and analyze the extracted data snapshots in Excel, gaining the benefits of both programs.
Relational database is a type of persistent storage that maintains information in tables.
It uses matching values from two tables to relate data in one table to the data in the other table.
An important indicator of a good relational database design is when each specific type of data is stored just once, as a record in a single table.
An AutoNumber datatype automatically generates unique identifiers for each record in the database. These identifiers become primary keys and may be used to relate a record in one table to one or more records in another table.
The relations between tables are formally enforced by the relational database software, preventing users from depositing records that may be incomplete or inconsistent with the business model used as a source of rules for formulating foreign key/primary key relationships between the tables.
For example, you would not want to have a sales order that did not include the information about the customer. In fact, such order would be invalid from the prospective of the business rules.
The idea of matching values between records from different tables is called referential integrity.
By contrast, Excel does not support relational data, therefore, it cannot provide referential integrity.
Structured Query Language (SQL) is a database query and programming language widely used for accessing, querying, updating, and managing data in relational database systems.
This lesson focuses primarily on the data definition aspect of SQL, which is creating tables and establishing relationships between them.
Next time we will begin to extract the data from the database, shifting our attention to the data manipulation. SQL allows expressions in queries to create calculated fields. Using an expression in SQL is similar to using formulas in Excel to calculate values. SQL queries also can summarize data and present the results as aggregate values, such as sums, averages, and counts.