29,99 €
This book provides a comprehensive guide to resolving database security issues during design, implementation, and production phases. It emphasizes specific measures and controls unique to database security, beyond general information security. Topics include account credential management, data access management, and techniques like database normalization, referential integrity, transactions, locks, and check constraints.
The importance of database security lies in protecting sensitive data from unauthorized access and ensuring data integrity. This book is designed for professionals, workshops, and self-learners, offering hands-on demonstrations with major Database Management Systems (MySQL, Oracle, and Microsoft SQL Server) across various computing platforms (Linux/UNIX, MacOS, Windows).
Starting with an introduction to information, data, and database security, the book covers database design, management, administration, user accounts, privileges, roles, and security controls for confidentiality. It also delves into transactions and data integrity with concurrent access. Each chapter includes questions and projects to reinforce learning and comprehension.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 343
Veröffentlichungsjahr: 2024
LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY
By purchasing or using this book and its companion files (the “Work”), you agree that this license grants permission to use the contents contained herein, but does not give you the right of ownership to any of the textual content in the book or ownership to any of the information, files, or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work.
MERCURY LEARNING AND INFORMATION (“MLI” or “the Publisher”) and anyone involved in the creation, writing, production, accompanying algorithms, code, or computer programs (“the software”), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to ensure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold “as is” without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship).
The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work. The data used throughout this text, including names of persons and companies are for instructional purposes only. They have been researched with care but are not guaranteed for any intent beyond their educational purpose.
The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and only at the discretion of the Publisher. The use of “implied warranty” and certain “exclusions” vary from state to state, and might not apply to the purchaser of this product.
Companion files are available for download from the publisher by writing to [email protected].
Problems and Solutions
Christopher Diaz, Ph.D.
MERCURY LEARNING AND INFORMATION
Dulles, Virginia
Boston, Massachusetts
New Delhi
Copyright ©2022 by MERCURY LEARNING AND INFORMATION LLC. All rights reserved.
This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.
Publisher: David Pallai
MERCURY LEARNING AND INFORMATION
22841 Quicksilver Drive
Dulles, VA 20166
www.merclearning.com
1-800-232-0223
C. Diaz. Database Security.
ISBN: 978-1-68392-663-4
The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.
Library of Congress Control Number: 2022940435
222324321 Printed on acid-free paper in the United States of America.
Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).
All of our titles are available in digital format at academiccourseware.com and other digital vendors. The sole obligation of MERCURY LEARNING AND INFORMATION to the purchaser is to replace the book, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.
I thank my wife Sindy, who assisted with developing case studies,as well as my family for inspiration and support.
Dedication
Preface
CHAPTER 1: Introduction to Information Security, Data Security, and Database Security
1.1 Information Security
Confidentiality
Integrity
Availability
1.2 Security Threats, Controls, and Requirements
Security threats
Security controls
Security requirements
1.3 Data Security
1.4 Database Security
Data confidentiality
Data integrity
Data Availability
1.5 Summary
CHAPTER 2: Database Design
2.1 Normalization
2.2 Surrogate Keys and Data Integrity
2.3 Normalization, Access Restrictions, and Beyond
2.4 Summary
CHAPTER 3: Database Management and Administration
3.1 Backup and Recovery
Backup and restore of a specific database
Backup and restore of multiple specific databases
Backup and restore of specific tables
Backup of users, privileges, and other components
Deciding what to backup
3.2 User Account Security Configurations
Password expiration
Disabling/enabling user accounts
3.3 Summary
CHAPTER 4: Database User Accounts
4.1. Creating and Removing Database User Accounts
4.2. Listing User Accounts
4.3 Host-Restricted Accounts
4.4 Summary
CHAPTER 5: Database Privileges
5.1 Overview of Privileges and Database-Level Privileges
5.2 Capability to Manage Privileges
5.3 Listing Privileges
5.4 Removing Privileges
5.5 Working with TLS and Table-Level Privileges
5.6 TLS and Normalization Revisited
5.7 Column Level Security (CLS)
5.8 CLS and Evolving Data Access Requirements and Data
The capability for CEO and CFO to read salary data
The capability for employees to see address data
The capability for executives to keep private notes in the budget table
5.9 Row Level Security
5.10 Summary
CHAPTER 6: Roles
6.1 Defining Role Members and Data Access Requirements
6.2 Creating a Database Role, Showing Role Privileges, and Removing a Role
6.3 Assigning Privileges to Roles
6.4 Database Users and Role
Adding and removing a database user to a role
Listing, setting, and testing a user’s role
The default role
Listing privileges and roles revisited
6.5 Roles and Evolution
A new employee is hired
An employee adds a role or moves to another role
An employee leaves a role or the organization
6.6 Summary
CHAPTER 7: Database Security Controls for Confidentiality
7.1 Views
Concept of a view
Creating a view
Showing a list of views and a view definition
Accessing the data of a view
Security considerations of a view
Deleting and redefining views
Views and multiple data access requirements
7.2 Encryption, Decryption, and Hashing
Encryption
Decryption
Hashing
Salting
7.3 Stored Routines
Stored functions
Stored procedures
Revisiting the password authentication implementation
7.4 Summary
CHAPTER 8: Transactions for Data Integrity
8.1 Commits, Rollbacks, and Automatic Commits
8.2 Beginning a Transaction with COMMIT or ROLLBACK
8.3 Beginning a Transaction with START TRANSACTION
8.4 Condition Issued COMMIT or ROLLBACK
8.5 Exception Issued ROLLBACK
8.6 A Larger Demonstration of Transactions
8.7 Summary
CHAPTER 9: Data Integrity with Concurrent Access
9.1 Concurrent Access and Backups
9.2 Concurrent Access with DML Statements
Table-level locking
Row-level locking
UPDATE locks
SHARE locks
9.3 Deadlock
9.4 Summary
Appendix
Index
After authoring my first textbook, An Introduction to UNIX/Linux, I later wanted to author another textbook. I considered a follow-up to that same topic, but realized a different topic had a larger void and greater need: Database Security. The idea of authoring a textbook in the field of database security arose with the rising trends of data science the past few years, the plethora of digital information that is created and used each day, and the ongoing needs for information security. While there are a few good database security textbooks that exist, many of those were written 10 to 20 years ago, so I believed it was time for a fresh look at this important topic.
Database security and information security may sound like similar concepts, but they are different in perspective and coverage. Database security does involve the information security principles of confidentiality, integrity, and availability. However, database security considers and implements those security measures or controls in more specific ways than are generally realized in the broader realm of information security. For example, to uphold the principle of confidentiality, we often turn to the information security control of encryption. But database security also involves other confidentiality approaches, such as techniques for account credential management, techniques to manage access to data, as well as techniques to manage the types of access. These are among the topics that we cover in Chapters 1, 3, 4, 5, 6, and 7.
To uphold the principle of integrity, we often consider the information security controls of hashing or digital signatures. With database security, in addition to those techniques we must also consider other, less realized, approaches such as database normalization, referential integrity, transactions, locks, and check constraints, all of which are some of the topics we cover in Chapters 1, 2, 7, 8, and 9.
Last but not least, to uphold the principle of availability we likewise discuss a variety of approaches in Chapters 1, 3, and 9.
The audiences for this textbook include professionals and self-learners, as well as classroom or workshop settings. The concepts presented in the text are demonstrated against databases that are provided, so that one can follow along in a hands-on approach and better learn these concepts. Each chapter also has a set of questions and follow up projects that one can use to reinforce their understanding of the material.
This textbook is not meant to be a complete reference of database security concepts and techniques, but rather focuses on the more typical ones. In addition, in this text we focus on DBMS considerations, and not database application considerations (such as SQL injection), which itself can involve enough content for its own textbook. With this background, the reader can expand on these concepts as necessary with various print and online resources.
Security is a vital need in many facets of everyday life. Whether we think of security for a room or security for digital information, the goal is similar: protect something important from unauthorized access or tampering! In terms of protecting a physical area such as a room, we consider physical security with doors, locks, and other mechanisms to prevent unauthorized physical access. In terms of protecting digital information, we can also employ physical security to prevent unauthorized physical access to devices that contain the information. However, for digital information we also must consider other forms of security, because physical security alone does not provide complete protection when digital information is accessible through an application, through a system, or over a network.
When it comes to forms of security involved with storing and managing digital information, we often think or hear of the terms information security, data security, and database security. These terms may seem equivalent and interchangeable when it comes to protecting digital information. However, while each term has security goals in mind, the actual goals of each vary in both scope as well as how to achieve those goals. In this first chapter, we introduce these terms and explain how they provide the basis for security concepts presented in subsequent chapters.
Information security refers to protecting data in general and in any form. Technically, the data may be digital or nondigital, although this text focuses on digital data. The data may be sitting in storage, in the act of being processed, and/or being communicated between parties. Information security is considered a broad definition of protection, encompassing every state or form in which the information exists. This broad definition also leads us to three goals, or principles, by which information security has been established: confidentiality, integrity, and availability. These principles are often referred to as the CIA (for confidentiality, integrity, and availability) Triad for Information Security.
Confidentiality refers to the protection of information against unauthorized access. This principle keeps sensitive information confidential and therefore accessible only to an authorized agent (such as a person, application program, or system service). An example of confidentiality in practice is allowing only an authorized user to access certain information, such as their own account or an information that is delivered to them over a network. Such a restriction prevents access or disclosure of the information to any unauthorized party.
The information security principle of integrity refers to the protection of information against unauthorized modification or deletion. A goal of integrity is to maintain information in a manner that is expected and accurate. As an example of this integrity goal, consider a financial document where a value in that document is accidentally—or intentionally—modified in an unauthorized manner. Such a modification could portray a higher or lower value than in reality, and that could lead to inaccurate, false, or invalid information being presented.
Another goal of integrity is to keep information consistent with other information, so that if one piece of information is changed (whether in an authorized or unauthorized manner), that change does not conflict—or become inconsistent—with other information. As an example of such inconsistency, again consider an unauthorized modification of a value in a financial document, and that value is also referenced to derive a second value, say a sum or average. But if that second value remains unchanged, the correlations between the modified value and the derived value become inconsistent with each other. In other words, to maintain integrity with consistency, both values must be updated so that the first modified value is reflected in the second value such as a sum or average. Even though these examples involve unauthorized modification of information, the same idea holds for unauthorized addition or removal of information, where that information derives other information such as a sum, average or even a count.
The third information security principle of availability provides timely accessibility of information to agents authorized to access that information. The types of information can be broad, such as information stored within a file or information provided by an application or service, such as a web server or database server. The goal of availability is to prevent situations where an authorized agent is unable to access information that should be accessible to them. As an example of availability in practice, consider a database server that stores financial information about bank accounts and is accessed by users that work at the bank or are customers of the bank to obtain such information. Such a user should be able to access the information of an account to which they are authorized and achieve this access in a timely manner. If the information is retrieved to the user within the expected time, the principle of availability is met. The problem lies when that information is not retrieved or made available within the specified time, presenting a delay (or maybe even no response at all) to the authorized user.
The factors that can affect availability are numerous and broad, but can be categorized into the following problems and solutions:
Hardware failure
, which usually involves a faulty storage device that prevents access to information stored on that device.
System outages
, caused by power failure, environmental damage, and even catastrophic events.
Software bugs or faulty software
that does not operate as intended.
User attacks
that overwhelm the system with busy or non-legitimate work, thereby preventing the system from being able to process legitimate work or the work that the system really should handle. Such an attack is commonly called a
Denial of Service (DOS) attack
when the attack is carried out by one source, and a
Distributed Denial of Service (DDOS) attack
when the attack is carried out by multiple sources.
Now that we have a broad understanding of information security, before we move into more detailed approaches involving data security and database security, let’s describe the concepts of security threats, security measures, and security requirements. These concepts are applicable to information security as well as data security and database security.
A security threat is a malicious user, program, or service that attempts to compromise confidentiality, integrity, and/or availability. Typically, we view a security threat as occurring by an agent that is not associated with (or even not known by) the organizational environment. Such security threats are often referred to as external threats. As an example of an external threat, consider a person who is not affiliated with an organization and attempts to access sensitive information within the organization over a network. Certainly that user is not authorized to access that sensitive information. The sensitive information may be stored on one of the organization’s servers, contained in one of the organization’s transmitted emails, or other numerous possibilities. The principle of confidentiality helps prevent the unauthorized access of the sensitive information through one or more mechanisms (or controls, which we describe later), thereby keeping such information confidential against unauthorized parties.
In addition to external threats, we must also consider internal threats. An internal threat involves an agent that is affiliated with or recognized by the organization. Such threats can pose a greater security challenge, because the person has a familiarity with the environment, or the agent may already have authorized access to certain information or resources. As an example of an internal threat, consider a person that is part of the organizational environment and is allowed to access financial information but is not allowed to access human resource information. The person attempts to access human resource information about someone else. In this scenario, the principle of confidentiality aims to keep that human resource information confidential from that person. The concept of allowing or disallowing access to certain information not only applies to people or users, but to any agent within or outside of an organization, such as a running program or service, like a web server, file server, or application communicating on a network.
Each security principle can be enforced by an implementation of one or more security mechanisms, or controls. The exact mechanism(s) or control(s) in a given situation can vary by many factors, such as the type of environment, operational needs, and organizational policies. As an example, a common control for confidentiality involves encryption, where an agent must have the required key (typically a password) to access the information. Without that key the information remains encrypted and thus confidential. However, in an environment where users are mobile and/or unable to practically provide a password, a control for confidentiality may instead involve a physical component that provides the required key—such as a card, fob, or wearable device.
A common control for integrity involves checksums or hashes on data to detect modifications to the data. As with confidentiality, other controls for integrity may be required depending on the situation.
Controls for availability can also vary depending on the situation. With regard to storage device failure, a variety of controls may be implemented. One approach may rely on regular data backups, so that data on a failed device can be restored onto a replacement device. For a faster and even automated approach, we may consider the use of redundant array of independent devices (RAID), where multiple storage devices are configured and used in ways to prevent the failure of a device to impact the accessibility to the stored data. Many RAID configurations exist, but the main idea is to use redundancy to store data one more than one device, so the failure of one of those devices still leaves the data accessible on a functional device. In terms of system outages, controls may range from uninterruptible power supplies (UPSs), backup power generators, antistatic measures, to even redundant or backup systems. Faulty programs may involve controls such as proper design, implementation, testing, and code reviews. Last but not least, threats by DOS and DDOS attacks may be mitigated by controls involving anti-malware programs, intrusion detection systems, or network devices such as firewalls.
Note that requirements as well as controls for confidentiality, integrity, and availability can vary greatly across types of applications or systems. We have listed some general security threats and solutions, and depending on the purpose or environment of the application or system, we may have to consider any of the threats and solutions we described, or other ones altogether. Later we will discuss more such threats and solutions in the context of a database environment.
Requirements of confidentiality, integrity, and availability can also vary greatly across organizations. One organization (such as a financial institution) may have greater demands for confidentiality of information compared to another organization (such as an advertising agency that gathers mailing addresses). The greater demand may be in the form of requiring confidentiality for larger amounts or percentages of overall data. The greater demand may also be in a stronger form of confidentiality, such as requiring two factor authentication (for example, requiring a password and physical token or device card) rather than a password alone to access sensitive data. Integrity requirements can vary, depending on the amount of data to verify as well as how to verify the data. This can be further broken down into how the data is accessed, who accesses the data, how data is processed, as well as how data may relate with other data. Lastly, availability requirements can vary, depending on what data or resources are to be made accessible in a timely manner. We also must consider the window of time that defines a “timely manner.” In more critical environments, that window may be within a minute or second of time. In less critical environments, that window may be an hour or even longer.
An organization’s requirements for confidentiality, integrity, and availability will be identified within the organization’s security and operational policies. It is important that the objectives of the organization’s security and operational policies are met or exceeded with the security solution design, implementation, and configuration. The security solution should be tested to ensure those objectives are met not only once deployed, but also routinely afterwards to ensure that the security solution is effective as the data and environment changes or evolves.
Data security has similar goals and objectives as information security—with the same fundamental principles of confidentiality, integrity, and availability. However, data security is often associated with only data that is at rest or saved in persistent storage, such as a magnetic disk, flash drive, or solid state device. Even though data security follows the same goals and objectives of information security, the goals are often much more focused on how to achieve and implement them. Such focus allows the goal to be more precisely defined and detailed in how the solution is achieved. As an example, to protect data against unauthorized modification (integrity), a data security approach may involve monitoring accesses to certain data to reveal who accesses the data as well as how the data is accessed. Such analysis can reveal anomalies to identify potential threats, such as when data is written unexpectedly (say, outside of normal work hours or outside of the expected processing pattern).
“Database security refers to the range of tools, controls, and measures designed to establish and preserve database confidentiality, integrity, and availability.”1 We can view database security to have the same objectives and principles as data security and information security, but within the scope or level of a database environment. This further focuses the choice, design, implementation, and configuration of controls available at the database level.
To achieve confidentiality in database security, we can employ controls such as privileges (which we cover in Chapter 5, “Database Privileges“) and encryption of stored data (which we cover in Chapter 7, “Other Database Security Controls”). Privileges are a database system control where the database system itself manages data access. As described earlier, encryption can provide confidentiality against agents that do not have the proper credentials and may be implemented by the database system, modules, or other programs.
To achieve database integrity, in addition to checksums or hashing, we can use a range of controls that appear as early as the database design phase. This includes proper database design with normalization, defining referential integrity constraints, and identifying when concurrent access may occur. We cover database design and normalization approaches to help maintain data integrity in Chapter 2, “Database Design,” and other data integrity controls in Chapter 8, “Transactions for Data Integrity.” We also cover mechanisms that help maintain integrity when multiple applications and/or users access the same data in Chapter 9, “Data Integrity with Concurrent Access.”
We previously defined the term integrity to detect when data has been modified in an unauthorized manner. Within a database environment we also have four types of data integrity that we must also recognize, and describe now.
The first type of data integrity to recognize within a database environment is entity integrity. The idea behind entity integrity is similar to the definition of a relation in the relational database model, which we describe in more detail shortly. In a database design model, an entity describes a set of data for a particular theme or context, such as for an employee. In the relational database model, we can consider an entity to be implemented as a table or relation. Each instance of an entity (or row in that table) represents one specific instance or case of that theme. For example, an instance of the employee entity (or each row in the employee table) represents one specific person and contains data to just that person.
Entity integrity requires that each entity instance (or each table row) to be uniquely identified by a primary key value. The primary key may consist of one or more attributes of the entity (or columns of the table). If the primary key is a single attribute, then all of the values in that one column must be unique. If the primary key contains multiple attributes (a composite key), the combined values of the columns must be unique. Primary key values for an entity should be not only unique, but non-null and contain no more attributes or columns than is necessary to provide uniqueness.
As examples, let’s consider the following set of tables that hold data for an organization that manages employees, departments, and projects (Figure 1.1). Each theme is implemented as a table and contains data specific to that theme. For representation purposes, we will use one of the popular conventions of representing table names and column names with PascalCase, also known as CapitalCase, where the first letter of each word in the name is capitalized and all other letters are lower case. When reading a name, such capital letters indicate word separation without the use of other character separators such as an underscore symbol, hyphen, or space. Also, when using PascalCase, abbreviations within a name are commonly represented with all capital letters, such as ID for “identifier” or SSN for “social security number.”
This textbook also follows the principles of the most common database model, the relational database model. The relational database model defines that each table (or relation) consists of a set of rows (or records), where each row contains a set of data to a specific entity of the table. Each column (or attribute) contains at most one piece of data in each row. In certain cases, no value (a null value) may be given to a column in a row. No two rows are identical; that is, no two rows have the same values for each corresponding column.
For the employee entity, a table named Employee holds data about the organization’s employees, and each row in the table represents one employee. In order to uniquely identify an employee, we need to specify a primary key of one (or possibly more) attributes or columns, whose data values would refer to exactly one row in that table. The EmployeeId attribute serves this purpose, as every employee has a unique EmployeeId value in the organization. At times we may consider other attribute possibilities as unique identifiers, and that may be valid for certain tables and attributes. While this particular set of data for the Employee table does have unique last names, we may be tempted to specify LastName (or LastName and FirstName) as a primary key. However, in a practical scenario we could not assume that every employee does or will have a unique last name (or unique last name and first name). As such, relying on last name, or last name and first name, as a unique identifier would not provide entity integrity in general.
In a similar manner, data about departments is implemented with a Department table and each row contains data of a specific department. DepartmentID is chosen as a primary key value. We may have alternatively chosen DeptName as a primary key, presuming that no two departments have the same name. While that may be a valid assumption, we may also consider that DepartmentID may be an introduced key (or surrogate key) to help provide a more uniform and possibly easier way to specify a primary key value for a particular department (this idea may become clearer when we get to the ProjectAssignment table later).
The next form of data integrity, referential integrity, may be the more familiar type of integrity to database users or administrators. Here, we look at requirements needed to support relationships of data across tables. In a database, data in one row can relate with data in another row (of another table or possibly the same table) to generate more complete information. For example, consider these tables in an organization where employees are assigned to at most one department.
FIGURE 1.1. Example database tables for an organization.
The Employee table provides the first name, last name, and office of a particular employee. However, if we need more information about a given employee’s department, such as department name or administrative assistant, we can derive that information by using the DeptID column value at that employee’s row in Employee as a foreign key that relates to the primary key DepartmentID in Department. Take for example the Employee table row for Sam Smith with the DeptID attribute D2. We can then retrieve in the Department table the row identified by D2 to derive that employee Sam Smith is with the Sales department.
A relationship refers to the mapping of rows in one table to rows in another (or even the same) table to provide more information with combining data. A relationship instance refers to a specific example of a given relationship. For example, we just mentioned that the relationship between Employee and Department allows us to derive the department information of a given employee. A relationship instance refers to a specific example of a mapping from Employee to Department, for example, that the row for Sam Smith in Employee relates to the row for Sales in Department.
Referential integrity requires that for each relationship instance, a foreign key value must refer to an existing primary key value. Between the Employee and Department tables, referential integrity is maintained because each DeptID value in Employee refers to an existing DepartmentID in Department. As a specific example, or instance, of referential integrity being maintained between Employee and Department, consider another EmployeeId 1003, which has a DeptID of D1, associating Susan Shu with the Accounting department. Because D1 is an existing DepartmentID in Department, referential integrity is maintained. However, if an employee were to have a DepartmentID of, say D5, referential integrity is not maintained because DepartmentID D5 does not exist in Department.
The Employee and Department tables also have a second relationship, where a row in Department relates to a row in Employee to provide more information about a department’s administrative assistant. For example, with DepartmentID D2, the AdminAsstEmpId value is 1005, referring to EmployeeId 1005 in Employee, which is Gary Grant. Because each foreign key value of AdminAsstEmpId in Department refers to an existing primary key value of EmployeeId in Employee, referential integrity is maintained for that particular relationship.
ProjectAssignment happens to be a table with two relationships, one with Employee and another with Project. In ProjectAssignment, EmpId is a foreign key to EmployeeId in the Employee table, and ProjID is a foreign key to ProjectID in Project. Referential integrity likewise requires that each row in ProjectAssignment has a foreign key value that corresponds to an existing primary key value in Employee as well as an existing primary key value in Project.
ProjectAssignment is also an example of an entity with a composite (multi-attribute) identifier, where EmpId and ProjID together form the primary key. Here, one must provide a value to both EmpId and ProjID to uniquely identify a row in ProjectAssignment. In the larger picture, ProjectAssignment is actually an intersection table that implements a many-to-many relationship between Employee and Project, where a given row in ProjectAssignment relates an employee with a project. As such, the composite primary key of ProjectAssignment not only uniquely identifies a row in ProjectAssignment, but also represents a relationship instance between a specific row in Employee and a specific row in Project.
While referential integrity requires a foreign key value to correspond to an existing primary key value, referential integrity does not require a foreign key value to always be defined or specified in every row of a table that has a relationship, as long as that foreign key is not part of a composite primary key to that table. When a foreign key is not part of the table’s primary key, the idea here is that a foreign key value may not be known or not exist yet, in which case there may not be a foreign key value to specify (at least at that time). The assumption is that in the future this foreign key value will become known and added to the row, say after the value is determined, or after the corresponding primary key is added to the related table. As an example, in the Department table, Marketing currently does not have an administrative assistant. Perhaps there may not be a person to specify at this time because that person is not yet an employee and hence does not exist in the Employee table, or perhaps the person is a current employee and does exist in Employee but has not yet been appointed. In either case, a null value may be specified for AdminAsstEmpId in the Department row for Marketing, because AdminAsstEmpId is not part of Department’s primary key. After the person is added to Employee or appointed, their EmployeeId value may be set as the AdminAsstEmpId value in Department for the Marketing row.
Entity integrity also plays a part in supporting this many-to-many relationship, requiring that every row of ProjectAssignment contain a value for EmpId as well as ProjID. This means that in order to specify a project assignment, we must have both an employee and project—the absence of either is an invalid project assignment. Even though EmpId and ProjID are foreign keys in ProjectAssignment, because they also form the primary key in that table (and because of entity integrity), we will require that both EmpId and ProjID have non-null values in ProjectAssignment. In general, if a foreign key in a table also is part of the table’s primary key, then entity integrity does require a non-null value of that foreign key in every row of that table.
Domain integrity in a database environment refers to a data value that exists when expected and in the proper form. When a table is created, the definition and constraints defined for a column causes the database management system (DBMS) to play a huge role with enforcing domain integrity. For example, by defining the data type for a column when a table is created, we are defining the domain of values that are stored in that column. A column defined with an integer data type will store a value as a whole number, even if the value is provided as a decimal number. And if an inserted value cannot be converted to an integer, the DBMS will generate an error.
Domain integrity may even involve enforcing that a row’s value for a column is present when required. Such a specification for a column can be defined by indicating its value must be NOT NULL. In contrast, a NULL specification indicates that the value may or may not be specified.
One can also employ the use of constraints to ensure that data values conform to an expected value or format. For example, consider a column that stores a value in U.S. dollars and cents. By specifying the data type of that column as a monetary type or number type with two decimal places, we ensure that a value stored has two digits for the cents value. An inserted number that has, say, one or three decimal digits that cannot be converted properly, will not be allowed by the DBMS. Other forms of constraints can ensure other types of criteria, such as ensuring a column value is within a range of values.
Even with the other three forms of data integrity that have been described, in a particular environment or situation we may still have requirements to define. User-defined integrity refers to the addition of those requirements, typically in the form of business rules. An example of a business rule involves a minimum number of items that can be placed in an order. Another example may be that a department must have at least two employees to exist. If data does not fulfill those requirements, then the data is considered to violate the data integrity of that organization. The concept of user-defined integrity may not be as formally defined as the others, but it may play a significant part in the data integrity of an organization.
With database security, if availability is compromised such that data is lost or corrupted, as part of recovery we may have to use data backups, as described with information security. There are often various backup and restore approaches based on facilities provided by the operating system and/or applications that carry out the backup and recovery tasks. Regarding databases, there are often built-in mechanisms within the DBMS itself that can be used to quickly and conveniently issue backup and restore operations. We cover some of those approaches in Chapter 3, “Database Management and Administration” and Chapter 9, “Data Integrity with Concurrent Access.”
As with information security, in addition to hardware failure of storage devices, we may also have to consider other forms of hardware failure, such as with RAM, CPU, network connection, or network card. Controls to mitigate these threats also involve redundancy of hardware components, so that the operation of one failed component can be quickly replaced by operation of a similar functional component. If such a switchover is automatic, we can achieve high availability, where the system continues to operate in a manner that minimizes or eliminates the effect of the failure to users. High availability can involve RAID systems or other redundant hardware involving multiple RAM banks, CPUs, network connections, or network interface cards. High availability often requires special hardware, operating systems, and/or software that is particular to the system platform, operating system, and availability objectives. Depending on the security objectives of the database system, we may even involve multiple database servers to provide availability of the database information in the event that one of the databases fails, say by power outage or catastrophe.
Database security may also involve many of the availability controls mentioned earlier. This may include hardware and/or software controls such as UPSs, firewalls, and other network security mechanisms. When database applications are present, we may also have to consider attacks that could compromise availability (as well as confidentiality or integrity). Controls that include proper code design, reviews, and testing of database applications or mechanisms can reduce those threats.
In this chapter, we introduced the concepts of information security as well how data security and database security involve more focused and specific usage of those concepts. Other security controls involving user account management can also play a role with database security. As with general user accounts, a security control may involve proper review of database user accounts to mitigate certain threats. Such threats include ensuring that an account of a former employee, or an account created for testing or backdoor purposes is properly disabled and inaccessible to a malicious agent. For accounts that are legitimately active, we must also ensure that proper data access or privileges are assigned to the account, so that an agent has access to the data in which they are allowed and does not have access to data that is disallowed. We describe the concepts of database user accounts and managing access to data in Chapter 4, “Database User Accounts,” Chapter 5, “Database Privileges,” and Chapter 6, “Roles.”
1https://www.ibm.com/cloud/learn/database-security