Arelational database(RDB[1]) is adatabasebased on therelational modelof data, as proposed byE. F. Coddin 1970.[2]Adatabase management systemused to maintain relational databases is arelational database management system(RDBMS). Many relational database systems are equipped with the option of usingSQL(Structured Query Language) for querying and updating the database.[3]

History

edit

The concept of relational database was defined byE. F. CoddatIBMin 1970. Codd introduced the termrelationalin his research paper "A Relational Model of Data for Large Shared Data Banks".[2]In this paper and later papers, he defined what he meant byrelation.One well-known definition of what constitutes a relational database system is composed ofCodd's 12 rules.However, no commercial implementations of the relational model conform to all of Codd's rules,[4]so the term has gradually come to describe a broader class of database systems, which at a minimum:

  1. Present the data to the user asrelations(a presentation in tabular form, i.e. as acollectionoftableswith each table consisting of a set ofrowsandcolumns);
  2. Provide relational operators to manipulate the data in tabular form.

In 1974, IBM began developingSystem R,a research project to develop a prototype RDBMS.[5][6] The first system sold as an RDBMS wasMultics Relational Data Store(June 1976).[citation needed]Oraclewas released in 1979 by Relational Software, nowOracle Corporation.[7]IngresandIBM BS12followed. Other examples of an RDBMS includeIBM Db2,SAP Sybase ASE,andInformix.In 1984, the first RDBMS forMacintoshbegan being developed, code-named Silver Surfer, and was released in 1987 as4th Dimensionand known today as 4D.[8]

The first systems that were relatively faithful implementations of the relational model were from:

  • University of Michigan –Micro DBMS(1969)[9]
  • Massachusetts Institute of Technology (1971)[10]
  • IBM UK Scientific Centre at Peterlee –IS1(1970–72),[11]and its successor,PRTV(1973–79).[12]

The most common definition of an RDBMS is a product that presents a view of data as a collection of rows and columns, even if it is not based strictly uponrelational theory.By this definition, RDBMS products typically implement some but not all of Codd's 12 rules.

A second school of thought argues that if a database does not implement all of Codd's rules (or the current understanding on the relational model, as expressed byChristopher J. Date,Hugh Darwenand others), it is not relational. This view, shared by many theorists and other strict adherents to Codd's principles, would disqualify most DBMSs as not relational. For clarification, they often refer to some RDBMSs astruly-relational database management systems(TRDBMS), naming otherspseudo-relational database management systems(PRDBMS).[citation needed]

As of 2009, most commercial relational DBMSs employSQLas theirquery language.[13]

Alternative query languages have been proposed and implemented, notably the pre-1996 implementation ofIngres QUEL.

Relational model

edit

A relational model organizes data into one or moretables(or "relations" ) ofcolumnsandrows,with a unique key identifying each row. Rows are also calledrecordsortuples.[14]Columns are also called attributes. Generally, each table/relation represents one "entity type" (such as customer or product). The rows represent instances of that type ofentity(such as "Lee" or "chair" ) and the columns represent values attributed to that instance (such as address or price).

For example, each row of a class table corresponds to a class, and a class corresponds to multiple students, so the relationship between the class table and the student table is "one to many"[15]

Keys

edit

Each row in a table has its own unique key. Rows in a table can be linked to rows in other tables by adding a column for the unique key of the linked row (such columns are known asforeign keys). Codd showed that data relationships of arbitrary complexity can be represented by a simple set of concepts.[2]

Part of this processing involves consistently being able to select or modify one and only one row in a table. Therefore, most physical implementations have a uniqueprimary key(PK) for each row in a table. When a new row is written to the table, a new unique value for the primary key is generated; this is the key that the system uses primarily for accessing the table. System performance is optimized for PKs. Other, morenatural keysmay also be identified and defined asalternate keys(AK). Often several columns are needed to form an AK (this is one reason why a single integer column is usually made the PK). Both PKs and AKs have the ability to uniquely identify a row within a table. Additional technology may be applied to ensure a unique ID across the world, aglobally unique identifier,when there are broader system requirements.

The primary keys within a database are used to define the relationships among the tables. When a PK migrates to another table, it becomes a foreign key in the other table. When each cell can contain only one value and the PK migrates into a regular entity table, this design pattern can represent either aone-to-oneorone-to-manyrelationship. Most relational database designs resolvemany-to-manyrelationships by creating an additional table that contains the PKs from both of the other entity tables – the relationship becomes an entity; the resolution table is then named appropriately and the two FKs are combined to form a PK. The migration of PKs to other tables is the second major reason why system-assigned integers are used normally as PKs; there is usually neither efficiency nor clarity in migrating a bunch of other types of columns.

Relationships

edit

Relationships are a logical connection between different tables (entities), established on the basis of interaction among these tables. These relationships can be modelled as anentity-relationship model.

Transactions

edit

In order for a database management system (DBMS) to operate efficiently and accurately, it must useACID transactions.[16][17][18]

Stored procedures

edit

Part of the programming within a RDBMS is accomplished usingstored procedures(SPs). Often procedures can be used to greatly reduce the amount of information transferred within and outside of a system. For increased security, the system design may grant access to only the stored procedures and not directly to the tables. Fundamental stored procedures contain the logic needed to insert new and update existing data. More complex procedures may be written to implement additional rules and logic related to processing or selecting the data.

Terminology

edit
Relational database terminology

The relational database was first defined in June 1970 byEdgar Codd,of IBM'sSan Jose Research Laboratory.[2]Codd's view of what qualifies as an RDBMS is summarized inCodd's 12 rules.A relational database has become the predominant type of database. Other models besides therelational modelinclude thehierarchical database modeland thenetwork model.

The table below summarizes some of the most important relational database terms and the correspondingSQLterm:

SQL term Relational database term Description
Row Tupleorrecord A data set representing a single item
Column Attributeorfield A labeled element of a tuple, e.g. "Address" or "Date of birth"
Table RelationorBase relvar A set of tuples sharing the same attributes; a set of columns and rows
Vieworresult set Derived relvar Any set of tuples; a data report from the RDBMS in response to aquery

Relations or tables

edit

In a relational database, arelationis a set oftuplesthat have the sameattributes.A tuple usually represents an object and information about that object. Objects are typically physical objects or concepts. A relation is usually described as atable,which is organized intorowsandcolumns.All the data referenced by an attribute are in the samedomainand conform to the same constraints.

The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which use operations such asselectto identify tuples,projectto identify attributes, andjointo combine relations. Relations can be modified using theinsert,delete,andupdateoperators. New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting.

Tuples by definition are unique. If the tuple contains acandidateor primary key then obviously it is unique; however, a primary key need not be defined for a row or record to be a tuple. The definition of a tuple requires that it be unique, but does not require a primary key to be defined. Because a tuple is unique, its attributes by definition constitute asuperkey.

Base and derived relations

edit

All data are stored and accessed viarelations.Relations that store data are called "base relations", and in implementations are called "tables". Other relations do not store data, but are computed by applying relational operations to other relations. These relations are sometimes called "derived relations". In implementations these are called "views"or" queries ". Derived relations are convenient in that they act as a single relation, even though they may grab information from several relations. Also, derived relations can be used as anabstraction layer.

Domain

edit

A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of the attribute. Mathematically, attaching a domain to an attribute means that any value for the attribute must be an element of the specified set. The character string"ABC",for instance, is not in the integer domain, but the integer value123is. Another example of domain describes the possible values for the field "CoinFace" as ( "Heads", "Tails" ). So, the field "CoinFace" will not accept input values like (0,1) or (H,T).

Constraints

edit

Constraints are often used to make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a given integer attribute to values between 1 and 10. Constraints provide one method of implementingbusiness rulesin the database and support subsequent data use within the application layer. SQL implements constraint functionality in the form ofcheck constraints. Constraints restrict the data that can be stored inrelations.These are usually defined using expressions that result in aBooleanvalue, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation. Since every attribute has an associated domain, there are constraints (domain constraints). The two principal rules for the relational model are known asentity integrityandreferential integrity.

Primary key

edit

Everyrelation/table has a primary key, this being a consequence of a relation being aset.[19]A primary key uniquely specifies a tuple within a table. While natural attributes (attributes used to describe the data being entered) are sometimes good primary keys,surrogate keysare often used instead. A surrogate key is an artificial attribute assigned to an object which uniquely identifies it (for instance, in a table of information about students at a school they might all be assigned a student ID in order to differentiate them). The surrogate key has no intrinsic (inherent) meaning, but rather is useful through its ability to uniquely identify a tuple. Another common occurrence, especially in regard to N:M cardinality is thecomposite key.A composite key is a key made up of two or more attributes within a table that (together) uniquely identify a record.[20]

Foreign key

edit

Foreign key refers to a field in a relational table that matches the primary key column of another table. It relates the two keys. Foreign keys need not have unique values in the referencing relation. A foreign key can be used tocross-referencetables, and it effectively uses the values of attributes in the referenced relation to restrict the domain of one or more attributes in the referencing relation. The concept is described formally as: "For all tuples in the referencing relation projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same attributes such that the values in each of the referencing attributes match the corresponding values in the referenced attributes."

Stored procedures

edit

A stored procedure is executable code that is associated with, and generally stored in, the database. Stored procedures usually collect and customize common operations, like inserting atupleinto arelation,gathering statistical information about usage patterns, or encapsulating complexbusiness logicand calculations. Frequently they are used as anapplication programming interface(API) for security or simplicity. Implementations of stored procedures on SQL RDBMS's often allow developers to take advantage ofproceduralextensions (often vendor-specific) to the standarddeclarativeSQL syntax. Stored procedures are not part of the relational database model, but all commercial implementations include them.

Index

edit

An index is one way of providing quicker access to data. Indices can be created on any combination of attributes on arelation.Queries that filter using those attributes can find matching tuples directly using the index (similar toHash tablelookup), without having to check each tuple in turn. This is analogous to using theindex of a bookto go directly to the page on which the information you are looking for is found, so that you do not have to read the entire book to find what you are looking for. Relational databases typically supply multiple indexing techniques, each of which is optimal for some combination of data distribution, relation size, and typical access pattern. Indices are usually implemented viaB+ trees,R-trees,andbitmaps. Indices are usually not considered part of the database, as they are considered an implementation detail, though indices are usually maintained by the same group that maintains the other parts of the database. The use of efficient indexes on both primary and foreign keys can dramatically improve query performance. This is because B-tree indexes result in query times proportional to log(n) where n is the number of rows in a table and hash indexes result in constant time queries (no size dependency as long as the relevant part of the index fits into memory).

Relational operations

edit

Queries made against the relational database, and the derivedrelvarsin the database are expressed in arelational calculusor arelational algebra.In his original relational algebra, Codd introduced eight relational operators in two groups of four operators each. The first four operators were based on the traditional mathematicalset operations:

  • Theunionoperator (υ) combines the tuples of tworelationsand removes all duplicate tuples from the result. The relational union operator is equivalent to theSQL UNIONoperator.
  • Theintersectionoperator (∩) produces the set of tuples that two relations share in common. Intersection is implemented in SQL in the form of theINTERSECToperator.
  • Theset differenceoperator (-) acts on two relations and produces the set of tuples from the first relation that do not exist in the second relation. Difference is implemented in SQL in the form of theEXCEPTor MINUS operator.
  • Thecartesian product(X) of two relations is a join that is not restricted by any criteria, resulting in every tuple of the first relation being matched with every tuple of the second relation. The cartesian product is implemented in SQL as theCross joinoperator.

The remaining operators proposed by Codd involve special operations specific to relational databases:

  • The selection, or restriction, operation (σ) retrieves tuples from a relation, limiting the results to only those that meet a specific criterion, i.e. asubsetin terms of set theory. The SQL equivalent of selection is theSELECTquery statement with aWHEREclause.
  • Theprojection operation(π) extracts only the specified attributes from a tuple or set of tuples.
  • The join operation defined for relational databases is often referred to as a natural join (⋈). In this type of join, two relations are connected by their common attributes. MySQL's approximation of a natural join is theInner joinoperator. In SQL, an INNER JOIN prevents a cartesian product from occurring when there are two tables in a query. For each table added to a SQL Query, one additional INNER JOIN is added to prevent a cartesian product. Thus, for N tables in an SQL query, there must be N−1 INNER JOINS to prevent a cartesian product.
  • Therelational division(÷) operation is a slightly more complex operation and essentially involves using the tuples of one relation (the dividend) to partition a second relation (the divisor). The relational division operator is effectively the opposite of the cartesian product operator (hence the name).

Other operators have been introduced or proposed since Codd's introduction of the original eight including relational comparison operators and extensions that offer support for nesting and hierarchical data, among others.

Normalization

edit

Normalization was first proposed by Codd as an integral part of the relational model. It encompasses a set of procedures designed to eliminate non-simple domains (non-atomic values) and the redundancy (duplication) of data, which in turn prevents data manipulation anomalies and loss of data integrity. The most common forms of normalization applied to databases are called thenormal forms.

RDBMS

edit
The general structure of a relational database

Connolly and Begg define database management system (DBMS) as a "software system that enables users to define, create, maintain and control access to the database".[21]RDBMS is an extension of that initialism that is sometimes used when the underlying database is relational.

An alternative definition for arelational database management systemis a database management system (DBMS) based on therelational model.Most databases in widespread use today are based on this model.[22]

RDBMSs have been a common option for the storage of information in databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s. Relational databases have often replaced legacyhierarchical databasesandnetwork databases,because RDBMS were easier to implement and administer. Nonetheless, relational stored data received continued, unsuccessful challenges byobject databasemanagement systems in the 1980s and 1990s, (which were introduced in an attempt to address the so-calledobject–relational impedance mismatchbetween relational databases and object-oriented application programs), as well as byXML databasemanagement systems in the 1990s.[23]However, due to the expanse of technologies, such ashorizontal scalingofcomputer clusters,NoSQLdatabases have recently become popular as an alternative to RDBMS databases.[24]

Distributed relational databases

edit

Distributed Relational Database Architecture(DRDA) was designed by a workgroup within IBM in the period 1988 to 1994. DRDA enables network connected relational databases to cooperate to fulfill SQL requests.[25][26] The messages, protocols, and structural components of DRDA are defined by theDistributed Data Management Architecture.

List of database engines

edit

According toDB-Engines,in January 2023 the most popular systems on the db-engines.com web site were:[27]

  1. Oracle Database
  2. MySQL
  3. Microsoft SQL Server
  4. PostgreSQL(free software)
  5. IBM Db2
  6. Microsoft Access
  7. SQLite(free software)
  8. MariaDB(free software)
  9. Snowflake
  10. Microsoft Azure SQL Database
  11. Apache Hive(free software)
  12. Teradata Vantage

According to research companyGartner,in 2011, the five leadingproprietary softwarerelational database vendors by revenue wereOracle(48.8%),IBM(20.2%),Microsoft(17.0%),SAPincludingSybase(4.6%), andTeradata(3.7%).[28]

See also

edit

References

edit
  1. ^Hastings, Jordan (2003).Portable Software Tools for Managing and Referencing Taxonomies.Digital Mapping Techniques '03 Workshop Proceedings.Vol. U.S. Geological Survey Open-File Report 03–471. 2. Relational Database Technology and Taxonomic Representation.Archivedfrom the original on 2014-10-21.Retrieved2024-04-06– viaUnited States Geological Survey.
  2. ^abcdCodd, E. F.(1970)."A Relational Model of Data for Large Shared Data Banks".Communications of the ACM.13(6): 377–387.doi:10.1145/362384.362685.S2CID207549016.
  3. ^Ambler, Scott."Relational Databases 101: Looking at the Whole Picture".[better source needed]
  4. ^Date, Chris (5 May 2005).Database in depth: relational theory for practitioners.O'Reilly.ISBN0-596-10012-4.
  5. ^ Funding a Revolution: Government Support for Computing Research.National Academies Press. 8 Jan 1999.ISBN0309062780.
  6. ^Sumathi, S.; Esakkirajan, S. (13 Feb 2008).Fundamentals of Relational Database Management Systems.Springer.ISBN978-3540483977.The product was called SQL/DS (Structured Query Language/Data Store) and ran under the DOS/VSE operating system environment
  7. ^"Oracle Timeline"(PDF).Profit Magazine.12(2). Oracle: 26. May 2007.Retrieved2013-05-16.
  8. ^"New Database Software Program Moves Macintosh Into The Big Leagues".tribunedigital-chicagotribune.28 June 1987.Retrieved2016-03-17.
  9. ^Hershey, W.R.; Easthope, C.H. (1 December 1972)."A set theoretic data structure and retrieval language".ACM SIGIR Forum.7(4).Association for Computing Machinery:45–55.doi:10.1145/1095495.1095500.Retrieved4 January2024.
  10. ^SIGFIDET '74: Proceedings of the 1974 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control: Data Models: Data-Structure-Set versus Relational.Association for Computing Machinery.1 January 1975.ISBN978-1-4503-7418-7.Retrieved4 January2024.
  11. ^Notley, M.G. (1972).The Peterlee IS/1 System.IBM United Kingdom Scientific Centre.Retrieved4 January2024.
  12. ^Todd, Stephen (1976). "The Peterlee Relational Test Vehicle - A System Overview".IBM Systems Journal.15(4): 285–308.doi:10.1147/sj.154.0285.
  13. ^Ramakrishnan, Raghu; Donjerkovic, Donko; Ranganathan, Arvind; Beyer, Kevin S.; Krishnaprasad, Muralidhar (1998)."SRQL: Sorted Relational Query Language"(PDF).E Proceedings of SSDBM.
  14. ^"A Relational Database Overview".oracle.com.
  15. ^"A universal relation model for a nested database",The Nested Universal Relation Database Model,Lecture Notes in Computer Science, vol. 595, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 109–135, 1992,doi:10.1007/3-540-55493-9_5,ISBN978-3-540-55493-6,retrieved2020-11-01
  16. ^"Gray to be Honored With A. M. Turing Award This Spring".Microsoft PressPass. 1998-11-23.Archivedfrom the original on 6 February 2009.Retrieved2009-01-16.
  17. ^Gray, Jim(September 1981)."The Transaction Concept: Virtues and Limitations"(PDF).Proceedings of the 7th International Conference on Very Large Databases.Cupertino, CA:Tandem Computers.pp. 144–154.Retrieved2006-11-09.
  18. ^Gray, Jim, and Reuter, Andreas,Distributed Transaction Processing: Concepts and Techniques.Morgan Kaufmann,1993.ISBN1-55860-190-2.
  19. ^Date (1984),p. 268.
  20. ^Connolly, Thomas M; Begg, Carolyn E (2015).Database systems: a practical approach to design, implementation, and management(global ed.). Boston Columbus Indianapolis: Pearson. p. 416.ISBN978-1-292-06118-4.
  21. ^Connolly, Thomas M.; Begg, Carolyn E. (2014).Database Systems – A Practical Approach to Design Implementation and Management(6th ed.). Pearson. p. 64.ISBN978-1292061184.
  22. ^Pratt, Philip J.; Last, Mary Z. (2014-09-08).Concepts of Database Management(8 ed.). Course Technology. p. 29.ISBN9781285427102.
  23. ^Feuerlich, George (21 April 2010).Dateso 10; Database Trends and Directions: Current Challenges and Opportunities(1st ed.). Prague, Sokolovsk: MATFYZPRESS. pp. 163–174.ISBN978-80-7378-116-3.{{cite book}}:CS1 maint: date and year (link)
  24. ^"NoSQL databases eat into the relational database market".4 March 2015.Retrieved2018-03-14.
  25. ^Reinsch, R. (1988). "Distributed database for SAA".IBM Systems Journal.27(3): 362–389.doi:10.1147/sj.273.0362.
  26. ^Distributed Relational Database Architecture Reference.IBM Corp. SC26-4651-0. 1990.
  27. ^"DB-Engines Ranking of Relational DBMS".DB-Engines.Retrieved2022-04-29.
  28. ^"Oracle the clear leader in $24 billion RDBMS market".2012-04-12.Retrieved2013-03-01.

Sources

edit