Tuple relational calculus

Tuple calculusis acalculusthat was created and introduced byEdgar F. Coddas part of therelational model,in order to provide adeclarativedatabase-query language for data manipulation in thisdata model.It formed the inspiration for the database-query languagesQUELandSQL,of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly everyrelational-database-management system.Michel Lacroix andAlain Pirotteproposeddomain calculus,which is closer tofirst-order logicand together with Codd showed that both of these calculi (as well asrelational algebra) are equivalent in expressive power. Subsequently, query languages for the relational model were calledrelationally completeif they could express at least all of these queries.

Definition of the calculus

Relational database

Since the calculus is a query language forrelational databaseswe first have to define a relational database. The basic relational building block is thedomain(somewhat similar, but not equal to, adata type). Atupleis a finite sequence ofattributes,which areordered pairsof domains and values. Arelationis a set of (compatible) tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. Atableis an accepted visual representation of a relation; a tuple is similar to the concept of arow.

We first assume the existence of a setCof column names, examples of which are "name", "author", "address", etcetera. We defineheadersas finite subsets ofC.Arelational database schemais defined as atupleS= (D,R,h) whereDis the domain of atomic values (seerelational modelfor more on the notions ofdomainandatomic value),Ris a finite set of relation names, and

h:R→ 2^C

afunctionthat associates a header with each relation name inR.(Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domainDwe define atupleoverDas apartial functionthat maps some column names to an atomic value inD.An example would be (name: "Harry", age: 25).

t:C⇸D

The set of all tuples overDis denoted asT_D.The subset ofCfor which a tupletis defined is called thedomainoft(not to be confused with the domain in the schema) and denoted asdom(t).

Finally we define arelational databasegiven a schemaS= (D,R,h) as a function

db:R→ 2^T_D

that maps the relation names inRto finite subsets ofT_D,such that for every relation namerinRand tupletindb(r) it holds that

dom(t) =h(r).

The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.

Atoms

For the construction of the formulas we will assume an infinite setVof tuple variables. The formulas are defined given a database schemaS= (D,R,h) and a partial functiontype:V⇸ 2^C,called attype assignment,that assigns headers to some tuple variables. We then define theset ofatomic formulasA[S,type] with the following rules:

ifvandwinV,aintype(v) andbintype(w) then the formulav.a=w.bis inA[S,type],
ifvinV,aintype(v) andkdenotes a value inDthen the formulav.a=kis inA[S,type], and
ifvinV,rinRandtype(v) =h(r) then the formular(v) is inA[S,type].

Examples of atoms are:

(t.age =s.age) —thas an age attribute andshas an age attribute with the same value
(t.name = "Codd" ) — tuplethas a name attribute and its value is "Codd"
Book(t) — tupletis present in relation Book.

The formal semantics of such atoms is defined given a databasedboverSand a tuple variable bindingval:V→T_Dthat maps tuple variables to tuples over the domain inS:

v.a=w.bis true if and only ifval(v)(a) =val(w)(b)
v.a=kis true if and only ifval(v)(a) =k
r(v) is true if and only ifval(v) is indb(r)

Formulas

The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to bind the variables. We define theset of formulasF[S,type] inductively with the following rules:

every atom inA[S,type] is also inF[S,type]
iff₁andf₂are inF[S,type] then the formulaf₁∧f₂is also inF[S,type]
iff₁andf₂are inF[S,type] then the formulaf₁∨f₂is also inF[S,type]
iffis inF[S,type] then the formula ¬fis also inF[S,type]
ifvinV,Ha header andfa formula inF[S,type_[v->H]] then the formula ∃v:H(f) is also inF[S,type], wheretype_[v->H]denotes the function that is equal totypeexcept that it mapsvtoH,
ifvinV,Ha header andfa formula inF[S,type_[v->H]] then the formula ∀v:H(f) is also inF[S,type]

Examples of formulas:

t.name = "C. J. Date" ∨t.name = "H. Darwen"
Book(t) ∨ Magazine(t)
∀t:{author, title, subject} ( ¬ ( Book(t) ∧t.author = "C. J. Date" ∧ ¬ (t.subject = "relational model" )))

Note that the last formula states that all books that are written by C. J. Date have as their subject the relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.

We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema. This leads to the following formal semantics for formulas given a databasedboverSand a tuple variable bindingval:V->T_D:

f₁∧f₂is true if and only iff₁is true andf₂is true,
f₁∨f₂is true if and only iff₁is true orf₂is true or both are true,
¬fis true if and only iffis not true,
∃v:H(f) is true if and only if there is a tupletoverDsuch thatdom(t) =Hand the formulafis true forval_[v->t],and
∀v:H(f) is true if and only if for all tuplestoverDsuch thatdom(t) =Hthe formulafis true forval_[v->t].

Queries

Finally we define what a query expression looks like given a schemaS= (D,R,h):

{v:H|f(v) }

wherevis a tuple variable,Ha header andf(v) a formula inF[S,type] wheretype= { (v,H) } and withvas its only free variable. The result of such a query for a given databasedboverSis the set of all tuplestoverDwithdom(t) =Hsuch thatfis true fordbandval= { (v,t) }.

Examples of query expressions are:

{t:{name} | ∃s:{name, wage} ( Employee(s) ∧s.wage = 50.000 ∧t.name =s.name ) }
{t:{supplier, article} | ∃s:{s#, sname} ( Supplier(s) ∧s.sname =t.supplier ∧ ∃p:{p#, pname} ( Product(p) ∧p.pname =t.article ∧ ∃a:{s#, p#} ( Supplies(a) ∧s.s# =a.s# ∧a.p# =p.p# ))) }

Semantic and syntactic restriction of the calculus

Domain-independent queries

Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the schema it can be that a query may return a different result for a certain database if another schema is presumed. For example, consider the two schemasS₁= (D₁,R,h) andS₂= (D₂,R,h) with domainsD₁= { 1 },D₂= { 1, 2 }, relation namesR= { "r₁"} and headersh= { ( "r₁",{" a "}) }. Both schemas have a common instance:

db= { ( "r₁",{ (" a ", 1) } ) }

If we consider the following query expression

{t:{a} |t.a =t.a }

then its result ondbis either { (a: 1) } underS₁or { (a: 1), (a: 2) } underS₂.It will also be clear that if we take the domain to be an infinite set, then the result of the query will also be infinite. To solve these problems we will restrict our attention to those queries that aredomain independent,i.e., the queries that return the same result for a database under all of its schemas.

An interesting property of these queries is that if we assume that the tuple variables range over tuples over the so-calledactive domainof the database, which is the subset of the domain that occurs in at least one tuple in the database or in the query expression, then the semantics of the query expressions does not change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is defined, which makes all queries by definition domain independent.

Safe queries

In order to limit the query expressions such that they express only domain-independent queries a syntactical notion ofsafe queryis usually introduced. To determine whether a query expression is safe we will derive two types of information from a query. The first is whether a variable-column pairt.aisboundto the column of a relation or a constant, and the second is whether two variable-column pairs are directly or indirectly equated (denotedt.v==s.w).

For deriving boundedness we introduce the following reasoning rules:

in "v.a=w.b"no variable-column pair is bound,
in "v.a=k"the variable-column pairv.ais bound,
in "r(v) "all pairsv.aare bound foraintype(v),
in "f₁∧f₂"all pairs are bound that are bound either inf₁or inf₂,
in "f₁∨f₂"all pairs are bound that are bound both inf₁and inf₂,
in "¬f"no pairs are bound,
in "∃v:H(f) "a pairw.ais bound if it is bound infandw<>v,and
in "∀v:H(f) "a pairw.ais bound if it is bound infandw<>v.

For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for equivalence relations: reflexivity, symmetry and transitivity):

in "v.a=w.b"it holds thatv.a==w.b,
in "v.a=k"no pairs are equated,
in "r(v) "no pairs are equated,
in "f₁∧f₂"it holds thatv.a==w.bif it holds either inf₁or inf₂,
in "f₁∨f₂"it holds thatv.a==w.bif it holds both inf₁and inf₂,
in "¬f"no pairs are equated,
in "∃v:H(f) "it holds thatw.a==x.bif it holds infandw<>vandx<>v,and
in "∀v:H(f) "it holds thatw.a==x.bif it holds infandw<>vandx<>v.

We then say that a query expression { v: H | f(v) } issafeif

for every column nameainHwe can derive thatv.ais equated with a bound pair inf,
for every subexpression offof the form "∀w:G(g) "we can derive that for every column nameainGwe can derive thatw.ais equated with a bound pair ing,and
for every subexpression offof the form "∃w:G(g) "we can derive that for every column nameainGwe can derive thatw.ais equated with a bound pair ing.

The restriction to safe query expressions does not limit the expressiveness since all domain-independent queries that could be expressed can also be expressed by a safe query expression. This can be proven by showing that for a schemaS= (D,R,h), a given setKof constants in the query expression, a tuple variablevand a headerHwe can construct a safe formula for every pairv.awithainHthat states that its value is in the active domain. For example, assume thatK={1,2},R={ "r" } andh= { ( "r", { "a," b "}) } then the corresponding safe formula forv.b is:

v.b = 1 ∨v.b = 2 ∨ ∃w( r(w) ∧ (v.b =w.a ∨v.b =w.b ) )

This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query expression by adding such a formula for every variablevand column nameain its type where it is used in the expression. Effectively this means that we let all variables range over the active domain, which, as was already explained, does not change the semantics if the expressed query is domain independent.

Systems

References

Edgar F. Codd:A Relational Model of Data for Large Shared Data Banks.Communications of the ACM,13(6):377–387, 1970.