Category Archives: SQL

SQL (Structured Query Language) is a domain-specific language used in programming designed to manage and retrieve information management systems relational databases

SQL

SQL

To manage the data base data model most commonly used is the relational model and to communicate with the manager can use the SQL language

It is NOT a programming language, but a database management language that has the following characteristics:

  • DDL (Data Definition Language)

    The DDL SQL provides commands for defining relation schemas, deleting and changing relationships schemes relationship

  • DML (Data Manipulation Language)

    The DML SQL includes a query language based on both relational algebra and tuple relational calculation. It also includes commands to insert, delete, and modify tuples

  • Integrity

    The DDL includes commands to specify integrity constraints that must meet the data stored in the database. Updates that violate integrity restrictions are rejected

  • Definition of views

    The SQL DDL includes commands for defining views

  • Transaction control

    SQL includes commands to specify the start and end of transactions

  • Built-in SQL and dynamic SQL

    Built-in SQL and Dynamic SQL define how SQL statements can be incorporated into general-purpose programming languages such as C, C++, Java, PL/L, Cobol, Pascal or Fortran, among others

  • Authorization
    The SQL DDL includes commands to specify access rights to relationships and views

Evolution of SQL

IBM developed the original version of SQL, called Sequel, as part of the System R project in the early 1970s by the San Jose, California

The Sequel language has evolved since then and its name has become SQL (Structured Query Language)

Today, many programming languages support the SQL language and have been set as the standard language for the use of relational databases

1981, IBM released the first commercial SQL-based program

1986, ANSI (American National Standards Institute) published a SQL standard, called SQL-86 or SQL1 and was confirmed by ISO (International Standards Organization) in 1987

1989 ANSI released an extension of the standard for SQL named SQL-89

1992, a new version of the standard called SQL-92 or SQL2 appeared

1999, a new version called SQL:1999 or SQL2000 appeared. In which regular expressions, recursive queries (for hierarchical relationships), triggers, and some object-oriented features were added

2003, a new version called SQL:2003 appeared. In which some XML features, changes in functions, standardization of the sequence object and autonumeric columns were added

2005, a new version called SQL:2005 appeared and corresponds to the ISO/IEC 9075-14:2005 standard. Defining how SQL can be used in conjunction with XML:

  • Import and save XML data into a SQL database, manipulating it within the database and publishing conventional XML and SQL data in XML
  • Facilities that allow applications to integrate into their SQL code the use of XQuery, XML query language published by the World Wide Web Consortium (W3C) for concurrent access to ordinary SQL data and XML documents

2008, a new version called SQL:2008 appeared. In which the use of the ORDER BY clause was added outside the cursors definitions. Including triggers of the INSTEAD OF type. And the inclusion of the TRUNCATE judgment

2011, a new version called SQL:2011 appeared. In which the temporary data (PERIOD FOR) and improvements in the window functions and the FETCH clause were added

2016, a new version called SQL:2016 appeared. In which the search was added using patterns, polymorphic table functions and compatibility with JSON files

DDL

Data Definition Language (DDL)

Data Definition Language (DDL) is responsible for modifying the structure of database objects

Includes orders to modify, delete, or define relationships (tables) in which database data is stored

There are four basic operations:

Definition of data

The relationship set for each database must be specified in the system in terms of a data definition language (DDL)

The SQL DDL not only allows the specification of a set of relationships, but also information regarding those relationships, including:

  • The outline of each relationship
  • The value domain associated with each attribute
  • Integrity restrictions
  • The set of indexes that must be maintained for each relationship
  • The security and authorization information for each relationship
  • The physical storage structure of each relationship on disk

Basic types of domains

The SQL standard supports a wide variety of predefined domain types:

  • CHAR(n)

    A fixed length character string, with a length n specified by the user. You can also use the full word character

  • VARCHAR(n)

    A variable length character string with a maximum length n specified by the user. The complete form, character varying, is equivalent

  • INT

    An integer (a finite subset of computer dependent integers). The whole word, integer, is equivalent

  • SMALLINT

    A small integer (a computer dependent subset of the entire domain type)

  • NUMERIC(p, d)

    A fixed comma number, the accuracy of which is specified by the user. The number is made up of p digits (plus the sign) and those p digits, d belongs to the decimal part

  • DOUBLE

    A floating-point number and double precision floating point numbers, with computer dependent precision

  • FLOAT(n)

    A floating point number whose accuracy is, at least n digits

  • DATE

    Date type data, with an accuracy of up to second

Basic definition of schemes

CREATE TABLE

Relationships are defined by the command CREATE TABLE:

\text{CREATE TABLE r} (\\ A_1 D_1, \cdots, A_n D_n,\\ \{rest-integrity_1\}, \cdots, \{rest-integrity_k\} )

Where r is the name of relationship, every Ai is the name of an attribute in the relationship scheme r and Di is the domain type of the attribute domain values Ai

There are several valid integrity restrictions

One of the most commonly used is the primary key (primary key), which takes the following form:

\text{primary key}(A_{j1}, \cdots, A_{jm})

The primary key specification determines the attributes A_{j1}, \cdots, A_{jm} form the primary key to the relationship

The attributes of the primary key must be non null and unique; that is, no tuple can have a null value for an attribute of the primary key and no pair of tuples in the relationship can be equal across all attributes of the primary key

Although the primary key specification is optional, it is usually a good idea to specify a primary key for each relationship

The example created a relationship called acount in which the attributes have been defined:

number_acount, you can save a word of up to 10 characters
name_branch, you can save a word of up to 15 characters
balance, you can save a decimal number with 12 integers and 2 decimal places
And with the restriction on number_acount as the primary key to the relationship

If a newly inserted or newly modified tuple in a relationship contains null values for any of the attributes that are part of the primary key, or if they have the same value as another tuple in the relationship, SQL will report the error and prevent the update

The example created another relationship called client in which the attributes have been defined:

number_acount, you can save a word of up to 10 characters
name_client, you can save a word of up to 15 characters
And with the restriction on number_acount as the primary key to the relationship

As can be seen we can create more than one relationship

DROP TABLE

The command is used to delete a relationship from the database DROP TABLE

This command removes all information about the relationship from the database

The example has removed all information about the relationship from the database acount

This command is more drastic than DELETE, which only erases your tuples while maintaining the outline of the relationship

However, DROP, if removed acount, no tuples can be reinserted into that relationship unless it is recreated with the command CREATE TABLE

ALTER TABLE

The command ALTER TABLE is used to add attributes to an existing relationship

As the value of the new attribute, all tuples in the relationship are assigned null value

The example added a new attribute

Where acount is the name of the existing relationship, year is the name of the attribute you want to add and DATETIME is the domain of the added attribute

You can also remove attributes using ALTER TABLE

The example removed the attribute we added earlier

Where acount is the name of the existing relationship and year is the name of the attribute you want to delete

In this case it is not necessary to declare the domain of the attribute to be deleted

Many database systems do not allow the deletion of attributes, although they do allow the deletion of entire tables

TRUNCATE

This command applies only to tables and its function is to clear the entire contents of the specified relationship

The advantage over the command DELETE, is that if you want to erase all the content of the relationship, it is much faster, especially if the relationship is very large

The downside is that TRUNCATE only serves when you want to remove absolutely all tuples, as the clause is not allowed WHERE

While, at first, this statement would appear to be Data Manipulation Language (DML), it is actually a DDL, because internally, the TRUNCATE command clears the table and recreates it and does not execute any transactions

In the example we've deleted all the content of the called relationship acount

DML

Data Manipulation Language (DML)

Data Manipulation Language (DML) is responsible for carrying out the tasks of querying or manipulating the data, organized by the appropriate data model

Includes commands to query, modify, delete, or define tuples (rows) that are stored in the relationships (tables) of the database

There are four basic operations:

Basic definition of schemes

Alias Operation

SQL provides an operation to rename using an alias for both relationships and attributes

It has the following syntax: original_name AS alias

The clause AS it is optional although if we write it will be clearer because it informs us that we are defining an alias

Situations where it is useful to use an alias:

  • Where we have two relationships in the clause FROM who have the same name for their attributes

    In these situations there may be ambiguity and we can use the alias operation to disassemble the attributes and make it clear which relationship they belong to

  • When we have a calculated operation that generates an attribute in the clause SELECT

  • When we have a calculated operation that generates an attribute after applying a aggregation operation

To access its attributes, you must use the alias followed by a period and the attribute name

SELECT

You can use the command SELECT to query the data stored in a database relationship

It will return a set of tuples from that relationship

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1, r_2, \cdots, r_m\\ \text{WHERE} P

Where every Ai represents an attribute, each ri a relationship and P it's a predicate

So an SQL query will have the following equivalent relational algebra expression:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p(r_1 \Theta r_2 \Theta \cdots \Theta r_m) )

If the clause is omitted WHERE, the predicate P it's always True

But unlike relational algebra expression, the result of an SQL query can contain multiple repeated tuples

When we introduce more than one relationship into the aclausula FROM, SQL will form a Cartesian product of all included relationships

The selection of relational algebra is then carried out using the predicate of the WHERE and then projects the result on the attributes of the clause SELECT

SELECT clause

The example resolved the query to get all the accounts in the relationship acount with attributes number_acount, name_branch, balance

In this case, when using DISTINCT we are forcing the repeated to be removed

The example resolved the query to get all the accounts in the relationship acount with attributes number_acount, name_branch, balance

In this case, when using ALL we are forcing the repeated ones not to be eliminated

The example replaced the command ALL by the symbol * and the attributes have been dispensed with, as the result is equivalent

The example resolved the query to get all the accounts in the relationship acount with attributes number_acount, name_branch, with an increase of 1500 in the balance

As can be seen, we can also use arithmetic expressions that operate on attributes

The example created a calculated attribute (non relationship) from those operations, giving it a name or alias to that operation, in this case it's called increase

FROM clause

The clause FROM itself defines a Cartesian product of the relationships that appear in the consultation

Because natural meeting is defined in terms of a Cartesian product, a selection, and a projection, it is relatively easy to write an SQL expression for the natural meeting

We have to write it in relational algebra as:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (r_1 \Theta r_2 \Theta \cdots \Theta r_m)

The example resolves the query to get all customers who have an account at the 'Navacerrada

For this purpose, relationships have been introduced acount (you have been assigned the alias c) and client (you have been assigned the alias cl)

Only with the clause FROM we couldn't solve the query

Since the Cartesian product is made on each of the possible couples that can be formed with the relationship acount and client

Giving us repeated results and also all those who were not from the branch of 'Navacerrada

That's why the condition that relates both relationships, the attribute, has been added number_acount

After adding it, we will have only the values that belong to the relationship, although it can still have been repeated

And equality for the attribute name_branch with the value 'Navacerrada

JOIN Clause

The example used the clause JOIN that will give us back an equivalent result

The clause JOIN allows meeting operations, which take two relationships and return another relationship

The clause can normally be used FROM for external meetings or for any situation where a meeting can be used

To distinguish whether the meeting is external, the keyword will be used OUTER and for the intern INNER

Both are optional, although it makes it clearer what kind of meeting we're using

There are four types of meeting:

  • INNER JOIN

    Returns all tuples when there is at least one match in both relationships

    The example used the clause INNER JOIN that will give us back an equivalent result

    Because as we said, the keyword INNER was optional

    However, we now have clearer that the meeting was internal

  • LEFT OUTER JOIN

    Returns all tuples in the left relationship, and the matching tuples of the right relationship

    The example used the clause LEFT JOIN that will return a result that is no longer equivalent, as they will appear nulls values

    We've dispensed with the keyword OUTER because the result is equivalent

    In this case LEFT JOIN It keeps all tuples in the left relation (acount)

    The tuples of the right relationship (client) will be displayed if there is a match with those on the left

    If values exist in the left relationship but not in the right relationship, it will display null

  • RIGHT OUTER JOIN

    Returns all tuples in the relationship on the right, and the matching tuples of the relationship on the left

    The example used the clause RIGHT JOIN that will return a result that is no longer equivalent, as they will appear nulls values

    We've dispensed with the keyword OUTER because the result is equivalent

    In this case RIGHT JOIN keeps us all the tuples of the right relationship (client)

    The tuples of the left relationship (acount) will be displayed if there is a match with those on the right

    If there are values in the right relationship but not in the left relationship, it will display null

  • FULL OUTER JOIN

    Returns all tuples of the two relationships, left and right

    The example used the clause FULL JOIN that will return a result that is no longer equivalent, as they will appear nulls values

    We've dispensed with the keyword OUTER because the result is equivalent

    In this case FULL JOIN returns all rows in the left table (acount) and the right table (client)

    It combines the outcome of the LEFT JOIN and RIGHT JOIN

    Null will appear on each of the tables alternately when there is no match

The type of assembly defines how to treat tuples of each relation that do not marry the tuple of the other relation

There are three types of conditions that apply to the meetings:

  • meeting NATURAL

    It is used when the attributes which meet relationships have the same name

    The example used the clause NATURAL JOIN which will return an equivalent result without using the condition ON

    Since the attribute number_acount It called equal relations acount and client

  • condition ON

    Its use is similar to that of the clause WHERE, but it specializes in comparing attributes of both relations

  • condition USING

    Its use is similar to that of the meeting NATURAL, because it is used when the attributes which meet relationships have the same name

    However, only return a result of the meeting, not all as the meeting does NATURAL

    The example used condition USING with the attribute number_acount, which will return a result that is no longer equivalent

    It will only return only the first match of each pair of the relationship, not all

Using conditions is mandatory meeting at external meetings, but is optional on the internal (if omitted, a Cartesian product is obtained)

WHERE clause

In the example, the query get resolved only accounts of the relationship acount that meet the condition introduced in this case in its attribute balance have a value less than 100

To achieve this we have used the clause WHERE, thanks to which we can filter the query to admitting a condition using the logical comparison operator <

Logical comparison operators
Operator Description
< Less than
<= Less than or equal to
> Greater than
>= Greater than or equal to
= Like
<> Other than that

In the example, the query get resolved only accounts of the relationship acount that meet the condition introduced in this case in its attribute balance have a value equal to 'Navacerrada' and its attribute balance have a value less than 100

To achieve this we have used the logical connector AND

Logical connection operators
Operator Description
AND It's true if and only if the two operands are true
OR It's true if and only if the one or both operands are true
NOT Reverses the value of the operand
BETWEEN Operator

In the example, the query get resolved only accounts of the relationship acount that meet the condition introduced in this case in its attribute name_branch have a value equal to 'Navacerrada' and its attribute balance they have a value between 100 and 1000

The example used an equivalent query, using the comparison operator BETWEEN, which simplifies us some clauses WHERE

In the example it used a query negating the comparison operator NOT BETWEEN, which will return those values ​​outside that range

LIKE operator

In the example, the query get resolved only accounts of the relationship acount that meet the condition introduced in this case in its attribute name_branch have a value equal to 'Navacerrada

The example used an equivalent query, using the comparison operator LIKE, which simplifies us some clauses WHERE on allowing the use of patterns with strings

For the description of patterns we can use two special characters:

  • %

    The percentage matches any substring

  • _

    Underlining matches any character

These patterns can be combined to obtain different types of queries:

In the example it used the comparison operator LIKE using the pattern % after the text, which will return the values ​​that begin with the string 'Nava

In the example it used the comparison operator LIKE using the pattern % before and after the text, which will return values ​​that include as substring 'Nava

In the example it used the comparison operator LIKE using the pattern _ repeated three times, which will return values ​​that are exactly three characters

In the example it used the comparison operator LIKE using the pattern _ repeated three times and pattern %, Which will return to those values ​​that have the least three characters

In the example it used the comparison operator LIKE using the pattern % together with the special escape character '\' which will return those values ​​containing 'Nava%rrete

The escape character (which can be anything we choose) is used before special characters pattern to indicate that this character be treated as a normal character and not as a pattern

It is defined using the keyword escape

In the example it used the comparison operator NOT LIKE using the pattern %, which will return those values ​​that do not contain 'Nava%rrete

In this way we can also look for inconsistencies, but denying the operator LIKE

Nulls values

SQL allows the use of null values ​​to indicate the absence of information on the value of an attribute

You can use the special keyword NULL in a predicate to check if a value is null

In the example, the query get resolved all accounts whose name NO branch has been filled

The example resolved the query to get all accounts whose branch name has been filled in

As can be seen, it can also be checked the absence of nulls using the logical operator NOT

Since the predicate clauses WHERE may include Boolean operations such as AND, OR and NOT on the results of the comparisons, values NULL Boolean operations expand

AND TRUE FALSE NULL
TRUE TRUE FALSE NULL
FALSE FALSE FALSE FALSE
NULL NULL FALSE NULL
OR TRUE FALSE NULL
TRUE TRUE TRUE TRUE
FALSE TRUE FALSE NULL
NULL TRUE NULL NULL
NOT TRUE FALSE NULL
FALSE TRUE NULL

ORDER BY clause

Sometimes we need to return the information of the tuples in a certain order

SQL allows you to sort the attributes by using the clause ORDER BY

In the example it used the comparison operator ASC with clause ORDER BY for the attribute name_branch, Which will return the values ​​sorted in ascending order (in alphabetical order because it was an alphanumeric attribute) by the attribute name_branch

In the example it used the comparison operator DESC with clause ORDER BY for the attribute name_branch, Which will return the values ​​sorted in descending order (in alphabetical order because it was an alphanumeric attribute) by the attribute name_branch

The sort type returned by the comparison operators ASC and DESC depend on the type of the attribute because not ordered in the same way a number, a string or a date

If you need to order more than one attribute, we'll separated by commas and followed the comparison operator (ASC or DESC )

The default sorting method is ASC, is optional and could do without writing

Operations on sets

SQL allows operations on sets of relational algebra union \cup, intersection \cap and difference \neg

To use them we will have the only restriction that the names of the attributes of the relationships involved must be equal

UNION operation

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE} P\\ \text{UNION}\\ \text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} Q

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query that contains a union will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_1) ) \cup \prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (r_2) )

If the clause is omitted WHERE, the predicate P or Q (It depends on what is omitted), it is always true

The example query resolved all account numbers that exist in the bank or have a customer associated with them

Unlike the clause SELECT the operation UNION eliminates duplicate values

For this reason it is good idea to use operation UNION to create keys, because with SELECT would not get unique keys (allows repeated)

The example query resolved all account numbers that exist in the bank or have a customer associated with them

In the use operator ALL in conjunction with the operation UNION, we are allowing it to behave like a SELECT, as it supports repeated

INTERSECT operation

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE} P\\ \text{INTERSECT}\\ \text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} Q

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query that contains an intersection will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_1) ) \cap \prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (r_2) )

If the clause is omitted WHERE, the predicate P or Q (It depends on what is omitted), it is always true

The example query resolved all account numbers that exist in the bank and have a customer associated with them

The operation INTERSECT eliminates duplicate values ​​and returns only those values ​​that match the relationship acount and in client at once, not just one of them

The example query resolved all account numbers that exist in the bank and have a customer associated with them

In the use operator ALL in conjunction with the operation INTERSECT, we are allowing it to behave like a SELECT, as it supports repeated

EXCEPT operation

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE} P\\ \text{EXCEPT}\\ \text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} Q

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query that contains a difference will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_1) ) \cap \neg (\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (r_2) ) )

If the clause is omitted WHERE, the predicate P or Q (It depends on what is omitted), it is always true

The example query resolved all account numbers that exist in the bank and do not have a customer associated with them

The operation EXCEPT eliminates duplicate values ​​and returns only those values ​​that match the relationship acount but do not match client

The example query resolved all account numbers that exist in the bank and do not have a customer associated with them

In the use operator ALL in conjunction with the operation EXCEPT, we are allowing it to behave like a SELECT, as it supports repeated

IN operation

SQL lets you check the membership of tuples to a relation which is equivalent to check the membership of a set of relational algebra

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE} P\\ \text{IN}\\ (\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} Q)

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query that contains a membership will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_1) ) \in \prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (r_2) )

If the clause is omitted WHERE, the predicate P or Q (It depends on what is omitted), it is always true

The example resolves the query to get all customers who have an account at the 'Navacerrada

Previously we solved it using the clause JOIN

But as you can see, we can also solve it using set theory

Membership will return the values and if there are none, it will show null

Resolves example query to get all customers who do not have accounts on the branch 'Navacerrada

To deny membership will return all other values ​​and should be none, will show null

The example resolves the query to get all customers who have an account at the 'Logroño‘ ó ‘Navacerrada

As can be seen, belonging or not belonging apart from a subquery as in the previous examples, also it supports sets listed

Whenever these sets are listed in the same domain as the attribute that you want to check membership

Comparison operators

SQL provides the ability to compare sets using subqueries

SOME operator

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE} P\\ \text{SOME}\\ (\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} Q)

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query that contains a membership will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_1) ) \approx \prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (r_2) )

If the clause is omitted WHERE, the predicate P or Q (It depends on what is omitted), it is always true

Example resolves the query to get a client who has an account on the branch 'Navacerrada

Previously we solved using the operation IN

But as can be seen, we can also solve using a comparison of sets

In this case using the symbol = in front of the operator SOME to denote that they are equal

Example resolves the query to get a client who does not have an account at the branch 'Navacerrada

Although it might seem that is equivalent to the previous example NOT INT, the result may not be the same

We can also use the rest of the logical comparison operators

ANY operator

The operator can also be used ANY since it is equivalent to SOME

Early versions of SQL only allowed the operator ANY, but was added SOME to be more asemejase to natural language, as created any ambiguity in the English language

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE} P\\ \text{ANY}\\ (\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} Q)

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query that contains a membership will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_1) ) \approx \prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (r_2) )

If the clause is omitted WHERE, the predicate P or Q (It depends on what is omitted), it is always true

We can also use the rest of the logical comparison operators

Example resolves the query to get a client who has an account on the branch 'Navacerrada

Previously we solved using the operation SOME

As seen is fully equivalent and would use only for compatibility with older versions of SQL

ALL operator

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE} P\\ \text{ALL}\\ (\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} Q)

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query that contains a membership will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_1) ) \forall \prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (r_2) )

If the clause is omitted WHERE, the predicate P or Q (It depends on what is omitted), it is always true

We can also use the rest of the logical comparison operators

Resolves example query to get all customers who have accounts in the following branches'Navacerrada' (alphabetical)

Actually the use the symbol > in front of the operator ALL we are denoting that we look for all values that have an alphabetical order greater than 'Navacerrada

Repeated control

SQL lets you control whether a subquery has repeated tuples in its result

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_1\\ \text{WHERE UNIQUE}\\ (\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2, \cdots, r_n\\ \text{WHERE} P)

Where every Ai represents an attribute, each ri a relationship and P is preached

So a SQL query that contains a membership will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma \exists \prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r_2, \cdots, r_n) ) (r_1) )

If the clause is omitted WHERE, the predicate P, es siempre cierto

Formally assessment UNIQUE on a relationship is defined as false if and only if the ratio of two tuples ri and rk such that rik

Like checking ri = rk it is false if any attribute ri or rk is zero, it is possible that the outcome of UNIQUE is true even if several duplicates of the same tuple

Provided that at least one of the attributes of the tuple is null

Resolves example query to get all customers who have at most one account at the branch 'Navacerrada

This has been introduced relations account (has been assigned the alias c) and client (has been assigned the alias cl)

Only with the clause FROM we couldn't solve the query

We could have used the clause JOIN, although there may still be repeated

Why it has added the operator UNIQUE relating the two relationships with the attribute number_acount and equality for the attribute name_branch with the value 'Navacerrada

After adding them, we only values ​​pertaining to the relationship

Resolves example query to get all customers who have more than one account at the branch office 'Navacerrada

As can be seen, the denial of UNIQUE we can check more than one repeated

Aggregation functions

The aggregation functions are functions that take a collection (a set or multiset) value as input and return a single value

SQL allows for five aggregation functions:

  • AVG

    Allows the arithmetic mean of the collection, which must be numeric

    If there are nulls values at the entrance, they are ignored

    In the example query is resolved to obtain the arithmetic mean of the balance of all accounts

    It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called average

  • MIN

    It allows determine minimum value of the collection, which is not required to be numerical

    If there are nulls values at the entrance, they are ignored

    In the example query is resolved to obtain the lowest balance of all accounts

    It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called less

  • MAX

    It allows determine the maximum value of the collection, which is not required to be numerical

    If there are nulls values at the entrance, they are ignored

    In the example query is resolved to obtain the highest balance of all accounts

    It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called greater

  • SUM

    Allows the sum of the collection, which must be numeric

    If there are nulls values at the entrance, they are ignored

    In the example query is resolved to obtain the total balance of all accounts

    It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called liquid

  • COUNT

    Counts the number of elements that make up the collection, which is not required to be numerical

    If there are nulls values at the entrance, they are taken as 0

    In the example, the query get resolved many accounts are there altogether

    It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called total

GROUP BY clause

There are circumstances in which it would be desirable to apply the aggregation functions not only to a single set of tuples, but also a group of sets of tuples

In SQL we can specify by clause GROUP BY

The most common form of consultation will be:

\text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r\\ \text{WHERE} P\\ \text{GROUP BY} A_1, \cdots, A_2, \cdots, A_n\\ \text{HAVING} Q

Where every Ai represents an attribute, each ri a relationship and P, Q are predicates

So a SQL query containing a group will have the following equivalent expression of relational algebra:

\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma q (\prod_{A_1, \cdots, A_2, \cdots, A_n} (\sigma p (r) ) )

If the provisions are omitted WHERE or HAVING, the predicate P or Q (It depends on what is omitted), it is always true

In the example, the query get resolved there are many accounts in total for each branch

It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called total

In order for the grouping operation can be performed without errors we use the same order of attributes in both the clause SELECT as in GROUP BY

In case you are not going to intervene in the group, we can omit these attributes placing them on the right of those that will be used

HAVING clause

Sometimes we need a condition that applies to groups rather than tuples

To express such queries use the clause HAVING

In the example, the query get resolved there are many accounts in total branch 'Navacerrada

It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called total

It has been applied on the attribute condition name_branch with the value 'Navacerrada' using clause HAVING to apply on the grouping and not on the relationship

In the event that a clause would WHERE and one HAVING:

  1. SQL first apply the predicate clause WHERE
  2. The result will be divided into groups that satisfy the predicate clause HAVING

In the example, the query get resolved there are many accounts in total branch 'Navacerrada' and have a balance greater than 100

It was created a calculated attribute (not belonging to the relation) from these operations, assigning a name or alias to that operation, in this case it's called total

It has been applied on the attribute condition name_branch with the value 'Navacerrada

Additionally the condition applied on the attribute balance applying the grouping operation COUNT with the proviso that they are older than 100

Both have been on the clause aplicacadas HAVING

As can be seen, they can be applied without problems aggregation operations within clause HAVING but also they appear in clause SELECT

INSERT

Newly created relationships are initially empty

You can use the command INSERT to add data to the relationship

The most common way to insert query will be:

\text{INSERT INTO r}\\ \text{VALUES} (A_1, A_2, \cdots, A_n)

Where every Ai It represents an attribute, r relationship and each Pi it's a predicate

The example has solved the query to add the C-9732 has the branch 'Navacerrada' with a balance of 1200 €

The values are specified in the order in which the corresponding attributes are related in the relationship schema

The example has changed the order in which the corresponding attributes are related in the relationship schema and that is why the data has been entered in a mandatory manner, in the new order that we have specified

In this way we can choose in what order we enter the data

In the case of not introducing an attribute, SQL will try to enter a default or null value and if the primary key, notify the error and prevent the update

In the example, the query has decided to add the client 'Pedro Almodovar' for all account numbers that exist in the branch'Navacerrada' and they do not have a client associated with them

Instead of specifying a tuple it has been used as a subquery clause SELECT will allow us to specify a set of tuples

The most common form of subquery for insertion will be:

\text{INSERT INTO} r_1\\ \text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_2\\ \text{WHERE} P

Where every Ai It represents an attribute, ri relationships and P it's a predicate

The clause is evaluated first SELECT and are inserted in the relation client tuples obtained from the subquery

We must be careful to the nested subqueries use, since abusing them, could create an infinite loop subqueries and insertion would be made infinitely, being locked data base

Luckily management systems databases have mechanisms or tools for mass insertion in a controlled manner avoiding blocking of the database

UPDATE

You can use the command UPDATE to modify the values ​​of a set of tuples existing relationship

The most common form of update query is:

\text{UPDATE r}\\ \text{SET} A_1 = P_1, A_2 = P_2, \cdots, A_n = P_n

Where every Ai It represents an attribute, r relationship and each Pi it's a predicate

In the example it has been updated relationship called acount, in particular to the attribute balance we have changed their value 100

The command will update all tuples of the relationship acount

In the example they have been updated only tuples that satisfy the condition introduced in this case in its attribute number_acount have a value equal to 'C-9732

Case operator

The order of two instructions UPDATE It is important because if the order of the upgrade instructions changes we can get inconsistent data

To solve this problem we have the operator CASE

The most common form of update query is:

\text{UPDATE r}\\ \text{SET} A_1 = P_1, A_2 = P_2, \cdots, A_n = P_n

Where every Ai It represents an attribute, r relationship and each Pi is a predicate which can be replaced by the following predicate Q:

\text{CASE}\\ \text{WHEN} P_1 \text{THEN} r_1\\ \text{WHEN} P_2 \text{THEN} r_2\\ \cdots\\ \text{WHEN} P_n \text{THEN} r_n\\ \text{ELSE} r_0\\ \text{END}

Where every Pi each predicate can be satisfied, each ri the result to be used in updating

If not met any of the predicates, the result would apply r0

In the example it has been resolved update query to add 0.05 to the balance when it is less than or equal to 100 with a modification of 0.05, when it is equal to 500 with a change of 0.07 when it is greater than 1000 with a modification of 0.25 and a change of 0.01 in any other case

Clause has been used CASE to solve for the equate with the attribute balance which it was the value update

DELETE

You can use the command DELETE to erase tuples from a relationship

The example has deleted the relationship called acount

The command will clear all tuples from the relationship acount

And only in this relationship, because it is not able to work on several

The most common form of consultation will be deleted:

\text{DELETE FROM r}\\ \text{WHERE} P

Where r It represents the relationship and each P it's a predicate

In the example, only tuples that meet the introduced condition have been deleted, in this case those in their attribute balance have a value less than 100

It should be borne in mind that the command DELETE first check each tuple of the relation acount and then erase all tuples that meet the condition

It is important to perform all checks before erasing since the order in which tuples are processed can vary the erase operation

Views

Views

The views are used in situations where for safety or performance, we want to hide certain data to users

In the example, a client needs to know the number of a loan operation to its name, the 'C-9732' and the name of the branch that has processed, but for that operation does not need to see the balance or more related data

With the above query we solved what the client needed, but we gave more data than necessary and may put the bank in a situation that could be unsafe

Relationships that are not part of the logical model but are visible to users through virtual relationships are called views

Basic definition of views

CREATE VIEW

Relationships are defined by the command CREATE VIEW:

\text{CREATE VIEW v} AS r

Where r is a query expression and v is the name of the view

The example has created a view that tries to solve the problem of the above example

However, even though we have improved safety to the user not show balances, does not solve the problem, we should still do the following subquery to solve:

The example has treated the view as if it were a more

However, for the management system database it is a virtual connection being a view

Being a virtual relationship we can do query operations SELECT

But we can not do update operations as INSERT, UPDATE or DELETE

In the example we have created a view called loan(name_branch, total) that allows you to get the names of all branches and the total balance they have

As shown, it is possible to return any attribute and usually work with queries

In this case the attribute has been returned name_branch and it used the aggregation function COUNT after using clause GROUP BY with attributes name_branch and balance

In the example it defined a view from another to solve the query that retrieves all account numbers branch 'Navacerrada

So we've used the view named all_accounts

Since it is taken as a virtual relationship management system for the database to be a view

Keep in mind that nest views on themselves can produce a recursive effect and if ill defined block database falling into an infinite loop

Updating views

Generally a view can not be updated, however a management database system can allow make in individual cases

You have to consult the manual system manager databases to know what specific cases permits and how to solve it

In the event that updating is permitted, the view must meet:

  • The clause FROM only has a relationship
  • The clause SELECT only contains names of attributes of the relationship without alias, added value without specifying DISTINCT
  • Any attribute that does not appear in clause SELECT should be defined as null
  • The query must not contain aggregate functions (GROUP BY clause, HAVING clause)

DROP VIEW

You can use the command DROP VIEW to remove a view

And only from that view, as he is not able to work on several

The most common form of view deletion query will be:

\text{DROP VIEW v}

Where v is the name of the view

In the example it has deleted the view named all_accounts

WITH clause

The clause WITH was introduced in SQL: 1999 standard and allows you to define a temporary view

This temporary view can only be used in the query which has been defined

The most common way is:

\text{WITH} v \text{AS} r_0\newline\\ \text{SELECT} A_1, \cdots, A_2, \cdots, A_n\\ \text{FROM} r_0, r_1, r_2, \cdots, r_m, r\\ \text{WHERE} P

Where every Ai represents an attribute, each ri a relationship (r0 is a query), P is a predicate and v is the name of the temporary view

This example is equivalent to the view that we saw in a previous example

Only this time we have not used a defined view in the database

We have used a defined time view with clause WITH

Transactions

Transactions

The transactions They are a sequence of instructions to query or update

The SQL standard specifies that a transaction implicitly begins when a SQL statement is executed

At the end of the instruction must run one of the following SQL statements:

  • COMMIT WORK

    Validates updates made: so far and done so to become permanent in the database

    A complete turn, automatically starts the next transaction

    In the example we have created a relationship called acount and we populated with 7 tuples

    The query or update instructions are separated from each other by the symbol ; indicating that SQL is a new instruction

    At the end we have implemented COMMIT so that the changes are saved to the database

  • ROLLBACK WORK

    Causes the current transaction to be aborted, that is, it cancels all operations performed on the database

    The database returns to the initial state, before the first instruction of the transaction

    In the example we have created a relationship called acount and we populated with 7 tuples

    The query or update instructions are separated from each other by the symbol ; indicating that SQL is a new instruction

    At the end we have implemented ROLLBACK so that the changes will not be kept in the database

The key word WORK is optional in both directions, so we could skip in the examples

Use ROLLBACK It is useful when an error is detected and it is necessary to a stable state of the database

Once it executed COMMIT, it is not possible to make a ROLLBACK, as data changes become permanent

Therefore, when the database systems detect a failure (either a voltage drop, a drop of the system, a bad connection in the network), internally perform one ROLLBACK it is denied the use of the COMMIT, to prevent the loss of information

In the special case of the system crash, the ROLLBACK will be done when the system restarts

Starting with the SQL:1999 standard, the contents of a transaction can be enclosed in the following format:

\text{BEGIN}\\ S_1;\\ S_2;\\ \cdots;\\ S_n;\\ \text{END}

Where si it's a consultation instruction or an update

The example is similar to that shown above for COMMIT

However, on this occasion we have locked up the transaction in the format of the SQL:1999 standard

This way we can tell where different transactions start and end without having to look for instructions COMMIT or ROLLBACK

JDBC

JDBC

JDBC is a set of Java classes and interfaces for executing statements SQL

It is the CLI part of Java (Call-Level Interface)

It was jointly developed by JavaSoft, Sybase, Informix and IBM among others

JDBC allows the manipulation of any SQL database

You don't need to do a specific program to manipulate Oracle, Sybase, etc...

Our program can handle any database

Bringing together Java with JDBC, we get programs that can run on any platform, and they can manipulate any database

The classes and interfaces JDBC is located within the package java.sql

Process of working with JDBC

The JDBC worker process consists of the following steps:

  1. Connect to the database (using DriverManager)

  2. Issue SQL statements (using Statement, PreparedStatement, CallableStatement)

  3. Processing the results (ResultSet)

JDBC Driver

Each particular database implements the JDBC interface in a particular way, as do the helper and utility classes that are required for that database

That is why they are necessary drivers of databases

The same application will be able to connect to different databases simply by changing that driver (we should not confuse the database driver with the class Driver, because they are different things)

The interface Driver specifies the methods that any JDBC driver must implement, so drivers can be loaded in two ways:

  1. When you start the class Driver, the DriverManager consultation the property jdbc.drivers

    This property contains a list of drivers (classes) that must be loaded

    To do this, you must run a command like this (to use ODBC) on the command line:

  2. If you want to enter a new driver after the DriverManager is initialized we should use the method forname of the class Class

    To do this we must include in our code the following statement (to use ODBC):

    It is advisable to use of static in our application (because the driver will only load once, when loading the class)

Examples of drivers
Database Driver
ODBC sun.jdbc.odbc.JdbcOdbcDriver
Oracle oracle.jdbc.driver.OracleDriver
SQLServer com.microsoft.jdbc.sqlserver.SQLServerDriver
MySQL com.mysql.jdbcDriver

Tipos de driver:

  • Bridge JDBC-ODBC

    Translates JDBC to ODBC and then relays it to the ODBC driver of the machine

    Is the ODBC driver actually communicates with the database

    Is included in the JDK but does not include JDBC2

    Inconvenience:

    • It's useful for testing, but it's slow in production

    • ODBC driver required on client (lower portability)

  • Driver JDBC on a native driver of the database

    Relays JDBC the native driver installed on the machine

    The native driver is the one that actually communicates with the database

    Drawback:

    • need for native driver (lower portability)
  • Driver Java on network

    Translates JDBC calls into a network protocol independent of the platform that contact with the server

    The server translates these requests to the specific protocol of each database

    It uses a middleware on the network server that is capable of connecting customers with pure Java to many different databases

    Advantage:

    • is fast, platform-independent and requires no installation on the client

  • Driver pure Java and native protocol

    Translates the JDBC calls to the specific protocol of the database by directly contacting it

Get connections

Using the specific driver we will make the connections, but connection requests must be made using the DriverManager

Once the class Driver has been uploaded and registered, the DriverManager you can establish connections to the database by using these two steps:

  1. The method is called DriverManager.getConnection(url, user, pass); and you get an object of type Connection

  2. The DriverManager tests the registered drivers to see if you can establish the connection and if it wasn't possible, launch a SQLException

A single application can have multiple connections to the same database or multiple connections to other databases (up to the maximum allowed by the database)

The parameters supported by getConnection are the url (which is a subprotocol that uses the database to make the connection), the user (the username to be connected) and the pass (the password that the user uses to connect)

The parameter url has the following format:

The subprotocol is particular to each database, and uses the DriverManager to find the proper driver to handle it

The subname depends on the subprotocol specific

The driver will be responsible to interpret and will help you to locate the database

Our app won't work with the specific driver, but will work on the DriverManager

In this way applications can work with the object Connection without worrying about the type of database we're working with (no code modifications need to be made, just change the parameter url)

The connection should always be closed at the end, because otherwise resources are consumed unnecessarily, although connections can be reused, it is not advisable, it is better to always use a new connection

In the example we have used a skeleton of connection to Oracle

To use the connection we need to create an object of type Connection, which represents an open session with the database

The object provides a context with which to issue SQL statements and get results

The object Connection must be initialized to an initial null, so we can check if there was a connection because we'll include a block try to handle bugs, which will launch SQLException

To close the connection must be made using the method close(), within a finaly (which will also have its block try that spear SQLException)

The interface Statement

The connection allows us to create objects Statement by using the method createStatemen()

The objects Statement allow sql statements to be executed and results obtained (using the objects ResultSet)

The methods to execute SQL accept String (JDBC is CLI), which can be composed dynamically based on values contained in variables

To compose the String SQL, concatenate the different SQL fragments (static and variable)

There are three ways to execute SQL statements:

  1. By using the method executeQuery(<sql>) for queries using the statement SELECT that produces tuples as a result (returns an object of type ResultSet)

    • It is like a table that stores the result of the query

    • You have a number of query methods

    • It's like a cursor (in PL/SQL terminology)

  2. By using the method executeUpdate(<sql>) for updates by using the sentences INSERT, DELETE, UPDATE, as well as DDL commands (CREATE, DROP, ALTER TABLE, ADD) and PL/SQL blocks (between begin and end blocks)

    • Returns an integer that indicates the number of rows modified by the command

    • With DDL commands it returns 0

  3. By using the method execute(<sql>) that executes any SQL statement

    • If we used a QUERYReturn True

      • The ResultSet can be obtained by the method getResultSet()

    • If we used a UPDATEReturn False

      • The total number of modified rows can be obtained by using the method getUpdateCount()

After processing the result, the Statement by using the method close()

This method will also close the ResultSet associate, however, Sun recommended closing the ResultSet explicitly to avoid errors unwanted

The same Statement can be reused in the same connection to execute different statements

The interface ResultSet

The object ResultSet acts as a cursor within the results

The first time you read it points to the first result, but does not read it, so you have to read the first row moving forward using the method next()

The method next() returns True if you were able to move forward or False if you could not

To obtain the values of the ResultSet methods are used get that have the following format get<type>(<column>)

To name the columns we can either do it by name or by an index, which starts at 1, their numbering is given by the order in which it was entered in the SELECT

You have to pay attention to the dates, because they are kept as java.sql.Date, not as java.util.Date as you might expect

It should be close ResultSet although it is implicitly closes when you close or reuse the Statement that created it

When you have read a null SQL using one of the methods get<tipo>, it returns:

  • A value null Java for those methods that return Java objects (getString(), getBigDecimal(), getDate(), getTime(), getTimestamp(), getObject(), etc)

  • A value of 0 for those methods that return numeric types (getByte(), getShort(), getInt(), getLong(), getFloat(), getDouble() )

  • A value False for the method getBoolean()

To determine if a given value was null, you must first try to read the column and use the ResulSet wasNull() to know if it was null

Return True yes it was, False otherwise

The interface PreparedStatement

Every time you launch a Statement the database must interpret it and calculate a query plan

But when using PreparedStatement, can be executed multiple times, getting an increase in performance by having the query already analyzed and optimized

Not all databases support PreparedStatement, you have to read the documentation of it to know if they can be used

The objects PreparedStatement derived from the Statement obtained from the connection, using the prepareStatement(<sql>);

A PreparedStatement to launch precompiled SQL statements, you can parameterize one or more entries using the ?, whose values may be changed in different executions of the statement

Parameterize entries is useful when we do not know the values of certain SQL data types in the target database,

The parameterized value gets the correct value from the database driver, so we won't have to worry about the types

Before the SQL statement can be executed we will need to assign a value to the input parameters

To assign values use the methods set<type>(column, value); being the type compatible with the parameter

To run methods are used execute of Statement

In the example we have used a PreparedStatement for Oracle