Some Practice Problems

The questions appear in this font

And the answers in this font

SQL Overview

Beer SQL and Relational Algebra

Consider the beer drinkers' database with the tables

FREQUENT(Drinker, Bar)

SERVES(Bar, Beer)

LIKES(Drinker, Beer)

SELLS(Bar, Beer, Amount)

 

Beer Aggregation

Find the Bar that sells the highest total Amount of Beer. Hint: You can use a nested query in the HAVING clause of an SQL query.

SELECT Bar FROM SELLS

GROUPBY Bar

HAVING SUM(Amount) >= ALL

(SELECT SUM(Amount) FROM SELLS GROUPBY Bar)

Storage

Sequential Vs Random Access

Consider a table R which is stored in consecutive pages of the disk. We know that reading pages of the disk in the sequential order in which they are stored is at least ten times faster than reading pages in some random order. Give two reasons why we may still prefer to read in random order instead of reading sequentially.

Indexing

No More Indexes Please…

Provide reasons why we should not keep an index on an attribute A of some relation R.

Estimating the Cost of Operations on a Dense Index

Consider a relation R(K,A,B,…) where the attribute K is an integer key. The relation is sorted along the key K, i.e., K is a primary key. Then there is a multilevel dense index on K. Assume that the top level of the multilevel index always has just one page.

Each tuple has a block pointer pointing to the next tuple. Also, each index entry has a pointer to the next entry. As usual, the system attempts to "pack" consecutive tuples in contiguous blocks and uses the overflow when this is not possible.

Assume that n tuples are inserted in R.

Questions

  1. Assume that initially the tuples with keys m*i, i=1,…,n and the corresponding index entries are stored in consecutive blocks, no overflow is needed and space utilization is 66%. Display the state of the indexes and the relation assuming p=6, r=44, k=20, d=160, m=1000, n=30.
  2.  Assume that at every minute j there's a burst of inserting 100 tuples with keys m*j+k, k=1,…,l in ascending order. If you were the administrator how often would you reorganize this database?

The Weird Properties of C+ Trees

The C+ trees are a variation of B+ trees. They have the following properties:

Write an O(logn) procedure for insertion of a new element e into a C+ tree.

The problem with insertion in a C+ tree is that if the page p, where e should appear, is full we cannot simply allocate one more page and split the elements of p because then each page will have (3m+1)/4 elements - unacceptable. Furthermore, it may not be sufficient to find adjacent sibling pages p1 p2 … pa merge their contents and split them in a+1 pages. Here goes the solution

locate the leaf page p where e should appear

if p has less than 3m/2 elements (before the insertion)

insert e in p

if e is now the first element in p

update the appropriate parent node

else if p has exactly 3m/2 elements and there is an adjacent sibling p' with less than 3m/2 elements

if p' is the left adjacent sibling of p

transfer the first element of p to p'

insert e in p

update the appropriate parent node of p to reflect the change of the first element of p

if p' is the right adjacent sibling of p

if e is the largest than any element of p

insert e in p'

update the parent node to reflect that e is the first element of p'

else

transfer the last element of p to p'

update the parent node to reflect the new first element of p'

insert e in p

else if p has no adjacent sibling with less than 3m/2 elements

pick randomly one adjacent sibling p' of p

(Note that p' definitely has 3m/2 elements. For simplicity let us assume that p' is the right adjacent sibling of p. The case for left adjacent sibling is similar.)

concatenate the elements of p', p, and the element e into a list l and then sort l. (note that l has exactly 3m/2 + 3m/2 + 1 = 3m+1 elements)

allocate a new page p''

put the first m elements of l in p

put the second m elements of l in p'

put the last m+1 elements of l in p''

update the parent to reflect the new first element of p'

insert in the parent node of p the first element of p'' along with a pointer to p''

 

Query Processing

B+ trees for joins

Consider

Check the join techniques, from the list below, that can make use of the index. For each one, describe very briefly why the index is useful (i.e., how it is used) if it is useful.

Consider only the following four join techniques:

  1. nested loops join
  2. index join
  3. merge join
  4. hash join

A B+ tree index is useful for

 

Relational Algebra Transformation Rules

For each one of the two rules below state whether the rule holds or not. If the rule holds show why every tuple generated by the left hand expression will also be generated by the right hand expression, for every R and S. Otherwise generate an example where the rule breaks down.

Assume R1 and R2 have the same schema, they have an attribute named B, and they do not have an A attribute. S has an A attribute.

 

Choosing Selection Technique

System R-CSE232 supports the techniques of System R for performing selections and in addition it supports a selection execution technique based on pointer intersection. This is the complete list of selections that System R-CSE232 can do:

Consider the following query

SELECT * FROM R WHERE B=10 AND C=20 AND D=30

Consider the following statistics

State which is the best plan for each of the following combinations of V(B,R), V(C,R), and V(D,R). Provide approximate calculations that prove your statement (do not worry for non-dominating factors in the cost.)

For example, you may say sD=30SCAN(sB=10 AND C=20PINTR) is the optimal plan for the third case.

Hint: It is only the first selection that has a cost and is important. Whatever conditions are not handled by the first selection can be put in a subsequent scan selection that is executed directly on the result of the first selection. In the example, sD=30SCANR does not cost anything.