Nested-Loop Join

Basic Strategy

For the relational operation Join[a=b](R, S):

result = set()

for pageR in R:
    for pageS in S:
        for tupleR, tupleS in cartesianProduct(pageR, pageS):
            if tupleR.a == tupleS.b:
                newTuple = concat(tupleR, tupleS)
                result = result.union(newTuple)

Needs input buffers for R and S and output buffer for joined tuples.

Cost = b_R * b_S (without buffering).

Block Nested Loop Join

Method (with N memory buffers):

Read N - 2 page chunks of R into memory buffers.
For each page of S:
- Check join condition on all (t_r, t_s) pairs in buffers.
Repeat for all N - 2 page chunks of R.

Cost analysis:

Best case: b_R <= N - 2.
- Read b_R pages of relation R into buffers.
- While whole R is buffered, read b_S pages of S.
- Cost = b_R + b_S.
Typical case: b_R > N - 2.
- Read ceil(b_r / (n - 2)) chunks of pages from R.
- For each chunk, read b_S pages of S.
- Cost = b_R + b_S * ceil(b_R / (N - 2)).

Note: always requires r_R * r_S checks of the join condition.

Index Nested Loop Join

Problem with nested-loop join:

Needs repeated scans of entire inner relation S.

If there is an index on S, we can avoid such repeated scanning. Consider Join[i=j](R, S):

for tupleR in R:
    # use index to select tuples from S where s.j = r.i
    for s in selectedTuples(S):
        # add (r, s) to result

This method requires:

One scan of R relation (b_R):
- Only one buffer needed, since we use R tuple-at-a-time.
For each tuple in R (r_R), one index lookup on S.
- Cost depends on type of index and number of results.
- Best case is when each R.i matches few S tuples.

Cost = b_R + r_R * Sel_S where Sel_S is the cost of performing a select on S.

Typically Sel_S =

1 or 2 for hashing.
b_q for unclustered index.

Tradeoff: r_R * Sel_S vs b_R * b_S where b_R << r_R and Sel_S << b_s.