# B-Trees

## Overview

B-Trees are **multi-way search trees** with properties:

- Remains balanced after updates.
- Each node has at least
`(n - 1) / 2`

entries in it. - Each tree node occupies an entire disk page.

B-Trees are better than general multi-way search trees:

- Better storage utilisation.
- Better worst case performance.

## Selection with B-Trees

### One Queries

```
def find(key, tree) -> Node:
return search(key, root_of(tree))
def search(key, node: Node) -> Node:
if is_leaf(node):
return node
keys = # array of nk key values in node
pages = # array of nk + 1 ptrs to child nodes
if key <= keys[0]:
return search(key, pages[0])
elif k > keys[nk - 1]:
return search(key, pages[nk])
else:
for i in range(0, nk):
if keys[i] < keys <= keys[i + 1]:
return search(key, pages[i + 1])
```

`Cost_one = (D + 1)_r`

### Range Queries

```
lowNode = find(lowKey, tree)
curNode = lowNode
while curNode.val <= highKey:
# add pageOf(tid) to Pages to be scanned
# each curNode has pointer to immediately right neighbour on same level
curNode = curNode.next
# scan Pages looking for matching tuples
```

`Cost_range = (D + b_i + b_q)_r`

.

## Insertion

Overview of method:

- Find leaf node and position in node where new key belongs.
- If node is not full, insert entry into appropriate position.
- If node is full:
- Promote middle element to parent.
- Split node into two half full-nodes (< middle, >= middle).
- Insert new key into appropriate half-full node.

- If parent full, split and promote upwards.
- If reach root, and root is full, make new root upwards.

### Insertion Cost

`Cost_insert = Cost_treeSearch + Cost_treeInsert + Cost_dataInsert`

.

**Best case:** write one page (most of time).

- Traverse from root to leaf.
- Read/write data page, write updated leaf.
`Cost_insert = D_r + 1_w + 1_r + 1_w`

**Common case:** 3 node writes (rearrange 2 leaves + parent).

- Traverse from root to leaf, holding nodes in buffer.
- Read/write data page.
- Update/write leaf, parent and sibling.
`Cost_insert = D_r + 3_w + 1_r + 1_w`

.

**Worst case:** propagate to root.

- Traverse from root to leaf.
- Read/write data page.
- Update/write leaf, parent and sibling.
- Repeat previous step
`(D - 1)`

times. `Cost_insert = D_r + D * 3_w + 1_r + 1_w`

.