Heap Files

Overview

Heap files are:

Sequence of pages containing tuples.
No inherent ordering of tuples.
Pages may contain free space from deleted tuples.
Does not generally involve overflow pages.

Selection in Heaps

No strategy other than a linear scan.

Insertion

Main strategy:

Append tuple to file in last page.

Alternative strategy:

Find any page from for the relation with enough space.
Preferably a page already loaded into a buffer.

PostgreSQL strategy:

Use last updated page of relation in buffer pool if enough space.
Otherwise, search buffer pool for page with enough space.
Assisted by free space map (FSM) associated with each table.

Oversized Tuples

for attribute in t:
    if attribute not oversized:
        continue
    
    off = appendToFile(overflow_file, attribute)
    attribute = (OVERSIZE, off)

# insert into buffer as before

PostgreSQL Implementation

heap_insert(Relation, relation, HeapTuple newTup, CommandId cid, ...)

Finds page which has enough free space for newtup.
Ensures page loaded into buffer pool and locked.
Copies tuple data into page buffer, sets xmin, etc.
Marks buffer as dirty.
Writes details of insertion in transaction log.
Returns OID of new tuple if relation has OIDs.

Deletion in Heaps

rel = openRelation("RelationName", READ | WRITE)

for page_num page in enumerate(pages(rel)):
    n_deletions = 0
    for tuple_num, _tuple in enumerate(tuples(page)):
        if satisfies_condition(_tuple):
            n_deletions += 1
            delete_record(buf, tuple_num)

    if n_deletions > 0:
        put_page(rel, page_num, buf)
    if n_deletions > 0 and unique:
        break

PostgreSQL Implementation

heap_delete(Relation rel, ItemPointer tid, ..., CommandId cid, ...)

Gets page containing tuple tid into buffer pool and locks it.
Sets flags, commandID and xmax in tuple; dirties buffer.
Writes indication of deletion to transaction log.

Vacuuming eventually compacts space in each page.

Updates in Heaps

Analysis for updates similar to deletion:

Scan all pages.
Replace any updated tuples (within each page).
Write affected pages to disk.

Complication: new tuple larger than old version (too big for page). Solution: delete, reorganise free space, then insert.

PostgreSQL Implementation

heap_update(Relation rel, ItemPointer otid, HeapTuple newtup, ..., CommandId cid, ...)

Essentially does delete(otid), then insert(newtup).
Also sets old tuple’s ctid field to reference new tuple.
Can also update-in-place if no referencing transactions.

Heaps in PostgreSQL

PostgreSQL stores all table data in heap files by default.

Typically there are also associated index files.

If a file is more useful in some other form:

PostgreSQL may make a transformed copy during query execution.
Programmer can set it via create index ... using hash.

PostgreSQL “heap file” may use multiple physical files.

Files are named after the OID of the corresponding table.
First data file is called simply OID.
If size exceeds 1GB, create a fork called OID.1.
Add more forks as necessary.
Other files:
- Free space map (OID_fsm), visibility map (OID_vm).
- Optionally, TOAST File (if table has large varlen attributes).