Tuple Representation

Records vs Tuples

Tables are defined by a schema, e.g..

create table Employee (
   id   integer primary key,
   name varchar(20) not null,
   job  varchar(10), 
   dept smallint references Dept(id)
);

Tuples are collection of attribute values based on a schema. Records are a sequence of bytes, containing data for one tuple. Bytes need to be interpreted relative to a schema to get a tuple.

Converting Records to Tuples

A Record is an array of bytes. - Represents the data values from a typed Tuple. - Stored on disk (persistent) or in a memory buffer.

A Tuple is a collection of named, typed values (similar to C structs). - An interpretable structure is needed to manipulate the values. - Stored in working memory (temporary).

Information to Interpret Bytes in a Record

May be contained in schema data in DBMS catalog.
May be stored in page directory.
May be stored in the record (header).
May be stored partly in record and partly in schema.

For variable-length records, formatting information is required:

Must be stored in the record or in the page directory.
- Since it refers to individual records and not the whole table.
At the least, need to know how many bytes in each variable length value.

Operations on Records

Access record via RecordId:

Record get_record(Relation rel, RecordId rid) {
    (pid, tid) = rid;
    Page buf = get_page(rel, pid);
    return get_bytes(rel, buf, tid);
}

Cannot use a Record directly; need a Tuple:

Relation rel = ... // relation schema
Record rec = get_record(rel, rid);
Tuple t = makeTuple(rel, rec);

Once we have a Tuple, we can access individual attributes/fields (with index).

int x = getIntField(t, 1);
char *s = getStrField(t, 2);

Fixed Length Records

Possible encoding scheme:

Record format (length + offsets) stored in catalog.
Data values stored in fixed-size slots in data pages.

Variable Length Records

Possible encoding scheme:

Prefix each field by length.
Terminate fields by delimiter.
- Note: similar to C strings.
Array of offsets.

Data Types

DBMSs typically define a fixed set of base types which have a corresponding implementation-level data type for field values:

DBMS Type	C Level Type
`DATE`	`time_t`
`FLOAT`	`float, double`
`INTEGER`	`int, long`
`NUMBER(n)`	`int[]`
`VARCHAR(n)`	`char[]`

Field Descriptors

Tuple can be implemented as:

List of field descriptors for a record instance.
- Where FieldDesc gives (offset, length, type) information.
Along with a reference to the Record data.

typedef struct {
  ushort    nfields;   // number of fields/attrs
  ushort    data_off;  // offset in struct for data
  FieldDesc fields[];  // field descriptions
  Record    data;      // pointer to record in buffer
} Tuple;

Fields are derived from relation descriptor + record instance data.

Tuple data could be:

A pointer to bytes stored elsewhere in memory.
Or, appended to Tuple struct (used widely in PostgreSQL).