File Management
Aims of File Management Subsystem
- Organize layout of data within FS.
- Handle mapping from database ID to file address.
- Transfer blocks of data between buffer pool and FS.
- Attempts to handle file access error problems.
Typical DBMS file management subsystems build on top of OS file operations.
DBMS File Organization
Different DBMS make different choices when arranging DB objects in the FS. Examples:
- By-pass the FS and use a raw disk partition.
- Have a single very large file containing all DB data.
- Have several large files, with tables spread across them.
- Have multiple data files, one for each table.
- Have multiple files for each table.
- Etc.
Single-file DBMS (E.g. SQLite)
Objects are are allocated to regions (segments) of the file. If an object grows too large for allocated segment, allocate an extension.
Allocating space in Unix files is easy:
- Seek to the place you want and write the data.
- If nothing there already, data is appended to the file.
- If something there already, it gets overwritten.
If seek goes way beyond end of file:
- Unix does not (yet) allocate disk space for the “hole”.
- Allocate disk storage only when data is written there.
Example Layout
SpaceMap = [ (0,10,U), (10,10,U), (20,600,U), (620,100,U), (720,20,F) ]
TableMap = [ ("employee",20,500), ("project",620,40) ]
Pages
Each file segment consists of a number of fixed-size blocks.
#define PAGESIZE 2048 // bytes per page
typdef long PageId; // block index, pageOffset = PageId * PAGESIZE
typedef char *Page; // pointer to page/block buffer
Possible Data Structures For Opened DBs and Tables
typedef struct DBrec {
char *dbname; // copy of db name
int fd; // database file
SpaceMap map; // map of free/used areas
TableMap names; // map names to areas + sizes
} *DB;
typedef struct Relrec {
char *relname; // copy of table name
int start; // page index of start of table data
int npages; // number of pages of table data
...
} *Rel;
Scanning a Relation
select name from Employee
may be implemented as
DB db = openDatabase("myDB");
Rel r = openRelation(db, "Employee");
Page buffer = malloc(PAGESIZE * sizeof(char));
for (int i = 0; i < r->npages; i++) {
PageId pid = r->start + i;
get_page(db, pid, buffer);
for (each tuple in buffer) {
get tuple data and extract name
add name to result tuples
}
}
// start using DB, buffer meta-data.
DB openDatabase(char *name) {
DB db = new(struct DBrec);
db->dbname = strdup(name);
db->fd = open(name, O_RDWR);
db->map = readSpaceTable(db->fd);
db->names = readNameTable(db->fd);
return db;
}
// stop using DB and update all meta-data.
void closeDatabase(DB db) {
writeSpaceTable(db->fd, db->map);
writeNameTable(db->fd, db->map);
fsync(db->fb);
free(db->dbname);
free(db);
}
// set up struct describing relation.
Rel openRelation(DB db, char *rname) {
Rel r = new(struct Relrec);
r->relname = strdup(rname);
// get relation data from map tables
r->start = ...;
r->npages = ...;
return r;
}
// stop using a relation.
void closeRelation(Rel r) {
free(r->relname);
free(r);
}
// assume that Page = byte[PageSize].
// assume that PageId = block number in file.
// read page from file into memory buffer.
void get_page(DB db, PageId p, Page buf) {
lseek(db->fd, p * PAGESIZE, SEEK_SET);
read(db->fd, buf, PAGESIZE);
}
// write page from memory buffer to file
void put_page(DB db, PageId p, Page buf) {
lseek(db->fd, p * PAGESIZE, SEEK_SET);
write(db->fd, buf, PAGESIZE);
}
// managing contents of space mapping table can be complex
// assume an array of (offset, length, status) records.
// allocate n new pages
PageId allocate_pages(int n) {
if (no existing free chunks are large enough) {
int endfile = lseek(db->fd, 0, SEEK_END);
addNewEntry(db->map, endfile, n);
} else {
grab "worst fit" chunk
split off unused section as new chunk
}
}
// similar complexity for freeing chunks
// drop n pages starting from p
void deallocate_pages(PageId p, int n) {
if (no adjacent free chunks) {
markUnused(db->map, p, n);
} else {
merge adjacent free chunks
compress mapping table
}
// note that file itself is not changed
// changes take effect when closeDatabase() executed
}
Multiple-file Disk Manager
Most DBMSs don’t use a single large file for all data. They typically provide:
- Multiple files partitioned physically or logically.
- Mapping from DB-level objects to files (e.g. via catalog meta-data).
Precise file structure varies between DBMSs.
Using multiple files (one file per relation) can be easier, e.g.
- Adding a new relation.
- Extending the size of a relation.
- Computing page offsets within a relation.
Example Layout
Structure of PageId
If system uses one file per table:
struct PageId {
long relid; // relation identifier, can be mapped to filename
long pagenum; // to identify page within file
} *PageId;
If system uses several files per page:
struct PageId {
long relid; // relation identifier
long fileid; // combined with relid, gives filename
long pagenum; // to identify page within file
} *PageId;
DBMS File Parameters
Typical DBMS/table parameter values
Quantity | Symbol | Example Value |
---|---|---|
Total # tuples | r |
10^6 |
Record size | R |
128 bytes |
Total # pages | b |
10^5 |
Page size | B |
8192 bytes |
# tuples per page | c |
60 |
Page read/write time | T_r, T_w |
10 msec |
Cost to process one page in memory | ~=0 |