Transaction Isolation
Overview
Serial execution (T1 ; T2; T3 ;...
) is the simplest form of isolation.
Problem: serial execution yields poor throughput.
Concurrency control schemes (CCSs) aim for “safe” concurrency.
Abstract view of DBMS concurrency mechanisms:
Serialisability
Consider two schedules S1
and S2
produced by:
- Executing the same set of transactions
T1 .. Tn
concurrently. - But with a non-serial interleaving of read/write operations.
S1
and S2
are equivalent if StateAfter(S1) == StateAfter(S2)
.
- This is a very restrictive requirement.
S
is a serializable schedule if S
is equivalent to some serial schedule S'
of T1 .. Tn
.
- This is a less restrictive requirement.
- Enough to guarantee consistency.
Classes of Serialisability
Conflict serialisability::
- Conflicting read/write operations occur in the “right order”.
- Check via precdence graph; look for absence of cycles.
View serialisability:
- Read operations see the correct version of data.
- Check via VS conditions on likely equivalent schedules.
View serialisability is strictly weaker than conflict serialisability.
Checking Conflict Serialisability
construct graph with just nodes, one for each T_i.
for each pair of operations across transactions:
if T_i and T_j have conflicting ops on variable X:
put directed edge between T_i and T_j
where direction goes from first tx to access X
to second tx to access X
if this new edge forms a cycle:
return NotConflictSerialisable
return ConflictSerialisable
Checking View Serializability
// T_{C,i} denotes transaction i in concurrent schedule
for each serial schedule S:
for each shared variable X:
if T_{C, i} reads same version of X as T_{S, i}
(either initial value or value written by T_j)
continue
else
give up on this serial schedule
if T_{C, i} and T_{S, i} write the final version of X:
continue
else:
give up on this serial schedule
return ViewSerialisable
return NotViewSerialisable
Transaction Isolation Levels
SQl programmers’ concurrency control mechanism:
set transaction
read only -- so weaker isolation may be ok
read write -- suggests stronger isolation needed
isolation level
-- weakest isolation, maximum concurrency
read uncommitted
read committed
repeatable read
serializable
-- strongest isolation, minimum concurrency
Implication of Isolation Levels
Isolation Level | Dirty Read | Non-repeatable Read | Phantom Read |
---|---|---|---|
Read Uncommitted | Possible | Possible | Possible |
Read Committed | Not Possible | Possible | Possible |
Repeatable Read | Not Possible | Not Possible | Possible |
Serialisable | Not Possible | Not Possible | Not Possible |
View of Database
A PostgreSQL tx consists of a sequence of SQL statements:
BEGIN S_1; S_2; ... S_n; COMMIT;
Isolation levels affect view of DB provided to each S_i
.
- Read Committed:
- Each
S_i
sees snapshot of DB at start ofS_i
.- This snapshot does NOT include any uncommitted changes from other concurrent tx’s.
- This snapshot may however include committed changes from other concurrent tx’s.
- Each
- Repeatable Read and Serialisable:
- Each
S_i
sees snapshot of DB at start of tx.- This snapshot does NOT include even committed changes from other concurrent tx’s.
- Serialisable checks for extra conditions.
- Each
Transactions fail if the system detects violation of isolation level.
Repeatable Read vs Serialisable
Table R(class, value)
containing (1,10) (1,20) (2,100) (2,200)
.
T1: X = sum(value) where class=1; insert R(2,X); commit
.T2: X = sum(value) where class=2; insert R(1,X); commit
.- With repeatable read, both transactions commit, giving:
- Updated table:
(1,10) (1,20) (2,100) (2,200) (1,300) (2,30)
.
- Updated table:
- With serial transactions, only one transaction commits:
T1;T2
gives(1,10) (1,20) (2,100) (2,200) (2,30) (1,330)
.T2;T1
gives(1,10) (1,20) (2,100) (2,200) (1,300) (2,330)
.
- PG recognises that committing both gives serialisation violations.
Concurrency Control
Isolation requires some method to control concurrency.
Possible approaches to implementing concurrency control:
- Lock-based:
- Synchronise tx execution via locks on relevant part of DB.
- Version-based (multi-version concurrency control):
- Allow multiple consistent versions of the data to exist.
- Each tx has access only to version existing at start of tx.
- Validation-based (optimistic concurrency control):
- Execute all tx’s; check for validity problems on commit.
- Timestamp-based:
- Organise tx execution via timestamps assigned to actions.