Query Cost Estimation

Overview

Query optimisers estimate costs via:

Result size estimated by statistical measures on relations. For example:

Easy, since we know:

Number of tuples in output:
- r_out = |π_{a,b,..}(T)| = |T| = r_T (in SQL, because of bag semantics).
Size of tuples in output:
- R_out = sizeof(a) + sizeof(b) + ... + (tuple overhead).

Assume page size B:

If using select distinct:

Selectivity: fraction of tuples expected to satisfy a condition.

Common assumption: attribute values uniformly distributed. Using this, can incorporate:

Effective ways to handle non-uniform attribute value distributions:

Disadvantage: cost of storing/maintaining statistics.

Analysis relies on semantic knowledge about data/relations.

Consider equijoin on common attribute: R ⨝_a S:

Above methods can (sometimes) give inaccurate estimates that lead to poor evaluation plans. To get more accurate cost estimates:

EIther way, optimisation process costs more.

Tradeoff between optimiser performance and query performance.