Get better quality, in fewer bytes
A video codec created to either match or exceed the quality of current-generation of video codecs, such a VP9 and HEVC. Currently an ongoing effort by Xiph and Mozilla to introduce a royalty-free video codec.
Under Shannon Entropy theory, the performance of lossless compression is bounded by the entropy of the input
LZW, RLE, ZIP, etc. work great...
(Behind, you're seeing a 512 by 512 pixel image, not just a white background)
Image | Size (bytes) |
---|---|
BMP | 786554 |
PNG | 1529 |
JPG | 3598 |
Except not so well for more complex images
Image | Size (bytes) |
---|---|
BMP | 786554 |
PNG | 476235 |
JPG | 408932 |
Even worse for noise
Image | Size (bytes) |
---|---|
BMP | 786554 |
PNG | 788485 |
JPG | 770446 |
Humans are very forgiving for loss of some quality
Before DCT |
After DCT |
The block prior to the DCT: spatial domain. the block after: frequency domain
N.B. The top-left pixel in the frequency domain is called DC, and the rest are called AC
Before Quantization |
After Quantization |
Encoding f:A \to B
Decoding f:B \to C
Where B<>A
Where C \subseteq A
Before Loss |
After Loss |
However, most modern encoders pretty much use a variation of JPEG
Before we go ahead and use the DCT, we apply a pre-filter
We then overlap the results of the pre-filter onto the DCT
Let's go back to the DCT
Instead of applying an O(nlog(n)) algorithm, use an O(1) approximation
We use lifting
Paper on lifting
Same idea for the prefilter
Result
Paper regarding the lapped transform
A type of vector quantization
Instead of quantizing every scalar elements individually, group adjecent ones and quantize collectively
Problem: lost information + squandered range
All quantized regions are used
We have a function G to get k = G(g), k \in \mathbb{N}
With k, we get a W = \{\mathbf{w} \in \mathbb{Z}^N | \sum\limits_{i = 1}^N |\mathbf{v}_i| = k\}
We then have a function Q, such that we can compute Q(\mathbf{w}|k) = \mathbf{q}, \mathbf{q} \in W
Fischer, T.R. "A pyramid vector quantizer." IEEE Transactions on Information Theory. Issue 4 Volume 32 (1986): 568—583. Print.
Prediction
We compute a \alpha_u, \alpha_v, \beta_u\beta_v, by performing a linear regression on the U and V channels
We can then send \alpha_u, \alpha_v, \beta_u\beta_v, and the decoder should be able to infer the final colour
Algorithm on finding the direction
Enough fun; let's ask "to paint or not to paint"
w = \min(1, \alpha\frac{Q^2}{12\sigma^2})
The end-result
https://people.xiph.org/~xiphmont/demo/daala/update1-tool2b.shtml