Using Daala Intra Frames for Still Picture Coding

Get better quality, in fewer bytes

What Is Daala?

A video codec created to either match or exceed the quality of current-generation of video codecs, such a VP9 and HEVC. Currently an ongoing effort by Xiph and Mozilla to introduce a royalty-free video codec.

Under Shannon Entropy theory, the performance of lossless compression is bounded by the entropy of the input

Problem: Compressing Images

LZW, RLE, ZIP, etc. work great...

(Behind, you're seeing a 512 by 512 pixel image, not just a white background)

Image	Size (bytes)
BMP	786554
PNG	1529
JPG	3598

Except not so well for more complex images

Image	Size (bytes)
BMP	786554
PNG	476235
JPG	408932

Even worse for noise

Image	Size (bytes)
BMP	786554
PNG	788485
JPG	770446

Solution: Fool the Human

Humans are very forgiving for loss of some quality

https://youtu.be/Sp7HiqULakk

Let's Look At JPEG

What Does JPEG Do?

Spatial to frequency conversion
- similar to audio but for images, and instead of 1D, we use 2D
Fourier's Theorem
- “A wave is the sum of many sinusoidal waves.”
DFT is often used to extract the sine waves of a wave (FFT for the fast variant)
But we're going to use the DCT
- Benefit over DFT
  - no complex numbers involved
  - wave coefficient packed closer to lower frequencies

pre-DCT

Before DCT

pre-DCT

After DCT

The block prior to the DCT: spatial domain. the block after: frequency domain

N.B. The top-left pixel in the frequency domain is called DC, and the rest are called AC

pre-DCT

Before Quantization

pre-DCT

After Quantization

Quantization in a nutshell

Encoding $f:A \to B$

Decoding $f:B \to C$

Where $B$ <> $A$

Where $C \subseteq A$

Run-Length-Encoding (RLE)

We Send This

pre-DCT

Before Loss

pre-DCT

After Loss

However, most modern encoders pretty much use a variation of JPEG

A contender: Daala

Lapped Transform

Before we go ahead and use the DCT, we apply a pre-filter

Prefilter

We then overlap the results of the pre-filter onto the DCT

Prefilter

Prefilter through lifting

Let's go back to the DCT

Instead of applying an $O(nlog(n))$ algorithm, use an $O(1)$ approximation

We use lifting

Prefilter and the lifting cont'd

Paper on lifting

http://www.sfu.ca/~jiel/papers/c003-bindct-ieee.pdf

Same idea for the prefilter

Result

Lapped Transform Example

Paper regarding the lapped transform

http://thanglong.ece.jhu.edu/Tran/Pub/prepost.pdf

Gain-Shape Quantization

A type of vector quantization

Vector quantization

Instead of quantizing every scalar elements individually, group adjecent ones and quantize collectively

Problem: lost information + squandered range

All quantized regions are used

Gain-Shape Quantization

Given a block, we treat it like a vector
- Let's call it $\mathbf{v}$
- $\mathbf{v} \in \mathbb{R}^N$
Given \mathbf{v}, we get
- The length (gain), which we will call $g$
- The direction (shape), which we will call $\mathbf{w}$
We then quantize $g$ and $\mathbf{w}$

Pyramid Vector Quantization

Instead of look-up tables, arithmetically group vectors
Saves space

We have a function $G$ to get $k = G(g)$ , $k \in \mathbb{N}$

With $k$ , we get a $W = \{\mathbf{w} \in \mathbb{Z}^N | \sum\limits_{i = 1}^N |\mathbf{v}_i| = k\}$

We then have a function $Q$ , such that we can compute $Q(\mathbf{w}|k) = \mathbf{q}$ , $\mathbf{q} \in W$

Fischer, T.R. "A pyramid vector quantizer." IEEE Transactions on Information Theory. Issue 4 Volume 32 (1986): 568—583. Print.

RFC: https://tools.ietf.org/html/draft-valin-videocodec-pvq

Removing Pixels

Prediction

Compressing Colours

Humans don't notice colours as much
And even then, colours are 3D; you're practically sending the same image three times if you send RGB; better to distinguish chroma from luma
Use YUV

Chroma From Luma Prediction

We compute a $\alpha_u$ , $\alpha_v$ , $\beta_u$ $\beta_v$ , by performing a linear regression on the U and V channels

We can then send $\alpha_u$ , $\alpha_v$ , $\beta_u$ $\beta_v$ , and the decoder should be able to infer the final colour

$DC_u = \alpha_u + \beta_uDC_y$
$AC_u[x, y] = \beta_uAC_y[x, y]$
$DC_v = \alpha_v + \beta_vDC_y$
$AC_v[x, y] = \beta_vAC_y[x, y]$

Paint Deringing

Direction search
Boundary pixel
Painting

Algorithm on finding the direction

http://jmvalin.ca/notes/intra_paint.pdf

Some Fun

Enough fun; let's ask "to paint or not to paint"

w = \min(1, \alpha\frac{Q^2}{12\sigma^2})

$Q$ is the quantization amount (quality)
$\alpha$ is a tunable value between 0 and 1
$\sigma^2$ is the mean squared distance between decoded image and painted image

The end-result

https://people.xiph.org/~xiphmont/demo/daala/update1-tool2b.shtml