Using Daala Intra Frames for Still Picture Coding

Get better quality, in fewer bytes

What Is Daala?

A video codec created to either match or exceed the quality of current-generation of video codecs, such a VP9 and HEVC. Currently an ongoing effort by Xiph and Mozilla to introduce a royalty-free video codec.

Under Shannon Entropy theory, the performance of lossless compression is bounded by the entropy of the input

Problem: Compressing Images

LZW, RLE, ZIP, etc. work great...

(Behind, you're seeing a 512 by 512 pixel image, not just a white background)

ImageSize (bytes)
BMP786554
PNG1529
JPG3598

Except not so well for more complex images

ImageSize (bytes)
BMP786554
PNG476235
JPG408932

Even worse for noise

ImageSize (bytes)
BMP786554
PNG788485
JPG770446

Solution: Fool the Human

Humans are very forgiving for loss of some quality

Let's Look At JPEG

What Does JPEG Do?

  • Spatial to frequency conversion
    • similar to audio but for images, and instead of 1D, we use 2D
  • Fourier's Theorem
    • “A wave is the sum of many sinusoidal waves.”
  • DFT is often used to extract the sine waves of a wave (FFT for the fast variant)
  • But we're going to use the DCT
    • Benefit over DFT
      • no complex numbers involved
      • wave coefficient packed closer to lower frequencies

pre-DCT

Before DCT

pre-DCT

After DCT

The block prior to the DCT: spatial domain. the block after: frequency domain

N.B. The top-left pixel in the frequency domain is called DC, and the rest are called AC

pre-DCT

Before Quantization

pre-DCT

After Quantization

Quantization in a nutshell

Encoding f:A \to B

Decoding f:B \to C

Where B<>A

Where C \subseteq A

Run-Length-Encoding (RLE)

RLE

We Send This

pre-DCT

Before Loss

pre-DCT

After Loss

However, most modern encoders pretty much use a variation of JPEG

A contender: Daala

Lapped Transform

Before we go ahead and use the DCT, we apply a pre-filter

Prefilter

We then overlap the results of the pre-filter onto the DCT

Prefilter

Prefilter through lifting

Let's go back to the DCT

Instead of applying an O(nlog(n)) algorithm, use an O(1) approximation

We use lifting

Prefilter and the lifting cont'd

Paper on lifting

http://www.sfu.ca/~jiel/papers/c003-bindct-ieee.pdf

Same idea for the prefilter

Result

Lapped Transform Example

Paper regarding the lapped transform

http://thanglong.ece.jhu.edu/Tran/Pub/prepost.pdf

Gain-Shape Quantization

A type of vector quantization

Vector quantization

Instead of quantizing every scalar elements individually, group adjecent ones and quantize collectively

Lena Lena
Quantized Tuples

Problem: lost information + squandered range

Better quantized tuples

All quantized regions are used

Gain-Shape Quantization

  • Given a block, we treat it like a vector
    • Let's call it \mathbf{v}
    • \mathbf{v} \in \mathbb{R}^N
  • Given \mathbf{v}, we get
    • The length (gain), which we will call g
    • The direction (shape), which we will call \mathbf{w}
  • We then quantize g and \mathbf{w}

Pyramid Vector Quantization

  • Instead of look-up tables, arithmetically group vectors
  • Saves space

We have a function G to get k = G(g), k \in \mathbb{N}

With k, we get a W = \{\mathbf{w} \in \mathbb{Z}^N | \sum\limits_{i = 1}^N |\mathbf{v}_i| = k\}

We then have a function Q, such that we can compute Q(\mathbf{w}|k) = \mathbf{q}, \mathbf{q} \in W

Fischer, T.R. "A pyramid vector quantizer." IEEE Transactions on Information Theory. Issue 4 Volume 32 (1986): 568—583. Print.

RFC: https://tools.ietf.org/html/draft-valin-videocodec-pvq

Removing Pixels

Prediction

Prediction

Compressing Colours

  • Humans don't notice colours as much
  • And even then, colours are 3D; you're practically sending the same image three times if you send RGB; better to distinguish chroma from luma
  • Use YUV

Chroma From Luma Prediction

We compute a \alpha_u, \alpha_v, \beta_u\beta_v, by performing a linear regression on the U and V channels

We can then send \alpha_u, \alpha_v, \beta_u\beta_v, and the decoder should be able to infer the final colour

  • DC_u = \alpha_u + \beta_uDC_y
  • AC_u[x, y] = \beta_uAC_y[x, y]
  • DC_v = \alpha_v + \beta_vDC_y
  • AC_v[x, y] = \beta_vAC_y[x, y]

Paint Deringing

  • Direction search
  • Boundary pixel
  • Painting

Algorithm on finding the direction

http://jmvalin.ca/notes/intra_paint.pdf

Some Fun

Enough fun; let's ask "to paint or not to paint"

w = \min(1, \alpha\frac{Q^2}{12\sigma^2})

  • Q is the quantization amount (quality)
  • \alpha is a tunable value between 0 and 1
  • \sigma^2is the mean squared distance between decoded image and painted image

The end-result

https://people.xiph.org/~xiphmont/demo/daala/update1-tool2b.shtml