I've been taking a brief look at Theora, since it seems to be winning its battle to be one of the standard HTML5 video-tag codecs.
It's not entirely stupid: it's obviously been designed for PCs rather than by anyone involved in TV or video conferencing and it's in much the same state as WMV9 Main Profile (which is nearly VC-1 main profile, but not quite ..).
Theora turns out to be a fairly standard I- and P- only block structured codec. We have no B-frames, but we can predict either from the preceeding frame or the preceeding I-frame (Theora calls them INTRA or INTER rather than I and P). Theora is progressive-only and nominally fixed frame-rate only though I suspect variable frame rate by PTS will become quite common.
Theora's blocks are the right size (8x8). It has two block groupings - macroblocks of 2x2 blocks and superblocks of 4x4, with blocks arranged in a Hilbert curve rather than in raster order the way MPEG-2, H.264 and VC-1 do.
Raster order for Theora is bottom-to-top left-to-right, so (0,0) is bottom left rather than top left. It's unclear why this happened.
There's a fairly conventional three-plane colour structure, two supported colour spaces (NTSC-M and PAL), you can code in 4:2:0, 4:2:2 or 4:4:4, and chroma sits between luma samples in both X and Y - there's no variable luma positioning.
The decoded region is in whole macroblocks, but the visible frame can be any window on it so we can have arbitrary amounts of invisible picture. It appears that superblocks are the unit of coding but macroblocks the unit of motion compensation.
There's a fairly conventional MV/residual/deblock filter structure with only one, fairly simple in-loop deblock. You'll probably want an out-of-loop dering and deblock for low bitrate.
The transform is a quite particularly implemented DCT - the butterflies and cos approximation values are specified in the spec. It's effectively yet another explicit integer-only frequency transform and a quick read suggests that it's exact.
Motion vector derivation and motion compensation is pretty standard; we get motion vectors down to quarter-pel and the filter is a round-and-average beast rather than anything FIR-like.
The bitstream is run-length Huffman coded and packets are bit-counted, so we have to rely on out-of-ES framing to recover from synchronisation errors. There's obviously no emulation prevention. Presumably for coding efficiency reasons Theora groups bits by role rather than by macroblock, so we get all the coding markers, then all the MVs, then all the coefficients - a bit like some of the bitplane coding in VC-1.
This means we need more memory (and more memory I/O) than necessary, but it's not entirely fatal and at least we get the coefficients last.
Theora has almost entirely dynamic quant and coding tables, stored in the decoder initialisation headers, which may be quite big - the standard suggests 16kish. This means that for effective MPEG-TS/PS/PES use we're going to need some kind of out-of-ES SPS framing and effectively means that Theora has no ES. This is a right pain, both because it means that Theora ES streams don't exist and because it means we can't optimise quant tests in zigzag decode.
The zigzag table, oddly, is fixed, so we can optimise that. Go figure.
The Ogg framing format is quite odd - its plethora of structures is reminiscent of ASF - but should be dealable with. It's clearly where Matroska got its odd thread ideas from.
Next up: Dirac ..