1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
6 <title>Ogg Documentation</title>
8 <style type="text/css">
10 margin: 0 18px 0 18px;
12 font-family: Verdana, Arial, Helvetica, sans-serif;
26 margin: 30px 0 16px 0;
33 h1, h1 a, h2, h2 a, h3, h3 a {
36 margin: 1.3em 0 8px 0;
66 background-color: #aabbff;
85 background-color: #ffffff;
103 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
106 <h1>Ogg bitstream overview</h1>
108 <p>This document serves as starting point for understanding the design
109 and implementation of the Ogg container format. If you're new to Ogg
110 or merely want a high-level technical overview, start reading here.
111 Other documents linked from the <a href="index.html">index page</a>
112 give distilled technical descriptions and references of the container
113 mechanisms. This document is intended to aid understanding.
115 <h2>Container format design points</h2>
117 <p>Ogg is intended to be a simplest-possible container, concerned only
118 with framing, ordering, and interleave. It can be used as a stream delivery
119 mechanism, for media file storage, or as a building block toward
120 implementing a more complex, non-linear container (for example, see
121 the <a href="skeleton.html">Skeleton</a> or <a
122 href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>).
124 <p>The Ogg container is not intended to be a monolithic
125 'kitchen-sink'. It exists only to frame and deliver in-order stream
126 data and as such is vastly simpler than most other containers.
127 Elementary and multiplexed streams are both constructed entirely from a
128 single building block (an Ogg page) comprised of eight fields
129 totalling twenty-eight bytes (the page header) a list of packet lengths
130 (up to 255 bytes) and payload data (up to 65025 bytes). The structure
131 of every page is the same. There are no optional fields or alternate
134 <p>Stream and media metadata is contained in Ogg and not built into
135 the Ogg container itself. Metadata is thus compartmentalized and
136 layered rather than part of a monolithic design, an especially good
137 idea as no two groups seem able to agree on what a complete or
138 complete-enough metadata set should be. In this way, the container and
139 container implementation are isolated from unnecessary metadata design
144 <p>The Ogg container is primarily a streaming format,
145 encapsulating chronological, time-linear mixed media into a single
146 delivery stream or file. The design is such that an application can
147 always encode and/or decode all features of a bitstream in one pass
148 with no seeking and minimal buffering. Seeking to provide optimized
149 encoding (such as two-pass encoding) or interactive decoding (such as
150 scrubbing or instant replay) is not disallowed or discouraged, however
151 no container feature requires nonlinear access of the bitstream.
153 <h3>Variable Bit Rate, Variable Payload Size</h3>
155 <p>Ogg is designed to contain any size data payload with bounded,
156 predictable efficiency. Ogg packets have no maximum size and a
157 zero-byte minimum size. There is no restriction on size changes from
158 packet to packet. Variable size packets do not require the use of any
159 optional or additional container features. There is no optimal
160 suggested packet size, though special consideration was paid to make
161 sure 50-200 byte packets were no less efficient than larger packet
162 sizes. The original design criteria was a 2% overhead at 50 byte
163 packets, dropping to a maximum working overhead of 1% with larger
164 packets, and a typical working overhead of .5-.7% for most practical
167 <h3>Simple pagination</h3>
169 <p>Ogg is a byte-aligned container with no context-dependent, optional
170 or variable-length fields. Ogg requires no repacking of codec data.
171 The page structure is written out in-line as packet data is submitted
172 to the streaming abstraction. In addition, it is possible to
173 implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
174 is done in the Tremor sourcebase).
178 <p>Ogg is designed for efficient and immediate stream capture with
179 high confidence. Although packets have no size limit in Ogg, pages
180 are a maximum of just under 64kB meaning that any Ogg stream can be
181 captured with confidence after seeing 128kB of data or less [worst
182 case; typical figure is 6kB] from any random starting point in the
187 <p>Ogg implements simple coarse- and fine-grained seeking by design.
189 <p>Coarse seeking may be performed by simply 'moving the tone arm' to a
190 new position and 'dropping the needle'. Rapid capture with
191 accompanying timecode from any location in an Ogg file is guaranteed
192 by the stream design. From the acquisition of the first timecode,
193 all data needed to play back from that time code forward is ahead of
196 <p>Ogg implements full sample-granularity seeking using an
197 interpolated bisection search built on the capture and timecode
198 mechanisms used by coarse seeking. As above, once a search finds
199 the desired timecode, all data needed to play back from that time code
200 forward is ahead of the stream cursor.
202 <p>Both coarse and fine seeking use the page structure and sequencing
203 inherent to the Ogg format. All Ogg streams are fully seekable from
204 creation; seekability is unaffected by truncation or missing data, and
205 is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor
208 <p>Seeking without use of an index is a major point of the Ogg
209 design. There two primary reasons why Ogg transport forgoes an index:
213 <li>An index is only marginally useful in Ogg for the complexity
214 added; it adds no new functionality and seldom improves performance
215 noticeably. Empirical testing shows that indexless interpolation
216 search does not require many more seeks in practice than using an
219 <li>'Optional' indexes encourage lazy implementations that can seek
220 only when indexes are present, or that implement indexless seeking
221 only by building an internal index after reading the entire file
222 beginning to end. This has been the fate of other containers that
223 specify optional indexing.
227 <p>In addition, it must be possible to create an Ogg stream in a
228 single pass. Although an optional index can simply be tacked on the
229 end of the created stream, some software groups object to
230 end-positioned indexes and claim to be unwilling to support indexes
231 not located at the stream beginning.
233 <p><i>All this said, it's become clear that an optional index is a
234 demanded feature. For this reason, the <a
235 href="http://wiki.xiph.org/Ogg_Index">OggSkeleton now defines a
236 proposed index.</a></i>
238 <h3>Simple multiplexing</h3>
240 <p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
241 multiplexed stream in time order. The multiplexed pages are not
242 altered. Muxing an Ogg AV stream out of separate audio,
243 video and data streams is akin to shuffling several decks of cards
244 together into a single deck; the cards themselves remain unchanged.
245 Demultiplexing is similarly simple (as the cards are marked).
247 <p>The goal of this design is to make the mux/demux operation as
248 trivial as possible to allow live streaming systems to build and
249 rebuild streams on the fly with minimal CPU usage and no additional
250 storage or latency requirements.
252 <h3>Continuous and Discontinuous Media</h3>
254 <p>Ogg streams belong to one of two categories, "Continuous" streams and
255 "Discontinuous" streams.
257 <p>A stream that provides a gapless, time-continuous media type with a
258 fine-grained timebase is considered to be 'Continuous'. A continuous
259 stream should never be starved of data. Examples of continuous data
260 types include broadcast audio and video.
262 <p>A stream that delivers data in a potentially irregular pattern or
263 with widely spaced timing gaps is considered to be 'Discontinuous'. A
264 discontinuous stream may be best thought of as data representing
265 scattered events; although they happen in order, they are typically
266 unconnected data often located far apart. One example of a
267 discontinuous stream types would be captioning such as <a
268 href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's
269 possible to design captions as a continuous stream type, it's most
270 natural to think of captions as widely spaced pieces of text with
271 little happening between.
273 <p>The fundamental reason for distinction between continuous and
274 discontinuous streams concerns buffering.
278 <p>A continuous stream is, by definition, gapless. Ogg buffering is based
279 on the simple premise of never allowing an active continuous stream
280 to starve for data during decode; buffering works ahead until all
281 continuous streams in a physical stream have data ready and no further.
283 <p>Discontinuous stream data is not assumed to be predictable. The
284 buffering design takes discontinuous data 'as it comes' rather than
285 working ahead to look for future discontinuous data for a potentially
286 unbounded period. Thus, the buffering process makes no attempt to fill
287 discontinuous stream buffers; their pages simply 'fall out' of the
288 stream when continuous streams are handled properly.
290 <p>Buffering requirements in this design need not be explicitly
291 declared or managed in the encoded stream. The decoder simply reads as
292 much data as is necessary to keep all continuous stream types gapless
293 and no more, with discontinuous data processed as it arrives in the
294 continuous data. Buffering is implicitly optimal for the given
295 stream. Because all pages of all data types are stamped with absolute
296 timing information within the stream, inter-stream synchronization
297 timing is always maintained without the need for explicitly declared
298 buffer-ahead hinting.
300 <h3>Codec metadata</h3>
302 <p>Ogg does not replicate codec-specific metadata into the mux layer
303 in an attempt to make the mux and codec layer implementations 'fully
304 separable'. Things like specific timebase, keyframing strategy, frame
305 duration, etc, do not appear in the Ogg container. The mux layer is,
306 instead, expected to query a codec through a centralized interface,
307 left to the implementation, for this data when it is needed.
309 <p>Though modern design wisdom usually prefers to predict all possible
310 needs of current and future codecs then embed these dependencies and
311 the required metadata into the container itself, this strategy
312 increases container specification complexity, fragility, and rigidity.
313 The mux and codec code becomes more independent, but the
314 specifications become logically less independent. A codec can't do
315 what a container hasn't already provided for. Novel codecs are harder
316 to support, and you can do fewer useful things with the ones you've
317 already got (eg, try to make a good splitter without using any codecs.
318 Such a splitter is limited to splitting at keyframes only, or building
319 yet another new mechanism into the container layer to mark what frames
322 <p>Ogg's design goes the opposite direction, where the specification
323 is to be as simple, easy to understand, and 'proofed' against novel
324 codecs as possible. When an Ogg mux layer requires codec-specific
325 information, it queries the codec (or a codec stub). This trades a
326 more complex implementation for a simpler, more flexible
329 <h3>Stream structure metadata</h3>
331 <p>The Ogg container itself does not define a metadata system for
332 declaring the structure and interrelations between multiple media
333 types in a muxed stream. That is, the Ogg container itself does not
334 specify data like 'which steam is the subtitle stream?' or 'which
335 video stream is the primary angle?'. This metadata still exists, but
336 is stored by the Ogg container rather than being built into the Ogg
337 container itself. Xiph specifies the 'Skeleton' metadata format for Ogg
338 streams, but this decoupling of container and stream structure
339 metadata means it is possible to use Ogg with any metadata
340 specification without altering the container itself, or without stream
341 structure metadata at all.
343 <h3>Frame accurate absolute position</h3>
345 <p>Every Ogg page is stamped with a 64 bit 'granule position' that
346 serves as an absolute timestamp for mux and seeking. A few nifty
347 little tricks are usually also embedded in the granpos state, but
348 we'll leave those aside for the moment (strictly speaking, they're
349 part of each codec's mapping, not Ogg).
351 <p>As previously mentioned above, granule positions are mapped into
352 absolute timestamps by the codec, rather than being a hard timestamp.
353 This allows maximally efficient use of the available 64 bits to
354 address every sample/frame position without approximation while
355 supporting new and previously unknown timebase encodings without
356 needing to extend or update the mux layer. When a codec needs a novel
357 timebase, it simply brings the code for that mapping along with it.
358 This is not a theoretical curiosity; new, wholly novel timebases were
359 deployed with the adoption of both Theora and Dirac. "Rolling INTRA"
360 (keyframeless video) also benefits from novel use of the granule
363 <h2>Ogg stream arrangement</h2>
365 <h3>Packets, pages, and bitstreams</h3>
367 <p>Ogg codecs place raw compressed data into <em>packets</em>.
368 Packets are octet payloads containing the data needed for a single
369 decompressed unit, eg, one video frame. Packets have no maximum size
370 and may be zero length. They do not generally have any framing
371 information; strung together, the unframed packets form a <em>logical
372 bitstream</em> of codec data with no internal landmarks.
374 <div class="caption">
375 <img src="packets.png">
377 <p> Packets of raw codec data are not typically internally framed.
378 When they are strung together into a stream without any container to
379 provide framing, they lose their individual boundaries. Seek and
380 capture are not possible within an unframed stream, and for many
381 codecs with variable length payloads and/or early-packet termination
382 (such as Vorbis), it may become impossible to recover the original
383 frame boundaries even if the stream is scanned linearly from
388 <p>Logical bitstream packets are grouped and framed into Ogg pages
389 along with a unique stream <em>serial number</em> to produce a
390 <em>physical bitstream</em>. An <em>elementary stream</em> is a
391 physical bitstream containing only a single logical bitstream. Each
392 page is a self contained entity, although a packet may be split and
393 encoded across one or more pages. The page decode mechanism is
394 designed to recognize, verify and handle single pages at a time from
395 the overall bitstream.
397 <div class="caption">
398 <img src="pages.png">
400 <p> The primary purpose of a container is to provide framing for raw
401 packets, marking the packet boundaries so the exact packets can be
402 retrieved for decode later. The container also provides secondary
403 functions such as capture, timestamping, sequencing, stream
404 identification and so on. Not all of these functions are represented in the diagram.
406 <p>In the Ogg container, pages do not necessarily contain
407 integer numbers of packets. Packets may span across page boundaries
408 or even multiple pages. This is necessary as pages have a maximum
409 possible size in order to provide capture guarantees, but packet
414 <p><a href="framing.html">Ogg Bitstream Framing</a> specifies
415 the page format of an Ogg bitstream, the packet coding process
416 and elementary bitstreams in detail.
418 <h3>Multiplexed bitstreams</h3>
420 <p>Multiple logical/elementary bitstreams can be combined into a single
421 <em>multiplexed bitstream</em> by interleaving whole pages from each
422 contributing elementary stream in time order. The result is a single
423 physical stream that multiplexes and frames multiple logical streams.
424 Each logical stream is identified by the unique stream serial number
425 stamped in its pages. A physical stream may include a 'meta-header'
426 (such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its
427 own Ogg page at the beginning of the physical stream. A decoder
428 recovers the original logical/elementary bitstreams out of the
429 physical bitstream by taking the pages in order from the physical
430 bitstream and redirecting them into the appropriate logical decoding
433 <div class="caption">
434 <img src="multiplex1.png">
436 <p>Multiple media types are mutliplexed into a single Ogg stream by
437 interleaving the pages from each elementary physical stream.
441 <p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies
442 proper multiplexing of an Ogg bitstream in detail.
446 <p>Multiple Ogg physical bitstreams may be concatenated into a single new
447 stream; this is <em>chaining</em>. The bitstreams do not overlap; the
448 final page of a given logical bitstream is immediately followed by the
449 initial page of the next.</p>
451 <p>Each logical bitstream in a chain must have a unique serial number
452 within the scope of the full physical bitstream, not only within a
453 particular <em>link</em> or <em>segment</em> of the chain.</p>
455 <h3>Continuous and discontinuous streams</h3>
457 <p>Within Ogg, each stream must be declared (by the codec) to be
458 continuous- or discontinuous-time. Most codecs treat all streams they
459 use as either inherently continuous- or discontinuous-time, although
460 this is not a requirement. A codec may, as part of its mapping, choose
461 according to data in the initial header.
463 <p>Continuous-time pages are stamped by end-time, discontinuous pages
464 are stamped by begin-time. Pages in a multiplexed stream are
465 interleaved in order of the time stamp regardless of stream type.
466 Both continuous and discontinuous logical streams are used to seek
467 within a physical stream, however only continuous streams are used to
468 determine buffering depth; because discontinuous streams are stamped
469 by start time, they will always 'fall out' at the proper time when
470 buffering the continuous streams. See 'Examples' for an illustration
471 of the buffering mechanism.
473 <h2>Multiplexing Requirements</h2>
475 <p>Multiplexing requirements within Ogg are straightforward. When
476 constructing a single-link (unchained) physical bitstream consisting
477 of multiple elementary streams:
481 <li><p> The initial header for each stream appears in sequence, each
482 header on a single page. All initial headers must appear with no
483 intervening data (no auxiliary header pages or packets, no data pages
484 or packets). Order of the initial headers is unspecified. The
485 'beginning of stream' flag is set on each initial header.
487 <li><p> All auxiliary headers for all streams must follow. Order
488 is unspecified. The final auxiliary header of each stream must flush
491 <li><p>Data pages for each stream follow, interleaved in time order.
493 <li><p>The final page of each stream sets the 'end of stream' flag.
494 Unlike initial pages, terminal pages for the logical bitstreams need
495 not occur contiguously; indeed it may not be possible for them to do so.
498 <p><p>Each grouped bitstream must have a unique serial number within the
499 scope of the physical bitstream.</p>
501 <h3>chaining and multiplexing</h3>
503 <p>Multiplexed and/or unmultiplexed bitstreams may be chained
504 consecutively. Such a physical bitstream obeys all the rules of both
505 chained and multiplexed streams. Each link, when unchained, must
506 stand on its own as a valid physical bitstream. Chained streams do
507 not mix or interleave; a new segment may not begin until all streams
508 in the preceding segment have terminated. </p>
510 <h2>Codec Mapping Requirements</h2>
512 <p>Each codec is allowed some freedom in deciding how its logical
513 bitstream is encapsulated into an Ogg bitstream (even if it is a
514 trivial mapping, eg, 'plop the packets in and go'). This is the
515 codec's <em>mapping</em>. Ogg imposes a few mapping requirements
520 <li><p>The <a href="framing.html">framing specification</a> defines
521 'beginning of stream' and 'end of stream' page markers via a header
522 flag (it is possible for a stream to consist of a single page). A
523 correct stream always consists of an integer number of pages, an easy
524 requirement given the variable size nature of pages.</p>
526 <li><p>The first page of an elementary Ogg bitstream consists of a single,
527 small 'initial header' packet that must include sufficient information
528 to identify the exact CODEC type. From this initial header, the codec
529 must also be able to determine its timebase and whether or not it is a
530 continuous- or discontinuous-time stream. The initial header must fit
531 on a single page. If a codec makes use of auxiliary headers (for
532 example, Vorbis uses two auxiliary headers), these headers must follow
533 the initial header immediately. The last header finishes its page;
534 data begins on a fresh page.
536 <p><p>As an example, Ogg Vorbis places the name and revision of the
537 Vorbis CODEC, the audio rate and the audio quality into this initial
538 header. Vorbis comments and detailed codec setup appears in the larger
539 auxiliary headers.</p>
541 <li><p>Granule positions must be translatable to an exact absolute
542 time value. As described above, the mux layer is permitted to query a
543 codec or codec stub plugin to perform this mapping. It is not
544 necessary for an absolute time to be mappable into a single unique
545 granule position value.
547 <li><p>Codecs are not required to use a fixed duration-per-packet (for
548 example, Vorbis does not). the mux layer is permitted to query a
549 codec or codec stub plugin for the time duration of a packet.
551 <li><p>Although an absolute time need not be translatable to a unique
552 granule position, a codec must be able to determine the unique granule
553 position of the current packet using the granule position of a
556 <li><p>Packets and pages must be arranged in ascending
557 granule-position and time order.
563 <em>[More to come shortly; this section is currently being revised and expanded]</em>
565 <p>Below, we present an example of a multiplexed and chained bitstream:</p>
567 <p><img src="stream.png" alt="stream"/></p>
569 <p>In this example, we see pages from five total logical bitstreams
570 multiplexed into a physical bitstream. Note the following
574 <li>Multiplexed bitstreams in a given link begin together; all of the
575 initial pages must appear before any data pages. When concurrently
576 multiplexed groups are chained, the new group does not begin until all
577 the bitstreams in the previous group have terminated.</li>
579 <li>The ordering of pages of concurrently multiplexed bitstreams is
580 goverened by timestamp (not shown here); there is no regular
581 interleaving order. Pages within a logical bitstream appear in
586 The Xiph Fish Logo is a
587 trademark (™) of Xiph.Org.<br/>
589 These pages © 1994 - 2010 Xiph.Org. All rights reserved.