Sunday, April 25, 2010

Reading Outcast PAK files


Good Old Games just released Outcast, one of my favorite games of the late 1990s. They're selling it for $6 USD. In honor of the occasion I've dug up some of my file-format hacking notes. I'm indebted to Dmitry Andreev for much of this info.

Outcast's archive files are stored in the relatively-obscure PKWARE Data Compression Library format, a variant of Lempel-Ziv-Huffman dictionary compression with static tables for the Huffman codes. The archive file contains an uncompressed directory followed by all the files, individually compressed. Multi-byte values are stored little-endian (least significant byte first).

All Outcast files start with a 2-byte magic number: (71 6e). For .pak files this is followed by ten more magic bytes: (0 0 d3 d4 7d 9 1 0 0 0). Next is a 4-byte value indicating whether the files in the archive are compressed or not. A value of 1 means they are compressed; a value of 3 means they are not.

Next up is a 4-byte value containing the number of files in the archive. Following it comes that many variable-length directory entries. Each directory entry starts with a variable-length string (a 4-byte count followed by that many bytes; no null terminators), which is followed by the 4-byte offset from the start of the archive to the file's contents; the 4-byte compressed file size; and the 4-byte uncompressed file size.

After that is the data for the individual files. Each file is compressed separately. I used code by Mark Adler to do the decompression.

No comments: