Changes

Jump to navigation Jump to search

FF7/LZSS format

656 bytes added, 12:01, 23 April 2019
no edit summary
=== LZS LZSS Compressed archive for PSX by [[User:Ficedula|Ficedula]] ===
==== Format ====
The LZS LZSS archive has a very small header at 0x00 that has the length of the compressed file as an unsigned 32 bit integer. After that is the compressed data.Some files use the .lzs extension, probably to make the extension 3 characters long. It has caused some confusion, since LZS is a different compression method.
==== LZS LZSS compression ====
FF7 uses LZS LZSS compression on some of their files - more properly, a slightly modified version of LZSS compression as devised by Professor Haruhiko Okumura. LZS LZSS data works on a control byte scheme. So each Each block in the file begins with a single byte indicating how much of the block is uncompressed ('literal data'), and how much is compressed ('references'). You read the byte rightbits LSB-to-leftfirst, with 0=reference, 1=literal, 0=reference.
Literal data means just that: read one byte in from the source (compressed) data, and write it straight to the output.
References take up two bytes, and are essentially a pointer to a piece of data that's been written out (i.e. is part of the data you've already decompressed). LZSS uses a 4K 4KiB buffer, so it can only reference data in the last 4K 4KiB of data.
==== Reference format ====
A reference takes up two bytes, and has two pieces of information in it: offset (where to find the data, or which piece of data is going to be repeated), and length (how long the piece of data is going to be). The two reference bytes look like this:
OOOO OOOO OOOOOOOO OOOO LLLLOOOOLLLL
(O = Offset, L = Length)
 
The 1st byte it the least significant byte of the offset. The second byte has the remaining 4 bits of the offset as it's **high** nibble, so some shifting is required to extract it properly. The remaining 4 bits is the length minus 3.
So you get a 12-bit offset and a 4-bit length, but both of these values need modifying to work on directly. The length is easy to work with: just add 3 to it. This is because if a piece of repeated data was less than 3 bytes long, you wouldn't bother repeating it - it'd take up no more space to actually just put literal data in. So all references are at least 3 in length. So a length of 0 means 3 bytes repeated, 1 means 4 bytes repeated, so on.
real_offset = tail - ((tail - 18 - raw_offset) mod 4096)
Here, 'tail' is your current output position (eg. 10,000), 'raw_offset' is the 12-bit data value you've retrieved from the compressed reference, and 'real_offset' is the position in your output buffer you can begin reading from. This is a bit complex because it's not exactly the way LZSS traditionally does (de)compression; it uses decompression. If you use a 4K circular 4KiB buffer; if , you do that, can use the offset directly. The offset is more absolute, and not relative to the cursor position or less usable directlythe position in the input stream. You should initialize the buffer position to 0xFEE and not zero. The buffer content should be initialized to zero.
Once you've got to the start position for your reference, you just copy the appropriate length of data over to your output, and you've dealt with that piece of data.
The FF7 files use both of these 'tricks', so you can't ignore them.
 
If you use a circular 4KiB buffer, you can ignore these issues completely, as long as you do a one-byte-at-a-time copy for the references.
Anonymous user

Navigation menu