The TGX/TGW file format is a proprietary archive file format used by TimeGate studios to store asset files for some of their games; notably Kohan: Immortal Sovereigns, and Kohan: Ahriman's Gift.
It starts with a header giving information about the file as a whole, as well as about all the files contained within. The rest of the file contains the archived files themselves, seemingly spaced at intervals that make for nice round numbers in hexadecimal.
The header has several sections:
- General file information
- Specifications of individual files
- Lengths of individual files
- Locations of individual files
In this section are data such as the version number of the file, its checksum, its length, and the number of files stored. I have managed to understand how to reliably read these data, although the number of files seems to appear more than once, and there are lots of other components that I do not understand, so it is not yet possible to write these files.
The structure of it seems to be as follows:
offset | Description |
---|---|
0x00 |
magic number |
0x04 |
unknown |
0x08 |
constant? |
0x0c |
packed version |
0x10 |
XOR checksum |
0x14 |
file length |
... |
unknown |
0x3c |
filespec offset |
0x40 |
filespec count |
0x44 |
filelength offset |
0x48 |
filelength count |
0x4c |
filepos offset |
0x50 |
filepos count |
The magic number appears to denote a .tgx file with 0x0001000f
and a .tgw file with 0x0001000c
.
Only the first of the next 4 bytes seems to actually be used,
and is constant at 0x1F
in the tgw files for both KIS and KAG,
but vary in different mods.
The part labelled 'constant?' is 0xFA7E843F
in every example
I have seen of either tgx or tgw files.
The file version number is packed in a strange way in that it is only readable once converted to decimal, and appears to be generated by replacing every decimal point with a zero.
The checksum follows, and is simply the bitwise XOR of all 32-bit values in a file. I wrote a python function to calculate this but it runs very slowly, and so I will probably refactor it into C (that is callable from python) at some point. The use of XOR means that once the checksum has been written to the file, running it again will return 0. This seems to be ignored by the game, however, which has no qualms about loading mods where the checksum has been zeroed out.
The number of files appears 3 times: once for each following
section of the header, and following the offset of that section
in the file. The first offset should always be 0x74
, as the
first part of the header seems to have constant size, while
the following offsets will depend on the number of stored files,
as each section will be larger or smaller.
In this section, each contained file is described, including its location in the game's filesystem, its length, and some other attributes that I've not worked out yet.
Each file spec begins with 80 bytes assigned for the path as a null terminated C-like string. This is presumably the location used by the game to reference the files it needs
The rest of the file spec is arranged as follows:
Offset | Description |
---|---|
0x00 |
File identifier |
0x04 |
File length |
0x08 |
Constant 1? |
0x0c |
Index in archive |
0x10 |
Header offset |
0x14 |
Header length |
The file identifier is generated using a fairly simple string hashing algorithm. The exact way this is used is not yet understood, however it does seem to be used to sort the subfiles within the archive, with the lowest identifier first up to the largest. This is followed by the length of the included file.
The next two values relate to the position of the file in the archive itself, with the first seeming to always be 1, and the second being the index of the file within the archive.
For file types with no header, the last two entries are always 0.
For types with a header these contain the header offset (the sum
of all previous header lengths in the archive), and the length
of this file's header. For example, a .wav file has a header of
36 bytes, and so for an archive containing only .wav files, the
third file will have a header offset of 0x48
and a length of 0x24
.
The following section consists of the file lengths followed by the
indices of the corresponding files in the form:
0xlength 0x000001 0x0index
.
Each of these is padded with two empty 32-bit words, so the full
length spec follows the format
Offset | Description |
---|---|
0x00 |
Padding 0x0 |
0x04 |
Padding 0x0 |
0x08 |
File length |
0x0c |
Constant 1 |
0x10 |
Index in archive |
The last header section contains the offsets of the subfiles in the archive file, stored as the start offset followed by the end offset, both as 32-bit words. There is no label of the index or anything like that; instead the previously specified order is assumed. I have not yet tested to see if changing the order of the elements of the above sections has any effect on the assumed order of this section.
- tgxlib.py This is a library that contains all the abstractions and handles the heavy lifting for finding and accessing the different files contained within a .tgx or .tgw file.
- tgxdumper.py This takes a path to a .tgx file as argument and extracts all the subfiles into a directory, creating a subdirectory structure that mirrors the directories given within the header of the source file.
- checksum.py Takes a file as argument, and returns the 32-bit XOR checksum of the file. This doesn't seem essential as Kohan doesn't seem to care if the checksum is valid before loading but I think it's nice to have anyway. Slow in its current implementation
- magic_number.py Takes a string as argument and returns the file identifier generated from that string. Developed based on the disassembly of the K-Mod executable.