March 29, 2024, 03:41
bigger smaller reset     1020px Wide width Full width Reset   * *

Gildor's Forums

  Homepage Facebook Read news on Twitter Youtube channel Github page
Welcome, Guest. Please login or register.
Did you miss your activation email?

« previous next »
Print
Author Topic: Unreal Tournament 3 UPK File Format Questions  (Read 32373 times)
Poobah
Guest
« on: November 26, 2009, 13:32 »

Hello,

When parsing the UPK files used by Unreal Tournament 3, I am unsure of how to alter the parsing when the Compression Flags DWORD located directly after the Cooker Version DWORD is non-zero. When I seek to the beginning of the name table as specified in the file header, it appears that the rest of, or some of, the contents of the file are in a compressed format. What possible values should I expect the Compression Flags DWORD to be able to have in this game's UPK files, and how should I process the file differently when the value is non-zero?

Thanks.
Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #1 on: November 26, 2009, 15:50 »

Possible values of CompressionFlags:
Code:
#define COMPRESS_ZLIB 1
#define COMPRESS_LZO 2
#define COMPRESS_LZX 4
After CompressionFlags package header (FPackageFileSummary) has
Code:
TArray<FCompressedChunk> CompressedChunks;
(real name of structure is unknown) with the following declaration:
Code:
struct FCompressedChunk
{
    int  UncompressedOffset;
    int  UncompressedSize;
    int  CompressedOffset;
    int  CompressedSize;
};
Layout of this structure on disk is the same as in memory.
When package has compression, UE3 will replace generic file reader with compressed file reader. This reader will perform lookup in CompressedChunks array to find a chunk holding required position. All requests are made using UncompressedOffset "space", and replaced internally (transparently for the rest of UE) into CompressedOffset.
Each block pointed by FCompressedChunk has a header FCompressedChunkHeader (real name is unknown):
Code:
struct FCompressedChunkBlock
{
    int  CompressedSize;
    int  UncompressedSize;
};

struct FCompressedChunkHeader
{
    int  Tag;         // equals to PACKAGE_FILE_TAG (0x9E2A83C1)
    int  BlockSize;   // maximal size of uncompressed block, always the same
    int  CompressedSize;
    int  UncompressedSize;
    TArray<FCompressedChunkBlock> Blocks;
};
Each FCompressedChunkBlock points to a compressed data. This block uses compression algorithm described by Header.CompressionFlags and holds Block.CompressedSize on the disk and Block.UncompressedSize after decompression (or before compression Wink) ZLIB and LZO decompression is very simple (standard usage of these libraries), LZX is more complex, because has additional subdivision to blocks with headers.
Logged
Poobah
Guest
« Reply #2 on: November 27, 2009, 04:09 »

Thank you very much! I'll let you know if I have any further problems.

EDIT:
Ok, I have a few more questions about this.
1. What does the value immediately following CompressionFlags represent?
2. How should I determine how many FCompressedChunks to read?
3. I noticed that the data before the offset pointed to by FCompressedCHunk.CompressedOffset does not appear to be part of the FCompressedChunk array. Does this mean that there is some sort of additional data after the FCompressedChunk array and before the first FCompressedChunkHeader? Do you know what this data is?
4. How do I determine the amount of FCompressedChunkBlocks to read at the end of the FCompressedChunkHeader? I tried calculating this as FCompressedChunkHeader.UncompressedSize / FCompressedChunkHeader.BlockSize, but I am uncertain of whether it is correct.
5. I tried seeking to the beginning of the first block of compressed data in a compressed package, and then passing CompressedChunkBlock.CompressedSize bytes of this data to the miniLZO lzo1x_decompress function, and it failed. From what I have told you, do you know what I may have done wrong?
6. Do you know if any Unreal Tournament 3 files would use zlib compression, or is that mainly used for other platforms or games?
« Last Edit: November 27, 2009, 06:02 by Poobah » Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #3 on: November 27, 2009, 10:18 »

Ok, I understood. You don't know what is TArray.

TArray is a C++ template class appeared in 1st Unreal Engine. Its declaration can be seen in "UT public headers", in a file Core/Template.h.
This is a dynamic array class of any type. Example:
Code:
TArray<SomeStruc> VarName;
will declare a dynamic array VarName, each item is a SomeStruc type.
TArray serialized (stored in a package) as "int32 Count" followed by Count structures of type SomeStruc.
Note: in UE2 Count was serialized as compressed index.
Logged
Poobah
Guest
« Reply #4 on: November 28, 2009, 03:55 »

That helps a bit. So I can see that after the CompressedFlags value, the TArray starts with the amount of elements followed by the actual structures. After this first array of FCompressedChunks, I've noticed that there's an int value before the series of FCompressedChunkHeaders starts. Do you know what this is?

I've noticed that in uncompressed files, the CompressedFlags value appears to always be followed by two values. The first of these may be some sort of count, since it always seems to be 1, but I'm not sure what the next value is. Do you know what this is?

Also, within each FCompressedChunkHeader, the FCompressedChunkBlock array does not appear to begin with a count; it appears to just go straight into the FCompressedChunkBlock structures, from what I can tell, so I'm still unsure about this.

EDIT:
Good news: I made a bit more progress on the format! I used FCompressedChunkHeader.UncompressedSize / FCompressedChunkHeader.BlockSize to calculate the amount of blocks present within the FCompressedChunkHeader and then seeked 8 * this amount bytes forward to the end of the array of FCompressedChunkBlocks. I then had to seek past an additional two int values, which I could not identify, and then passed the following data to the LZO decompressor, which was able to decompress that section into part of the name table without errors! Are you certain that the amount of FCompressedChunkBlocks is specified somewhere in the file? Also, do you know what the two int values are after the array of FCompressedChunkBlocks?

One more question: I have been assuming that most of the 32-bit values in the package files are unsigned. Do you know if this is a correct assumption, or should I treat them all as signed?

Thanks a lot for your help so far!
« Last Edit: November 28, 2009, 05:31 by Poobah » Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #5 on: November 28, 2009, 18:29 »

Yes, my mistake.
FCompressedChunkHeader.Blocks are not serialized as TArray. I'm using TArray structure internally. You should read FCompressedChunkBlock BlockCount times:
Code:
BlockCount = (H.UncompressedSize + H.BlockSize - 1) / H.BlockSize;
Quote
I then had to seek past an additional two int values, which I could not identify
You should round number of blocks up (as in code sample above).
Quote
One more question: I have been assuming that most of the 32-bit values in the package files are unsigned. Do you know if this is a correct assumption, or should I treat them all as signed?
Usually this does not matter.
Logged
Poobah
Guest
« Reply #6 on: November 29, 2009, 00:56 »

Ok, that should be enough information for me to parse the compressed files.

Just to make sure that I am not making any bad assumptions, could you please clarify these three points?
1. How should the two ints following the CompressedFlags int in the header be interpreted?
2. When seeking to the first block of compressed data after reading a CompressedChunkHeader, should I just seek past the size of ( BlockCount + 1 ) * sizeof( FCompressedChunkBlock ) to get there?
3. After reading the first block of compressed data, does the second one immediately follow it, or does additional data appear in between the blocks?
Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #7 on: November 29, 2009, 01:29 »

1. How should the two ints following the CompressedFlags int in the header be interpreted?
These two ints are the last FCompressedChunkBlock structure you have lost due to incorrect calculation of block count (check my post above)
Quote
2. When seeking to the first block of compressed data after reading a CompressedChunkHeader, should I just seek past the size of ( BlockCount + 1 ) * sizeof( FCompressedChunkBlock ) to get there?
To seek into position Pos in compressed archive I'm using the following algorithm:
  • find Pos in CompressedChunks:
Code:
for (int ChunkIndex = 0; ChunkIndex < CompressedChunks.Num(); ChunkIndex++)
{
    Chunk = &CompressedChunks[ChunkIndex];
    if (Pos >= Chunk->UncompressedOffset && Pos < Chunk->UncompressedOffset + Chunk->UncompressedSize)
        break;
}
  • seek to Chunk->CompressedOffset
  • read FCompressedChunkHeader ChunkHeader (and all corresponding FCompressedChunkBlock blocks)
  • find Pos in ChunkHeader.Blocks (should internally track compressed and uncompressed positions when iterating that array)
Code:
int ChunkPosition = Chunk->UncompressedOffset;
int ChunkData = Reader->Tell();
int UncompSize = 0, CompSize = 0;
const FCompressedChunkBlock *Block = NULL;
for (int BlockIndex = 0; BlockIndex < ChunkHeader.Blocks.Num(); BlockIndex++)
{
    Block = &ChunkHeader.Blocks[BlockIndex];
    if (ChunkPosition + Block->UncompressedSize > Pos)
        break;
    ChunkPosition += Block->UncompressedSize;
    ChunkData     += Block->CompressedSize;
}
  • now you can read data block at position ChunkData, size of block is Block->CompressedSize; after decompression this block corresponds to uncompressed ChunkPosition, size of uncompressed block is Block->UncompressedSize
Quote
3. After reading the first block of compressed data, does the second one immediately follow it, or does additional data appear in between the blocks?
Immediately (for all blocks which are belongs to the same FCompressedChunkHeader)
Logged
Poobah
Guest
« Reply #8 on: November 29, 2009, 09:11 »

Thanks again for your help! I have almost finished implementing this now.

1. How should the two ints following the CompressedFlags int in the header be interpreted?
These two ints are the last FCompressedChunkBlock structure you have lost due to incorrect calculation of block count (check my post above)

With this question, I was referring to the package file header. I've noticed that with an uncompressed file, the CompressionFlags int is followed by two ints, but I do not know what they are for. Do you?
Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #9 on: November 29, 2009, 19:57 »

There is only one int following package header (at least for UT3). I don't know its meaning.
Logged
Poobah
Guest
« Reply #10 on: December 01, 2009, 12:31 »

I've got the compressed file parsing working now. If you hadn't explained the details about that then I probably would have just ended up giving up on the program I'm making, so thanks for helping!

I've also just encountered my first string with a negative length specified. It appears that a negative length indicates that the string is a wide-character or UTF-16 string (2 bytes per character), but I've also noticed that the last character of the string is stored differently to the one displayed by UT3 pkginfo. The particular example that I am looking at is name 2044 in UTGame.u. In the text output from pkginfo, it is displayed as "Alternatywny strzal". In the file (after decompression), it appears as "Alternatywny strzaB". Do you know how this works?
« Last Edit: December 01, 2009, 12:50 by Poobah » Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #11 on: December 01, 2009, 13:32 »

Do you mean "pkginfo" is a ut3 ucc commandlet command?
What "decompression" do you mean?
Logged
Poobah
Guest
« Reply #12 on: December 01, 2009, 13:41 »

Do you mean "pkginfo" is a ut3 ucc commandlet command?

Yes.

What "decompression" do you mean?

The zlib/LZO/LZX decompression.

Just to clarify, I'm wondering exactly how I should interpret the negative-length strings in UT3 package files.
Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #13 on: December 01, 2009, 13:46 »

As you wrote before, negative length in FString indicates UTF-16 string. I cannot say more.
Logged
Poobah
Guest
« Reply #14 on: December 01, 2009, 14:02 »

As you wrote before, negative length in FString indicates UTF-16 string. I cannot say more.

Ok, so you also don't know how or why the UT3 engine transforms the UTF-16 name "Alternatywny strzaB" into ""Alternatywny strzal"?
Logged
Print 
« previous next »
Jump to:  

Powered by SMF | SMF © 2006-2009, Simple Machines LLC
Leviathan design by Bloc | XHTML | CSS