March 29, 2024, 18:57
bigger smaller reset     1020px Wide width Full width Reset   * *

Gildor's Forums

  Homepage Facebook Read news on Twitter Youtube channel Github page
Welcome, Guest. Please login or register.
Did you miss your activation email?

« previous next »
Print
Author Topic: Static Mesh Data for Bioshock  (Read 252 times)
espnsThirtyForThirty
Newbie
*
Posts: 7


View Profile
« on: December 23, 2021, 23:34 »

TL;DR Can I get an overview of how the .bsm files for BioShock are parsed and where in the code this happens? I'm confused on what LoadWholePackage is doing, and what Serialize does.


Long Explanation:

I've been following runs of the source code in a debugger and I wanted to see if I could get some clarification on how the static mesh data gets found for BioShock. To me it seems like a basic overview is that the application goes into the .bsm file, calls InitClassAndExportSystems where all of the exporters and classes are registered. A bit later, LoadWholePackage is called. I think that this is where the .bsm file is parsed, but I'm confused how some of the data is already present. For instance, we can already get the class name of the object, as is done in the loop.

CreateExport is called, and somehow GetExport works. My confusion is that Exp.ObjectName already exists but not Exp.Object. Then later in the function CreateClass is called. I assume that Type->Constructor(Obj) should be doing something, but following the debugger it doesn't seem like anything really happens here.

Later after BeginLoad is called, it seems that the real loading happens. Here when we check if OuterExp.Object exists, it does. I'm really not sure how it exists though. It doesn't seem like anything has happened yet that should've filled that object.

And that completes one loop in GetWholePackage.

So I'm wondering where did the data for the object get parsed from that file? How did it happen? Am I completely off? Before I was checking the Serialize functions, where it seems data is literally read in from the file, but I don't know how it's getting parsed. A little overview would be helpful.
Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #1 on: December 23, 2021, 23:56 »

but I'm confused how some of the data is already present. For instance, we can already get the class name of the object, as is done in the loop.
Objects gets created before serialization. This is exactly how Unreal works itself. They're created as empty objects of particular class in UnPackage::CreateExport().

Quote
CreateExport is called, and somehow GetExport works. My confusion is that Exp.ObjectName already exists but not Exp.Object.
Exp.Name is a part of the package's export table. Exp.Object is a "transient" field, it doesn't exist in package itself. When package is loaded, it's NULL. Then, when CreateExport is called, it checks - if Exp.Object is not NULL, it's already created and returned. If not - the object is created according to its class name, with "factory" functions registered in global class system. The (empty) created object is placed into Exp.Object. It's similar to how Unreal works, it's hard to do things in a different way.

Quote
Then later in the function CreateClass is called. I assume that Type->Constructor(Obj) should be doing something, but following the debugger it doesn't seem like anything really happens here.
It calls object's constructor. Before it, the "object" is just a zeroed memory, even without having any virtual methods table. The constructor wrapper auto-generated in DECLARE_CLASS macro - it generates several "static" functions which calls class methods or returns class information, like it's name and parent class.

Quote
Later after BeginLoad is called, it seems that the real loading happens. Here when we check if OuterExp.Object exists, it does. I'm really not sure how it exists though. It doesn't seem like anything has happened yet that should've filled that object.
I think you meant EndLoad. The serialization is called there. It picks already created (but still empty, "defaulted") object, and calls its virtual Serialize method, passing FArchive as the object which represents a "stream of data" (could be a file, memory, everything). FArchive has many implementations, sometimes they're "nested". For instance, UnPackage is FArchive, but all of its serialization calls are passed to "Loader" member, which is ALSO FArchive. This allows to abstract from data storage methods - this could be a separate file in OS, could be a file in pak, could be a compressed upk file etc.

Quote
So I'm wondering where did the data for the object get parsed from that file? How did it happen? Am I completely off? Before I was checking the Serialize functions, where it seems data is literally read in from the file, but I don't know how it's getting parsed. A little overview would be helpful.
Data parsed in UObject::Serialize (function is overridden for most of classes, UObject::Serialize only parses UProperty data).
Logged
espnsThirtyForThirty
Newbie
*
Posts: 7


View Profile
« Reply #2 on: December 27, 2021, 02:25 »

Thank you so much for the explanation, that clears up a lot. (Also, merry Christmas if you celebrate!)

TL;DR I think I'm just trying to understand the layout of the .bsm file, and trying to figure out how to figure that out. For instance, how do I know where the mesh data is, and its materials?

Full post:

I think my confusion then may just be with the ExportTable and its construction. It looks like its an array of FObjectExport's which contain data such as the object's name and different size properties. To me it looks like it has a SerializeSize and SerializeOffset, which sound like (and from what I've seen from the serialize functions) they point to the offset in the .bsm file of the (for instance) mesh data, and how large the data is. I'm a bit confused then on the role of the class and package indices.

It looks like very many of the << operators are, at the base, calls to FFileReader::Serialize, which seems to just be copying bytes of the .bsm file. For this reason, I assume that the export table is the data structure which tells us where to find the data in the .bsm file, and serialize just reads in that data (although it reads in other types of data, like offsets, as I talk about below)


The line:

Ar << NameCount << NameOffset << ExportCount << ExportOffset << ImportCount << ImportOffset;

in the function FPackageFileSummary::Serialize2 (located in UnPackage2.cpp) seems to do a lot. It looks like it gets a lot of the initial data for the Summary variable in UnPackage.cpp. It seems like the program reads in 4 bytes from the file for each of these variables and advances the location in the file by 4 bytes. So is this layout of the .bsm file (that it starts with int32s in the order of import offset, import count, export offset, etc) just something that you know from the UE source?
Logged
Gildor
Administrator
Hero Member
*****
Posts: 7978



View Profile WWW
« Reply #3 on: December 27, 2021, 10:29 »

I think my confusion then may just be with the ExportTable and its construction. It looks like its an array of FObjectExport's which contain data such as the object's name and different size properties. To me it looks like it has a SerializeSize and SerializeOffset, which sound like (and from what I've seen from the serialize functions) they point to the offset in the .bsm file of the (for instance) mesh data, and how large the data is. I'm a bit confused then on the role of the class and package indices.
Class - the type of the object. There are SkeletalMesh, StaticMesh, Material, MaterialInstance, Texture, and LOTS of other object types there. Hundreds of object types. Class is used to call appropriate object's constructor, and then, SerialOffset + SerialSize is used to load the object data out of the file.

PackageIndex contains more detailed information about the object - the name of package where object placed (.bsm aggregates data from many packages, they borrowed the behavior from UE3 cooker). Also, objects has a "group", a kind of folder inside the package. Look at UnPackage::GetFullExportName() - you'll see how the full object's name is combined from FObjectExport + PackageIndex.

Regarding the list of types - just try "-list" command line option for any of .bsm files. You'll see how many objects and how many object types are inside this file. UModel just supports probably 10 object types.

Quote
... in the function FPackageFileSummary::Serialize2 (located in UnPackage2.cpp) seems to do a lot. It looks like it gets a lot of the initial data for the Summary variable in UnPackage.cpp. It seems like the program reads in 4 bytes from the file for each of these variables and advances the location in the file by 4 bytes. So is this layout of the .bsm file (that it starts with int32s in the order of import offset, import count, export offset, etc) just something that you know from the UE source?
Layout is much more complex. There are many more data fields in package, than just import/export table pointers.

Also, bsm has data compression - again, borrowed from UE3 code. All data pointers in ExportTable etc points to UNCOMPRESSED offset. The compression table has a map between compressed and uncompressed file data, mostly consisting of "compressed block size" and "uncompressed block size" array. For Bioshock, decompressor is created in UnPackage::ReplaceLoader() - it grabs the compression table and passes it to the proxy object, FUE3ArchiveReader, so it does decompression transparently for UModel - e.g. UModel says "seek to position 10000", and this class determines where this position is located in compressed file, reads the compressed data block, performs decompression to data buffer, and then does copying of the requested data from this buffer.

Compression appeared in UE3, and Bioshock has this code integrated into UE2. UE4 also has package compression, but in some engine version it became deprecated, and .pak level compression is used instead.
Logged
Print 
« previous next »
Jump to:  

Powered by SMF | SMF © 2006-2009, Simple Machines LLC
Leviathan design by Bloc | XHTML | CSS