donmai

Don't mind me— I occasionally post about my findings in reverse engineering games. Saltpack key: kex1996qewz7hgxzyrhlvunnspm4wdzt0m9c3xvl9wfaxvrz0pdlm88shhwfc5

Criware's USM format Part 1

If you played a lot of recent Japanese games, you might have heard of Criware or its full name CRI Middleware Co.. It may be a Japanese gambling game disguised as a cutesy anime mobile game or a AAA action game where you fight God. Chances are, you might have seen their name or logo in the start-up or credits of a Japanese game. As the name CRI Middleware Co. implies, Criware provides middleware for use in video game development. Criware has multiple middlewares for a game developer's audio and video playback needs, from delivering autogenerated lip-sync data for voice files to playback of video and audio. One of these middleware is Sofdec 2, whose container you might recognise by its extension .usm. USM is a container for video and audio to be played by Criware's player, internally named Mana. You may even have encountered this file format for video games produced outside of Japan, explaining why later.

Why would a game developer bother with proprietary middleware for video playback? Doesn't Unity and Unreal Engine provide frameworks for media playback? You might ask. It is true that Sofdec 2 taps into the engine's native media framework. But Sofdec 2, together with ADX 2, Criware's audio middleware, offers the ability to incorporate audio and subtitles into video playback with ease. Sofdec 2 also allows videos to be textures on solids with full transparency support. There are more reasons why game developers use Criware's SDKs, but this isn't an ad for Criware and I'm not a game developer. So let's get on with dissecting this proprietary format.

This will be the first part of a series of posts about Criware's USM format. By the end of this series (or sooner), I'll have a surprise for everyone looking to extract and make their own custom USM files. I promise that I'll post more regularly compared to last year so bear with me. Lastly, I still don't know much about USM, so corrections and additional info are always welcome.

By the end of the first part, we should have learned the following:

  • The basic building block of USM, chunks. First, the general format.
  • Then, the different types of payloads a chunk can have.
  • Finally, how dictionary payloads are encoded.

History

Before we talk about the technical details regarding CRI Movie 2, we'll first briefly discuss its history and how it gained some adoption outside of Japan.

Criware released Sofdec 2 to supersede Sofdec 1, Criware's earlier video playback library. While Sofdec 1 used the SFD container, Sofdec 2 uses a new container format called USM. USM's advantages over SFD are video seek playback support, queue point support, Unicode support, and more video and audio codec support.1 Metadata about the video and audio data present in the USM file is included in the initial header chunks, unlike SFD.

In 2009, Scaleform partnered with Criware to use CRI Movie 2 for Scaleform GFx 3's Scaleform Video. 2 Scaleform, just like Criware, develops middleware for use in video games.

In 2011, Autodesk acquired Scaleform for $36 Million. 3 And in 2012, Autodesk announced their middleware suite for game developers called Autodesk Gameware. 4 Numerous game companies have used their middleware, such as CD Projekt Red and Valve, which may explain how USM has received adoption in video games produced outside of Japan.

In 2017, Autodesk removed Autodesk Gameware from their offerings and announced end of support. 5

In 2018, Criware announced a plugin for Sofdec 2 that allows support for VP9. 67 Increasing the amount of supported video codecs to three.

In 2019, Criware established a subsidiary in China for the booming video game industry there. 8 Since then Sofdec 2 and ADX 2 is used in numerous Chinese games like Azur Lane and Genshin Impact.

The discontinuation of Autodesk's middleware suite means the end of the adoption of CRI Movie 2 for some of the few western game developer that has embraced it. But the adoption of Criware middleware in Chinese video games makes up for it. The success of Azur Lane, Girls Frontline, Genshin Impact, and the loads of Japanese games that use their middleware proves that Criware is far from irrelevant.

Note: Sofdec 2 is the actual name of Criware's video middleware library. It is unclear to me whether CRI Movie 2 refers to the USM container format or is just Criware USA's marketing department rebranding Sofdec 2 to CRI Movie 2. However, I have to agree that Sofdec isn't a nice name, nor is it as catchy as CRI Movie. For this blog post, I'll make USM synonymous with CRI Movie 2. And Sofdec 2 as the entire video playback suite that includes the SDK and the USM format.


Format

Let's start the technical discussion with an overview of the building blocks of USM. A USM file is made up of chunks in a serial manner. Meaning where one chunk ends, the next one begins. It is also important to note that all data stored in a USM is in big-endian—the most significant byte is stored first. The format of a USM chunk is:


Chunk header

Chunk identifier

The first four bytes of a chunk header is its identifier. A chunk identifier indicates the chunk type, whether audio or video related or something else entirely. The identifier is a four-letter ASCII text, and in total, there are three chunk identifiers for USM:

  • CRID
  • @SFV
  • @SFA

A chunk with an identifier of CRID contains information on all the video and audio streams available in the USM file. It also includes information on the format version of the USM file itself. A CRID chunk only appears at the beginning of the USM file and will only have a header payload type.

A chunk with the identifier of @SFV and @SFA contains information on a video or audio stream, respectively. It could be a header, metadata, or the actual frame packet, depending on the chunk's payload type.

Chunk size

Chunk size is the size of the chunk data (payload header, payload, and padding) and does not include the 8-byte chunk header.


Chunk data

Payload offset

As the name suggests, this is the offset from the start of the chunk data to the payload. From existing USM files, this is always 0x18.

Padding size

Padding size is the number of padding bytes appended at the end of the payload.

Channel number

The channel number of a chunk is a 1-byte integer that begins at 0. Typical use is for USMs with one video track and multiple audio tracks for localisation.

Channel numbers are not exclusive, and a video and audio track can have the same channel number. However, audio tracks will never share the same channel number. A CRID chunk will always have a 0 channel number.

Payload type

Payload type is an enum type packed into one byte and can be represented as:

enum payload_type {
    stream = 0,
    header = 1,
    section_end = 2,
    seek = 3,
}

A stream payload is binary data from a video or audio stream. And a header payload contains media metadata about a video or audio track. While a seek payload type includes data about the seek positions of a video track. A section_end payload will state the end of a stream, header, or seek chunk or series of chunks.

Frame time

A frame time is a 4-byte integer used to synchronise audio and video frame's chunks. These are only used for stream chunks and are 0 for everything else.

Frame rate

A chunk's frame rate is a 4-byte integer, and its values differ from the chunk type and a video track's actual framerate. For chunk types that are not a stream, the value is always 30, and for audio stream chunks, the value is always 2997. For video stream chunks, the value is 100 times the video track's frame rate.

From actual USM files, table 1 contains a list of typical frame rates and their corresponding stream chunk's frame rate:

Video frame rate Stream chunk frame rate
24 2400
29.97 2997
30 3000
60 6000

Table 1: Common video frame rates and their corresponding stream chunk framerates.

Payload

Payload is the chunk's actual data; it could be metadata or packets of data from a video or audio track. For stream payloads, the payload is just the bytes from a frame of a video track or a packet of an audio track.

For other payload types, which we'll refer to as dictionary payloads, the payload is just information presented in an array of dictionaries with key strings. An example of this for video seek information is:

video_seek = [
    {
        "ofs_byte": 5696,
        "ofs_frmid": 0,
        "num_skip": 0,
        ...
    },
    {
        "ofs_byte": 569632,
        "ofs_frmid": 60,
        "num_skip": 0,
        ...
    },
    {
        "ofs_byte": 3864416,
        "ofs_frmid": 120,
        "num_skip": 0,
        ...
    },
    ...
]

I excluded some information for brevity, but the vital thing to note is that every dictionary in the array has the same keys. All values of the same key have the same type. The types for a dictionary value as C types are:

  • Char (1 byte)
  • Unsigned char (1 byte)
  • Short (2 bytes)
  • Unsigned short (2 bytes)
  • Integer (4 bytes)
  • Unsigned integer (4 bytes)
  • Long long (8 bytes)
  • Unsigned long long (8 bytes)
  • Float (4 bytes)
  • String (variable and null-terminated)
  • Byte array (variable)

Padding

Padding is just null bytes (0x00) whose size is declared in the payload header's payload size. It is important to note that the USM format is designed for CDs, therefore, padding is essential to align some parts of it to sector boundaries for more efficient reads.


Payload encoding

Let's elaborate more on how dictionary payloads encode their data. To reiterate what I've written in the previous section with additional information:

  • Dictionary payload contains an array of dictionaries.
  • Each dictionary payload has a name.
  • All dictionaries in the array have the same set of keys.
  • All dictionaries in the array have ASCII string keys.
  • All dictionaries in the array have the same order of keys.
  • All values of the same key have the same type.
  • The same key's value may or may not differ from the others.
  • A value with a string type can be encoded in either Shift-JIS, UTF-8, or UTF-16.

There are four arrays in a dictionary payload. They are:

  • An array for shared data.
  • An array for unique data.
  • An array for C-strings.
  • And an array for byte arrays.

Before I describe what the four arrays contains, I'll give a simple example. In the first part of the example, is a Python code snippet that gives a high-level view of the contents and structure of the payload. Next are the equivalent byte array (presented in hex), if it were converted to an actual payload.

Note: The following example is for demonstration purposes and is not used in any actual USM file.

Sample payload

payload.name = "Example payload"
payload.dicts = [
    {
        "filename": (ValueType.string, "foo.txt"),
        "filesize": (ValueType.int, 12345678),
        "version": (ValueType.char, 1),
        "owner": (ValueType.string, "donmai")
    },
    {
        "filename": (ValueType.string, "bar.txt"),
        "filesize": (ValueType.int, 87654321),
        "version": (ValueType.char, 1),
        "owner": (ValueType.string, "donmai")
    },
]

Shared array

Shared array is 25 bytes.

5A 00 00 00 17 54 00 00 00 20 30 00 00 00 29 01
3A 00 00 00 31 00 00 00 3F

Unique array

Unique array is 16 bytes.

00 00 00 37 00 BC 61 4E 00 00 00 46 05 39 7F B1

C-string array

C-string array is 78 bytes. A \x00 denotes a null-byte.

<NULL>\x00Example payload\x00filename\x00filesize\x00version\x00owner\x00foo.txt\x00donmai\x00bar.txt\x00

Byte array

Byte array is empty.

Explanation

Let's look at this byte per byte, starting with the shared array. The bytes in a shared array are grouped—the number of groups equal to the number keys in a dictionary. Our example's dictionary has four keys, so four groups are in the shared array. The first byte in a group contains two pieces of information: the value type and whether it is unique or recurring. From our example, we know that the first value, "foo.txt", is a string and unique. To pack these two pieces of information together into one byte: First, we convert a value's type to a number using table 2. Next, we convert a value's occurrence to a number using table 3. Finally, we combine these two numbers by adding our value type's number. The value occurrence's number shifted 5 bits to the right. For our first value, we ge 1A for our value type and 2 for our value occurence. We then add them like this: 1A + (2 >> 5) = 5A.

Value type Number Size
Char 0x10 1
Unsigned char 0x11 1
Short 0x12 2
Unsigned short 0x13 2
Integer 0x14 4
Unsigned Integer 0x15 4
Long long 0x16 8
Unsigned long long 0x17 8
Float 0x18 4
String 0x1A Pointer size is 4 bytes
Bytes 0x1B Start and end pointers are 4 bytes

Table 2: Conversion table for a value's type and its corresponding number.

Value occurrence Number
Recurring 1
At least one value is unique 2

Table 3: Conversion table for a value's occurrence and its corresponding number.

After the first byte, the following four bytes is a start offset of the key in the C-string array. 00 00 00 17 is 23 and would point to the start of filename\x00filesize\x00ver.... Since the string is null-byte terminated, our key with the null-byte discarded would be filename which is indeed the key for the first item in the dictionary. Finally, since our value is unique, we would find the value in the unique array. Since our value's type is a string, the pointer size is 4 bytes. Taking the first four bytes of the unique array, we get 00 00 00 37. This pointer is pointing to the start of foo.txt\x00donmai\x00... which means our value is foo.txt. After we got our value, this ends the first group of bytes in the shared array.

Let's move on to the second group, and let's make it brief:

  1. Shared array second group: first byte = 54 = 0x14 + (2 >> 5). Second value is an integer and unique.
  2. Next four bytes in shared array: 00 00 00 20 => filesize\x00. Second key is filesize.
  3. Since value is unique and an integer. Next four bytes in unique array = 00 BC 61 4E. Second value is 12345678.

Third group:

  1. Shared array third group: first byte = 30 = 0x10 + (1 >> 5). Third value is a char and recurring.
  2. Next four bytes in shared array: 00 00 00 29 => version\x00. Third key is version.
  3. Since value is recurring and a char. Next byte in shared array = 01. Third value is 1.

Final group:

  1. Shared array fourth group: first byte = 3A = 0x1A + (1 >> 5). Fourth value is a string and recurring.
  2. Next four bytes in shared array: 00 00 00 31 => owner\x00. Third key is owner.
  3. Since value is recurring and a string. Next four bytes in shared array = 00 00 00 3F => donmai\x00. Fourth value is donmai.

We would have the following dictionary discarding value type and occurrence:

{
    "filename": "foo.txt",
    "filesize": 12345678,
    "version": 1,
    "owner": "donmai",
}

Which is precisely the dictionary we made in the example.

Now that we're done with the first dictionary in the array let's move on to the second and final dictionary. To do that, we need to point back to the start of the shared array and retain our pointer in the unique array. Effectively the shared and unique array is now equivalent to this:

  • Shared array: 5A 00 00 00 17 54 00 00 00 20 30 00 00 00 29 01 3A 00 00 00 31 00 00 00 3F
  • Unique array: 00 00 00 46 05 39 7F B1

The C-string and byte arrays are still the same since the offsets stored in the shared and unique arrays are absolute. Now let's do what we did before, but this time for the second dictionary.

First group:

  1. Shared array first group: first byte = 5A = 0x1A + (2 >> 5). First value is a string and unique.
  2. Next four bytes in shared array: 00 00 00 17=> filename\x00. First key is filename.
  3. Since value is unique and a string. Next four bytes in unique array = 00 00 00 46 => bar.txt\x00. First value is bar.txt.

Second group:

  1. Shared array second group: first byte = 54 = 0x14 + (2 >> 5). Second value is an integer and unique.
  2. Next four bytes in shared array: 00 00 00 20 => filesize\x00. Second key is filesize.
  3. Since value is unique and an integer. Next four bytes in unique array = 05 39 7F B1. Second value is 87654321.

Third group:

  1. Shared array third group: first byte = 30 = 0x10 + (1 >> 5). Third value is a char and recurring.
  2. Next four bytes in shared array: 00 00 00 29 => version\x00. Third key is version.
  3. Since value is recurring and a char. Next byte in shared array = 01. Third value is 1.

Final group:

  1. Shared array fourth group: first byte = 3A = 0x1A + (1 >> 5). Fourth value is a string and recurring.
  2. Next four bytes in shared array: 00 00 00 31 => owner\x00. Third key is owner.
  3. Since value is recurring and a string. Next four bytes in shared array = 00 00 00 3F => donmai\x00. Fourth value is donmai.

From the procedures we've done we derive this dictionary, again, discarding value types and occurrence:

{
    "filename": "bar.txt",
    "filesize": 87654321,
    "version": 1,
    "owner": "donmai",
}

Which is the same as the second dictionary in our example. To summarize the process:

  • Each item in a dictionary is represented as a group in the shared array.
  • The first byte is: T + (O >> 5). Where T is the value's type corresponding number, and O is the value's occurrence number.
  • The next four bytes are the offset of the key in the C-string array.
  • If the value is recurring - meaning all the values for the same key across all dictionaries are the same - the value or pointer/s are stored in the shared array.
  • Else value or pointer/s are stored in the unique array.
  • For strings, the pointer stored in either the shared or unique array. This pointer points to the start of the string in the C-string array.
  • For bytes, the first and second pointer in either the shared or unique array is the start and end pointers, respectively.
  • When moving to the following dictionary in the array, we point back to the start of the shared array and retain where we point to the unique array.

I hope this gave you a clear understanding of how USM encodes an array of dictionaries. Now let's move on to how to put everything together. A dictionary payload is structured as follows:

  • Header
    • Identifier (4 bytes)
    • Payload size (4 bytes)
  • Data
    • Unique array offset (4 bytes)
    • C-string array offset (4 bytes)
    • Byte array offset (4 bytes)
    • Payload name offset (4 bytes)
    • Number of items per dictionary (2 bytes)
    • Unique array size per dictionary (2 bytes)
    • Number of dictionaries (4 bytes)
    • Shared array
    • Unique array
    • C-string array
    • Byte array

The header of the payload is 8 bytes and composed of two data: First, the identifier, a four-letter ASCII string that has a value of @UTF. Second, the payload size, a 4-byte integer that states the size of the actual data and does not include this 8-byte header.

The unique, C-string, byte array offsets are 4-byte integers that point to the start of their respective arrays relative after the 8-byte header. Next, the payload name offset points to the start of the payload name in the C-string array. The number of items per dictionary and the number of dictionaries is self-explanatory. Finally, the unique array size per dictionary is the size in bytes consumed per dictionary.

For our example above the payload is:

  • Header
    • Identifier: 40 55 54 46
    • Payload size: 00 00 00 8F
  • Data
    • Unique array offset: 00 00 00 31
    • C-string array offset: 00 00 00 41
    • Byte array offset: 00 00 00 8F
    • Payload name offset: 00 00 00 07
    • Number of items per dictionary: 00 04
    • Unique array size per dictionary: 00 08
    • Number of dictionaries: 00 00 00 02
    • Shared array: 5A 00 00 00 17 54 00 00 00 20 30 00 00 00 29 01 3A 00 00 00 31 00 00 00 3F
    • Unique array: 00 00 00 37 00 BC 61 4E 00 00 00 46 05 39 7F B1
    • C-string array: 3C 4E 55 4C 4C 3E 00 45 78 61 6D 70 6C 65 20 70 61 79 6C 6F 61 64 00 66 69 6C 65 6E 61 6D 65 00 66 69 6C 65 73 69 7A 65 00 76 65 72 73 69 6F 6E 00 6F 77 6E 65 72 00 66 6F 6F 2E 74 78 74 00 64 6F 6E 6D 61 69 00 62 61 72 2E 74 78 74 00
    • Byte array: NONE

Putting everything in one contiguous block of bytes would produce our actual payload:

40 55 54 46 00 00 00 8F 00 00 00 31 00 00 00 41
00 00 00 8F 00 00 00 07 00 04 00 08 00 00 00 02
5A 00 00 00 17 54 00 00 00 20 30 00 00 00 29 01
3A 00 00 00 31 00 00 00 3F 00 00 00 37 00 BC 61
4E 00 00 00 46 05 39 7F B1 3C 4E 55 4C 4C 3E 00
45 78 61 6D 70 6C 65 20 70 61 79 6C 6F 61 64 00
66 69 6C 65 6E 61 6D 65 00 66 69 6C 65 73 69 7A
65 00 76 65 72 73 69 6F 6E 00 6F 77 6E 65 72 00
66 6F 6F 2E 74 78 74 00 64 6F 6E 6D 61 69 00 62
61 72 2E 74 78 74 00

Conclusion

If you are with me up to this point, we should have a sufficient understanding of the building blocks of a USM file and how they are encoded. In the next part, I'll discuss the different types of chunks used in an actual USM file, their purpose and how they are structured. I'll try my best to post the following parts of this as soon as possible. I was initially going to include everything in one long post but it took too much time, and external factors forced me to post at least a part of this as soon as possible. Thank you for your time, and see you again soon.


Saltpack signed message

Newly Released Simai to SDT Converter

I've finally published the code for my simai to sdt converter at my github. The rest of this blog post is mostly about my time developing this Python program. Please keep in mind that not much thought has been given to this post so some explanations may be unclear.

The simai chart format was a bit hard to translate to sdt due to two major differences in the two formats:

  • Simai allows BPM changes while sdt does not
  • Some slide patterns in simai require some additional steps before you find the equivalent slide pattern in sdt

BPM Changes

A hacky way of solving the first problem is to still write everything in the initial BPM but find the equivalent length of a note or rest in another BPM in the initial BPM of the chart. For example, a chart has an initial BPM of 120 BPM but along the way there's a BPM change of 240 BPM, twice of the initial BPM. How can we get the equivalent length in the initial tempo of 120 BPM of a quarter note in 240 BPM? First, we assume that the tempo is in terms of quarter notes. We simply find the proportion of the initial tempo to the changed tempo and multiply it by the length of the note. So we get 0.25*(120/240) = 0.125. The equivalent length is 0.125 measures.

The sdt format's time signature seems to be in 4/4 so we can be confident in our computation above. For songs with funky time signatures we can use sdt's 4 decimal point measures to compensate for that.

Slide Patterns

'^' Pattern

For the second difference, we have to find the direction of the slides before we can find the equivalent slide pattern. An example is simai's "^" slide pattern. It is supposed to connect a start and end position by an arc along the judgement ring. The direction of the arc whether it is clockwise or counter-clockwise is determined by which direction produces a shorter arc. I first approached this problem thinking it's a modular arithmetic problem because I viewed the locations of the buttons as a clock. The simpler approach is to think it as a simple geometric problem.

First is to find the two possible distances (CW and CCW) between the start and end positions. That's simply the absolute value of the difference of end and start position (we'll call the first difference), and the absolute value of the difference between the total length and the first difference (we'll call this other difference). Next is to find which rotation is associated with which distance. For a clockwise rotation, if the start position is greater than the end position then the first difference must be greater than the other difference; or if the start position is less than the end position, then the first difference must be less than the other difference. If neither satisfies then the shortest arc must be a counter-clockwise rotation.

Example

Given the slide note 6^3, the first difference is |3 - 6| = 3, the other difference is |8 - 3| = 5. Our start position is greater than our end position, and the other difference is greater than the first difference; therefore, the slide goes counter-clockwise. In sdt, a counter-clockwise arc along the judgement ring has a slide pattern value of 2.

'V' Pattern

For the simai 'V' slide pattern, we also determine whether the arc goes clockwise or counter-clockwise. This is done by getting the first and second point (not the end point) and find whether it is clockwise or counter-clockwise. We can reuse our simple algorithm above for finding the rotation. In my program, we discard the second point and only use it to find the rotation. Sdt format only allows the second point to be two places away from the starting point unlike simai where the second point can be set.

We can probably get around this in sdt by defining two straight slide: start to second point, second point to end point. The total duration of the two must be equal to the duration of the 'V' slide. The rest note of the second slide must be equal to the duration of the first slide so it will start to move only when the first slide is finished.

Example

Given the slide note 1V74, we find the rotation of the slide by getting the first and second point, 1 and 7. Using the previous algorithm, the first difference and second difference are 6 and 2, respectively. Our second point is greater than our first slide, and the first difference is greater than our second difference; therefore our rotation is counter-clockwise. An equivalent counter-clockwise sdt slide of 'V' is 11.

Ending thoughts

This post just explains a small part of the conversion that happens between simai to sdt. I've hastily put the Python program together with this blog post, so I hope that people who wants to convert a simai file to sdt can find it useful. In the next post I'll finally discuss about the binary versions of the Maimai chart formats and how to convert to and from the two formats.

Until then, I hope you learned something and I'll see you next time.


Saltpack signed message

The Four Chart Formats of Maimai Classic

Erratum (10 March, 2021): I have made an error about SRT slide pattern values and have fixed them.
Edit (23 March, 2021): I have replaced "slide tap note" with "star note". I also added some more information about star and slide notes in utage charts.

Maimai fascinates me because of its unique take on rhythm games. When I played it for the first time, I was caught off-guard by how hard it is to play when the notes are coming from the centre towards the machine's buttons. It was different from other arcade rhythm games, where the judgement line is on one area. You'd typically look at that area to accurately time the notes. But in Maimai, the judgement line is a judgement ring. A judgement ring always ensures that a part of it is always in the player's peripheral vision, making the notes harder to time. Add to that the various slide notes in the game that will make you trace your hands on the screen, blocking a part of your view of the screen. It can get pretty overwhelming at the start, and you have to get a good feel at the game to get even decent at it. Talking about it makes me itch to play some right now, but the current pandemic makes it hard.

The creative use of button and slides to choreograph moves in time with the music interests me on how this game works. After digging around and studying the game files for a while, I'm here to give my findings. I'm still not sure about many of these things, so if you can correct me, I'll gladly appreciate it. If you have any correction or additional information, please message me at my guestbook here on Listed.to or at Twitter.

Maimai classic, over eight years, used four different chart formats for their game. Each was incrementally building on the previous for additional features or refinements to the game. Since the differences between them are minimal, I'll start with things they share in common.


Common traits of the four formats

Note Types

There are three note types in Maimai classic: tap, hold and slide. Tap notes are notes that come from the centre towards one of the buttons. The player has to press the button or tap the area on the touchscreen near the button when it comes to the judgement ring. Hold notes are just like tap notes, but the player has to hold the button for a set amount of time before they let go. In Maimai classic, releasing the hold note early or late will give a low judgement. A slide note begins like a tap note, but after you hit/miss it, the rest of the slide will appear after you hit/miss it. The player then has to trace their hands across the touchscreen, following the pattern shown. Tap and slide notes can have a "break" modifier to have a higher value in grading.

Position

Maimai Classic's chart formats have no way of representing touchscreen zones. They are only used to keep track of slide notes and as an additional input to the buttons. It's only in Maimai DX, where they are included in the format, but that's for another time. For Maimai Classic, there are eight positions, all corresponding to the eight buttons on the machine. Internally, the buttons start from 0 to 7. The first (or 0th) button is the button at the 1 o'clock position, and the last button (7th) is the button at the 11 o'clock position. The ordering of the buttons is in a clockwise manner.

Time

Maimai classic chart formats don't have the concept of BPM changes nor fine time offsets; heck, it doesn't even say what BPM the song is. All it does is track what measure a particular note is with 6 or 4 decimal places of precision starting at measure 1.0000. The engine would grab a chart file and the BPM of the song from one of its tables to make sense of a song's timing. Even though it doesn't support BPM changes or offsets, you can compensate for it via the format's four decimal place precision. Nothing is stopping you from charting a song in a BPM different from the song's correct BPM. Just as long as you give the engine your chart file and the song's BPM and you'll have no problem.

In general, the chart format has no absolute definition of time. All notes' timing is in terms of measures, which is dependent on BPM. For holds and slides, the starting and ending measure determines the duration.

Internally, the chart formats use three floats to keep track of what time a note happens. Below is a simplified sample of 4 tap notes: One that occurs at the beginning of the first measure, the middle of the second measure, the third quarter of the same measure, and the first quarter of the fourth measure.

1.0000, 0.0000, 0.0625,
2.0000, 0.5000, 0.0625,
2.0000, 0.7500, 0.0625,
4.0000, 0.2500, 0.0625,

Even though the first column looks like a float, it's an integer representing the whole number part of the current measure. The second column is the fractional part. The third column is the duration of the note. So if you have two hold notes at the same time, one lasting for half a measure and the other lasting for two bars, you would write them like this:

1.0000, 0.0000, 0.5000,
1.0000, 0.0000, 2.0000,

So what was the '0.0625' at the first example? To be honest, I don't know. The value doesn't matter for tap notes, and I haven't seen any pattern from looking at official charts or my testing. If you do know, please message me so I can add it here.


Differences

SRT

SRT is the first chart format created for Maimai Classic. It has seven columns: 3 for time, 1 for location, 1 for note type, 2 for slide information. Each line represents a note and looks like the one below. In the example below, it's a tap note at button one at measure 1.000000.

1.000000, 0.000000, 0.062500, 1,   0,   0,   0,
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7
Measure (whole part) Measure (fractional part) Hold/slide duration or unknown Note location Note Type Slide ID Slide Pattern

Column 5 (Note type) Values for SRTs

Value Description
0 Tap note
2 Hold note
4 Break tap note
128 End slide

A pair of tap note (0) and end slide (128) generates a slide note. A slide note's tap note should have a unique non-zero slide id and slide pattern. The corresponding end slide note should have the same slide id and slide pattern. Below is an example of a straight slide from button 0 to 4 that lasts for one measure. The following section contains more information on slide patterns.

1.000000, 0.000000, 1.000000, 0,   0,   1,   0,
2.000000, 0.000000, 0.000000, 4, 128,   1,   0,

Column 7 (Slide Pattern) Values for SRTs

Value Description Simai equivalent
0 Straight line -
1 Along the judgement ring CW (Can only go 3 places max) <, >, or ^
2 Along judgement ring CCW (Can only go 3 places max) <, >, or ^

Note that note types 1 and 2 can only travel up to 3 places max even if given an end position longer than three places.


SZT

The SZT format brings a significant change to the SRT format by changing how slides work and adding and changing note types. But it's still a seven-column format like SRT. To show the difference, let's use the same example of a tap note at button one at measure 1.

1.000000, 0.000000, 0.062500, 1,   1,   0,   0,
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7
Measure (whole part) Measure (fractional part) Hold/slide duration or unknown Note location Note Type Slide ID Slide Pattern

Column 5 (Note type) Values for SZTs

Value Description
0 Start slide
1 Regular Tap note
2 Hold note
3 Break note
4 Star note
5 Break star note
128 End slide

Slides now need three notes instead of two. It needs:

  1. A star note or break star note (with zero slide id and slide pattern)
  2. Start slide (with a unique slide id and a non-zero slide pattern)
  3. End slide (with same slide id and slide pattern as paired start slide)

Actually, you only need 2 and 3. The star note is sometimes discarded in some utage charts. And in the case of some utage charts like その群青が愛しかったようだった, all tap notes are replaced with star notes with no slides.

Column 7 (Slide Pattern) Values for SZTs

Value Description Simai equivalent
1 Straight line -
2 Along the judgement ring (CCW) <, >, or ^
3 Along judgement ring (CW) <, >, or ^
4 Arc CCW around the center p
5 Arc CW around the center q
6 Zigzag (S) s
7 Zigzag (Z) z
8 Start to center (straight) to end (straight) v
9 Start to center (straight) to end (CCW arc) pp
10 Start to center (straight) to end (CW arc) qq
11 Start to two places CCW (straight) to end (straight) V
12 Start to two places CW (straight) to end (straight) V
13 Fan w

Note that patterns 2 and 3 are no longer limited to 3 places and can now complete a 360-degree rotation.


SCT

SCT adds a feature to create multiple slides coming from one position at the same time. It does so by adding another column, which we'll call the slide amount for lack of a better term. Using the previous example, let's add a star note at button one that will spawn two straight slides to button four and button six.

1.000000, 0.000000, 0.062500, 1,   1,   0,   0,   0,
1.000000, 0.500000, 0.062500, 1,   4,   0,   0,   2,
1.000000, 0.500000, 0.500000, 1,   0,   1,   1,   0,
1.000000, 0.500000, 0.500000, 1,   0,   2,   1,   0,
2.000000, 0.000000, 0.000000, 4, 128,   1,   1,   0,
2.000000, 0.000000, 0.000000, 6, 128,   2,   1,   0,

From the example above, we begin with a tap note in button one at measure 1.0. Then a star note at measure 1.5 at button 1. Notice the value of 2 in the 8th column; this indicates that there will be two slides coming out of button one simultaneously. The following two lines state the beginning of two straight slides from button 1 with a duration of 0.5 measures. The last two lines at measure 2.0 state that the two straight slides we defined earlier will end at buttons 4 and 6.

Slide mechanics are mostly the same as SZTs, but we have to specify how many slides a star or break star note will make. Other than that, there are no other differences I can find for SCT files.

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8
Measure (whole part) Measure (fractional part) Hold/slide duration or unknown Note location Note Type Slide ID Slide Pattern Slide Amount

SDT

SDT is the final chart format created for Maimai Classic. It adds a new feature to slides that allow charters to specify the delay of a slide. The delay is how much time, in measures, will pass before the star in the slide will begin to move. By default, in older chart formats, the delay is a quarter note with no way to change it.

This way, you can create slides faster than a quarter note without the bug caused by a delay being greater than the slide duration. Before the slide duration is less than a quarter note, the slide would finish without the star moving, confusing the player. You could also make gimmicks out of this by creating a slide with a long duration and a delay that's a bit smaller than the slide duration. Making the star wait for a while at the start then quickly move to the end. You can see this gimmick at the end of QZKago Requiem Re:master, where the fan slide is visible for a long time but only moves quickly at the end.

Using the previous example in the SCT format, let's make the straight slide from button 1 to 4 have a quarter note delay (0.2500) and remove the straight slide's delay from button 1 to 6.

1.0000, 0.0000, 0.0625, 1,   1,   0,   0,   0,  0.0000,
1.0000, 0.5000, 0.0625, 1,   4,   0,   0,   2,  0.0000,
1.0000, 0.5000, 0.5000, 1,   0,   1,   1,   0,  0.2500,
1.0000, 0.5000, 0.5000, 1,   0,   2,   1,   0,  0.0000,
2.0000, 0.0000, 0.0000, 4, 128,   1,   1,   0,  0.0000,
2.0000, 0.0000, 0.0000, 6, 128,   2,   1,   0,  0.0000,

You might have noticed that there are just four decimal places instead of 6 previously. Looking at the chart files from previous chart formats, it seems like it only ever used four decimal places. The last two decimal places are only zeroes. It looks like the engine only parses four decimal places in the first place.

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8 Column 9
Measure (whole part) Measure (fractional part) Hold/slide duration or unknown Note location Note Type Slide ID Slide Pattern Slide Amount Slide Delay

Slide Samples

To get a better image of the various patterns and combinations, I've compiled all of them in a video with information. The SDT file used to create the samples shown, and the equivalent Simai chart is available.

Straight

  • SRT slide pattern: 0
  • SZT, SCT and SDT pattern: 1
  • Simai equivalent: -

Pattern 1

NOTE: End position should at least be two places away from the start position. Otherwise, it will default to an end position two places away (CCW) from the start position.


Around the judgement ring (CCW)

  • SRT slide pattern: 2
  • SZT, SCT and SDT pattern: 2
  • Simai equivalent: <, >, or ^

Pattern 2

NOTE: For SRT files, it can only do end positions three places max.


Around the judgement ring (CW)

  • SRT slide pattern: 1
  • SZT, SCT and SDT pattern: 3
  • Simai equivalent: <, >, or ^

Pattern 3

NOTE: For SRT files, it can only do end positions three places max.


Arc along the center (CCW)

  • SZT, SCT and SDT pattern: 4
  • Simai equivalent: q

Pattern 4


Arc along the center (CW)

  • SZT, SCT and SDT pattern: 5
  • Simai equivalent: p

Pattern 5


Zigzag (S)

  • SZT, SCT and SDT pattern: 6
  • Simai equivalent: s

Pattern 6

NOTE: There is only one possible end position for a given start position.


Zigzag (Z)

  • SZT, SCT and SDT pattern: 7
  • Simai equivalent: z

Pattern 7

NOTE: There is only one possible end position for a given start position. It will only work when the end position is at 180 degrees with the start position. Otherwise, it will look like a zigzag (s).


Start to center (straight) to end (straight)

  • SZT, SCT and SDT pattern: 8
  • Simai equivalent: v

Pattern 8


Start to center (straight) to end (CCW arc)

  • SZT, SCT and SDT pattern: 9
  • Simai equivalent: pp

Pattern 9


Start to center (straight) to end (CW arc)

  • SZT, SCT and SDT pattern: 10
  • Simai equivalent: pp

Pattern 10


Start to two places CCW (straight) to end (straight)

  • SZT, SCT and SDT pattern: 11
  • Simai equivalent: V

Pattern 11

NOTE: Will crash the game when the start and end position are the same. A bug happens when the end position is in-between the start and second point or is at the second point.


Start to two places CW (straight) to end (straight)

  • SZT, SCT and SDT pattern: 12
  • Simai equivalent: V

Pattern 12

NOTE: Will crash the game when the start and end position are the same. A bug happens when the end position is in-between the start and second point or is at the second point.


Fan

  • SZT, SCT and SDT pattern: 13
  • Simai equivalent: w

Pattern 13

NOTE: Will always fan to the opposite button regardless of the end position.


Conclusion

Writing and preparing for this was long and tedious. I hope this will be valuable information for those who want it. Much information is still unknown or unclear to me, so please let me know of corrections or additions. If I do find some additional information, I'll make a follow-up post.

For now, I'm working on a continuation of this. You might have noticed that Maimai Finale chart files are not in the same format as what I've discussed here. I'll talk about that in the next blog post about Maimai Classic.

I'm also working on a Simai to SDT converter. I just used it right now for the slide samples above. It was first written in Simai format before being converted to SDT. I'll release the complete code to my newly created Github account @donmai-me.

I hope you learned something from this and I'll see you next time.


Saltpack signed message

Twitter

Not maimai or reverse engineering related but I just created a twitter account. I don't really know what I'm going to tweet there aside from announcing new blog posts but hey it's there. I'm @donmai_me on twitter.


Saltpack signed message

Maimai Classic and Xact

My first actual post to this blog will hopefully be the first in a series of posts about my findings in Maimai. Hopefully, we will have the knowledge and tools to make custom charts. And appreciate the underlying engine that powers the classic Maimai games from 2012 to 2019 and the new Maimai Deluxe (stylised as Maimai DX) that just released. We'll first start with how Maimai stores game sounds (i.e. music, sound effects, and voices)

Let's start with what Maimai is. Maimai is an arcade rhythm game developed and published by Sega. It was released in Japan in July 2012 and has since produced 15 releases (updates included.) The game has eight buttons and a touch screen, where the player has to press buttons and swipe the screen in time with the music. Watching a video of actual Maimai plays will give you a better idea of how the game works than my lame explanation. So go watch some if you're new to Maimai.

A picture of a person playing Maimai Pink by Julien Haack

I will refer to the Maimai versions from 2012 up to Finale as Maimai Classic since the underlying engine is similar to old versions. Maimai DX (and DX Plus), the new version, is a complete rewrite of the game and released on a more recent cabinet. It shares almost none of the code from Maimai classic aside from chart files.

Maimai classic uses the Xact engine, part of DirectX 9, for audio playback. I will not detail how Xact works, but basically, the sounds are stored as a collection composed of two files, a sound bank (.xsb) and a wave bank (.xwb). The wave bank contains audio data, while the sound bank contains cues and other information about the sound. The following are the header information1 for one of the files in the game:

struct SoundBankHeader SBH_776 = {
      .magic = 0x5344424B, // "KBDS"
    .toolVersion = 45,
    .formatVersion = 43,
    .crc = 0xED95,
    .lastModifiedLow = 0xBE495CDF,
    .lastModifiedHigh = 0xEB69D401,
    .platform = 0x01, // Windows
    .numSimpleCues = 0x01,
    .numComplexCues = 0,
    .unkn3 = 0,
    .numTotalCues = 0x10, // ????? should be 0x1
    .numWaveBanks = 0x1,
    .numSounds = 0x1,
    .cueNameTableLen = 0xB,
    .simpleCuesOffset = 0xD6,
    .complexCuesOffset = 0xffffffff,
    .cueNameOffset = 0x101,
    .unknOffset = 0xffffffff,
    .variationTableOffset = 0xffffffff,
    .unknOffset2 = 0xffffffff,
    .waveBankNameTableOffset = 0x8A,
    .cueNameHashTableOffset = 0xDB,
    .cueNameValsTableOffset = 0xFB,
    .soundsOffset = 0xCA,
    .name = "776"
};

struct WaveBankNameTableEntry WBNTE_776 = {
    .name = "776"
};

struct SoundEntry SE_776 = {
    .flags = 0x0,
    .category = 0x1,
    .unkn2 = 0x97,
    .volume = 0x0,
    .unkn3 = 0x0, // Pitch??
    .entryLength = 0xC,
    // flags is 0 therefore not a complex sound
    .trackIndex = 0x0,
    .waveBankIndex = 0x0
};

struct SimpleCueEntry SCE_776 = {
    .flags = 0x4,
    .soundOffset = 0xca
};

struct CueNameHashVal CNHV_776 = {
    .nameOffset = 0x101, //Links to CueName struct below
    .unkn = 0xFFFF
};

struct CueName CN_776 = {
    .name = "GM_BGM_776"
}

Header information for Maimai's xwb file (Sample from 776):

struct WaveBankHeader WBH_776 = {
    .magic = 0x444E4257, // "DNBW"
    .toolVersion = 45, 
    .formatVersion = 43,
    .waveBankData1Offset = 0x34,
    .waveBankData1Length = 0x60,
    //Not sure about these
    .waveBankData2Offset = 0x94,
    .waveBankData2Length = 0x18,
    .waveBankData3Offset = 0xAC,
    .waveBankData3Length = 0x00,
    .waveBankData4Offset = 0x00,
    .waveBankData4Length = 0x00,
    .waveBankData5Offset = 0x800,
    .waveBankData5Length = 0x4CE000,
};

struct WaveBankData WBD1_776 = {
    .flags = 0x00080001, // Streaming or mask, has seektables
    .entries = 0x1,
    .name = "776", // name size is 64 bytes
    .metadataElementSize = 0x0,
    .nameElementSize = 0x0,
    .alignment = 0x0,
    .compactFormat = 0x18, // if 18, nSamplesPerSecond is 18
    .buildTimeLow = 0x40,
    .buildTimeHigh = 0x800,
};

struct WaveBankData WB2_776 = {
    .flags = 0x040BA400, //???
    .entries = 0x815888a, 
    //........
};

// WB3_776 and WB4_776 are 0

struct WaveBankData WB3_776 = {
    .flags = 0x00100000, //
};


'm not sure about most of these fields; however, we can gather crucial information from this. The sound and wave banks from the game both have a tool and format version of 46 and 44. If we can find an Xact creation tool that produces the same tool and format version as those from the game, we can create our songs to replace or add2. After some digging, I've found that the Xact creation tool from DirectX SDK from March 20093 produces the same tool and format version.

When creating our wave and sound banks, it is essential that the following be done so the game will read the custom wave and sound banks:

  • Wavebank name should be the three-digit numerical id of the chart (e.g. 776, 534, etc.)
  • Wavebank should only contain the song.
  • Wavebank should be compressed with ADPCM.
  • Wavebank should be streaming and not in memory.
  • Soundbank name should be the numerical id of the chart.
  • Cue type should be simple.
  • Cue name should be GM_BGM_NumericalID where NumericalID is the numerical id of the chart
  • Cue should only contain one track from the wave bank.

If these are followed and done correctly, we should create a compatible wave and sound banks for the game. I was supposed to add an actual tutorial, but I want this post short and simple. I've already given enough information for you to make compatible banks, and there are plenty of tutorials out there on the Xact creation tool. Maybe I'll do a follow-up tutorial if I feel like it. But there are more important posts in the future regarding Maimai. I hope that you learned from this post. See you next time.


Saltpack signed message


  1. More info about it on multimedia.cx wiki 

  2. For now we can only replace existing songs. To add songs we still need to modify the game's database which will be covered in a future post. 

  3. Can be downloaded from archive.org or from Microsoft 

Hello world

Welcome to my small blog!

Edit (14 April, 2021): All Saltpack signed messages will now be in a separate private post. The link should be at the bottom of every public post.

I'm Donmai. Unfortunately I can't show you anything interesting, yet. This blog will mostly contain my findings in reverse engineering games and probably other software.

I'll also sign every correspondence with my EdX25519 key:

kex1996qewz7hgxzyrhlvunnspm4wdzt0m9c3xvl9wfaxvrz0pdlm88shhwfc5

Every correspondence will contain a Saltpack signed message at the end. Tools to verify saltpack signed messages can be get at keys.pub. I trust that you'll save the key to verify that the message does indeed come from me. Hope to see you again soon.


Saltpack signed message