Kevin Vance - Nintendo DS Texture Compression

Entries | Archive | Friends | Friends' Friends | User Info

09:51 am

Nintendo DS Texture Compression

Tuesday, March 24th, 2009
Previous Entry Share Next Entry

As promised: a long, boring article about Nintendo DS texture compression. This article describes the compression format. I may follow it up with another one about how my compressor works specifically.


Hardware texture compression on the NDS allows you to display very large textures for the device. You can store two compressed 1024x512 textures on a device with a total screen size of 256x384. There is no performance penalty, as the hardware can decompress them on the fly. Unfortunately, the tools to generate compressed textures are not available to the homebrew community.

Compressed textures have a slightly complicated format, and the hardware is not very forgiving about how they are stored. The texture is divided into three parts:

  • Pixmap: The pixels of the texture, grouped into 4x4 blocks
  • Index: A map from each pixel block to a part of the palette
  • Palette: The list of colors, used in groups of 2 or 4

The pixmap can be stored in texture slots 0 or 2. The index must be stored in slot 1. This prevents textures larger than 1024x512, since the index would be in the middle of them. Palettes are stored normally.

The Pixmap

Pixmaps are packed into 32-bit blocks of 4x4 pixels. Each pixel is two bits (valued 0-3). The top-left pixel is the least significant 2 bits, and the bottom-right pixel is the most significant 2 bits. Each 32-bit word is stored one after another, from the top-left of the image to the bottom-right. The pixmaps can be stored in texture slot 0 or slot 2.

A bitfield as a 4x4 block

The Index

The color each pixel value represents depends on its color index. Each index entry is a 16-bit field containing two values.

Bits 14-15Bits 0-13
ModePalette Index (increments of 4 bytes)

Bits 0-13 contain an offset into the palette in increments of 4 bytes, i.e. a value of 1 is an offset of 4 bytes, a value of 2 is an offset of 8 bytes, etc. Since each color is a 16 bit word, this gives you a granularity of 2 colors. Bits 14-15 contain the mode, which specifies how the pixel values map to colors in the palette. These modes are as follows:

  • Mode 0: Three colors and transparent
  • Mode 1: Two colors, a 50/50 blend of those colors, and transparent
  • Mode 2: Four colors
  • Mode 3: Two colors, and 3/8 and 5/8 blends of those colors

This means a pixel valued 0-4 can have any of these meanings (quoting GBATEK):

PixelMode 0Mode 1Mode 2Mode 3
0Color 0Color 0Color 0Color 0
1Color 1Color 1Color 1Color 1
2Color 2(Color 0 + Color 1)/2Color 2(Color0 * 5 + Color1 * 3) / 8
3TransparentTransparentColor 3(Color0 * 3 + Color1 * 5) / 8

The index must be stored in texture slot 1. The first 16-bit word affects the first 4x4 block in texture slot 0, continuing on for 64 KiB. At offset 0x10000 from slot 1, this word affects the first 4x4 block in texture slot 2, and so on until the end of slot 1. There is no flexibility at all in storing this index.

Slot 1 OffsetMemory Contents
00000 - 0FFFFIndex for texture in slot 0
10000 - 1FFFFIndex for texture in slot 2

The Palette

The palette is stored no differently than a palette for an 8-bit paletted texture. Each color is packed in 5 bit/channel framebuffer format.

Bit 15Bits 10-14Bits 5-9Bits 0-4

Because each block can only address two to four consecutive colors, there needs to be a lot of redundancy to get decent results. Luckily, you can use a palette up to 64 KiB in size. Picking this palette is the key to writing a good texture compressor.


Nintendo gives you a lot of room for cleverness in writing a texture compressor. If each block had its own palette, compression would be relatively straightforward. Since VRAM is too small for that, you will have to find blocks to share palettes, overlap palettes, reduce them to 2 colors, and find even more tricks.

Link )Reply )

(Deleted comment)
[User Picture]From: kvance
2009-03-24 04:08 pm (UTC)
And I love apropos userpics!
(Reply) (Parent) (Thread)
From: entropy512
2009-03-25 01:48 am (UTC)

While I have not done any DS homebrew, I need to reread this when I'm a bit more sober (yay for consuming wheat beer supplies in preparation for a parental visit).

FYI, I have a bit of background (both theoretical and practical) in information theory and compression techniques so might be able to provide some input when the wheat beer is gone. :)

Either way, good job on a wicked cool achievement. :)
(Reply) (Thread)
[User Picture]From: kvance
2009-03-25 09:02 pm (UTC)
That would be great. I'd appreciate your input.

The quick version of my compressor is like this:
  1. Convert and dither the source image to 5 bits/channel.
  2. For each 4x4 block in the image:
    1. Create a block with the best possible palette in modes 1, 2, and 3.
    2. Select the block with the least error (usually mode 2).
  3. Split the palettes into groups made of 2 colors, and those made of 4.
  4. Determine the number of 2-color and 4-color palettes that will fit in the requested amount of space.
  5. Treating each pixmap block as a palette vector (e.g. [R1 G1 B1 R2 G2 B2] for a 2-color palette), use k-means clustering to create the requested number of new 2-color and 4-color palettes.
  6. Link each block to its new palette, as decided by k-means.
That produces some okay results. I've tried adding a second pass where I redo every block with too high an error by brute force quantizing it to every palette until the error is acceptable. But this approach isn't working well: trying all 64k palettes takes too long even running on 4 cores. Maybe I could make this work with a decent data structure for searching for the nearest palettes.

I'm considering going back to step 2 and accepting any 2-color palette within some error even when there is a better 4-color palette. More 2-color palettes means I'll have a smaller total palette size, which means I'll be able to delete fewer palettes at the end.

Edited at 2009-03-25 09:02 pm (UTC)
(Reply) (Parent) (Thread)