Field packing

So far we’ve focused on treating a multibit integral value as an array of individual bits, and performing Boolean operations on them. However, similar bit hacking can enable us to pack and unpack arbitrary fields in memory, in order to match the requirements of a file format or network protocol, or to trade off between memory usage and processing time. To illustrate what I mean, let us explore different choices about how we might represent playing cards in C.

Suits and ranks

A playing card has two attributes, its suit and its rank. The suits, following French tradition, are spades, hearts, clubs, and diamonds; there are four of them. There are thirteen ranks: ten numbered cards, one (‘ace’) through ten, and three ‘face’ cards, jack, queen, and king. There are fifty-two (4×13=52) cards in a deck.

A common first step in representing abstract values is to number them, since we have so many tools at our disposal for representing numbers already. So I will define a four-element array of strings with the names of the suits, and a thirteen-element array of strings with the names of the ranks, and the corresponding indices will be the numbers that represent each idea.

const char *suits[] = { "spades", "hearts", "clubs", "diamonds" };
const char *ranks[] =
    { "A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K" };

So, for example, the 5 of clubs would be rank 4 and suit 2; the ace of spaces is rank 0 and suit 0; the kind of diamonds is rank 12 and suit 3. These could be integral values in any type big enough to hold the necessary range. The suits range from 0 through 3, i.e. four non-negative values, so they can be represented in two bits. The ranks range over 13 non-negative values, and log2(13)3.7, so 3 bits would be inadequate (only 8 possible values) whereas 4 bits is slightly more than needed (16 possible values). We lack the representational power for 0.7 of a bit, so four bits for a rank will do. Note that a suit or a rank fits in less than a byte, so any of C’s built in integral types will suffice.

/* it's the five of clubs! */
uint8_t suit = 2;
uint8_t rank = 4;
printf("it's the %s of %s!\n", suits[suit], ranks[rank]);

Since the smallest type is a byte, and a card has two attributes, each card requires two bytes.

Packing the fields together

There are only 52 unique cards, and log2(52)5.7, so 16 bits (two bytes) seems wasteful for a single card since we know 6 bits will be more than adequate. How can we at least pack those two fields together into a single byte? One approach is to number the cards in the deck directly, 0 through 51, and determine functions that can ‘unpack’ the suit and rank numbers from a card number. Here is such a set of functions.

uint8_t pack_card(uint8_t suit, uint8_t rank) {
    return suit * 13 + rank;
}

uint8_t unpack_suit(uint8_t card) {
    return card / 13;
}

uint8_t unpack_rank(uint8_t card) {
    return card % 13;
}

With this scheme we can represent cards with one byte for each. So, a hand of five cards would be five bytes. However, thinking of each card drawn independently, five cards represent 525 different possibilities, and log2(525)28.5, so if we pack them aggressively enough, it should be possible to fit a five-card hand into less space than a 32-bit integer (saving a whole byte).

uint32_t pack_hand(uint8_t cards[5]) {
    uint32_t hand = 0;
    int i;
    for (i = 4; i >= 0; i--)
        hand = hand * 52 + cards[i];
    return hand;
}

uint8_t unpack_card(uint32_t hand, int index) {
    int i;
    for (i = 0; i < index; i++)
        hand /= 52;
    return hand % 52;
}

This is a fairly aggressive space–time tradeoff; saving a byte per card, or six bytes per hand compared to a naïve strategy could be a huge benefit if a huge number of them have to be stored, e.g. to explore a tree of possible moves for a game. However, accessing the components of those packed values requires extensive computation rather than being directly addressable in memory. The implementations I’ve shown could be optimized significantly, but even with significant loop unrolling and precomputation, there’s going to end up being a division for each access.

Bit packing

Arranging to use powers of two allows a middle ground in this tradeoff. The reason to multiply and divide by 13 when combining suits and ranks is to ensure that all 52 unique combinations are numbered, with no overlapping encodings. However, any number at least 13 would work, and 16 is a power of two; choosing 16 corresponds to rounding up log2(13) to 4 and accepting the cost of four bits for a rank. The benefit of choosing a power of two is that we can implement multiplication and division with the very cheap shifting operators, and replace modulo (equivalent to a division) with bit masking.

uint8_t pack_card(uint8_t suit, uint8_t rank) {
    return suit << 4 | rank;
}

uint8_t unpack_suit(uint8_t card) {
    return card >> 4;
}

uint8_t unpack_rank(uint8_t card) {
    return card & 0xf;
}

At 6 bits per card (4 for rank and 2 for suit), a hand of five cards would take 30 bits, which while pessimistic compared to less than 29, still fits in a 32-bit integer.

uint32_t pack_hand(uint8_t cards[5]) {
    uint32_t hand = 0;
    int i;
    for (i = 4; i >= 0; i--)
        hand = hand << 6 | cards[i];
    return hand;
}

uint8_t unpack_card(uint32_t hand, int index) {
    return hand >> (6 * index) & 0x3f;
}

(You might have also noticed that while 13 is not a power of two, the number of suits—four—is a power of two, so if I had chosen to pack the suit in the low bits and the rank in the higher bits, rather than the other order as I did, the factor of four would have been easy to optimize into bit operations.)

Conclusion

Bit-level field packing is often the ideal point in trading off storage space for encoding effort. The heavier-encoding strategy above that employed full-on division can approach the ideal of encoding a card in log2(13) bits, even though that isn’t a whole number, essentially by delaying the point when we have to round to a whole number of bits to store the result in memory. (Taken to extremes, this is essentially the idea behind ‘arithmetic coding’.) Only using memory-addressable fields for every single attribute of our data, on the other hand, can waste vast amounts of storage to avoid any encoding at all. Choosing a whole number of bits for each field, but packing the fields together using shifts and masks, uses operations the hardware can perform very cheaply to save what can sometimes amount to a large amount of space.

Of course, what is true of all optimizations is worth repeating here as well: Don’t try to guess what you will have to optimize. Write the clearest, easiest-to-get-right version of your code first, and then, only if you measure an actual performance problem, is it worth worrying about how to fix it.

Performance aside, packing bit fields is also an essential component of writing code to work at a low level with storage layouts like binary file formats or network protocols that aren’t always aligned to bytes.

You have attempted 1 of 1 activities on this page