# Chess Move Compression

**Executive Summary**

A routine move representation of 16 or 32 bits makes sense for normal chess computation. For archive or transfer compression makes sense. It is possible to compress chess moves to much less than 8 bits (on average), although information theory dictates that sometimes 8 bits are required. In this article/blog post I describe a kind of hybrid system that does not seek to compress aggressively, instead using 8 bits for every move. The benefits of this system are; a useful amount of compression, extreme convenience (what could be simpler than representing games as strings, once move per 8 bit ‘character’?), and (very importantly) negligible consumption of CPU cycles compared to more aggressive schemes. This representation method has worked out well for implementing the new database features of the Tarrasch V3 program.

Please note that I pursue my hobby of chess programming more or less in isolation. I don’t use (or even look at) other people’s chess code. I haven’t read any academic research. I play around and come to my own conclusions. I don’t pretend this is a strength, it’s clearly a weakness. But for now at least it suits what I am trying to achieve – basically personal satisfaction. So my conclusions are (far from) peer reviewed best practice. It’s just a random dude’s thoughts on the internet, buyer beware.

I absolutely don’t expect any of what follows to be original, I expect it to be essentially a rediscovery and/or restatement of existing ideas.

**One Move in 40 Bits**

Unsurprisingly normal chess move notation is not particularly space efficient. Two common conventions are SAN or standard algebraic notation and a more terse form used to communicate with UCI (Universal Chess Interface) engines. A single move can take up to eight characters in SAN. for example e7xf8=Q+. UCI uses either four or five characters, five characters are required only for promotions, our example in UCI move format is e7f8q.

Already we can see an issue that will affect our quest for efficient representation. How much information are we trying to store? In the SAN example, we can tell that the move is a capture, a promotion and gives check. In the UCI example we can see promotion directly and infer capture (since only pawns promote and a pawn can only go from e7 to f8 by capturing). In our quest we are going for lean and mean – we seek to store only the information required to uniquely identify any legal move in any legal position. Very importantly (more about this later) we seek a scheme that will always produce exactly the same encoding for a given move in a given position, irrespective of how that position arose.

**One Move in 12 Bits**

My first naive dabbling led me to the conclusion that to store a chess move requires somewhat more than 12 bits (so somewhat more than 1.5 bytes, for our purposes a byte is 8 bits, and is also essentially equivalent to a character in the examples above). A chessboard of course has 64 squares and since 2^6 = 64, six bits neatly accommodate a single square. So a move requires two squares (source and destination) or twelve bits. Unfortunately just as UCI sometimes requires five characters, 12 bits seemed to me to be sufficient usually, but insufficient universally because of the need to accommodate underpromotion. Four possible promotions are always possible, so since 2^2 = 4, 2 more bits are required, for a total of 14. Right? Well no, even I quickly realised it was possible to avoid a 2 bit promotion tax. Imagine a pawn promoting from e7. It can go only to d8, e8 or f8. That’s only three possibilities, yet we are allocating 6 whole bits to encode this destination. Wasteful! Our first piece of cleverness today comes into play. We will adjust the destination square to indicate underpromotion, so e7-f8 indicates a rightward capture and Queen promotion, e7-f7 indicates a Rook, e7-f6 indicates a Bishop, e7-f5 indicates a Knight (say). Of course we apply the same idea to promoting leftward captures and non capturing promotions.

**One Move in 5.5 Bits**

Now if I had been paying better attention, it should have occurred to me that this was really too easy , it’s hardly a case of squeezing blood from a stone. This is contra-indicative of an optimum strategy. We can use information theory to probe the limits of compression, and to inform us whether we are really getting close to an optimum solution.

Information theory tells us that the number of bits needed to encode one of N equally likely possibilities is log2(N). We have already seen log2(64) = 6 bits and log2(4) = 2 bits in the discussion above. As an aside, all these even powers of two, 2 colours, 4 possible promotions, 8 pawns per side, 8 pieces per side, 64 squares make the chess programmer’s life easier than it would otherwise be. So the key question becomes, how many moves are possible? That will dictate the theoretical minimum number of bits needed to encode one of the possible moves.

For the sake of argument, let’s assume there are 32 legal moves available in a position. Log2(32)=5 or equivalently 2^5 = 32. So 5 bits can theoretically encode a move. How? All that is required is a list of the legal moves. This isn’t a problem since generating a list of legal moves is a fundamental building block of computer chess. The order of the moves in the list is all important, so we need a convention to ensure the move ordering is deterministic, perhaps we always sort the list according to some agreed criteria. Then we simply find the move we wish to encode in the list – if it’s a legal move it will be in there somewhere. The offset (or index, or position) of the move in the list is a 5 bit number (or code) in the range 0-31 and uniquely identifies our move. Voila.

Of course there aren’t always 32 legal moves available in any position, but a straightforward generalisation of this procedure already gets us to a reasonably practical and close to optimal scheme. The generalisation is to vary the number of bits per move according to the length of the list of legal moves in the position. The number of bits used in any position is obtained from our old friend, the formula log2(N) where N is the length of the list of legal moves. In practice most moves will need 5 or 6 bits, 6 bits is required if there are more than 2^5=32 possibilities, but that is almost always enough. Practical positions with more than 2^6 = 64 legal moves are rare. In “The Joys of Chess” by Christian Hesse the author addresses the upper limits in the chapter “How Many Moves?” and concludes that the current record holders are 75 for a real game (Podhorzer-Palda Vienna 1933, White’s 25th move) and 218 for a constructed positition (Petrovic 1964, FEN notation “R6R/3Q4/1Q4Q1/4Q3/2Q4Q/Q4Q2/pp1Q4/kBNN1KB1 w – – 0 1” note that WordPress is converting normal hyphens into something else in this FEN string – sorry about that – retype the two hyphens to get Tarrasch or some other chess program to read the string). Naturally, in the Petrovic position White has promoted all 8 pawns to Queens, so now there are 9 Queens all strategically arranged a knight’s move apart in order to minimise interference. The most important thing we can take from this is that 8 bits will absolutely, positively, be sufficient in any legal position. Even a casual inspection of the Petrovic position reveals it must be reasonably close to the upper limit. The empirical evidence is that even in such an esoteric field as this, no record would stand for fifty years if it could be absolutely decimated, and it would require absolute decimation to get from 218 to beyond 2^8 = 256.

Although we are ostensibly discussing compressing individual moves, a variable bit scheme like this is probably best thought about in the context of a whole game. A game as a series of moves is represented as a stream of bits. Each move is represented as somewhere between 0 and 8 bits according to the number of possible moves in the position. Most moves would need 5 or 6 bits, it would be quite common to use less (for example endings with much reduced material – and replies to checks) and very rare to use more. An amusing point is that some moves really would require zero bits – this happens when there is only one legal move in the position, there’s no need to store anything at all in that case. Our fundamental formula expresses this concept in the language of mathematics, log2(1)=0 [or equivalently 2^0=1].

**One Move in 4 Bits**

It would be a mistake to think no further compression is possible. For a start, there are usually unused codes in the scheme, since usually the number of legal moves in a position is not normally an exact power of two. Consider the initial position, there are twenty possible moves, so 5 bits are required, and consequently 12 unused codes (2^5-20=12). These codes could represent common opening sequences. Later in the game such gap codes could be used to represent common multi-move sequences when they are uniquely identifiable (eg O-O then O-O, PxP then PxP, RxR then RxR, h6 then BxNf6 then recapture [or the same pattern starting with a6,h3,a3]).

We can get more ambitious. In our magic log2(N) formula there are N equally likely possibilities. But all chess moves are not created equal, they are not equally likely. This is an example of the principle that non random information is compressible. Consider the order of the moves in the legal moves list. Instead of using some simplistic sort technique to establish a deterministic order, imagine instead using a chess engine to rank the moves in order of strength. Now vary our basic scheme so that moves early in the list use less bits than moves later in the list. Designing the details of such a scheme can be a lot of fun. One idea that suggests itself is a 3 bit code sometimes followed by extended bits. 3 bits allows 2^3=8 codes. 6 of these codes represent the first 6 moves in the list. The other 2 codes indicate one of two possible formats for extended bits. One format is 2 extended bits coding for 2^2=4 further moves from the list (moves 7 through 10). The other format is a fallback scheme, a variable number of extended bits, as many as required to represent moves 11 and above in the list. In this scheme well played games are more compressed than poorly played games. At a guess, a reasonable game would comprise perhaps 70% 3 bit moves, 20% 3+2=5 bit moves, 8% 3+5=8 bit moves bits, 2% 3+6=9 bit moves. This works out at an average of 3.9 bits per move. So we have improved on a naive 12 bit scheme by at a factor of about three.

In all of this we can see one of the basic tenets of all computation playing out- there is no such thing as a free lunch. In this case space costs us time (it’s a bit like chess!). What I mean by this is that reducing the amount of space (memory) required costs a lot of time (CPU cycles). Move list generation takes significant CPU time. Ranking moves by strength (to get more compression) takes (comparatively) vast amounts of time.

**One Move in 2 Bits, Maybe?**

A couple of weeks after I wrote this post it got some attention on Hacker News and Reddit. That experience convinced me to add the introductory paragraph that now starts this post, which hopefully helps make this work as an article not just a blog post. I also learned some new things from the commentary there and I’ve added this paragraph to reflect that. Specific thanks are due to Reddit commentator Dag Ågren. It turns out that there’s a way to turbo charge the chess engine based compression scheme. Consider the 3.9 bits per move scheme above. It works best when there are 6 or less good moves available, reasonably when there are 10 or less, poorly when there are more than 10. It’s a compromise scheme, it attempts to be at least adequate in every position. But what if we could tailor a scheme for each position we encounter? We could potentially do a lot better in many, maybe most, positions. Especially positions in which there aren’t many good moves. For example, if there is only one good move in a position, we’d design a scheme that used only one bit if that move was played. If there were two good moves, we’d use a scheme that used only two bits if either of those moves were played. And so on. Surprisingly (to me anyway) it turns out it is possible to do this in a practical way. The trick is to dynamically select a scheme well suited to each position in the sequence of positions that constitute a game. There is no need waste extra bits indicating which scheme is being used, because the decoder is synchronised to the encoder. It follows along through the same sequence of positions. As long as it uses exactly the same algorithm as the encoder to select the scheme best suited to any position, it will always pick the right scheme for decoding. It turns out that a technique called “arithmetic coding” can automatically generate optimal schemes given a list of symbols (moves in our context) and symbol probabilities (which can reasonably be inferred from the engine’s move score). The title of this section is just speculation on my part, if there are an average of 4 good moves in a position, and if a strong player (almost) always plays a good move, then 2 bits per move is theoretically the best we could do, for high quality games. Actually it has just occurred to me that when playing through a master game with kibitzing in Tarrasch gives a feel for this, as the “best” four lines are shown. The percentage of moves played that come from the engine’s top four list often does feel very high. But remember the downside, even if we can get reasonable output from a chess engine in 1 millisecond (a stretch), a big database has of the order of 1 billion moves, so compressing or uncompressing a complete database is going to take of the order of a million seconds (more than a week).

**The Sweet Spot – One Move in 8 Bits**

I’ve given hints in the language I’ve used that I actually haven’t implemented any of the complicated schemes above for Tarrasch 3. Basically they are far too slow for my purposes. The scheme I adopted instead trades off compression for time. It turns out it is possible to compress moves into 8 bits with negligible CPU cycles. For the pragmatic programmer this represents great value. A fixed one move per byte system is very convenient for the programmer, much more convenient than 12 bits. In particular, 8 bit moves allow move sequences to be represented as strings, the length of the string is the number of moves, and simple indexing into the string locates individual moves in a natural way. Although further compression with a variable bit scheme has some attractions, it would not only cost a lot of time, it would also be less simple and convenient to use.

Our 8 bit scheme works as follows. We split each 8 bit code into two 4 bit fields. Each field has 2^4=16 codes. There are a maximum of 16 pieces (let’s use the word ‘piece’ as shorthand for ‘pawn or piece’) on each side of the board, so one of our two fields (let’s call it the piece code) selects one of the 16 pieces in the army of the side to move. The other field (let’s call that one the vector code) selects one of 16 possible moves for the selected piece. Too easy! As they say in Australia.

Of course if it were really that easy I probably wouldn’t be spending hours writing this up. There are some difficulties still to overcome. We’ll deal with the most obvious one first. A queen can make up to 27 moves, it’s the one piece on the board that can exceed the 16 possibilities our vector code allows. We solve this problem by conceptually splitting the Queen into two parts, a bishop and a rook. The bishop and rook each get their own piece code. This seems to need a 17^{th} piece code we don’t have, but we balance splitting a queen into two by combining both knights into one. The 16 vector codes of this dual knight are sufficient for both knights since they can only make up to 8 moves each.

Another fundamental problem we need to overcome is extra pieces appearing on the board due to promotion. Extra queens are particularly problematic since they need two piece codes, so it’s not possible to simply recycle the code formerly used by the pawn that promoted. Our strategy is to have a fall back mode we can call upon in some (quite rare in practice) circumstances. The fall back mode is the slow system we are already familiar with; we generate a list of all legal moves, sort them, and then index into the list to select our move. There’s no need to mess about with variable bit fields, we’ll just use 8 bits, which we know is always sufficient, every time. More about this later.

Our 16 available piece codes comprise eight pawn codes, one king code, one dual knight code, three bishop codes and three rook codes. Remember that we are treating the queen as a bishop plus a rook. A vector coding scheme is required for each type of piece code. A king makes the least demands on the 16 codes available, we simply define 8 direction vectors, N,S,E,W,NE,NW,SE,SW. Castling can be accommodated by using two more codes, or we can be a little tricky and use the fact that a king cannot move in all 8 directions on its initial square. So we can represent White O-O (for example) as the White King moving South East from e1.

For the dual knight, we use 3 of the 4 available vector bits to define 8 direction vectors NNE, NNW, NWW, NEE, SSE, SSW, SEE, SWW corresponding to the eight possible knight moves. The other bit selects one of the two knights.

Rook and bishop vector codes also split the 4 available bits into 1 and 3. The rook code uses the 1 bit field to indicate whether the move is changing the rank or the file, the 3 bit field then indicates one of the 8 ranks or files. A bishop code uses the one bit field to indicate whether the move is along a diagonal that rises from left to right, or falls from left to right. The 3 bit field indicates a destination file.

Due to the need to accommodate the possibility of underpromotion, the humble pawn definitely needs all 4 vector code bits. The most obvious system is to use 2 bits to indicate one of the 4 possible types of pawn move (advance, double advance, left capture, right capture) and the other 2 bits to indicate one of the four possible promotions, if it is a promoting move. But we can exploit the fact that a double advance never promotes to improve on this scheme; We use 2 bits to select one of 4 possible types of pawn move – non promoting, promoting advance, promoting left capture or promoting right capture. If the move is one of the three promoting types, then the other 2 bits select one of the four possible promotions. If the move is non-promoting, then the other 2 bits indicate the type of move (advance, double advance, left capture, right capture). This is a better scheme because it is more efficient, it encodes extra information (is the move a promotion or not?) in the same space.

To calculate the 8 bit move code for any legal move in any legal position we first scan through all pieces of the side to move. Opposing pieces are essentially irrelevant. Up to four pieces are assigned fixed piece codes; the king and (if present), queen, light squared bishop and dark squared bishop. Pawns, rooks and knights are assigned dynamic piece codes according to their square location using an arbitrary square ordering scheme. So up to two knights are identified as knight 0 and knight 1, they share the dual knight piece code. Up to two rooks are identified as rook 0 and rook 1 which are individual piece codes. Up to eight pawns are identified as pawn 0, pawn 1… pawn 7, which are individual piece codes. If there is more than one queen, or more than one light squared bishop, or more than one dark squared bishop, or more than two knights or more than two rooks, then (and only then) we use the slow fall back scheme to calculate the move code. Otherwise the 8 bit move code is a combination of the 4 bit piece code of the piece to move with the appropriate 4 bit vector code.

At first glance this might seem to be unnecessarily computationally expensive. Why scan over the whole board every time? Why not assign fixed piece codes in the starting position and then track them? In fact that’s exactly how I reasoned when I first implemented compressed moves and for a long time I did use fixed codes. It was a huge mistake. Huge! The dynamic coding scheme buys us a very important property. Once we have settled on all the various arbitrary conventions we need, our algorithm always produces the same move code for a given move in a given position. Unfortunately using fixed codes means the history leading to a position influences the move code generated.

Consider the following opening transposition; A) 1.e4 e6 2.c4 d5 3.cxd5 exd5 4. exd5 c6 5.dxc6. B) 1.e4 e6 2.c4 d5 3.exd5 exd5 4. cxd5 c6 5.dxc6. The move 5.dxc6 is the same move in the same position in each case, but because the capturing pawn started its life as an ‘e’ pawn in line A) and as a ‘c’ pawn in line B), a fixed scheme will generate two different 8 bit move codes for the move. This has the effect of potentially messing up basically any chess database feature at least occasionally. Pawns and Knights quite often mischievously swap places like this, rooks less so, but I am sure there are some real life instances. At least theoretically it is possible for a promoted piece to feature in a position normally reached without promotion. I agonised over these things and killed off huge numbers of brain cells trying to think up elaborate “normalisation” tricks to try to repair an inherently faulty underlying scheme. Futility exemplified.

One day I (finally) returned to thinking about using dynamic coding instead. All those pawns that constantly needed to be reordered, surely it was impractically slow. I think all programmers are familiar with the delicious sensation of a momentary flash of insight suddenly dissolving a problem. It turns out that we can use dynamic codes with a negligible amount of extra computation. The way to do it is to track the position of each piece and only change the dynamic codes as required. Of course that much is obvious, and with rooks and knights it is no big burden. Just keep track of both knights and as long as there are two knights on the board, check whether they need to swap places after each knight move. Make adjustments as required if a knight is captured or created by promotion. All of this applies in exactly the same way for rooks.

It’s the pawns that are a problem and the solution starts by using the right type of square ordering convention. My normal convention is; a8=0, b8=1, c8=2 … h8=7,a7=8 … h1=63. It’s normally a good convention but for this particular task it’s horrible. Instead we need a convention where we increment along files not ranks. For example, a1=0,a2=1,a3=2…a8=7,b1=8…h8=63. Now any pawn advance doesn’t affect pawn ordering at all. This is a direct consequence of the laws of chess, a pawn cannot leap over another pawn on the same file. Capturing pawn moves can require significant reordering, but a simple algorithm handles the task, and in the vast majority of cases it’s computationally cheap.

To understand the simple pawn reordering algorithm, consider that although a capturing pawn may need to be reordered relative to its neighbours, no neighbour needs to be reordered relative to any other neighbour. For a right capture, check if our pawn needs to swap places with its adjacent neighbour to the right. If it does, swap the pawns and repeat with the next neighbour to the right. The first time no swap is needed, we know the pawns have been reordered successfully.

An example might help. White has three pawns on a2 (pawn 0),b2 (pawn 1) and c2 (pawn 2). After right capture a2xb3, we initially have b3 (pawn 0), b2 (pawn 1) and c2 (pawn 2). Does our relocated pawn on b3 need to swap with b2, its adjacent neighbour to the right? Yes. So b2 (pawn 0) b3 (pawn 1) c2 (pawn 2). Does the b3 pawn need to swap with c2, its adjacent neighbour to the right? No. We are done with one simple swap. Zero or one swaps are by far the most common cases.

Let’s do another example with three swaps, the most I observed when compressing a database of 4.5 million games;

**b5** (pawn 0) c2 (pawn 1) c3 (pawn 2) c5 (pawn 3) f5 (pawn 4), White plays b5xc6

**c6** (pawn 0) c2 (pawn 1) c3 (pawn 2) c5 (pawn 3) f5 (pawn 4), Swap c6 and c2? Yes

c2 (pawn 0) **c6** (pawn 1) c3 (pawn 2) c5 (pawn 3) f5 (pawn 4), Swap c6 and c3? Yes

c2 (pawn 0) c3 (pawn 1) **c6** (pawn 2) c5 (pawn 3) f5 (pawn 4), Swap c6 and c5? Yes

c2 (pawn 0) c3 (pawn 1) c5 (pawn 2) **c6** (pawn 3) f5 (pawn 4), Swap c6 and f5? No

Left captures work the same way (of course). An example of a left capture;

a4 (pawn 0) b4 (pawn 1) **c2** (pawn 2), White plays c2xb3

a4 (pawn 0) b4 (pawn 1) **b3** (pawn 2), Swap b3 and b4? Yes

a4 (pawn 0) **b3** (pawn 1) b4 (pawn 2), Swap b3 and a4? No

The conceptual breakthrough that allows computationally inexpensive dynamic piece coding prompted me to completely rework my move compression algorithm. I introduced the simplifying concept that only the side with the move needs to be considered when calculating the compressed move code. Previously I would switch into fallback mode globally (i.e. for both sides) immediately an inconvenient promotion occurred. But more often than not an inconvenient new promoted piece (typically a second queen) is immediately captured before the promoting side gets to move again. So my improved approach is not only much simpler, it often avoids a completely unnecessary switch to fallback mode.

Statistically speaking, it turns out about 0.015% of compressed moves are calculated with fallback mode, and about one in 250 games requires some such moves. A reasonably simple change would improve these statistics by perhaps a factor of 100 or so (I will come back and edit the blog when I find out more precisely [see edit below]). I am sorely tempted to implement the change, which would basically confine fallback mode to freakish games. The idea is to allow a second queen, in much the same way as two knights and two rooks are allowed. When two queens (queen 0 and queen 1) are on the board, the piece codes for pawn 6 and pawn 7 are recycled and used for queen 1. Two queens would not be compatible with seven pawns, that really would necessitate fallback mode. In other words, the side with two queens would need to have lost at least one pawn to capture in addition to the pawn ‘lost’ in the act of promotion.

Successful program design is all about effectively managing trade-offs. I am not a hundred percent convinced that the extra ugly complexity is justified in this case. Time will tell. In the meantime you can check my current implementation at;

https://github.com/billforsternz/tarrasch-chess-gui/blob/development/src/t3/CompressMoves.h

and

https://github.com/billforsternz/tarrasch-chess-gui/blob/development/src/t3/CompressMoves.cpp

Edit: I decided to go ahead and implement the “second queen” enhancement I envisaged above. It felt like the right thing to do. Before making the change scanning a 4.5M game dataset produced 17971 games where fallback mode was required due to two (or more) queens, and 99 games where fallback mode was required for other reasons. Implementing the “second queen” option reduces the 17971 games to 211, an improvement by a factor of 94. The number of moves requiring fallback mode improved by a factor of 40, from 0.015% to 0.00037%. Interestingly, the change improved the performance fairly dramatically as well, the time to compress then uncompress 350M moves went from 257 seconds to 110 seconds. So I can compress and decompress 3 million moves per second on my commodity laptop. At this point I have made absolutely no effort to optimise the code itself, (as opposed to optimising the underlying algorithm). The code is in the first phase of the “first get it working, then get it fast” mantra, and is still full of diagnostic instrumentation. I think with some effort I could improve performance by a factor of 10 or so, which I’ll need every bit of if I ever get to implement my idea of abandoning database index position searches in favour of scanning the moves of all games [Edit 7Dec2016: By the time Tarrasch V3 was released in November 2016 I had optimised the code and gone ahead with replacing the index position searches with scanning the moves of all games, it turned out to be a much more practical approach as I describe in a follow up blog post .

One final point of detail, for the record. I actually made a second performance tweak as I implemented “second queen”. Fallback mode requires a move list sort, and I have previously used the (extravagantly slow with my chess API) approach of converting each move to SAN text equivalent (eg “exf8=Q+”) and then sorting the strings. Writing this post jogged the realisation that it would be hugely faster and just as documentable (to coin an adjective) to use UCI text instead (eg “e7f8q”). This was an almost trivial change to make. I felt the need to measure the impact of each tweak, and as often happens when you measure performance (as opposed to just speculating and assuming) something surprising emerged. Implementing **either tweak alone** gets the full 257 -> 110 second improvement! I find that sufficiently surprising that I’d like to repeat the whole experiment when I get a chance, but assuming I didn’t make a dumb experimental method error, I can rationalise this result as follows;

The super slow SAN conversions caused fallback mode to be a performance issue, even at a rate of 0.015%. Improving the rate to 0.00037% or using fast UCI conversions is each sufficient to effectively eliminate fallback mode as a performance issue.

You can store a move in 1 byte by enumerating the legal moves in the position and storing the move index. 7 bits are not enough for this scheme, but 8 bits are.

I cover this in the article, including a comprehensive description of exactly why 8 bits are always enough. The problem with the indexing scheme is that it’s slow. Very slow. No matter how quick your move generator is, it is going to be at least an order of magnitude slower than the piece plus vector scheme I describe. The indexing scheme does have one important advantage over the piece plus vector scheme though; it can be applied 100% of the time, whereas the piece plus vector scheme can only be applied 99.9+% of the time. That is why I advocate a hybrid system, using the piece plus vector scheme normally and the indexing scheme in the edge cases where piece plus vector breaks down. This gives you the best of both worlds. This is all explained in the article.

One subtle point I don’t explain thoroughly is why it is not a good idea to simply index into the enumerated legal move list without sorting it first (unfortunately of course, the sort slows down the scheme even more). If you don’t sort the most serious problem is that in practice you will be forced to freeze your move generator, you cannot make any improvements that could conceivably change the order that moves are generated. A second problem also concerns future-proofing. Without a sorting stage it becomes a practical impossibility to document and standardise the scheme in a way that allows interoperability with future systems.

Just brilliant, thank you for sharing!

Fascinating, thanks for the write up!

I like your idea, and I see a way to push it forward.

First, let’s ignore the pieces and fully concentrate on moves. There are:

– Rook move (requires full 4 bit description)

– Bishop move (4 bit description)

– Knight move (3 bit)

– King move (3 bit)

– Pawn move (2 bit)

– Promoting pawn move (4 bit)

I think you already see where I’m going, but let me continue anyway.

Let’s see how many piece codes do we need to describe which piece does which move.

There are often 3 pieces that can do rook or bishop move and 1 pawn to do promoting move.

That’s 7 for 4-bit moves. In practice 9 is a stretch.

There are often 3 pieces to do King or Knight move. 2 codes are enough to describe them with compression.

All pawns together need just 2 codes.

Even if you don’t use mix-King-with-Knights, separating compressed pawns from promoting pawns will let you go under 8 bits in almost all cases.

Sorry I don’t understand. (I’m not saying you’re wrong, just that I don’t understand) For example;

> There are often 3 pieces that can do rook or bishop move and 1 pawn to do promoting move.

> That’s 7 for 4-bit moves. In practice 9 is a stretch.

a) Where does 7 come from? (it looked to me as if you were going 3+1=4 not 7), 9 is a stretch for what?

> All pawns together need just 2 codes.

b) In what way? There are 8 pawns after all.

a) 3 (bishop) + 3 (rook) + 1 (pawn promotion) = 7

b) You are already combining the 2 Knights. I say you can combine 4 pawns.

Let’s have an example. Let’s take the starting setup and use the same ordering I used before

There are 3 pieces that could do the Rook move.

There are 3 pieces that could do the Bishop move.

There are 2 pieces that could do the Knight move. (can be combined into 1 code)

There is 1 piece that could do the Kings move.

There are 8 pieces that could do the Pawn move. (combine into 2)

There are 0 pieces that could do the Promoting Pawn move. (this one amount can change dynamically for the same number of pieces on the board)

The highest valid code is 10 (out of 16)

I’ll be using hexadecimal numbers. Lets decode some of the notations:

7-1: Code 7 means the knights. Description 1 means 1st knight to NNE

7-2: Code 7 means the knights. Description 2 means 1st knight to NNW

7-9: Code 7 means the knights. Description 9 means 2nd knight to NNE

9-1: Code 9 means the pawns(1-4). Description 1 means 1st pawn single advance

9-5: Code 9 means the pawns(1-4). Description 5 means 2nd pawn single advance

10-9: Code 10 means the pawns(5-8). Description 9 means 3rd pawn single advance of the pawns 5-8, so actually 7th pawn single advance.

Now when I think about it, you could go even further and separate advancing pawns from capturing pawns. Then all advancing pawns could use just one code, and all capturing pawns would take another code. In this case the pawn codes above would mean:

9-1: Code 9 means the pawns advance. Description 1 means 1st pawn single advance

9-5: Code 9 means the pawns advance. Description 5 means 3rd pawn single advance

9-6: Code 9 means the pawns advance. Description 6 means 3rd pawn double advance

10-1: Code 10 means the pawns capture. Description 1 means 1st pawn capture to the left (invalid move in the beginning)

Thanks for answering my questions, now I understand. Very clever, I do like your idea. The refinement whereby you have one code to represent all advancing pawns and another code to represent all capturing pawns is particularly elegant. You have six spare codes available. You could use them to represent up to 6 seventh rank pawns (or third rooks, bishops or knights/half knights). Or they could represent up to 3 extra queens. Or some combination. Handling the transitions is more complicated than with my system, because it’s a kind of three speed system rather than a two speed system. But if you really need to avoid fallback mode in all but the most extreme cases it’s a good system because it exploits the fact that pawns don’t have many moves available (usually) so well.

Thanks for posting this article; it was a fun read. 🙂 I agree that your sweet spot scheme strikes a great balance between storage efficiency, runtime performance, and ease of implementation and understanding. Well done!

I haven’t thought about this nearly as much as you, and I’d bet that there is some hole in my thinking here and I’ll probably facepalm after you point it out. But, quickly, couldn’t you handle most tricky promotion cases while avoiding an expensive fallback mode with something along the following lines:

We only need to do something tricky if a player gets to make a move when they have more than one queen, right? (At least this is what I inferred from the article; I’ve rarely ever played chess and don’t know how promotion rules work.) When a player is in that state, have the encoding/decoding system determine (either through scanning the board or by explicitly tracking for this state; whichever form of complexity you prefer) whether two things are true:

1) Can the player’s non-queen pieces be encoded in 3 bits? (Practically speaking, this means determine whether the player has 8 or fewer non-queen pieces on the board, or 9 or fewer non-queen pieces in the case where both of their knights are still alive.)

2) Does the player have 4 or fewer queens alive on the board?

If both conditions are true, then encode/decode the 8-bit code as follows: Use 1 bit to indicate whether the move is a queen move or not. If it is a queen move, the next two bits indicate which queen was moved (works for up to 4 queens for a player) and the next 5 bits determine which of the 27 possible moves for that queen is performed. If it is not a queen move, the next 3 bits determine which other piece was moved and the move vector code is the same as usual for the type of piece in question.

In this way, you avoid an expensive fallback mode for what seems like the most likely cases of promotion. First, you only need to worry about it when a player actually gets to play with their multiple queens (I think you cover this in the article already). Second, it seems likely that the player would’ve lost 5+ pieces by the time they are getting a pawn promoted to a queen and getting to make a move after that, meaning the player’s remaining pieces can be represented by 3 bits. (I’d be curious to know empirically if that’s the case, though. I’m merely guesstimating.) Third, it seems highly unlikely that a player would have more than 4 queens at one time during their turn. So, it would seem unlikely that the system would need to fallback to a more expensive encoding/decoding scheme by leveraging this interstitial scheme, right?

Let me know if I’m off my rocker. I’m sure there’s a smarter/more efficient way to do this too, but just wanted to toss the general concept out there, in case this is useful. Seems we’re definitely optimizing on the margin here, but sometimes in terms of user experience that really matters. If you’re implementing a game/move database search or some such, you want to avoid outlier situations as much as possible for the sake of consistency in user experience, avoiding problematic system load situations, etc.

PS- If it’s untrue that multiple-queens is the only promotion state to worry about, I would think (but haven’t thought through it thoroughly) that a similar scheme could be used to handle other promotion cases too. So long as the encoder/decoder can accurately track or determine what pieces were promoted and identify or track them deterministically, I suppose the first bit could mean we’re in a moving-with-promotion case, and the next 2 bits (or 3 bits, if no promoted pieces are queens) could be used to determine which of the up-to-4 (or up-to-8, or more for knights, if you can promote to knights?) promoted pieces is being moved and the remaining 5 (or 4) bits for the move code would determine where that piece moved.

Thanks for a very clear and complete explanation of your idea. It would work well. The only problem is that you would still need to fall back to the move indexing scheme *sometimes*. For example consider the Petrovic position I reference in the article. I wish WordPress would let me put a diagram here, but basically it is a perfectly legal position where White has nine queens, two knights, two bishops, two rooks (and no pawns). No conceivable variation of my vector system or yours is going to cope with that! Of course you can argue that it’s not going to happen in a real game. I don’t think there are many examples of even three queens in a real game. But accommodating every legal move in every legal position is a lovely property to have. I am not willing to sacrifice it without some huge reward (that just isn’t there).

The system I have has only one relatively straightforward code path to handle *all* edge cases (including 3 or more knights, multiple bishops etc.). As part of my testing I have automatically collected all the games I’ve encountered that require the fallback mode. I have 17971 games where at some stage in the game a player to move had more than one queen and 99 games where the problem was too many knights, rooks or bishops. As I say in the article, I think that I can painlessly reduce the two queen problem by two orders of magnitude or so. The fact that I still haven’t got around to this reflects the fact that in a 4 million game database these edge cases are tiny numbers that don’t create any performance problems.

Makes good sense, and thanks for the kindly reply. 🙂 Glad to know I wasn’t being a complete idiot.

To be clear, I wasn’t purporting that this scheme would handle all edge cases. Rather, I was just saying it’d be relatively simple solution that works for what I would guess are the majority of the edge cases. It’d be like a pre-fallback mode, or first fallback.

Agree that it’d increase code complexity a little! Was only trying to help keep the nice 8-bit encoding as much as possible, and avoid needing to resort to an expensive fallback as much as possible.

What do you mean exactly when you say that you can reduce the two queen problem by two orders of magnitude or so? I’m not asking about the scheme/solution there, just trying to understand what you mean by that. Does it mean that rather than having to use the expensive fallback in ~18K cases (of your 4M game database), it’d be down to hundreds of cases?

Thanks again for posting this (and for your excellent comments/responses here and on hn and reddit).

Thanks for your kind words. If there is a complete idiot here it is me since I took far too long to find the right combination of ingredients for this particular recipe! To answer your question, yes I think maybe 18K cases will reduce to probably less than 100 cases. This is a guess based on prior experience. For months my system was tuned quite differently. I would recycle codes of captured pieces to aggressively avoid fallback mode if at all possible (does this sound familiar to you :- ). I found then that (unsurprisingly) it is very rare for a second queen to appear in a nearly complete army (nothing like 18K instances for sure). But despite this the code I have now is much simpler and performs much better. Previously if my recycling schemes failed and I was forced to switch to fallback mode, I did it immediately and for the rest of the game for both sides. The elaborate recycling somehow also made it tricky to exit fallback mode, so I never did. This was a second reason I never had a “any move in any position compresses to a unique code” property (because in the vast majority of positions we *could* theoretically be in fallback mode – by the way the first reason is the “static codes” issue covered at length in the article). The switch to fallback mode was so disruptive then that I contemplated even greater complications by trying to postpone the switch for a half move, in case the newly promoted piece was captured.

A massive improvement came when I realised I could implement fallback mode per side rather than globally. Once I did that if a newly promoted piece was captured immediately (which is very common) then there was no question of fallback mode – and this was inherent and “free” rather than requiring tricky half move delay complications. At the same time I realised that having a comparatively simple set of rules for entry to fallback mode made it easy to cleanly exit fallback mode as well – boosting performance (since now we are usually only in fallback mode for a short time) and more importantly not spoiling the “any move in any position compresses to a unique code” property I’d finally made feasible by adopting dynamic codes.

All in all, I basically had a system that worked reasonably well fairly early. But later on a couple of important conceptual breakthroughs made it simpler, faster, more robust, with better intrinsic properties. The dramatic improvements were very satisfying creatively which is why I decided it was worth documenting the whole thing with this article.

Can’t reply to your most recent comment for some reason (perhaps WordPress limits the nested reply depth?). So, just chiming in here to say thanks for the reply, and I get what you’re saying there now. Good stuff. Was fun to think through this so thanks again for writing up the article and the replies. Appreciate how you walked through your raw thoughts and process, as opposed to making it sound like you came down off a mountaintop with a great solution right away. That is a mark of someone who is smart and is confident about it.

Cool if you don’t approve this comment since it might appear out of context. Just wanted to reply to ya.

Take care!

I was actually pleased to have the opportunity to flesh out my reasoning. Putting all this stuff in the article itself would definitely have thrown it out of balance, it was already hard to go into any detail without becoming incomprehensibly boring. No doubt I actually lost that battle for a lot of readers :- ) I think I will actually be trying out the “two queens if six or less pawns” optimisation in the next few days. I’ll edit this comment if I do, so come back and check if you’re interested.

One final thing that might spark your interest in the great game of chess. You said early on that you don’t know how promotion rules work. It’s very simple, you choose freely from queen, rook, knight or bishop (you can’t leave the pawn unpromoted). People still talk of promoting as “queening”, obviously the queen is the normal choice. Choosing something else is known as underpromotion. Sometimes a knight really is the best choice, usually because it is vital to give check (to keep up momentum or win material with a knight “fork”). In rare cases, a rook or bishop is best – I think this is exclusively a consequence of the stalemate rule. Recently I discovered that even though underpromotion has always been part of the rules, it was basically unknown, certainly unappreciated for centuries. The catalyst for change seems to have been the “Saavedra” position, discovered in the 19th century by an otherwise weak player, who achieved immortality of sorts by creating possibly the most famous single idea in all of chess. Google Saavedra for details, it really does show how chess can be beautiful. (Spoiler: It involves a rook underpromotion). My dear wife indulged me by agreeing to feature the position on our wedding cake :- )

Haha that’s awesome about the wedding cake.

I came back to check if you’d replied, so thanks a lot for doing so, and again for the good discussion.

I hope to get into chess soon!

I’ll check back again for any updates. I’m curious if any of the further optimizations get implemented and pan out.

All the best!

Thanks. I have implemented some more optimisations, check the bottom of the main post for details. Good luck with your chess plans.

I love the ideas presented here and this blog post greatly helped me shape a move compression scheme for my chess game storage format. Notably I’m using 2 different schemes with slightly different compression quality.

I liked the ideas from “One Move in 5.5 Bits”, but was initially put off by the performance of move generation. I needed something much faster. Then I realized that there is some leeway in generating the move list. What’s more, the move list doesn’t have to be constructed explicitly. It doesn’t even have to be strictly about legal moves! My approach encodes the moves into 8 bits, so that no bitpacking is required. Occasionally one move requires 16 bits to encode, but that’s rare (requires more than 2 queens of one color on the board). Quite a bit worse than the original index based approach but oh man it’s fast. It ended up being very friendly for cpus with popcnt instruction and possibly even better for ones with bmi2, where finding the index of the nth set bit is just one instruction. I elaborated on this on chessprogrammingwiki https://www.chessprogramming.org/Encoding_Moves#Per_Piece_and_Square.

The second one has some ideas from the paragraph “The Sweet Spot – One Move in 8 Bits”, but in the end doesn’t try to fit stuff into one fixed length. I opted for variable length encoding here because I find it more natures and we know up front how many bits we need to encode any piece index and each move index for said piece. For the available moves of a pieces I used pseudo-legal destination squares, they are much faster to compute than legal destinations thanks to magic bitboards. Special moves like pawns and castling is also rather easy as you’ve discovered. Empirical tests show that it uses on average about 6 bits per move, which is pretty close to the legal-move-movelist approach (which was on average 5.11 bits per move in my tests).

Currently this stuff is only used internally by me but I want to make a proper tool for handling the files of this format once it matures.

Thank you for making this post. A lot of great insight!

Very interesting thanks. The basic idea of 27+2*14+2*13+2*8+8+8*12 = 201 < 256 and the scheme that falls out of that never occurred to me. It is a relation to my one byte scheme I suppose, maybe a second cousin 🙂 The whole area is a ton of fun, which comes through in your comment. Good luck with your project.