I’ve been plugging away, a few hours each and every week, trying to make Tarrasch V3 a reality. I’d really like to see some light at the end of the tunnel. There are other things I want to work on, other things to explore. I’ve had a wake-up call this year, none of us are immortal, we need to make sure we find time to enjoy life while we can.
So. New goal. I want to make a useful Beta version of Tarrasch V3 available by the end of November. Anyone interested will be able to download and use the program. It won’t initially be as easy to download and install as Tarrasch V2, but it won’t be rocket science either. It won’t be perfect (or anything close to that), but it will be useful and it will be able to do interesting and significant things Tarrasch V2 doesn’t. After that I’ll try to publish progressive enhancements until the real program is available sometime early in the new year,
When I first started this blog the idea was to document step by step the process of developing the Tarrasch Chess GUI. In many ways, using Github as an integral part of my development process renders this original idea moot. Just click on the following link to track me as I develop Tarrasch;
Finally (for now) I did suggest right from the start I might also write more general posts on development of chess software, and I’ve decided it’s well past time for me to do precisely that. I’ve prepared a big blog post on how I compress chess moves offline, and I’ve just published it as the previous post on this blog. I don’t expect this to appeal to many people, but who knows, it might be interesting or useful to someone sometime.
A routine move representation of 16 or 32 bits makes sense for normal chess computation. For archive or transfer compression makes sense. It is possible to compress chess moves to much less than 8 bits (on average), although information theory dictates that sometimes 8 bits are required. In this article/blog post I describe a kind of hybrid system that does not seek to compress aggressively, instead using 8 bits for every move. The benefits of this system are; a useful amount of compression, extreme convenience (what could be simpler than representing games as strings, once move per 8 bit ‘character’?), and (very importantly) negligible consumption of CPU cycles compared to more aggressive schemes. This representation method has worked out well for implementing the new database features of the Tarrasch V3 program.
Please note that I pursue my hobby of chess programming more or less in isolation. I don’t use (or even look at) other people’s chess code. I haven’t read any academic research. I play around and come to my own conclusions. I don’t pretend this is a strength, it’s clearly a weakness. But for now at least it suits what I am trying to achieve – basically personal satisfaction. So my conclusions are (far from) peer reviewed best practice. It’s just a random dude’s thoughts on the internet, buyer beware.
I absolutely don’t expect any of what follows to be original, I expect it to be essentially a rediscovery and/or restatement of existing ideas.
One Move in 40 Bits
Unsurprisingly normal chess move notation is not particularly space efficient. Two common conventions are SAN or standard algebraic notation and a more terse form used to communicate with UCI (Universal Chess Interface) engines. A single move can take up to eight characters in SAN. for example e7xf8=Q+. UCI uses either four or five characters, five characters are required only for promotions, our example in UCI move format is e7f8q.
Already we can see an issue that will affect our quest for efficient representation. How much information are we trying to store? In the SAN example, we can tell that the move is a capture, a promotion and gives check. In the UCI example we can see promotion directly and infer capture (since only pawns promote and a pawn can only go from e7 to f8 by capturing). In our quest we are going for lean and mean – we seek to store only the information required to uniquely identify any legal move in any legal position. Very importantly (more about this later) we seek a scheme that will always produce exactly the same encoding for a given move in a given position, irrespective of how that position arose.
One Move in 12 Bits
My first naive dabbling led me to the conclusion that to store a chess move requires somewhat more than 12 bits (so somewhat more than 1.5 bytes, for our purposes a byte is 8 bits, and is also essentially equivalent to a character in the examples above). A chessboard of course has 64 squares and since 2^6 = 64, six bits neatly accommodate a single square. So a move requires two squares (source and destination) or twelve bits. Unfortunately just as UCI sometimes requires five characters, 12 bits seemed to me to be sufficient usually, but insufficient universally because of the need to accommodate underpromotion. Four possible promotions are always possible, so since 2^2 = 4, 2 more bits are required, for a total of 14. Right? Well no, even I quickly realised it was possible to avoid a 2 bit promotion tax. Imagine a pawn promoting from e7. It can go only to d8, e8 or f8. That’s only three possibilities, yet we are allocating 6 whole bits to encode this destination. Wasteful! Our first piece of cleverness today comes into play. We will adjust the destination square to indicate underpromotion, so e7-f8 indicates a rightward capture and Queen promotion, e7-f7 indicates a Rook, e7-f6 indicates a Bishop, e7-f5 indicates a Knight (say). Of course we apply the same idea to promoting leftward captures and non capturing promotions.
One Move in 5.5 Bits
Now if I had been paying better attention, it should have occurred to me that this was really too easy , it’s hardly a case of squeezing blood from a stone. This is contra-indicative of an optimum strategy. We can use information theory to probe the limits of compression, and to inform us whether we are really getting close to an optimum solution.
Information theory tells us that the number of bits needed to encode one of N equally likely possibilities is log2(N). We have already seen log2(64) = 6 bits and log2(4) = 2 bits in the discussion above. As an aside, all these even powers of two, 2 colours, 4 possible promotions, 8 pawns per side, 8 pieces per side, 64 squares make the chess programmer’s life easier than it would otherwise be. So the key question becomes, how many moves are possible? That will dictate the theoretical minimum number of bits needed to encode one of the possible moves.
For the sake of argument, let’s assume there are 32 legal moves available in a position. Log2(32)=5 or equivalently 2^5 = 32. So 5 bits can theoretically encode a move. How? All that is required is a list of the legal moves. This isn’t a problem since generating a list of legal moves is a fundamental building block of computer chess. The order of the moves in the list is all important, so we need a convention to ensure the move ordering is deterministic, perhaps we always sort the list according to some agreed criteria. Then we simply find the move we wish to encode in the list – if it’s a legal move it will be in there somewhere. The offset (or index, or position) of the move in the list is a 5 bit number (or code) in the range 0-31 and uniquely identifies our move. Voila.
Of course there aren’t always 32 legal moves available in any position, but a straightforward generalisation of this procedure already gets us to a reasonably practical and close to optimal scheme. The generalisation is to vary the number of bits per move according to the length of the list of legal moves in the position. The number of bits used in any position is obtained from our old friend, the formula log2(N) where N is the length of the list of legal moves. In practice most moves will need 5 or 6 bits, 6 bits is required if there are more than 2^5=32 possibilities, but that is almost always enough. Practical positions with more than 2^6 = 64 legal moves are rare. In “The Joys of Chess” by Christian Hesse the author addresses the upper limits in the chapter “How Many Moves?” and concludes that the current record holders are 75 for a real game (Podhorzer-Palda Vienna 1933, White’s 25th move) and 218 for a constructed positition (Petrovic 1964, FEN notation “R6R/3Q4/1Q4Q1/4Q3/2Q4Q/Q4Q2/pp1Q4/kBNN1KB1 w – – 0 1” note that WordPress is converting normal hyphens into something else in this FEN string – sorry about that – retype the two hyphens to get Tarrasch or some other chess program to read the string). Naturally, in the Petrovic position White has promoted all 8 pawns to Queens, so now there are 9 Queens all strategically arranged a knight’s move apart in order to minimise interference. The most important thing we can take from this is that 8 bits will absolutely, positively, be sufficient in any legal position. Even a casual inspection of the Petrovic position reveals it must be reasonably close to the upper limit. The empirical evidence is that even in such an esoteric field as this, no record would stand for fifty years if it could be absolutely decimated, and it would require absolute decimation to get from 218 to beyond 2^8 = 256.
Although we are ostensibly discussing compressing individual moves, a variable bit scheme like this is probably best thought about in the context of a whole game. A game as a series of moves is represented as a stream of bits. Each move is represented as somewhere between 0 and 8 bits according to the number of possible moves in the position. Most moves would need 5 or 6 bits, it would be quite common to use less (for example endings with much reduced material – and replies to checks) and very rare to use more. An amusing point is that some moves really would require zero bits – this happens when there is only one legal move in the position, there’s no need to store anything at all in that case. Our fundamental formula expresses this concept in the language of mathematics, log2(1)=0 [or equivalently 2^0=1].
One Move in 4 Bits
It would be a mistake to think no further compression is possible. For a start, there are usually unused codes in the scheme, since usually the number of legal moves in a position is not normally an exact power of two. Consider the initial position, there are twenty possible moves, so 5 bits are required, and consequently 12 unused codes (2^5-20=12). These codes could represent common opening sequences. Later in the game such gap codes could be used to represent common multi-move sequences when they are uniquely identifiable (eg O-O then O-O, PxP then PxP, RxR then RxR, h6 then BxNf6 then recapture [or the same pattern starting with a6,h3,a3]).
We can get more ambitious. In our magic log2(N) formula there are N equally likely possibilities. But all chess moves are not created equal, they are not equally likely. This is an example of the principle that non random information is compressible. Consider the order of the moves in the legal moves list. Instead of using some simplistic sort technique to establish a deterministic order, imagine instead using a chess engine to rank the moves in order of strength. Now vary our basic scheme so that moves early in the list use less bits than moves later in the list. Designing the details of such a scheme can be a lot of fun. One idea that suggests itself is a 3 bit code sometimes followed by extended bits. 3 bits allows 2^3=8 codes. 6 of these codes represent the first 6 moves in the list. The other 2 codes indicate one of two possible formats for extended bits. One format is 2 extended bits coding for 2^2=4 further moves from the list (moves 7 through 10). The other format is a fallback scheme, a variable number of extended bits, as many as required to represent moves 11 and above in the list. In this scheme well played games are more compressed than poorly played games. At a guess, a reasonable game would comprise perhaps 70% 3 bit moves, 20% 3+2=5 bit moves, 8% 3+5=8 bit moves bits, 2% 3+6=9 bit moves. This works out at an average of 3.9 bits per move. So we have improved on a naive 12 bit scheme by at a factor of about three.
In all of this we can see one of the basic tenets of all computation playing out- there is no such thing as a free lunch. In this case space costs us time (it’s a bit like chess!). What I mean by this is that reducing the amount of space (memory) required costs a lot of time (CPU cycles). Move list generation takes significant CPU time. Ranking moves by strength (to get more compression) takes (comparatively) vast amounts of time.
One Move in 2 Bits, Maybe?
A couple of weeks after I wrote this post it got some attention on Hacker News and Reddit. That experience convinced me to add the introductory paragraph that now starts this post, which hopefully helps make this work as an article not just a blog post. I also learned some new things from the commentary there and I’ve added this paragraph to reflect that. Specific thanks are due to Reddit commentator Dag Ågren. It turns out that there’s a way to turbo charge the chess engine based compression scheme. Consider the 3.9 bits per move scheme above. It works best when there are 6 or less good moves available, reasonably when there are 10 or less, poorly when there are more than 10. It’s a compromise scheme, it attempts to be at least adequate in every position. But what if we could tailor a scheme for each position we encounter? We could potentially do a lot better in many, maybe most, positions. Especially positions in which there aren’t many good moves. For example, if there is only one good move in a position, we’d design a scheme that used only one bit if that move was played. If there were two good moves, we’d use a scheme that used only two bits if either of those moves were played. And so on. Surprisingly (to me anyway) it turns out it is possible to do this in a practical way. The trick is to dynamically select a scheme well suited to each position in the sequence of positions that constitute a game. There is no need waste extra bits indicating which scheme is being used, because the decoder is synchronised to the encoder. It follows along through the same sequence of positions. As long as it uses exactly the same algorithm as the encoder to select the scheme best suited to any position, it will always pick the right scheme for decoding. It turns out that a technique called “arithmetic coding” can automatically generate optimal schemes given a list of symbols (moves in our context) and symbol probabilities (which can reasonably be inferred from the engine’s move score). The title of this section is just speculation on my part, if there are an average of 4 good moves in a position, and if a strong player (almost) always plays a good move, then 2 bits per move is theoretically the best we could do, for high quality games. Actually it has just occurred to me that when playing through a master game with kibitzing in Tarrasch gives a feel for this, as the “best” four lines are shown. The percentage of moves played that come from the engine’s top four list often does feel very high. But remember the downside, even if we can get reasonable output from a chess engine in 1 millisecond (a stretch), a big database has of the order of 1 billion moves, so compressing or uncompressing a complete database is going to take of the order of a million seconds (more than a week).
The Sweet Spot – One Move in 8 Bits
I’ve given hints in the language I’ve used that I actually haven’t implemented any of the complicated schemes above for Tarrasch 3. Basically they are far too slow for my purposes. The scheme I adopted instead trades off compression for time. It turns out it is possible to compress moves into 8 bits with negligible CPU cycles. For the pragmatic programmer this represents great value. A fixed one move per byte system is very convenient for the programmer, much more convenient than 12 bits. In particular, 8 bit moves allow move sequences to be represented as strings, the length of the string is the number of moves, and simple indexing into the string locates individual moves in a natural way. Although further compression with a variable bit scheme has some attractions, it would not only cost a lot of time, it would also be less simple and convenient to use.
Our 8 bit scheme works as follows. We split each 8 bit code into two 4 bit fields. Each field has 2^4=16 codes. There are a maximum of 16 pieces (let’s use the word ‘piece’ as shorthand for ‘pawn or piece’) on each side of the board, so one of our two fields (let’s call it the piece code) selects one of the 16 pieces in the army of the side to move. The other field (let’s call that one the vector code) selects one of 16 possible moves for the selected piece. Too easy! As they say in Australia.
Of course if it were really that easy I probably wouldn’t be spending hours writing this up. There are some difficulties still to overcome. We’ll deal with the most obvious one first. A queen can make up to 27 moves, it’s the one piece on the board that can exceed the 16 possibilities our vector code allows. We solve this problem by conceptually splitting the Queen into two parts, a bishop and a rook. The bishop and rook each get their own piece code. This seems to need a 17th piece code we don’t have, but we balance splitting a queen into two by combining both knights into one. The 16 vector codes of this dual knight are sufficient for both knights since they can only make up to 8 moves each.
Another fundamental problem we need to overcome is extra pieces appearing on the board due to promotion. Extra queens are particularly problematic since they need two piece codes, so it’s not possible to simply recycle the code formerly used by the pawn that promoted. Our strategy is to have a fall back mode we can call upon in some (quite rare in practice) circumstances. The fall back mode is the slow system we are already familiar with; we generate a list of all legal moves, sort them, and then index into the list to select our move. There’s no need to mess about with variable bit fields, we’ll just use 8 bits, which we know is always sufficient, every time. More about this later.
Our 16 available piece codes comprise eight pawn codes, one king code, one dual knight code, three bishop codes and three rook codes. Remember that we are treating the queen as a bishop plus a rook. A vector coding scheme is required for each type of piece code. A king makes the least demands on the 16 codes available, we simply define 8 direction vectors, N,S,E,W,NE,NW,SE,SW. Castling can be accommodated by using two more codes, or we can be a little tricky and use the fact that a king cannot move in all 8 directions on its initial square. So we can represent White O-O (for example) as the White King moving South East from e1.
For the dual knight, we use 3 of the 4 available vector bits to define 8 direction vectors NNE, NNW, NWW, NEE, SSE, SSW, SEE, SWW corresponding to the eight possible knight moves. The other bit selects one of the two knights.
Rook and bishop vector codes also split the 4 available bits into 1 and 3. The rook code uses the 1 bit field to indicate whether the move is changing the rank or the file, the 3 bit field then indicates one of the 8 ranks or files. A bishop code uses the one bit field to indicate whether the move is along a diagonal that rises from left to right, or falls from left to right. The 3 bit field indicates a destination file.
Due to the need to accommodate the possibility of underpromotion, the humble pawn definitely needs all 4 vector code bits. The most obvious system is to use 2 bits to indicate one of the 4 possible types of pawn move (advance, double advance, left capture, right capture) and the other 2 bits to indicate one of the four possible promotions, if it is a promoting move. But we can exploit the fact that a double advance never promotes to improve on this scheme; We use 2 bits to select one of 4 possible types of pawn move – non promoting, promoting advance, promoting left capture or promoting right capture. If the move is one of the three promoting types, then the other 2 bits select one of the four possible promotions. If the move is non-promoting, then the other 2 bits indicate the type of move (advance, double advance, left capture, right capture). This is a better scheme because it is more efficient, it encodes extra information (is the move a promotion or not?) in the same space.
To calculate the 8 bit move code for any legal move in any legal position we first scan through all pieces of the side to move. Opposing pieces are essentially irrelevant. Up to four pieces are assigned fixed piece codes; the king and (if present), queen, light squared bishop and dark squared bishop. Pawns, rooks and knights are assigned dynamic piece codes according to their square location using an arbitrary square ordering scheme. So up to two knights are identified as knight 0 and knight 1, they share the dual knight piece code. Up to two rooks are identified as rook 0 and rook 1 which are individual piece codes. Up to eight pawns are identified as pawn 0, pawn 1… pawn 7, which are individual piece codes. If there is more than one queen, or more than one light squared bishop, or more than one dark squared bishop, or more than two knights or more than two rooks, then (and only then) we use the slow fall back scheme to calculate the move code. Otherwise the 8 bit move code is a combination of the 4 bit piece code of the piece to move with the appropriate 4 bit vector code.
At first glance this might seem to be unnecessarily computationally expensive. Why scan over the whole board every time? Why not assign fixed piece codes in the starting position and then track them? In fact that’s exactly how I reasoned when I first implemented compressed moves and for a long time I did use fixed codes. It was a huge mistake. Huge! The dynamic coding scheme buys us a very important property. Once we have settled on all the various arbitrary conventions we need, our algorithm always produces the same move code for a given move in a given position. Unfortunately using fixed codes means the history leading to a position influences the move code generated.
Consider the following opening transposition; A) 1.e4 e6 2.c4 d5 3.cxd5 exd5 4. exd5 c6 5.dxc6. B) 1.e4 e6 2.c4 d5 3.exd5 exd5 4. cxd5 c6 5.dxc6. The move 5.dxc6 is the same move in the same position in each case, but because the capturing pawn started its life as an ‘e’ pawn in line A) and as a ‘c’ pawn in line B), a fixed scheme will generate two different 8 bit move codes for the move. This has the effect of potentially messing up basically any chess database feature at least occasionally. Pawns and Knights quite often mischievously swap places like this, rooks less so, but I am sure there are some real life instances. At least theoretically it is possible for a promoted piece to feature in a position normally reached without promotion. I agonised over these things and killed off huge numbers of brain cells trying to think up elaborate “normalisation” tricks to try to repair an inherently faulty underlying scheme. Futility exemplified.
One day I (finally) returned to thinking about using dynamic coding instead. All those pawns that constantly needed to be reordered, surely it was impractically slow. I think all programmers are familiar with the delicious sensation of a momentary flash of insight suddenly dissolving a problem. It turns out that we can use dynamic codes with a negligible amount of extra computation. The way to do it is to track the position of each piece and only change the dynamic codes as required. Of course that much is obvious, and with rooks and knights it is no big burden. Just keep track of both knights and as long as there are two knights on the board, check whether they need to swap places after each knight move. Make adjustments as required if a knight is captured or created by promotion. All of this applies in exactly the same way for rooks.
It’s the pawns that are a problem and the solution starts by using the right type of square ordering convention. My normal convention is; a8=0, b8=1, c8=2 … h8=7,a7=8 … h1=63. It’s normally a good convention but for this particular task it’s horrible. Instead we need a convention where we increment along files not ranks. For example, a1=0,a2=1,a3=2…a8=7,b1=8…h8=63. Now any pawn advance doesn’t affect pawn ordering at all. This is a direct consequence of the laws of chess, a pawn cannot leap over another pawn on the same file. Capturing pawn moves can require significant reordering, but a simple algorithm handles the task, and in the vast majority of cases it’s computationally cheap.
To understand the simple pawn reordering algorithm, consider that although a capturing pawn may need to be reordered relative to its neighbours, no neighbour needs to be reordered relative to any other neighbour. For a right capture, check if our pawn needs to swap places with its adjacent neighbour to the right. If it does, swap the pawns and repeat with the next neighbour to the right. The first time no swap is needed, we know the pawns have been reordered successfully.
An example might help. White has three pawns on a2 (pawn 0),b2 (pawn 1) and c2 (pawn 2). After right capture a2xb3, we initially have b3 (pawn 0), b2 (pawn 1) and c2 (pawn 2). Does our relocated pawn on b3 need to swap with b2, its adjacent neighbour to the right? Yes. So b2 (pawn 0) b3 (pawn 1) c2 (pawn 2). Does the b3 pawn need to swap with c2, its adjacent neighbour to the right? No. We are done with one simple swap. Zero or one swaps are by far the most common cases.
Let’s do another example with three swaps, the most I observed when compressing a database of 4.5 million games;
b5 (pawn 0) c2 (pawn 1) c3 (pawn 2) c5 (pawn 3) f5 (pawn 4), White plays b5xc6
c6 (pawn 0) c2 (pawn 1) c3 (pawn 2) c5 (pawn 3) f5 (pawn 4), Swap c6 and c2? Yes
c2 (pawn 0) c6 (pawn 1) c3 (pawn 2) c5 (pawn 3) f5 (pawn 4), Swap c6 and c3? Yes
c2 (pawn 0) c3 (pawn 1) c6 (pawn 2) c5 (pawn 3) f5 (pawn 4), Swap c6 and c5? Yes
c2 (pawn 0) c3 (pawn 1) c5 (pawn 2) c6 (pawn 3) f5 (pawn 4), Swap c6 and f5? No
Left captures work the same way (of course). An example of a left capture;
a4 (pawn 0) b4 (pawn 1) c2 (pawn 2), White plays c2xb3
a4 (pawn 0) b4 (pawn 1) b3 (pawn 2), Swap b3 and b4? Yes
a4 (pawn 0) b3 (pawn 1) b4 (pawn 2), Swap b3 and a4? No
The conceptual breakthrough that allows computationally inexpensive dynamic piece coding prompted me to completely rework my move compression algorithm. I introduced the simplifying concept that only the side with the move needs to be considered when calculating the compressed move code. Previously I would switch into fallback mode globally (i.e. for both sides) immediately an inconvenient promotion occurred. But more often than not an inconvenient new promoted piece (typically a second queen) is immediately captured before the promoting side gets to move again. So my improved approach is not only much simpler, it often avoids a completely unnecessary switch to fallback mode.
Statistically speaking, it turns out about 0.015% of compressed moves are calculated with fallback mode, and about one in 250 games requires some such moves. A reasonably simple change would improve these statistics by perhaps a factor of 100 or so (I will come back and edit the blog when I find out more precisely). I am sorely tempted to implement the change, which would basically confine fallback mode to freakish games. The idea is to allow a second queen, in much the same way as two knights and two rooks are allowed. When two queens (queen 0 and queen 1) are on the board, the piece codes for pawn 6 and pawn 7 are recycled and used for queen 1. Two queens would not be compatible with seven pawns, that really would necessitate fallback mode. In other words, the side with two queens would need to have lost at least one pawn to capture in addition to the pawn ‘lost’ in the act of promotion.
Successful program design is all about effectively managing trade-offs. I am not a hundred percent convinced that the extra ugly complexity is justified in this case. Time will tell. In the meantime you can check my current implementation at;
The last blog post I wrote was a while ago now, nearly 6 months in fact. At the time I was in the midst of a great burst of Tarrasch productivity, and my expectation was that I’d keep my momentum up. Unfortunately that was not what happened. Long time readers will understand that I fall into slumps sometimes, but at least this time I had a good excuse. A kidney cancer diagnosis tends to take your mind off your hobbies and on to other priorities.
That sounds really grim, but before I write another word I want to emphasise that my setback was temporary (I hope) and my prognosis is now excellent. My tumour was found early, and excised completely. In fact I still have 1.8 kidneys, more than enough. My lab results are excellent, and although I will need to be monitored for years, my surgeon is encouraging me to consider the whole incident to be something I can now put behind me.
I did have rather a protracted and unpleasant post-operative recovery, because I fell victim to a procession of annoying and enfeebling complications I won’t bore you with. But in recent weeks I’ve finally overcome these problems, and I now feel pretty much back to normal. More than ready to get on with the rest of my life.
I don’t like to have absolutely nothing to show with a new blog post, so I have made a small but important update to Tarrasch V2 which is available for download at triplehappy.com. Like all Tarrasch maintenance releases, it can be freely installed over the top of an existing Tarrasch installation.
The most important change is that I’ve added new engines to the package. It is true that these engines could be separately downloaded and used previously, but extreme ease of use is probably Tarrasch’s most important benefit. Separate download is a barrier, particularly for those who aren’t particularly computer savvy. A new user can now download Tarrasch and start using the latest version of Stockfish, perhaps the most powerful engine available anywhere, with just a few mouse clicks.
The new engines in the package are Stockfish, and the free demo version of Houdini. Stockfish is now the default engine. For the first time Tarrasch checks for 32 or 64 bit Windows, and uses the 64 bit version as the default if it can. Thanks to the Stockfish team for providing such an amazing free tool, and thanks to Robert Houdart for permission to include Houdini.
If you install Tarrasch over the top of an existing installation, your original preferences are respected. So your engine choice won’t change in these circumstances. One easy way to change to the default Stockfish engine is simply to use the Options menu to “Reset to factory defaults”. I cover this issue in the FAQ which I have updated a little, for the first time in a few years.
I took the opportunity to make a few other small improvements;
- Improve colour of dark squares (the screenshot on triplehappy.com has been updated to show the new colour).
- On 32 bit Windows, don’t allow user to try to run 64 bit engine.
- Change text of commands from ‘promote comments to variation’ to ‘promote comments to moves’.
- Fix bug that stopped DELETE key working in pgn dialog.
- Add more menu keyboard shortcuts.
These are all things I should have done much earlier. Sorry. In particular the lack of contrast between light and dark squares has been completely unnecessary, and really just an artifact of my particular monitor. Of course ideally I should provide customisable colours, but I don’t imagine that will happen until Tarrasch picks up some momentum and I get some programming help. With that in mind, I need to stop tweaking V2 and get back to working on V3, which is the future.
I’ve been putting in quite a lot of effort into my Tarrasch Chess GUI project recently. In fact although I haven’t posted here for quite a while, my progress has been (technically speaking) publicly visible on the programmer’s site Github, where I have been publishing the project source code as work in progress.
It’s easy to get depressed at slow progress on a big, difficult project like Tarrasch, especially when you are labouring entirely by oneself. I remember the same feelings of inadequacy as the months dragged by on Tarrasch V2. At least this time I can take strength by remembering that I did deliver T2 eventually, and no doubt T3 will follow in due course.
Recently I have motivated myself by setting an intermediate goal – get T3 to the point where it is useful to me personally. Today I am happy to announce that I have at least met that goal! With the New Zealand Chess Champs starting tomorrow I am a little bit too late perhaps, but still there is a sense of satisfaction. I have a tool that is flawed but useful. Now all I have to do is remove the flaws!
To celebrate, I decided to make a YouTube video to show off what I have so far. You can see it at http://youtu.be/mxAFlrXGIGI
I’m sorry about the technical quality of the video – it’s the first time I’ve tried to create a “screencast” video and it’s a lot harder than it looks! I can’t believe I said “dig more delvely” or somesuch instead of “delve more deeply”. But fixing that would mean hours of effort at my current level of competence. I actually recorded my voice on my mobile phone as part of the process of patching things together – it’s bad but not as bad as the lamentable microphone on my laptop.
I am afraid that T3 is ready for me, but not ready for any other early adopters yet. There are just too many known minefields to step around. I will be sure to let this audience know when T3 reaches a point where it is ready to be tested by a wider audience.
One difficulty I will face is the requirement to source games to populate the database. For my personal needs I am using games from my copy of Chessbase’s Megabase 2007 supplemented by TWIC for subsequent years (as I mention in the video). But I certainly won’t be distributing proprietary or even semi-proprietary collections like this. I could leave it to the user to source games, but that would not really be consistent with Tarrasch’s ultra easy download and install philosophy. I might be forced to somehow curate my own games collection. Hopefully a nice solution of one kind or another will emerge in due course.
I haven’t posted anything here for a while. I don’t like to leave the radio silence going for too long, I don’t think that’s fair to people who are interested in Tarrasch, no matter how small that audience might actually be. So this is just a little ping if you like, just to indicate that I am still alive, still working a little on Tarrasch, and anticipating being able to devote more time soon. I still hope to have a preview version of the next major Tarrasch version available before the end of 2014.
I do have something vaguely exciting to share, and that is that I have finally jumped on the bandwagon and put the source code on Github. I did put Tarrasch V2 on Bitbucket (a Github competitor), but this was a rather half-hearted effort, and I did it when Tarrasch V2 was finished. The plan this time is to do open source development “properly”. These days that means working in the open, publishing your code more or less as it is written and modified. The platform of choice for this process is Github.
Before taking this step I spent (wasted) a lot of time reading about git (the technology underpinning Github, git was created by Linus Torvalds of Linux fame). It all seemed very scary and complicated. It turns out, and this so often seems to be the case, that reality is less scary than imagined reality, and that instead of reading and obsessing about git I should have simply jumped in and started using git. You cannot learn to swim by reading about it on the internet.
Github has become a popular phenomenon, and they’ve couldn’t have done that without making the barrier to entry fairly low. I followed their “Getting Started” tutorials on both my Windows and Mac machines and started experimenting. I was pleasantly surprised how easy it all turned out to be, and I am already experiencing benefits, including benefits that I never expected. Unsurprisingly, I now have a good solution to backing up the source code regularly online. That pretty much comes for free from working this way. Additionally, I no longer have to save progress as an endless series of snapshot .zip files, most with comically unhelpful names like Tarrasch3Dec2013AfterFixingThatAnnoyingOffByOneBug.zip for example. Instead I am using state of the art software for keeping track of (hopefully) steady progress on a long term project.
The unexpected benefit is that developing the code on both Mac and Windows machines is now much simplified. I was worried how I would get git to handle this “workflow” issue, but it turns out to be very simple and natural and it actually solved the annoying problem of keeping code on the two machines in sync. Basically git makes it easy to keep the local code repository on one machine in sync with the remote code repository that lives on Github. If the remote code is advanced relative to the local code you go “git pull” to pull down the newer code from remote. Conversely if the local code is advanced relative to the remote code you go “git push” to push the newer code on to the remote. This works great with one programmer with one Github account (i.e. me) working on two machines. Let’s say I’ve been working on Tarrasch with my Mac. At the end of the session I go git push to update the remote. If I next use my Windows machine, “git status” will warn me that the remote repository is in advance of the local code. I just go “git pull” and I am in sync and up to date. Easy peasy and a great advance on my previous situation, which saw me quite reluctant to switch from Mac to Windows or vice-versa.
My single code repository has both Xcode (for Mac) and Visual Studio (for Windows) project files. This is fine, Xcode ignores the Visual Studio project files and vice-versa. Github user KostyaKow (that’s his account name, I recognise him as a friendly fellow who often emails me about Tarrasch) has already noticed that Tarrasch is now on Github, and unsolicited he added a generic Linux project file (these are called “makefiles” in unix speak). He then sent me a “pull request” to ask me to pull down his changes. I have done that, and incorporated them into the master repository. I don’t think KostyaKow actually got Tarrasch working on Linux (yet), but a start has been made, and expert programmers can work on it in whichever platform they prefer (Windows/Mac/Linux). Github is a kind of social network for programming and it seeks to encourage and facilitate collaborative work like this. Hopefully there will be much more collaborative activity in the future.
There is an annoying wrinkle in this otherwise happy picture. As always with the coding aspect of Tarrasch code, it is wxWidgets. Tarrasch relies on wxWidgets as a portable layer of software (basically it makes Tarrasch’s user interface possible). It is quite a nice thing to work with once you have it set up, but unfortunately it is vast and unwieldy. It is difficult to setup and configure your programming environment to accommodate wxWidgets. My XCode and Visual Studio project files assume wxWidgets has been set up just as it is on my machines. KostyaKow’s Linux makefile effectively does the same for his machine. I haven’t got a working Linux wxWidgets environment which is why I haven’t actually tested KotyaKow’s makefile myself.
The sad wxWidgets situation won’t stop a skillful and resourceful programmer. If you are such a person and you want to play with the Tarrasch source code, the first thing you should do is install wxWidgets and follow the tutorials and other resources, and build some demos and samples. If you are using Mac, you want to use the new V3.0 wxCocoa release, which makes modern Macs a first class wxWidgets citizen for the first time. Make sure you can rebuild the “richtext” demo/sample as the richtext wxWidgets library is not used by the simpler demo projects. After you’ve done that you should be able to modify the existing Tarrasch project files to reference *your* wxWidgets installation rather than *my* wxWidgets installation.
I really hope to improve this rather unfortunate solution to the “wxWidgets problem”. Eventually.
I am working on a standard feature for a chess GUI/database. Given a chess position, the program can generate a list of games in the database that feature the position (obviously). Just a little more sophisticated is a statistical breakdown. Consider the standard opening position after the moves 1.d4 d5 2.c4 e6 3.Nc3 Nf6 4.Nf3. When I enter this position, I get a list of 64749 games (from a test database of about 4.5 million games). The statistical breakdown I generate in this position looks like this;
Be7: 25497 games, white scores 60.5% +10154 -4822 =10514
c6: 14532 games, white scores 56.1% +5653 -3878 =4994
Nbd7: 7930 games, white scores 59.5% +3406 -1907 =2617
Bb4: 7580 games, white scores 56.4% +2838 -1871 =2867
c5: 4491 games, white scores 60.0% +1789 -891 =1810
dxc4: 3072 games, white scores 52.7% +911 -746 =1411
b6: 557 games, white scores 67.8% +317 -119 =121
Nc6: 479 games, white scores 74.5% +310 -75 =94
a6: 262 games, white scores 63.2% +132 -63 =67
h6: 167 games, white scores 68.6% +99 -37 =31
g6: 61 games, white scores 73.8% +40 -11 =10
Bd6: 56 games, white scores 75.0% +39 -11 =6
Ne4: 27 games, white scores 81.5% +19 -2 =6
Bd7: 19 games, white scores 73.7% +13 -4 =2
a5: 1 games, white scores 100.0% +1 -0 =0
Na6: 1 game, white scores 50.0% +0 -0 =1
Interesting enough, but just routine stuff for a chess database right ? Yes right. But here’s the interesting thing. As a consequence of the method I am using to compress moves in the database, I found myself needing to generate a list of transposition possibilities leading to a given position. Before reading on, take a guess at how many different transpositions occur in my test database for the position above. I must admit without thinking too much I assumed maybe 5 or 10.
The actual answer is, 61. Yes that’s right 61. Here’s the breakdown (warning wordpress is fighting me a little with this);
1.d4 Nf6 2.c4 e6 3.Nf3 d5 4.Nc3 : 13660 occurences
1.d4 d5 2.c4 e6 3.Nc3 Nf6 4.Nf3 : 7848 occurences
1.d4 d5 2.Nf3 Nf6 3.c4 e6 4.Nc3 : 7596 occurences
1.d4 d5 2.c4 e6 3.Nf3 Nf6 4.Nc3 : 7178 occurences
1.d4 Nf6 2.Nf3 d5 3.c4 e6 4.Nc3 : 3788 occurences
1.d4 Nf6 2.c4 e6 3.Nc3 d5 4.Nf3 : 3604 occurences
1.Nf3 d5 2.d4 Nf6 3.c4 e6 4.Nc3 : 3446 occurences
1.Nf3 Nf6 2.c4 e6 3.Nc3 d5 4.d4 : 3304 occurences
1.d4 Nf6 2.Nf3 e6 3.c4 d5 4.Nc3 : 3017 occurences
1.c4 Nf6 2.Nc3 e6 3.Nf3 d5 4.d4 : 1383 occurences
1.d4 d5 2.Nf3 e6 3.c4 Nf6 4.Nc3 : 906 occurences
1.Nf3 d5 2.c4 e6 3.d4 Nf6 4.Nc3 : 788 occurences
1.c4 e6 2.Nc3 d5 3.d4 Nf6 4.Nf3 : 768 occurences
1.d4 e6 2.c4 Nf6 3.Nf3 d5 4.Nc3 : 735 occurences
1.Nf3 Nf6 2.c4 e6 3.d4 d5 4.Nc3 : 626 occurences
1.Nf3 Nf6 2.d4 d5 3.c4 e6 4.Nc3 : 610 occurences
1.c4 e6 2.Nf3 d5 3.d4 Nf6 4.Nc3 : 593 occurences
1.d4 e6 2.Nf3 d5 3.c4 Nf6 4.Nc3 : 413 occurences
1.Nf3 d5 2.d4 e6 3.c4 Nf6 4.Nc3 : 395 occurences
1.d4 e6 2.c4 d5 3.Nc3 Nf6 4.Nf3 : 388 occurences
1.d4 e6 2.c4 d5 3.Nf3 Nf6 4.Nc3 : 376 occurences
1.c4 e6 2.Nf3 Nf6 3.Nc3 d5 4.d4 : 320 occurences
1.d4 d5 2.c4 Nf6 3.Nc3 e6 4.Nf3 : 316 occurences
1.d4 e6 2.Nf3 Nf6 3.c4 d5 4.Nc3 : 304 occurences
1.c4 Nf6 2.Nf3 e6 3.Nc3 d5 4.d4 : 280 occurences
1.Nf3 Nf6 2.d4 e6 3.c4 d5 4.Nc3 : 220 occurences
1.c4 Nf6 2.Nc3 e6 3.d4 d5 4.Nf3 : 220 occurences
1.d4 e6 2.c4 Nf6 3.Nc3 d5 4.Nf3 : 208 occurences
1.c4 e6 2.d4 d5 3.Nf3 Nf6 4.Nc3 : 136 occurences
1.c4 e6 2.Nc3 Nf6 3.Nf3 d5 4.d4 : 132 occurences
1.c4 Nf6 2.d4 e6 3.Nf3 d5 4.Nc3 : 118 occurences
1.c4 e6 2.d4 d5 3.Nc3 Nf6 4.Nf3 : 117 occurences
1.c4 Nf6 2.Nf3 e6 3.d4 d5 4.Nc3 : 116 occurences
1.c4 e6 2.d4 Nf6 3.Nf3 d5 4.Nc3 : 101 occurences
1.Nf3 e6 2.c4 Nf6 3.Nc3 d5 4.d4 : 99 occurences
1.c4 e6 2.Nf3 Nf6 3.d4 d5 4.Nc3 : 93 occurences
1.d4 Nf6 2.c4 d5 3.Nc3 e6 4.Nf3 : 86 occurences
1.Nf3 e6 2.c4 d5 3.d4 Nf6 4.Nc3 : 80 occurences
1.d4 d5 2.c4 Nf6 3.Nf3 e6 4.Nc3 : 65 occurences
1.c4 Nf6 2.d4 e6 3.Nc3 d5 4.Nf3 : 59 occurences
1.Nf3 e6 2.d4 d5 3.c4 Nf6 4.Nc3 : 36 occurences
1.c4 Nf6 2.Nc3 d5 3.d4 e6 4.Nf3 : 32 occurences
1.c4 e6 2.d4 Nf6 3.Nc3 d5 4.Nf3 : 26 occurences
1.d4 Nf6 2.c4 d5 3.Nf3 e6 4.Nc3 : 25 occurences
1.c4 e6 2.Nc3 Nf6 3.d4 d5 4.Nf3 : 24 occurences
1.Nf3 d5 2.c4 Nf6 3.d4 e6 4.Nc3 : 18 occurences
1.Nf3 Nf6 2.c4 d5 3.d4 e6 4.Nc3 : 15 occurences
1.Nf3 e6 2.c4 Nf6 3.d4 d5 4.Nc3 : 15 occurences
1.Nf3 e6 2.d4 Nf6 3.c4 d5 4.Nc3 : 14 occurences
1.c4 e6 2.Nc3 d5 3.Nf3 Nf6 4.d4 : 12 occurences
1.Nf3 d5 2.c4 e6 3.Nc3 Nf6 4.d4 : 11 occurences
1.c4 Nf6 2.d4 d5 3.Nc3 e6 4.Nf3 : 6 occurences
1.c4 d5 2.d4 e6 3.Nf3 Nf6 4.Nc3 : 3 occurences
1.c4 d5 2.d4 e6 3.Nc3 Nf6 4.Nf3 : 2 occurences
1.c4 e6 2.Nf3 d5 3.Nc3 Nf6 4.d4 : 2 occurences
1.Nf3 d5 2.c4 Nf6 3.Nc3 e6 4.d4 : 2 occurences
1.c4 Nf6 2.d4 d5 3.Nf3 e6 4.Nc3 : 1 occurence
1.c4 d5 2.d4 Nf6 3.Nc3 e6 4.Nf3 : 1 occurence
1.Nf3 Nf6 2.c4 d5 3.Nc3 e6 4.d4 : 1 occurence
1.c4 Nf6 2.Nf3 d5 3.d4 e6 4.Nc3 : 1 occurence
1.c4 d5 2.Nc3 e6 3.d4 Nf6 4.Nf3 : 1 occurence
As a chess player, I found this information pretty interesting. As I mentioned initially I only obtained this information as a side-effect, calculated for internal, programmatic reasons. But having seen this example, I think I will make this “transpositions found” information available to the end user of the program.
One final point I’ll make is that there were a further 5 paths to the same position with White to move! In these cases Black had played both …d6 and …d5. I’d actually like a database with less games from very weak players, but that’s a topic for another day.