2
2
I'm trying to write a parser for blk*.dat files from scratch. Right now, I can read blocks, extract the header fields and calculate the correct block hash based on the header fields.
I just tried to run the parser through all blk-files and it failed after successfully reading the first ~300 blk-files. When inspecting the failing file with a hex-editor, I found two "magic byte fields", only separated by 4 other bytes.
Since the mandatory block-header is supposed to be 80 bytes in length, I'm now quite confused.
I found three probably related questions:
- blk file error when reading - did something change in the format?
- How can you tell if you're at the end of an incomplete blk*.dat file?
- Zeros in blk00*.dat files
My guess, based on these other questions, is the following:
- This is an 'incomplete block'
- This happens when a block can only be partially downloaded, or written to disk.
- Deleting the affected blk files and re-downloading parts of the blockchain doesn't make sense, since this can be expected to happen again.
- The parser should be able to skip these blocks.
- The overall blk file is not missing actual data and is not corrupt.
- The 'incomplete block' will be downloaded and written again further down in the blk file (or to an other blk-file).
Which of my assumptions are correct? .. which are not? Am I missing something?

1The node was probably killed while writing to disk. – Anonymous – 2018-02-14T20:45:43.673
If you go on 0xd0 bytes further, is there another magic 4-bytes? Theoretically the f9beb4d9 could occur within the data of the block itself, so you can't be sure that any occurrence marks the start of a block. You should be using the lengths. Though the first bytes of the block should be the version number, I think, and I am not sure whether f9beb4d9 would be valid there. – Nate Eldredge – 2018-02-15T02:58:29.297
@eponymous I deleted and re-downloaded all blk files. This time, I made sure that the node didn't stop a single time. After that, I had no issues reading all of the blk files. This supports your theory. – forgemo – 2018-02-15T18:06:07.307
1Yeah. My files are in a strange state due to many forced node restarts during IBD. The software is fine with this because there’s an external database of pointers to the blocks on disk which can just skip the corruption. – Anonymous – 2018-02-15T18:07:32.293
@NateEldredge No, there is not. It's also very unlikely for those 4 bytes to occur within the data. https://bitcoin.stackexchange.com/questions/2337/how-was-the-magic-network-id-value-chosen
– forgemo – 2018-02-15T18:10:13.1431Unlikely to occur by chance, perhaps, but there is nothing preventing anyone from intentionally inserting those 4 bytes in a transaction (via OP_RETURN, for instance). A miner who was feeling ornery could also twiddle the coinbase transaction until those bytes appeared in the Merkle root; it would take something on the order of 100 million tries, which is not much. In that case you'd see those bytes in the block header. – Nate Eldredge – 2018-02-15T18:30:44.217
1Given your additional description, I agree that this is probably not the issue here, but it shows that your code definitely needs to handle this case properly. – Nate Eldredge – 2018-02-15T18:31:34.173
Thanks @NateEldredge, I didn't think of somebody generating those bytes on purpose. I wonder if this has already been tried? Your are absolutely right, though. My parser has to handle this properly. – forgemo – 2018-02-15T19:36:52.350
@forgemo: I'd be kind of surprised if it hasn't. Maybe I'll search the blockchain sometime to see. If nobody has done it then maybe I will :-) – Nate Eldredge – 2018-02-15T19:39:50.930
1
@NateEldredge I couldn't resist to search for it. :) As you already guessed, somebody did it. The following block from 2012 seems to contain such a case. https://blockchain.info/block/00000000000005D7E684BEB913BD73FDC33BFD06C1FDF247E599F4D9D6061B91?format=hex This further stresses your suggestion to handle these cases properly.
– forgemo – 2018-02-15T20:17:07.827