2
I'm doing some analysis on the UTXO set by reading from the chainstate database.
I was following the documents given by https://github.com/bitcoin/bitcoin/blob/d4a42334d447cad48fb3996cad0fd5c945b75571/src/coins.h#L19-L34
/** pruned version of CTransaction: only retains metadata and unspent transaction outputs
*
* Serialized format:
* - VARINT(nVersion)
* - VARINT(nCode)
* - unspentness bitvector, for vout[2] and further; least significant byte first
* - the non-spent CTxOuts (via CTxOutCompressor)
* - VARINT(nHeight)
*
* The nCode value consists of:
* - bit 1: IsCoinBase()
* - bit 2: vout[0] is not spent
* - bit 4: vout[1] is not spent
* - The higher bits encode N, the number of non-zero bytes in the following bitvector.
* - In case both bit 2 and bit 4 are unset, they encode N-1, as there must be at
* least one non-spent output).
The parser worked fine when the number of UTXO is small. However for the following tx (which have 2501 outputs), it failed:
2540b961f4a0b231db3bc5a23608307394eae037d8afd0462e9b794e02f00000
For the key 'c' + 2540b961f4a0b231db3bc5a23608307394eae037d8afd0462e9b794e02f00000, the (deobfuscated) value in chainstate looks like this:
01907050e140254150443a0c280004...
Where 01 is the version, 9070 is the nCode which tells if its a coinbase tx, the unspentness of vout[0], vout[1], and the length of the following unspentness bitvector for vout[2:]. By looking at blockchain.info, there are 2501 outputs, so there shall be (2501 - 2)/8 = 312 bytes following. However, parsing 9070 as a varint, removing the last thee bits, and +1 only give me 2288 / 8 + 1 = 287. (I got 2288 by (0x90 - 0x80 + 1) * 0x80 + 0x70, which is the MSB-128 varint used in bitcoin protocol.)
Did I missed something here? How exactly does one parse the varint?
Hmm, that make sense. I've read lots of questions and answers by you on this topic. Thanks so much! – h__ – 2017-05-11T22:37:53.887
May I also ask how are
CTxOutsserialized? I get that most of it are of the formCompressedAmount + 00 + hash160 of pubkey, but there are lots of nonstandard transactions which are hard to parse. – h__ – 2017-05-11T22:46:07.087https://github.com/bitcoin/bitcoin/blob/v0.14.1/src/compressor.h#L17L27 – Pieter Wuille – 2017-05-11T22:51:17.663
It is also part of the code I've linked in the answer, let me extend the answer to include that part of the code (check the comments) – sr-gi – 2017-05-11T22:53:23.220
Found it, thanks. https://github.com/sr-gi/bitcoin_tools/blob/d679c41183e315686729ca6b8bd79c7daa499b4d/utils/utils.py#L363-L388
– h__ – 2017-05-11T22:57:28.333Exactly, there it is. – sr-gi – 2017-05-11T22:59:41.100