I've been working on this as a part of a bigger Bitcoin tools Python 2 library for a while. You can find it on github.
You can run the code on ldb_parser.py which will give a txt with all the UTXOs from the chainstate parsed in a json. (Notice that this is a huge amount of data.)
Then you can call the decode_utxo function using the data from the json and store the result in a file to analyze it. (This file will be even bigger!).
Here you have an example (once you have run ldb_parser.py):
from bitcoin_tools.utils import load_conf_file, decode_utxo
from json import loads, dumps
fin_name = "utxos.txt"
fout_name = "parsed_utxos.txt"
# Load config file
cfg = load_conf_file()
fin = open(cfg.data_path + fin_name, 'r')
# Output file
fout = open(cfg.data_path + fout_name, 'w')
for line in fin:
data = loads(line[:-1])
utxo = decode_utxo(data["value"])
fout.write(dumps(utxo) + '\n')
fout.close()
And each line you will get in parsed_utxos.txt look like:
{"coinbase": 0, "version": 1, "outs": [{"index": 1, "amount": 14250000, "out_type": 0, "data": "865e218ff25929eee880e0e3b6f95280b2d05443"}], "height": 468349}
{"coinbase": 0, "version": 1, "outs": [{"index": 0, "amount": 132000, "out_type": 1, "data": "0b2a00367244680f6da18acd861a08f0a89cb3b4"}], "height": 449294}
{"coinbase": 0, "version": 1, "outs": [{"index": 1, "amount": 2423800, "out_type": 1, "data": "7f172a63c49c5d03e3307d432bd6b784b69d0e0d"}, {"index": 2, "amount": 10000000, "out_type": 1, "data": "1d0c4b60e8270f9b6ca5f167f08a5466a0cee565"}], "height": 474328}
...
Where each entry in outs in an output and, and data corresponds to the transaction data (hash160 of the address for P2PKH transactions).
Now, you have to take into account some considerations:
Not every single UTXO is a P2PKH, so what you will find in data depends on the out_type field.
out_types 0 and 1 correspond to P2PKH and P2SH respectively, and will have 20 byte of data (hash160 of the public key for P2PKH and scriptHash for P2SH).
out_types 2,3,4 and 5 correspond to P2PK outputs, and will contain 33 bytes of data (1 byte for the type of public key and 32 bytes for the actual key).
Finally, any other out_type will imply that the data hold by the UTXO is not compressed, and the value will correspond to the data size + the number of special scripts (nSpecialScripts) which is currently 6). This is the case of P2MS transactions and non-std transactions.
All this has been directly extracted from the Bitcoin Core source code.
Finally, notice that in order to use the library you will need to install the python dependencies in requeriments.txt and create a conf.py file to set your chainstate path and data path (or modify the code to not use the config file).
I'm working on merging all this to master branch (it's currently on dev), so links may vary in the future). – sr-gi – 2017-08-30T16:34:56.777
Note that the database format for the UTXO set changes in Bitcoin Core 0.15 (it becomes easier). – Pieter Wuille – 2017-08-31T04:49:41.587
@PieterWuille Thank you for pointing this out, I'll update the code. – sr-gi – 2017-08-31T13:50:22.573
@sr-gi you update the code? – D L – 2017-09-03T23:42:26.547
Not yet, but it should definitely work for any version before 0.15. – sr-gi – 2017-09-04T01:22:17.993
@Denis the code has been updated with the decoder for v.015. It's part of the dev branch, and will be merged to master during the next week. – sr-gi – 2017-11-04T16:02:40.533
@sr-gi please tell where is calculated and stored utxo set in your last update? – D L – 2017-11-06T07:32:16.660
You can find it under dev branch (it will be merged with master along this week). The decoding function for 0.15 is here: https://github.com/sr-gi/bitcoin_tools/blob/dev/bitcoin_tools/analysis/status/utils.py#L178-L263
– sr-gi – 2017-11-06T15:47:45.957