Reliable, efficient way to parse the blockchain into a SQL database

19

4

I need to generate a SQL database (preferably SQLite3) of every transaction that has occurred in the blockchain. Is there an efficient way to do this using open-source software? I'm attempting to do it myself using a Haskell script that makes system calls to bitcoind but I'm having trouble making it sufficiently fast.

Mike

Posted 2013-06-17T17:20:20.730

Reputation: 395

Does it have to be SQL or can you use LevelDB?Dominic Tancredi 2013-06-18T02:07:07.983

I know that bitcoind (if you changed the settings, which I have) creates a LevelDB database for its own use. However, there is very little documentation on LevelDB in general, and even less in my language of choice (Haskell). I want something that I can easily use in a clearly correct way in code that analyzes the data, and SQL seems like the only option for that.Mike 2013-06-18T15:40:24.777

Answers

5

I created a fork of blockparser with a 'csvdump' command that will generate a details dump of the blockchain in CSV format. There are also some support scripts in that repository for creation, population and periodic update of a PostgreSQL database.

Be aware that the resultant database is pretty large; you might want to consider PostgreSQL over SQLite for this one.

jgm

Posted 2013-06-17T17:20:20.730

Reputation: 1 487

I remember blockparser being slow as sin for single transaction and account data when I first tried it, but I'll give the CSV dump a shot and see how long it takes. Do you remember how long it took for you? If so, what were the specs of the machine you ran it on?Mike 2013-06-18T15:44:10.327

blockparser is relatively CPU and memory intensive. I just carried out a full csvdump on my machine and it took 9 mins, maxing out at just under 9GB memory used. It will definitely help if you have a decent CPU and SSDs for storage.jgm 2013-06-18T16:38:33.133

I'm currently running the SQL dump. It's much faster than Bitcoin-Abe; it looks like it's only going to take about five hours. However, I wonder whether it parses the transaction data or just leaves it in raw hex.Mike 2013-06-18T18:01:15.183

Depends what you mean by 'data'. Information is supplied on individual transaction inputs and outputs, but there is no work carried out to parse the signatures.jgm 2013-06-18T18:03:05.160

This works great, but I have one qualm: why don't the transactions have timestamps?Mike 2013-06-21T23:43:26.310

Individual transactions do not have timestamps in the blockchain. See https://en.bitcoin.it/wiki/Transaction for details

jgm 2013-06-22T08:12:58.723

I can't find another way to contact you, but you should consider changing the sql.cpp file in that fork to just make temporary CSV dumps and then parse them into a database using the MariaDB (or equivalent) LOAD DATA INFILE command. That would take less than an hour on a weak machine, where as sql.cpp takes days. I'd help, but I don't know C++.Mike 2013-06-24T22:28:22.583

Haven't looked at the existing sql.cpp much because it didn't gather all the info that I needed. Feel free to fork znort's code and send him a pull request.jgm 2013-06-24T23:26:54.987

Also, have you noticed that InputTxHash and InputTxIndex are always null in outputs.csv?Mike 2013-06-27T21:46:15.793

Yes they will be. You'll need to post-process the data to link the inputs to the outputs. There is some example SQL to do this for PostgreSQL somewhere in the database load files.jgm 2013-06-27T22:07:53.110

Would h2 work? I am interested in storing only the UTXO set, indexed by addresses.Jus12 2017-06-18T18:18:26.223

3

You can always use BlockchainSQL Server for SQL querying the Bitcoin blockchain.

The web version is free, the server is commercial.

Disclaimer: I am the developer.

Herman Schoenfeld

Posted 2013-06-17T17:20:20.730

Reputation: 131

Website seems downShayan 2017-12-21T06:15:34.907

2

Possibly checkout ABE - Alternative Block Parser. It uses python and a SQL-based database but you can hook up other databases based on the driver.

https://github.com/jtobey/bitcoin-abe

Dominic Tancredi

Posted 2013-06-17T17:20:20.730

Reputation: 489

This is a legitimate option. However, it's getting to be unrealistically slow as the blockchain grows. With SQLite3, it now takes days, from what I can tell.Mike 2013-06-18T17:59:49.473

Well start it, then have a client that keeps it up to date. Yes, it'll grow, as you're asking for a database that will encompass the blockchain. There's no immediate solution to that unless you want a database that has a reference to points in the chain. Did you want something "fast"? Use LevelDB.Dominic Tancredi 2013-06-18T21:03:11.030

2Or mcdee's blockparser.Mike 2013-06-27T22:39:54.993

1

To transfer to a SQL Server database you can try BitcoinDatabaseGenerator.
You can download it from https://github.com/ladimolnar/BitcoinDatabaseGenerator/releases
Sources and wiki: https://github.com/ladimolnar/BitcoinDatabaseGenerator

The transfer is very fast. At the time of writing this, the transfer of 266 blockchain files totaling 33 GB took less than one hour and 30 minutes on my development machine. Of course, the time will vary depending on hardware and configuration.

Ladi

Posted 2013-06-17T17:20:20.730

Reputation: 149

is it working under linux?jangorecki 2015-06-08T09:53:04.023

BitcoinDatabaseGenerator is Windows only.Ladi 2015-06-09T21:05:23.973