8
1
I'd like to understand taint analysis quantitatively. Blockchain.info offers a service that will calculate taint, but I've found no good explanation for how taint is calculated.
The best (and only) explanation I've found so far appears in the paper Anonymity of Bitcoin Transactions:

The taint analysis works by calculating the percentage of the amount of bitcoins that might origin from another address, thus revealing connections in the transaction graph. In the simplified example in Figure 4, A1 and A3 would have a taint of 75% and A2 a taint of 25%. However, it can only detect direct connections in the graph and does not consider any context information.
This explanation is confusing. Taint is measured between two pseudonyms (addresses). It's not a property of a particular pseudonym. The paper seems to be missing the part that says the taint scores for A1, A2, and A3 are relative to A4. If so, then the scores make sense.
However, it's not clear what would happen for more complex chains of ownership. For example, imagine another pseudonym, A5 that pays A1 5 BTC. What would be the taint score between A5 and A4?
I've seen this, but it doesn't discuss how to compute a taint score between two pseudonyms.
What I'd like to see is the outline of a step-by-step procedure for calculating taint, as done by Blockchain.info. If I had to guess, here's how the procedure would look:
- Find two pseudonyms, S (source) and T (target). Funds flow from S to T.
- Using the block chain, find a chain of ownership Ci from each coin controlled by S to T.
- For each chain of ownership Ci, find the lowest valued coin transfer mi.
- Sum all mi, giving m.
- Sum the value of all outputs received through T, giving s.
- Taint is defined as m/s.
Using this procedure would give a taint score of 50% between A5 and A4 (2 / 4).
Is this correct?
I haven't gotten around to going to the source, but I too find the example confusing. If I had to come up with a metric, I'd measure taint as an attribute of UTXO, not addresses. I.e. 1BTC claimed stolen, therefore 100% tainted getting spent together with 0.5 clean BTC would result in a UTXO with taint of 2/3. – Murch – 2015-06-01T00:32:35.633
Ah sorry, I started reading the paper and realized that it uses taint in a different context than the Bitcoin community usually does. Usually "taint" refers to the amount of coins traceable to a known theft, this paper however is using it as a term to measure address correlation. That's why I was confused and will remove the tainted-coins tag in a moment. – Murch – 2015-06-01T08:57:59.640