6
1
As far as I understand it, bloom filters can be defined with some target hashes and false-positive rate. When filtering the blockchain, all of the target transactions are guaranteed to be found, but many false-positives will also be returned to the client by design.
Suppose that an attacker is able to monitor many bloom filter specifications sent by an SPV client to its peers. Will he be able to eventually eliminate all false positives by performing subset matching on all transactions matching the filters?
As an example, let the set of all transactions be the integers from 1 to 100, and suppose that a client is interested in the transactions 4, 8 and 15. The first time he connects to the network, he transmits a bloom filter matching (1, 4, 7, 8, 15, 27, 44, 73); the second time, his filter matches (3, 4, 6, 8, 15, 27, 66). An attacker would immediately be able to narrow the possible transactions down to (4, 8, 15, 27); after several more invocations, he would then find the correct answer.
Am I misunderstanding how bloom filters work (e.g. the false positives do not change between connections), or is this a theoretical or even practical concern?
1The bitcoin specification of bloom filters seems to contain a random input to the hash functions (ntweak) - doed this also not influence the false positives? – lxgr – 2014-04-27T21:52:36.257
@lxgr quickly looking through the code, it seems to me that the ntweak value is meant to help better distribute the use of hash functions, and ultimately reduce the number of false positives you get (even further reducing the security value that the bloom filter might provide, but increasing it's effectiveness as a cache) - you're right, the implementation is non-canonical in the sense of being likely to return random false positives on successive calls. – blueberryfields – 2014-04-28T05:24:08.147