What complexity class is Bitcoin's proof-of-work (hashcash) in?

9

8

To formulate this question precisely, I will define an idealized hypothetical "perfect" hash function H(n) which has nice scalability properties, and will formulate a problem PERFECT HASHCASH in terms of that, understanding that practical considerations may end up yielding only an approximation of this ideal.

To keep it simple, we will say that our hash function H(n) takes as input a single natural number n. Then we say that H(n) is a perfect hash function iff:

  1. H(n) maps each natural number to an infinite binary sequence, of which the time complexity to compute any initial segment s is polynomial in the size of n and s, (making it a sponge function).
  2. For any initial segment of length d, the set of all natural numbers n such that H(n) shares that initial segment has natural density = 1/(2^d).

The first thing formalizes the scalability of our function, and the second thing formalizes the idea that we want all hashes to appear roughly "equally often" as an output. Other than that, our perfect hash function is a black box, and we don't care much about exactly how it works, so long as it meets the above properties, as well as the usual desiderata applying to hash functions (easy to compute, hard to invert, hard to find collisions, etc).

Predicated on the assumption that a perfect hash function exists, we can now define the problem PERFECT HASHCASH as follows: PERFECT HASHCASH takes as input a perfect hash function H, a natural number n, and an all-zeroes vector 0^d of length d, which can be thought of as a unary representation of d. A solution to PERFECT HASHCASH consists of an n and d such that H(n) starts with 0^d.

Given those inputs, it is clear that PERFECT HASHCASH is in the complexity class TFNP, since this is a function problem and a solution is guaranteed to exist.

Can we also identify PERFECT HASHCASH as a member of any complexity class finer than TFNP?

Could it perhaps be in PPP? PPA? PPAD? Something else?

For background, see Complexity class on Wikipedia.


EDIT: the above question has been overhauled, as in the way that I originally formulated it I assumed that SHA256 is what I'm now calling a perfect hash function. Many people have noted in the comments that this may not be true, so rather than place the emphasis in this question on whether SHA256 specifically has the nice scaling properties we want, I defined an idealized hash function that we hope SHA256 at least approximates nicely enough for real-world purposes, and rephrased the question in terms of that.

As a final note to clear up any potential confusion, to make PERFECT HASHCASH resemble real Hashcash, we'd have to make one more assumption: that there exists some way to start with a block of data (an email, a Bitcoin block, etc) and somehow derive a characteristic perfect hash function from that, perhaps by "salting" a different perfect hash function in a way that the result is also another perfect hash function. So in the case of a "perfect Bitcoin," all of the miners on the bitcoin network would be working with their own unique perfect hash functions H'(n) which are somehow tied to the block they're working on, and each miner would simply try H'(0), H'(1), H'(2), ... in order until they find something starting with enough 0's. Each H' would be a different input to PERFECT HASHCASH.

Mike Battaglia

Posted 2013-10-16T01:44:19.440

Reputation: 273

+1 though not everyone on this site is a cryptographer; it may be beneficial to describe the acronyms (FNP, TFNP, etc) or to ask on crypto.stackoverflow.comgoodguys_activate 2013-10-16T14:09:20.230

I think it is the responsibility of the asker to make his question understandable to potential experts answering the question, giving all background information shouldn't be necessary in every question, it would inflate questions too much.Murch 2013-10-25T10:33:53.723

Polynomial in what value? What's playing the role of input size here?Nate Eldredge 2014-02-01T20:47:38.920

Nate: Difficulty.Mike Battaglia 2014-02-02T00:21:16.067

The pigeonhole principle guarantees that there are an infinite number of solutions for some x. It does not guarantee an infinite number of solutions for every x - I'm not sure we can disprove that there is some x for which there is no solution.Meni Rosenfeld 2014-02-03T14:12:16.763

I'm not sure it makes sense to consider difficulty (or its log) as input size. In particular because if the hash function is fixed as SHA-256, difficulty values above 2^256 are meaningless so you can't talk about asymptotics.Meni Rosenfeld 2014-02-03T14:15:01.963

@MeniRosenfeld The pp guarantees that there are solutions (more than zero, but not infinitely many, which is impossible for a finite discrete input size).

The correct input size from a complexity standpoint should be the number of bits in x and y. But, to be sloppy, in practice that's going to be essentially the number of bits in the hash function output---less would bring about trouble for the existence of a solution at extreme difficulty, more would presumably(!) quickly become unnecessarily many. – pyramids 2014-02-03T14:30:39.027

@pyramids: The pp guarantees there is a solution for some x. Not for every x.Meni Rosenfeld 2014-02-03T19:45:21.187

Many of the questions here are particular to the real-world limitations of using SHA-256, so I overhauled the question to consider an idealized perfect hash function which scales perfectly and which we hope SHA-256 at least approximates decently in real-world situations.Mike Battaglia 2014-02-05T07:00:08.467

@MeniRosenfeld I stand corrected, but then again you already said the "for some x" part; my point was just that "infinite number" was incorrect, I didn't mean to dispute the "for some x" part. I guess it took both of us to find (hopefully) all the errors in your statement.pyramids 2014-02-05T18:10:32.170

@MikeBattaglia I saw your changes. I'll see if I can find time for it later, but I suppose all I'll be able to do is (likely) concur that you changed the question such that much of my answer doesn't apply anymore. Which I fear will not help you with the core issue.pyramids 2014-02-05T18:12:34.823

@pyramids: My comments were a reply to the OP. The OP said there are infinitely many preimages, which results from the fact that the input string to SHA-256 can be arbitrarily long (it's not confined to any particular input size). This is true and I only contested the "for all x" part. The OP then said that the input size is difficulty, which I said makes no sense. I see no errors in my statement, so I expect an apology for your disrespectful comment and failure to follow the conversation.Meni Rosenfeld 2014-02-06T07:45:26.943

@MeniRosenfeld: Sorry to read you're unhappy about my comments. I'm only trying to help in the issue, not to follow orders or wishful thinking along the lines of "I see no errors, so there must be none." If we'd instead look at the issue, we'd probably learn new things---such as that the pigeonhole principle really only works if input and output range have the same size. There is something wrong with the combination of infinitely many and pp here, be it in your answer, the pre-revision question, or somewhere in how we combined them.pyramids 2014-02-11T00:15:38.640

@pyramids: I suggest you read https://en.wikipedia.org/wiki/Pigeonhole_principle. The PP definitely applies for sets of different size and infinite sets. In a bijection from an infinite set to a finite set, there must be an element in the output range with infinitely many preimages. This argument is pointless - you're not engaging in the technical discussion, you ignore your own faults and you blame me for things I didn't do. "I see no errors" doesn't mean "there must be none", it means the onus on finding errors is on you if you wish to speak disrespectfully as you did.

Meni Rosenfeld 2014-02-12T08:22:28.843

@MeniRosenfeld: Wow, you're right, that statement of the pigeonhole principle does have an (even explicitly given!) extension to infinite input. But it gets even stranger than that: The one you linked to explicitly requires the input to be larger than the output, quite the opposite to the way basically the same pigeon hole principle is applied in the polynomial pigeon hole principle (PPP), see http://en.wikipedia.org/wiki/PPP_%28complexity%29, where input and output need to have exactly the same size!

pyramids 2014-02-12T22:08:25.890

@pyramids: Ok. PPP is an interesting read which I was not previously familiar with, but it's an esoteric concept. What I linked to is what is commonly known as the pigeonhole principle. PPP is so called because of its use of the PP. Note that a solution to PC is "either a preimage of 0, or two preimages of an output". The PP guarantees a solution to this because if there is no preimage of 0, the effective output range is reduced by 1 element - hence there are more input values than output values, the PP applies and there are two preimages to some output.Meni Rosenfeld 2014-02-12T22:32:46.773

(PS - Where I said above "bijection" it should have been instead just "function")Meni Rosenfeld 2014-02-12T22:37:47.900

@MeniRosenfeld: Yes, exactly. I agree to that explanation. Plus I suppose with that information we can probably agree that there was no malice or disrespect involved? I'd hate to leave you feeling wronged; I was simply narrowly addressing everyting in the context of the PPP (and finer subclasses) specifically asked about by the OP, rather than a more general form of the PP.pyramids 2014-02-12T22:44:05.217

@pyramids: We can agree that there were misunderstandings.Meni Rosenfeld 2014-02-14T09:51:27.497

Answers

5

There's a reason cryptographic hash functions, like the double SHA256 used for proof-of-work in Bitcoin, are not usually described using these complexity classes that classify asymptotic behavior. In fact, there are several.

  1. A technical reason is that hash functions often do not scale. For example, it is not defined how one would extend the proof-of-work to operate on 512 bits. A natural choice would be to use SHA512 then, but going from SHA256 to SHA512 takes alot of essentially arbitrary choices, like changing the number of rounds from 64 to 80, which are standardized but not in a naturally scaling way and not for arbitrarily large hash sizes.

  2. It is not relevant for the cryptographer. Even a NP-complete hash function, which would be the strongest amongst the complexity cases you listed for building a strong hash, does not guarantee all what we want from either a cryptographic hash or a proof-of-work function. To qualify for NP-completeness is merely a strong heuristic that the problem cannot be solved by an algorithm that is, asymptotically, less than exponential. But for a good hash function we want it to be, at the very limited bitcount we choose to use it at, maximally exponential in the sense that solving it is really as hard as trying every possibility in a hash function. For its corresponding proof-of-work function with such a difficulty that only a fraction x of the output range is acceptable, this would mean that we should expect to require a number of attempts of x/2 times the size of full output range to find a proof-of-work. Anything better than that would prompt an academic to call the associated hash function broken, even if it merely reduces the number of tries in half, which would still put it in an exponential complexity class and would easily be possible even with a NP-complete function.

    An impressive (but only superficially related) example of how picking something seemingly NP-complete is insufficient to get something cryptographically hard is knapsack cryptography. Of course, there the problem was that by picking special cases the complexity of the problem was reduced. The point is that even a NP-complete problem can be less difficult than indeed having to try every solution, despite it sometimes being described that way! For cryptography-grade quality, having to try every input is meant literally; for the complexity analysis, it is good enough if the asymptotic scaling remains exponential in the number of bits. So if you could reduce the problem to another NP-complete function taking only every 1000th bit as input, that would be good enough for classifying the problem as NP (and even NP-complete if a similar mapping worked in reverse), but not for being of interest for cryptographic applications.

  3. It's difficult! And I think this difficulty has already let you astray: Even your arguments for placing this problem in TFNP are, whilst really close to the truth, not true in the mathematical sense. For example, if I specify x=0, no y can produce hash(y) < x, contradicting your assertion. If all other x are fine or if there is a required minimum value for x probably depends on how you define the "strings" y you want hashcash to operate on. For Bitcoin with a limited number of bits entering the double-SHA256, I wouldn't be surprised if x=1 also does not have a solution, i.e. if no block hash can become exactly zero. Of course we'll probably never know. In practice, it is desirable that a hash function should produce total proof-of-work functions in the way you describe, but I don't think it is a proven quality. Disclaimer: I honestly don't know. You really should ask a cryptographer.

    What remains to be done to answer your question, after finding how the hash function scales and verifying that it remains polynomial for arbitrary large sizes, is just this proof that the corresponding proof-of-work function is total. If this proof can be done using the pigeon hole principle, you have shown that it is in PPP, etc.

    So where is the difficulty? For example, if y has at least as many bits as x, and if we change your hashcash to have a less-than-or-equal rather just less-than, and if we are willing to multilate it further to the point that either finding a proof-of-work or the existence of a hash collision is good enough to make "hashcash" true, then the pigeon-hole principle as explained in the wikipedia article you linked to would obviously apply.

    But anything less than that, as best as I can see, would not suffice to apply the pigeon-hole principle and hence would not answer the question if hashcash is in PPP or not. To again refer to your linked wikipedia article: Only for very few problems the answer is know, even for PPP. For the special cases of PPP, PPA and PPAD, it obviously gets even harder. If you find a solution, post it to an academic journal, not just here!

pyramids

Posted 2013-10-16T01:44:19.440

Reputation: 2 978

You make a lot of good points. Some of them dig at the formulation of my question more than the spirit of it, so I've rephrased the question in terms of an idealized "perfect hash function" for which all hashes occur with equal frequency and which scales perfectly. I was assuming SHA256 is a perfect hash function, but if it isn't, then I'm interested in the situation where a perfect hash function is used. I think that addresses your technical point 1. (contd)Mike Battaglia 2014-02-05T07:06:32.603

I think my reformulation also addresses your point 2, since for some weakness like you mention to exist, it would have to affect every perfect hash function to make a sweeping statement about the entire PERFECT HASHCASH problem in general. Finally, I think my more rigorous formation also addresses your point #3. You'll note that I formulated it in terms of a number of starting bits being 0 rather than the thing being less than a target just to keep it simple, but generalizing it to the target-based approach is also straightforward if you want.Mike Battaglia 2014-02-05T07:12:46.473

@MikeBattaglia I agree you've solved many of the technicalities. There's something problematic left, though, that may be more than just a technicality: The pigeon hole principle at the root of the finer complexity classes you hope to get at is essentially a counting argument. Hence it works for problems where you can say "either I hit (on one side of) the target, or I find a collision." Since finding a collision won't solve your (perfect) hashcash problem, applying the pigeon hole principle may not be possible. At least I couldn't.pyramids 2014-02-07T11:32:10.757

pyramids: that's what I'm thinking, though I wasn't sure if there might exist some crazier and more complicated way to reduce it to something in PPP. But I guess if PPP is out, then that rules out PPAD and PPA as well, since they're in PPP.Mike Battaglia 2014-02-07T18:53:45.790