It could have been done that way, at the cost of increasing the amount of space it takes to store and send block headers. It seems like block header storage was a big concern for Satoshi, (there's even a section in the whitepaper about it) but it's turned out to not matter very much.
Does this mean the 2nd SHA block is padded with 64 - 16 = 48 bytes?
Yes, (source) but your reasoning is flawed. Even if the nonce space were so large that it extended into another block, that would just mean that midstate would represent the state after hashing all but the last block (block in the cryptography sense, not the Bitcoin sense.)
Also, if the block header were exactly 128 bytes, the padding would extend it to a third block. You only have 119 bytes before that happens.
That way, extranonce doesn't have to be in the generation transaction, thereby speeding up hashing, no?
Not really. You can check 2^32 hashes before incrementing extranonce, after which you only need to do ten or so hashes before you get back to mining.
Generally, in modern ASICs, even that part has been offloaded. There will be some sort of small processor within the ASIC, like an ARM core, which takes a block template as input and outputs block headers for the SHA256 cores to work on.
So the cost isn't one of speed, but complexity.
Like many things in Bitcoin, I think this is a technical decision that made sense at the time, but aged poorly.
What you are suggesting has been proposed on bitcointalk (can't seem to find the link). I think the problem with this is that it would cause a slight resign of ASICS, be a hard fork, and updating the extranonce once every 2^32 hashes is extremely negligible anyway. – morsecoder – 2015-06-14T03:38:27.317
@StephenM347 Hashing will have to get fast enough that recomputing the Merkle root for a change in extranonce would be a bottleneck, so why not just put extranonce straight in the midstate SHA block in the first place, for example? – Geremia – 2015-06-14T03:46:37.043
2Recalculating the merkle root will never be the bottle neck. Crunch the numbers, I've done it before. Bear in mind that recalculating the merkle root takes Log(N) hashes (N = # of transactions in the block), not N. Even if one CPU couldn't generate enough work for an ASIC farm, you'd just have to get a processor with more cores. You have to recalculate the merkle root periodically for new transactions anyway. – morsecoder – 2015-06-14T03:56:33.857
@StephenM347 Why is extranonce updated every 2³² hashes? – Geremia – 2015-10-23T06:25:19.153
the nonce is a 32 bit integer, so there are 2^32 values you can try before you run out and need to change something else. – morsecoder – 2015-10-23T10:56:44.743
@StephenM347 Oh, yes, of course. For some reason I was thinking of extranonce, not nonce. thanks – Geremia – 2015-10-24T04:45:46.890
@Geremia Your block header is missing the field
<nBits> 4-bytes denoting the target threshold. – Mark Messa – 2016-11-16T01:03:44.453