The mining hardware industry started very recently and their chip fabrication technology was way behind that of traditional chip manufacturers (ie Intel, AMD).
Chip energy efficiency depends on the distance between the transistors of a chip. The closer the transistors are the better. Intel produced their first 14nm (distance between transistors) in 2014. Mining hardware manufacturers resently produced their 16nm ones. They started with 28nm (or worse) ~4years ago.
Still mining hardware almost reached the chip density of Intel which is remarkable for only 4 years of development. It would be very interesting to see, if mining incentives remain high, whether it reaches or surpasses the big players in chip manufacturing.
Note that mining hardware is now very close to the best available chip so we can expect that the energy efficiency of mining would follow Moore's Law from now on (just as Intel hardware does).
1There is another perspective: the world's joint experience in building chips has been focused on producing hardware with extremely low failure rate. Hardware that is twice as efficient, but produces nonsensical results 40% of the time is a perfectly good deal. This is a very different problem than the one Intel's chips are solving, and there is a lot to be learned about it. – Pieter Wuille – 2016-07-18T20:09:06.853
This is indeed true. However, the massive decrease in consumption has to do with transistor density. Mining hardware will not have such improvements anymore; only those allowed by Moore's law similar to the big manufacturers. – karask – 2016-07-19T20:21:39.937