I admit that I don't know for sure, but I have a guess. Since Satoshi was mining by himself for the first blocks, he probably set the initial target to whatever would take approximately 10 minutes to mine a block on his CPU. Had he set it to the maximum value for a 256 bit number, he would have mined the first 2,016 blocks almost instantly, and then immediately retargeted.
Because of the large number of blocks mined in such a short amount of time, this would likely have caused the retargeting calculation to produce a less-than-perfect result, meaning that the next 2016 blocks may be mined either far too quickly or far too slowly.
It was probably just better for him to initialize the target to a reasonable value, rather than force the algorithm to run its course in order to produce one.
1The difficulty can only go up or down by a factor of four each time, so it would take 16 re-targets to get to difficulty 1. That would create a pre-mine of 1.6 million Bitcoins. (16201650) – Nick ODell – 2017-01-18T17:22:19.120
Yikes! I was just thinking about how long it would take before stabilizing at a 10 minute block time. I forgot about how many coins would be mined during that process. – Jestin – 2017-01-18T19:18:20.350
1Of course, it would have been possible to set the initial target different from the maximum target, to allow for the possibility for the target to go up or down from the initial value. However, if the initial target was a reasonable number for one person's computer(s), there would probably be no point in allowing it to go higher, since that should never happen. – Nate Eldredge – 2017-01-18T20:10:03.157