I was wondering why a soft fork cannot in principle divide the Bitcoin network as a hard fork can.
A soft fork absolutely can divide the Bitcoin network, in principle, if you don't specify any other assumptions.
However, the relevant point that distinguishes soft and hard forks is that the network will always converge as long as a majority of the hashrate enforces the new rules.
The reason for this is that even when old miners would produce a block that violates the soft fork's rules, the new miners will not work on top of it. They will ignore such a block, and start working on a competing branch instead. As they have the majority, this competing branch will at some point overtake the old miners' branch. The backward compatibility of the softfork means that old miners at that point will switch over to the new branch, abandoning their own, as the new branch is longer and is valid to them. Through this, the fork is resolved, and the whole network ends up working on one chain again.
As you can see above, the remaining old miners will cause increased (temporary) forks in the chain. In order to avoid having these interfere with the security properties of the network, recent soft forks in Bitcoin have employed thresholds of 95% (measured in one way or another) of miner signalling - leaving at most 5% that may produce forks.