@ RU.08 & Laserdisc Master
Although I'm sure you're right to a certain extend, the process of combining the frames is not as straightforward as you seem to suggest. In principle the differences in the elements of the frames due to differences in depth are taken into account (as are lighting issues, scale issues, and a number of others). The reconstruction is a statistical prediction on what the actual high res frame should look like, combining information from different frames in a non-linear way. This process will by definition be imperfect and have a cost as you say. The cost may be loss of depth, but, depending on the source material, may also lead to undesired depth enhancement in some places.
There is a misconception that super resolution is all about adding micro-detail. Although this certainly is one of the aspects that makes it a powerful technique, it's main objective is to get a more accurate representation of the high resolution frame. In doing so it reveals more micro detail, but also removes many of the artifacts created while compressing and downscaling the original to a lower resolution. You can get hung up on maintaining shapes in the low resolution frame (sorry Laserdisc Master ;-)), but this assumes these shapes are good representations of the original high res material, which is not necessarily true.
In my opinion combining the information from different imperfectly compressed low res frames to reconstruct the actual shapes and depth of elements in the original high res frame (whether they be micro details or larger shapes), is to be preferred to smoothly interpolating a single imperfectly compressed low res frame, which by definition contains much less information about the actual shape and depth of objects visible in the individual low res frame.