There will be a difference between averaging the 5 avis before and after upscaling, averaging after upscaling will mean 5x a render time which is a bit serious when you're looking at 24hours computing to clean up 2 hours of footage prior to encoding to mpeg2, I won't do tests because I don't think there'll be any real quality difference between the two.
One interesting thing is without doing any extra filtering (sharpening & 2d/3d filters) a screengrab an unaltered single capture saved as a PNG is 455kb, whilst a screengrab of the same frame from 5 merged captures saved as a PNG is 416kb.