For my old analoge interlaced home camera footage I used a Canopus device that converts the analoge material to DV and the results were (as far as I know) always the same. But I'm from PAL country and don't know much about NTSC. But it sounds like whatever field enters the capture device first is turned into a top field. So if that field is already a top field nothing changes but if it is a bottom field that would explain the shifted scanline (vice versa if device starts with bottom field). Is that what you meant by "kicking in at either top or bottom field".
This sounds like a question for the doom9.org forum unless someone will come up with an answer

Edit: this question is getting more interesting the more I think about it. My captured DVs are always bff (I think all DV is), so that means when capturing starts the device starts at the first bf it encounters or field order is changed if it starts at a tf.