Easiest approach is using After Effects. You simply use motion tracking and place the crosshair on an obvious point (something that has good contrast, or a well defined shape). Motion tracking then tracks the feature throughout the footage. You can then apply the motion data (X, Y movements) to a Null Object. You next link your text or image to the Null Object. Upon playback the text/image will follow the Null Object, which follows the tracking point.
If you can get your hands on After Effects just give it a go…. it’s actually very easy, plus the motion tracking technique can be used in so many different ways (pixellating faces, adding graphic overlays etc).

