This is called "compositing." You'll need either appropriate editing software or external applications such as PhotoShop or After Effects to do this.
In Sony Vegas, for example, I would do this as a Picture-in-Picture. Download the YouTube material, create a P-in-P and insert it into the cue card, using key frames to move the YouTube material as needed to keep it inside the cue card.
The current (October, 2012) issue of Videomaker Magazine has an article on compositing that may be useful to you with your project and you might also take a look at http://www.youtube.com/watch?v=t1hBuWowkUY, a good tutorial on Picture-in-Picture.