The annotation of the speech in video corpus