A semi-automatic indexing system based on embedded information in HTML documents