Duplicate and near-duplicate documents in the web: detection by means of fuzzy-hash techniques