A reproducibility protocol and dataset on the biomedical sentence similarity