Hints on the data for language modeling of synthetic languages with transformers