Carolina’s Methodology: building a large corpus with provenance and typology information