corpus

1 September, 1998 - 31 August, 2003

CGN

Corpus gesproken Nederlands (The Spoken Dutch Corpus)

The Spoken Dutch Corpus (CGN = Corpus gesproken Nederlands) project has resulted in a large corpus of contemporary Dutch as spoken in Flanders and in the Netherlands. The corpus contains about 1000 hours of speech, and all this speech has been annotated at several levels. The basic annotations which are made for the entire corpus are an orthographic transcription, a part-of-speech tagging and a lemmatization.