Sunday, 23 May 2010

Corpora, thesauri (or thesauruses?)

Many thanks to Juliette for pointing out a new French textual corpus, the University of Leipzig Corpus français. Searchable corpora are a huge step forward for translators, allowing us to search beyond the simplistic word-meaning dictionary model to see how words really behave around other words. English corpora include the British National Corpus and the Corpus of Contemporary American English. There's a useful Russian corpus here, and of course the multilingual Leeds internet corpora (including all MATS languages). The Translational English Corpus created at Manchester is probably research-focused rather than translator-focused but might be of interest too.

Of course the other indispensable resource for translators are thesauri, otherwise known as thesauruses - see e.g. Interestingly, the BNC and the Leeds English corpus felt very strongly that 'thesauri' was preferable to 'thesauruses', but the Corpus of Contemporary American English suggests that 'thesauruses' is gaining ground... Any opinions from readers?

