You are here: Home / KorpusDK / User guide / Linguistic pitfalls

Linguistic pitfalls

Document Actions
Why are there errors in the texts?
The pitfalls of corpus querying: What can you find in a corpus, and why are there errors in the texts?

Why search in KorpusDK?

The purpose of corpus query is to learn about actual language usage. Therefore, a corpus is not very useful as a spelling guide to show what is right and what is wrong. The search results have not been processed by human editors in the way that it is done for entries in a dictionary. On the contrary, the texts are displayed exactly as they were written by the authors. In fact, this is exactly the point.

In two important respects, KorpusDK is different from the many pages on the Internet which you can access via a search engine:

  • The texts have been selected and composed to show a broad selection of text types and genres
  • Textual markup, in combination with the use of an advanced query function, enables you to perform many kinds of linguistic investigations of the corpus material with a high degree of accuracy

Correctness

No guarantee can be issued that the language of a search result complies with the official Danish orthography as laid down in the official Danish spelling dictionary (Retskrivningsordbogen). KorpusDK contains authentic examples of Danish language usage. Some texts contain language which deviates from the norm determined by the Danish Language Council, for example if a language user has misunderstood an expression or has simply hit the wrong key in the writing process. Overall though, the number of "correct" examples by far exceeds the number of "wrong" ones.

In some cases you will find unauthorized forms in the list of inflected forms. The reason is that the most common forms were recorded during the tagging process, and for that reason they are also included in the search result – or can be selected from the list of inflected forms in Extended search.

Finally, the tagging may have gone wrong in some cases. The number of texts in KorpusDK is so large that it is impossible to mark up the texts manually. This is done in an automatic process, and with the methods and tools available today one has to accept a markup that is less than perfect. If you experience a somewhat peculiar result it may be due to an error in the automatic tagging.

Linguists are often interested in examples of language usage that deviate from the normal practice. It may be an accidental error, but it could also be a sign that language usage is changing.

Konkordans: tabt bag_vogn (Faldgruber)
 tabt bag af en vogn or tabt bag en vogn? (enlarge)

In this example, it seems that the set phrase tabt bag af en vogn is being challenged by a competing variant tabt bag en vogn.

It is a matter of inclination whether you would call variation of this kind "erroneous" rather than "innovative" or "creative" use of an existing phrase. In practice, it is up to the lexicographers to judge when a particular usage has become sufficiently well-established to be recorded in a dictionary.