FAQ
- 1 What is a corpus?
- 2 What texts are in KorpusDK?
- 3 How much of a text is shown in KorpusDK?
- 4 Can you use KorpusDK as a guide to correct spelling?
- 5 Can you find all Danish words in KorpusDK?
- 6 Can you see what the words means?
- 7 Are there examples of spoken language in KorpusDK?
- 8 What is a concordance?
- 9 Why am I not getting a search result?
- 10 Why can't I see more lines on the screen at a time?
- 11 Why does the query take so long?
- 12 Why is the result I get by choosing Match All inflected forms different from the one I get by manual selection of all inflected forms?
- 13 Why query with attributes?
- 14 How can I see the markup for a concordance?
- 15 What are collocations?
- 16 What are set phrases?
- 17 Why can't I find examples of a given set phrase?
General questions
What is a corpus?
A corpus is a large collections of authentic text excerpts which have the same format and have been supplied with specific types of information. Each word may have information about part of speech, and it may be specified when and by whom the text was written. The text excerpts are usually selected according to explicit criteria to ensure that the corpus is representative of a particular period of time, a text type or a combination of several criteria. Corpora like Korpus 2000 and Korpus 90 consist of texts from particular periods, 1998-2002 and 1988-1992, respectively.
What texts are in KorpusDK?
KorpusDK consists of two subcorpora, Korpus 2000 and Korpus 90, and contains a wide selection of texts, both private texts such as diaries, personal letters and occasional songs, and public texts like novels, short stories, articles from newspapers and magazines etc. See the suppliers of texts to Korpus 2000 and to Korpus 90.
How much of a text is shown in KorpusDK?
When you make queries in KorpusDK you can only see a relatively small excerpt of the larger text contained in the corpus. This means that you won't be able to read all of the text in which the search word appears. In a corpus focus is not on the text as a whole, but on the linguistic constructions used in smaller sections of the text, typically one or two sentences. Therefore, KorpusDK shows only as much linguistic context as is permitted within copyright regulations.
Can you use KorpusDK as a guide to correct spelling?
KorpusDK contains authentic examples of Danish language usage. This implies that no guarantee can be issued that the language used complies with the official Danish spelling dictionary (Retskrivningsordbogen). You will find spelling errors and other linguistic forms that deviate from the norm determined by the Danish Language Council.
Can you find all Danish words in KorpusDK?
No. Even though the 56 million words contained in KorpusDK is a fairly large number it is certainly the case that a number of Danish words cannot be found. The stock of words that you find depends entirely on the texts represented in the corpus.
Can you see what the words means?
There are no explicit word definitions provided. If the meaning is not apparent from the linguistic context you can look up the word in the Dictionary of the Danish Language for a precise definition. At a later point it will also be possible to look up words in The Danish Dictionary from this site (probably by the end of 2008).
Are there examples of spoken language in KorpusDK?
All the texts in KorpusDK are examples of written language. Transcribed spoken language is not represented, and for that reason it is not possible to trace differences between spoken and written language in the corpus. Interviews from magazines and papers are not really genuine examples of spoken language either as false starts, self-corrections and hesitation signals are usually removed when the interview is written down.
Concordances
What is a concordance?
A concordance is a view that shows text examples of one or more search words. The text examples are displayed in a series of lines where the search word is highlighted and occurs in its authentic linguistic context. The lines can be arranged in various ways, for example sorted according to the right or left context. Read more about concordances.
Why am I not getting a search result?
There may be several explanations why an apparently acceptable query does not lead to a result at all or to an unexpected result. Often this can be ascribed to the automatic and not entirely flawless markup. If you enter nærmest into the search box with All inflected forms selected, the search result contains only 3 examples. This is due to a markup error. If you enter instead nærmest.* and select Exact forms only there are 5000 examples. If you click on + and select the atttributes "lemma" and "pos", it is revealed that nærmest is interpreted as an inflected form of nær.
Why can't I see more lines on the screen at a time?
In the concordance view a maximum of 50 lines is displayed at a time. You can see more pages by clicking Next page or Previous page or one of the page numbers at the middle of the page.
Why does the query take so long?
There may be several reason why a query takes a long time:
- It may be due to heavy server traffic: If many users execute queries at the same time it can cause congestion and bottleneck pressure on our server
- You may have searched for a common word occurring frequently in the corpus. Especially if:
- You have chosen to sort the concordance according to left or right context
- You have chosen to search for the last part of a word, either by selecting 'End of word' from the drop down box at the page for standard search or by inserting a regular expression in a query, for example .*agtig (i.e. all words ending in 'agtig')
Why is the result I get by choosing Match All inflected forms different from the one I get by manual selection of all inflected forms?
If you select Match all inflected forms a query will be made for forms which have been tagged automatically, i.e. for the lemma (lemma: a search word in its base form including all inflected forms). If you select all inflected forms from Select inflected forms a query is made for exactly the strings contained in the list.
The advantage of choosing Match all inflected forms is that the query is quicker. The disadvantage is that there may be errors in the automatic tagging. The search result is not quite as reliable.
The advantage of selecting from the menu is that you know exactly which forms are queried. The disadvantage is that the query is slower because it has to be reformulated as a complex query: "Search for form A or form B or form C ..."
Why query with attributes?
Atrtributes are the bits of information which are supplied for each single word in the automatic tagging process. Attributes allow highly accurate searches for specific information. Read more about attributes.
How can I see the markup for a concordance?
At the search result page for a concordance search click at the + button to the far left on the settings panel. The panel will unfold and open further settings possibilities, allowing you to reduce the concordance or change the number of words next to the search word. Here you can also visualize the tags used in the textual markup by selecting the desired tag. If you want to see for example the part of speech tag, simply select the attribute pos and click the button Change. See list of attributes.
Collocations
What are collocations?
Collocations are words that typically co-occur and often appear as set phrases such as lide afsavn, anlægge et skøn, et bragende bifald, den daglige dosis. Read more about collocations.
Set phrases
What are set phrases?
Set phrases are combinations of two or more words which do not combine freely with each other, but reflect a typical or idiomatic way of expression. It may be figurative expressions like tage tyren ved hornene or proverbs like mange bække små gør en stor å. The set phrases have been extracted from The Danish Dictionary. Read more about set phrases.
Why can't I find examples of a given set phrase?
A result without text examples may be caused by a 'too precise' query which does not match all the correct variations of the phrase.
- Try editing the search expression (use the edit button on the concordance result page) by removing any particle words from the search expression (like prepositions and pronouns)
