This paper describes the methods used in creating the following webcorpora and frequency dictionaries.
NOTE: the numbers cited in the paper are outdated. The current size of individual corpora are listed in the tables below.

Frequency Dictionaries

Language sorted by frequency sorted alphabetically
catalanXX
croatianXX
czechXX
danishXX
dutchXX
finnishXX
indonesianXX
lithuanianXX
norwegianXX
polishXX
portugueseXX
romanianXX
serbian_shXX
serbian_srXX
slovakXX
spanishXX
swedishXX

Webcorpora

Language toks (M) size (gzipped)
catalan658998MX
croatian14912.7GX
czech6121.1GX
danish496816MX
dutch19893.1GX
finnish8462.1GX
indonesian310539MX
lithuanian14052.8GX
norwegian16202.7GX
polish14262.6GX
portuguese9631.9GX
romanian10672.2GX
serbian.sh23371.5GX
serbian.sr845176MX
slovak8622.1GX
spanish13972.7GX
swedish8931.5GX