Download the huwiki corpus

The huwiki corpus is the Hungarian part of hunNERwiki. The data is divided into four gzip compressed files.

The huwiki corpus

File Tokens Size (compressed)
huwiki.1.ner.tsv.gz 7266903 41M
huwiki.2.ner.tsv.gz 4823538 27M
huwiki.3.ner.tsv.gz 3803409 22M
huwiki.4.ner.tsv.gz 3214747 18M