Webcorpus Creator
This project is a collection of scripts and programs for creating a webcorpus
from crawled data. The input data is extracted by the Wire crawler and the output is a text file with document separators and raw text
Download and documentation
The Webcorpus Creator is hosted on GitHub. Documentation is included in the downloadable repository.
Authors
Webcorpus Creator was written by Attila Zséder and Dániel Varga.
License
Webcorpus Creator is made available under the GNU Lesser General Public
License v3.0.
Reference
If you use the tool, please cite this paper.