Webcorpus Creator

This project is a collection of scripts and programs for creating a webcorpus from crawled data. The input data is extracted by the Wire crawler and the output is a text file with document separators and raw text

Download and documentation

The Webcorpus Creator is hosted on GitHub. Documentation is included in the downloadable repository.

Authors

Webcorpus Creator was written by Attila Zséder and Dániel Varga.

License

Webcorpus Creator is made available under the GNU Lesser General Public License v3.0.

Reference

If you use the tool, please cite this paper.