Hi researchers,
I could use a little help with understanding these dumps:
https://dumps.wikimedia.org/enwikisource/latest/
https://dumps.wikimedia.org/enwiki/20150901/
I'm trying to verify the claim that ENWP is the world's largest open text
project, and to do that I need to verify that ENWP is larger than English
Wikisource. Which files should I be comparing?
Are there any other projects that could make a claim to be a larger open
text project than ENWP? Perhaps there's a library somewhere that has such a
huge volume of out-of-copyright materials that the combined bytes of
published text are larger than ENWP?
Thanks!
Pine