Hi Pine,

TL;DR: best to just say it's the largest encyclopedia ever. That should be safe.

Claims like this are hard to make because terms that seem concrete from afar tend to break down up close. For example: What do you mean by largest? 

Largest in bytes? Words? Content "units" (articles vs. manuscripts in this case, I guess)? Contributors?

What do you mean by "open text project"? Is archive.org an open text project? It has 8.2 million books. How would you compare the two? Does 1 book = 1 article?

Having said all that, I'm curious how others have/would craft a claim like this. My guess is that most of us who've written for an academic audience have settled for some variant of "largest encyclopedia" (you've got to put something in your Introduction paragraph, after all). What sayst?

J

On Tue, Sep 15, 2015 at 4:45 PM, Pine W <wiki.pine@gmail.com> wrote:
Hi researchers,

I could use a little help with understanding these dumps:

https://dumps.wikimedia.org/enwikisource/latest/

https://dumps.wikimedia.org/enwiki/20150901/

I'm trying to verify the claim that ENWP is the world's largest open text project, and to do that I need to verify that ENWP is larger than English Wikisource. Which files should I be comparing?

Are there any other projects that could make a claim to be a larger open text project than ENWP? Perhaps there's a library somewhere that has such a huge volume of out-of-copyright materials that the combined bytes of published text are larger than ENWP?

Thanks!

Pine


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation