The main limitation is that MongoDB has only rudimentary support for
parallelism. I'm trying to design a system that various departments
can use as a data source, and the statistics on the Editor Trends page
show MongoDB maxed out for days to dump en.wiki. I'd like more ability
to grow capacity, especially long-term.
On Sun, Feb 13, 2011 at 15:43, Steven Walling <steven.walling(a)gmail.com> wrote:
On Sun, Feb 13, 2011 at 3:32 PM, David Strauss
<david(a)davidstrauss.net>
wrote:
Edit history in an accessible form -- create a
queryable NoSQL form of
data dumps
I'd like to get this started ASAP. I think we can set up a bridge to
synchronize directly from MediaWiki to a tool like Cassandra. It will
provide a superior source for both XML dumps and analysis.
See
http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software for an
already ongoing project very similar to this notion.
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
David Strauss
| david(a)davidstrauss.net
| +1 512 577 5827 [mobile]