Hi list,
so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D
I know it's rather a mammoth project so I was wondering if somebody could give me some pointers?
First of all, I would need to know what kind of hardware I should get. Is it possible/smart to have it all in two ginormous MySQL-Instance (one for each of the languages) or will I need to do sharding?
I don't need it to run smoothly. I only need to be able to query the database (and I know some of these queries can run for days)
I will probably have access to some rather powerful machines here at the university and I have also quite a few workstation-machines on which I could theoretically do the sharding.
Thanks in advance Andreas
PS: If it helps: I'm living in Berlin and I will gladly also just have a face-to-face meeting with anybody willing to share wisdom :)
On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
Hi list,
so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D
Just in case: http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
Also, you might want to ask / discuss at
https://lists.wikimedia.org/mailman/listinfo/offline-l
Good luck with this interesting project!
Hi,
You might also try the following mailing list: * XML Data Dumps mailing listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l *
Here is some info on importing XML dumps ( not sure what tools work well but probably the mailing list can help with that) http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
Also, Ariel Glenn recently announced two new tools for importing dumps on the XML list: http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.htm...
Mariya
On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil qgil@wikimedia.org wrote:
On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
Hi list,
so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D
Just in case: http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumpshttp://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
Also, you might want to ask / discuss at
https://lists.wikimedia.org/**mailman/listinfo/offline-lhttps://lists.wikimedia.org/mailman/listinfo/offline-l
Good luck with this interesting project!
-- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/**User:Qgilhttp://www.mediawiki.org/wiki/User:Qgil
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hey Quim, hey Maria,
thank you for your replies! I actually knew where to find the XML-dumps but that pointer about the new XML-import tools is really helpful.
So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS to start my experiments on :) Let's see what this baby can do * http://i.imgur.com/J47GJ.gif *
Thanks again Andreas
On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva mariya.miteva@gmail.comwrote:
Hi,
You might also try the following mailing list:
- XML Data Dumps mailing
listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Here is some info on importing XML dumps ( not sure what tools work well but probably the mailing list can help with that) http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
Also, Ariel Glenn recently announced two new tools for importing dumps on the XML list:
http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.htm...
Mariya
On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil qgil@wikimedia.org wrote:
On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
Hi list,
so I need to set up a local instance of the dewiki- and enwiki-DB with
all
revisions.. :-D
Just in case: http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
Also, you might want to ask / discuss at
https://lists.wikimedia.org/mailman/listinfo/offline-l%3E
Good luck with this interesting project!
-- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/**User:Qgil<
http://www.mediawiki.org/wiki/User:Qgil%3E
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<
https://lists.wikimedia.org/mailman/listinfo/wikitech-l%3E
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Andreas Nüßlein wrote:
so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D
I know it's rather a mammoth project so I was wondering if somebody could give me some pointers?
First of all, I would need to know what kind of hardware I should get. Is it possible/smart to have it all in two ginormous MySQL-Instance (one for each of the languages) or will I need to do sharding?
I don't need it to run smoothly. I only need to be able to query the database (and I know some of these queries can run for days)
I will probably have access to some rather powerful machines here at the university and I have also quite a few workstation-machines on which I could theoretically do the sharding.
Ryan L. or Marc P.: I routed Andreas to this list (from #wikimedia-toolserver), as I figured these questions related to the work that you all have been doing for Wikimedia Labs. Or at least I figured you all probably had some kind of formula for hardware provisioning that might be reusable here. Any pointers would be great. :-)
MZMcBride
wikitech-l@lists.wikimedia.org