there is no 10gb limit, but it is the recommended bucket size if you
want to split up the file, according to my recent discussion with the
archive.org team, and they have been helping me optimize the storage.
the idea of mine is to make smaller blocks that can be fetched quickly
and that people for example reading an article could just load the
data needed to display would be availab le via json(p) or xml/text
from a file.
we can make the wikipedia in a read only mode hosted totallz on the
archive org without a database server by encoding the search binary
trees in json data stored also on archive org, the clients can perform
the searches themselves.
that is my current research on
fosm.org and i hope it can apply to the
wikipedia as well.
mike
On Fri, May 18, 2012 at 9:41 AM, emijrp <emijrp(a)gmail.com> wrote:
There is no such 10GB limit,
http://archive.org/details/ARCHIVETEAM-YV-6360017-6399947 (238 GB example)
ArchiveTeam/WikiTeam is uploading some dumps to Internet Archive, if you
want to join the effort use the mailing list
https://groups.google.com/group/wikiteam-discuss to avoid wasting resources.
2012/5/18 Mike Dupont <jamesmikedupont(a)googlemail.com>
Hello People,
I have completed my first set in uploading the osm/fosm dataset (350gb
unpacked) to
archive.org
http://osmopenlayers.blogspot.de/2012/05/upload-finished.html
We can do something similar with wikipedia, the bucket size of
archive.org is 10gb, we need to split up the data in a way that it is
useful. I have done this by putting each object on one line and each
file contains the full data records and the parts that belong to the
previous block and next block, so you are able to process the blocks
almost stand alone.
mike
_______________________________________________
Wikimedia-l mailing list
Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website:
https://sites.google.com/site/emijrp/
_______________________________________________
Wikimedia-l mailing list
Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
--
James Michael DuPont
Member of Free Libre Open Source Software Kosova
http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world
http://fosm.org
Mozilla Rep
https://reps.mozilla.org/u/h4ck3rm1k3