I hadn't added archive.org because it requires us to upload the files, and
the files cannot be deleted in the future, so, upload the latest dumps
(updated every month or so) could be a waste of resources for them. I was
thinking about mirrors who runs wget to slurp all the files from
download.wikimedia.org, and the next month delete the previous ones.
Of course, we can contact to archive.org to ask them about the wget idea.
2010/11/16 paolo massa <paolo(a)gnuband.org>
> I've added archive.org ;)
> On Tue, Nov 16, 2010 at 12:05 PM, emijrp <emijrp(a)gmail.com> wrote:
> > Hi all;
> > I have started a new page in meta: for coordinating the efforts in
> > Wikimedia project XML dumps. I asked some days ago to iBiblio if they
> > interested in this, but they replied: "Unfortunately, we do not have the
> > resources to provide a mirror of wikipedia. Best of luck!"
> > I think that we must work on this, so, all the help is welcome. If you
> > about universities, archives, etc, that could be interested in get a copy
> > the XML files, for backup or research purposes, please, add them to the
> > and we can send them a letter.
> > We are compiling all the human knowledge! That deserves being mirroring
> > nauseam!
> > Regards,
> > emijrp
> > 
> > _______________________________________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
> Paolo Massa
> Email: paolo AT gnuband DOT org
> Blog: http://gnuband.org
I have started a new page in meta: for coordinating the efforts in mirroring
Wikimedia project XML dumps. I asked some days ago to iBiblio if they were
interested in this, but they replied: *"Unfortunately, we do not have the
resources to provide a mirror of wikipedia. Best of luck!"
*I think that we must work on this, so, all the help is welcome. If you know
about universities, archives, etc, that could be interested in get a copy of
the XML files, for backup or research purposes, please, add them to the list
and we can send them a letter.
We are compiling all the human knowledge! That deserves being mirroring ad
I was thinking of the last 3 - 4 month, thus July - October 2010.
On 2010-11-15, at 10:30 AM, emijrp wrote:
> Define recent.
> 2010/11/12 Diederik van Liere <dvanliere(a)gmail.com>
> As we all know, download.wikimedia.org is temporarily offline. Does
> somebody have a recent stub-meta-history.xml available (any language
> is okay)?
> Best regards,
> Xmldatadumps-l mailing list
I'm going to construct an ontology database with wikipedia. What I
want to do is importing datadumps into a database and then extract knowledge
from the databse. But I find a problem .As you know that Chinese contains
Simplified Chinese and Traditional Chinese. When I check the data in the
dumps, I find both Simplified Chinese and Traditional Chinese mixes
together. I don't know how to convert Traditional Chinese to Simplified
Chinese. Is that possible I use the datadumps to construct my ontology
The datadumps I download is "zhwiki-20101014".
We noticed a kernel panic message and stack trace in the logs on the
server that servers XML dumps. The web server that provides access to
these files is temporarily out of commission; we hope to have it back on
line in 12 hours or less. Dumps themselves have been suspended while we
investigate. I hope to have an update on this tomorrow as well.
I have download the "Database backup dumps" of chinese Edition. There are
files with XML and sql format. I want to have data all in database like
MySQL. Can I get this data (especially the XML format) to MySQL database
without using MediaWiki? How to do this if possible?
Where can I get the format details of each dump? Because I have read
contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two
eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits
raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how
to get rid it.
Y Mucho amor
Este mensaje le ha llegado mediante el servicio de correo electronico
que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistem
a Nacional de Salud. La persona que envia este correo asume el compromiso de
usar el servicio a tales fines y cumplir con las regulaciones establecidas
Are there statistics about how many people download the dumps? Not only the
hits, also the completed downloads (is it possible?), if not, the wasted
bandwidth would be a good measure.