On Tue, Jul 21, 2009 at 2:20 PM, Chengbin Zheng <chengbinzheng(a)gmail.com>wrote;wrote:
On Tue, Jul 21, 2009 at 1:49 PM, Chad <innocentkiller(a)gmail.com> wrote:
On Tue, Jul 21, 2009 at 1:42 PM,
Tei<oscar.vives(a)gmail.com> wrote:
On Tue, Jul 21, 2009 at 7:17 PM, Chengbin
Zheng<chengbinzheng(a)gmail.com>
wrote:
...
>
> No, I know what parsing means. Even if it takes 2 days to parse them,
> wouldn't it be faster than to actually create a static HTML dump the
> traditional way?
>
> If it is not, then what is the difficulty of making static HTML dumps?
It
can't
be bandwidth, storage, or speed.
WikiMedia work with limited resources on manpower, hardware, etc..etc...
Things are done. When? when theres available resources, humans and of
the other types.
Is not only you, there are lots of people that want to download the
wikipedia (sometimes in a periodic fashion)
There are a log somewhere with the daily work of some wikipedia admin. (
- :
http://wikitech.wikimedia.org/view/Server_admin_log
Some of these are even very fun, like in:
02:11 b****: CPAN sux
01:47 d******: I FOUND HOW TO REVIVE APACHES
( names obscured to protect the inocents ).
--
--
ℱin del ℳensaje.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hehe, seeing as like there's only 10 different names on there, it's
pretty easy to figure out who B and D are ;-)
-Chad
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I can't imagine the need of downloading Wikipedia often for personal use.
The amount of work (or should I say pain) involved to get Wikipedia working,
umm, I don't want to do that often.
The only reason I'm doing it is I want a copy of Wikipedia on the go.
Finding Wi-Fi hotspots is hard (especially in a subway, LOL). It can save me
time, as I can do research anytime I want, anywhere I want, for example in
the subway. I'm not downloading the current static HTML dump because
1: It is very outdated.
2: It contains a LOT of useless information, hogging up half the space.
Space is a big priority, as the English Wikipedia is what, 300GB
uncompressed including "junk". The next Archos PMP releasing in September is
said to have a 500GB hard drive, but I doubt it, even though I hope so,
because I would need 500GB if I'm putting Wikipedia on it (my videos are
taking 220ish GB already on my Archos 5). Seriously hoping the next Archos
supports NTFS (compression feature, cuts size by about half). How hard is it
to get Linux to support NTFS?
Why would you download Wikipedia? Internet is so readily available, and the
online version has images.
I downloaded the static HTML dump for another language to do a MUCH MUCH
smaller scale test to see if it actually works. It works brilliantly. Even
the search function works!! I didn't expect that to work. How does the
search function work? I thought it is like search in Windows, but since
everything is on RAM, website searches are instantaneous. I'm running this
on hard drive, and it is instantaneous as well.
BTW, the pages-articles.xml.bz2 version of the XML dump, does it include
links to images, even though images don't exist? I find those pages taking
up a lot of space as well.