Re: [Wikitech-l] Simple way to convert XML to HTML

21 Jul 2009

On Tue, Jul 21, 2009 at 2:20 PM, Chengbin Zheng &lt;chengbinzheng(a)gmail.com&gt;wrote;wrote:

...

 On Tue, Jul 21, 2009 at 1:49 PM, Chad &lt;innocentkiller(a)gmail.com&gt; wrote:

  On Tue, Jul 21, 2009 at 1:42 PM,
Tei&lt;oscar.vives(a)gmail.com&gt; wrote:
  On Tue, Jul 21, 2009 at 7:17 PM, Chengbin
Zheng&lt;chengbinzheng(a)gmail.com&gt;  wrote:
  ...
>
> No, I know what parsing means. Even if it takes 2 days to parse them,
> wouldn't it be faster than to actually create a static HTML dump the
> traditional way?
>
> If it is not, then what is the difficulty of making static HTML dumps?  It
   can't
be bandwidth, storage, or speed.

 WikiMedia work with limited resources on manpower, hardware, etc..etc...

 Things are done. When? when theres available resources, humans and of
 the other types.
 Is not only you, there are lots of people that want to download the
 wikipedia (sometimes in a periodic fashion)

 There are a log somewhere with the daily work of some wikipedia admin. (  - :

http://wikitech.wikimedia.org/view/Server_admin_log

 Some of these are even very fun, like in:
 02:11 b****: CPAN sux
 01:47 d******: I FOUND HOW TO REVIVE APACHES
 ( names obscured to protect the inocents ).

 --
 --
 ℱin del ℳensaje.

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l 
 Hehe, seeing as like there's only 10 different names on there, it's
 pretty easy to figure out who B and D are ;-)

 -Chad

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 I can't imagine the need of downloading Wikipedia often for personal use.
 The amount of work (or should I say pain) involved to get Wikipedia working,
 umm, I don't want to do that often.

 The only reason I'm doing it is I want a copy of Wikipedia on the go.
 Finding Wi-Fi hotspots is hard (especially in a subway, LOL). It can save me
 time, as I can do research anytime I want, anywhere I want, for example in
 the subway. I'm not downloading the current static HTML dump because

 1: It is very outdated.
 2: It contains a LOT of useless information, hogging up half the space.
 Space is a big priority, as the English Wikipedia is what, 300GB
 uncompressed including "junk". The next Archos PMP releasing in September is
 said to have a 500GB hard drive, but I doubt it, even though I hope so,
 because I would need 500GB if I'm putting Wikipedia on it (my videos are
 taking 220ish GB already on my Archos 5). Seriously hoping the next Archos
 supports NTFS (compression feature, cuts size by about half). How hard is it
 to get Linux to support NTFS?

 Why would you download Wikipedia? Internet is so readily available, and the
 online version has images.

I downloaded the static HTML dump for another language to do a MUCH MUCH
smaller scale test to see if it actually works. It works brilliantly. Even
the search function works!! I didn't expect that to work. How does the
search function work? I thought it is like search in Windows, but since
everything is on RAM, website searches are instantaneous. I'm running this
on hard drive, and it is instantaneous as well.

BTW, the pages-articles.xml.bz2 version of the XML dump, does it include
links to images, even though images don't exist? I find those pages taking
up a lot of space as well.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Simple way to convert XML to HTML