Snapshot wikipedia installations in schools

List overview All Threads
Download

newer

older

Pear wiki text !

Re: Fundraising

Andy Rabagliati

8 Jun 2004 8 Jun '04

12:33 p.m.

Folks,

In conjunction with the Wizzy Digital Courier internet access project http://wizzy.org.za/ I am considering installing wikipedia.

Wizzy Digital Courier is primarily to enable Internet access for countries / communities that cannot afford daytime internet access, due to Telcom monopolies and per-minute local telephone charges.

In two pilot schools, I have installed wikipedia on their intranet, with great success. After initial skeptism from teachers, it has become a very useful resource.

I install the wikimedia tarball, and a weekly snapshot.

Due to the time taken to download new snapshots, I only do these between semesters - it can take a week or so to download, using UUCP at cheap rates in evenings and weekends.

After feedback from teachers, I have two requests :-

1. Graphics (you knew that was coming !)

To determine feasibility of this, can you give me an idea of the size of a potential graphics archive ? An incremental snapshot would almost be mandatory for this, I think.

2. A daily SQL update for the main page.

This will give it a newsy feel, instead of talking about Mandarin Chinese for two months :-) I know that some of the articles it refers to will not be available, but ... tough.

Thanks again for an invaluable resource that almost replaces the web at one fell swoop in places that cannot afford the web.

Cheers, Andy!

Show replies by date

Magnus Manske

8 Jun 8 Jun

1:03 p.m.

Andy Rabagliati wrote:

...

Graphics (you knew that was coming !)

To determine feasibility of this, can you give me an idea of the size of a potential graphics archive ? An incremental snapshot would almost be mandatory for this, I think.

Well, the *German* images sum up to ~1GB, so the English ones /might/ still fit on a DVD...

...

A daily SQL update for the main page.

This will give it a newsy feel, instead of talking about Mandarin Chinese for two months :-) I know that some of the articles it refers to will not be available, but ... tough.

There are wikipedia "offline readers" under construction - my "Waikiki", and a "WikiRover". Both use their own rendereing software and a sqlite database, which opens the possibility of a daily sync with the wikipedia database. Nothing definite so far.

Magnus

Alfio Puglisi

1:25 p.m.

On Tue, 8 Jun 2004, Magnus Manske wrote:

...

Andy Rabagliati wrote:

...

Graphics (you knew that was coming !)

To determine feasibility of this, can you give me an idea of the size of a potential graphics archive ? An incremental snapshot would almost be mandatory for this, I think.

Well, the *German* images sum up to ~1GB, so the English ones /might/ still fit on a DVD...

They do. Images on en: sum up to almost 2GB.

Alfio

Fennec Foxen

3:24 p.m.

On Tue, 8 Jun 2004 14:33:54 +0200, Andy Rabagliati andyr@wizzy.com wrote:

...

Folks,

In conjunction with the Wizzy Digital Courier internet access project http://wizzy.org.za/ I am considering installing wikipedia.

No, no, no, I think you're planning on installing MediaWiki, and setting it up with a *snapshot* of Wikipedia content. (let's play pedantic so no one gets confused, eh? :)

Ashar Voultoiz

5:33 p.m.

Hello Andy

Andy Rabagliati wrote:

...

Graphics (you knew that was coming !)

To determine feasibility of this, can you give me an idea of the size of a potential graphics archive ? An incremental snapshot would almost be mandatory for this, I think.

We currently can't distribute the image archive cause of copyright issues. Somes images are made available under "fair use" or don't have any license information (so they might be (c)). The .plan is to add a license field for images that will allow us to make 100% free image snapshot.

...

A daily SQL update for the main page.

This will give it a newsy feel, instead of talking about Mandarin Chinese for two months :-) I know that some of the articles it refers to will not be available, but ... tough.

You can export the main page of en: by using the http://en.wikipedia.org/wiki/Special:Export (might need to be logged in). In the field just request "Main Page". It will output a xml feed that can then be imported in your local wiki. For import you can either use Special:Import page (available to sysops only, might not be available in 1.2.X versions) or build a little parser that will read the xml and put it in the database.

cheers,

-- Ashar Voultoiz

Andy Rabagliati

9 Jun 9 Jun

7:07 a.m.

On Tue, 08 Jun 2004, Ashar Voultoiz wrote:

...

Hello Andy

Andy Rabagliati wrote:

...

Graphics (you knew that was coming !)

We currently can't distribute the image archive cause of copyright issues. Somes images are made available under "fair use" or don't have any license information (so they might be (c)). The .plan is to add a license field for images that will allow us to make 100% free image snapshot.

I have seen some of the discussion. I am just passing on the number one request !

...

...

A daily SQL update for the main page.

You can export the main page of en: by using the http://en.wikipedia.org/wiki/Special:Export (might need to be logged in). In the field just request "Main Page". It will output a xml feed that can then be imported in your local wiki. For import you can either use Special:Import page (available to sysops only, might not be available in 1.2.X versions) or build a little parser that will read the xml and put it in the database.

For this to work for me, it has to be fully automateable, over UUCP.

Is there a script that will take the XML export and incorporate it into the MySQL database so the Mediawiki front end will see it ?

Then a cronjob my end pulls the XML, and sends it to the school via UUCP, which stuffs it on stdin to a script, and the wiki is magically updated every day.

Thanks for your solid suggestions.

Cheers, Andy!

Ashar Voultoiz

7:51 a.m.

Andy Rabagliati wrote: <snip xml export>

...

For this to work for me, it has to be fully automateable, over UUCP.

Is there a script that will take the XML export and incorporate it into the MySQL database so the Mediawiki front end will see it ?

Then a cronjob my end pulls the XML, and sends it to the school via UUCP, which stuffs it on stdin to a script, and the wiki is magically updated every day.

Hello,

I just discovered the special:export page accepted a page as parameter in the url. XML for en:Main_Page can be retriew at: http://en.wikipedia.org/wiki/Special:Export/Main_Page That code doesn't change that much anyway, it would be better to get the sub sections instead like {{Did you know}} & {{In_the_news}}:

http://en.wikipedia.org/wiki/Special:Export/Template:Did_you_know http://en.wikipedia.org/wiki/Special:Export/Template:In_the_news

The Special:Import page isn't ready yet, although you can get its code through: http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/includes/SpecialImpor... and maybe help developpers to get it working correctly :o) In your case, I think you should make a simple xml parsing script that update the database.

-- Ashar Voultoiz

Magnus Manske

7:58 a.m.

Andy Rabagliati wrote:

...

...
You can export the main page of en: by using the http://en.wikipedia.org/wiki/Special:Export (might need to be logged in). In the field just request "Main Page". It will output a xml feed that can then be imported in your local wiki. For import you can either use Special:Import page (available to sysops only, might not be available in 1.2.X versions) or build a little parser that will read the xml and put it in the database.

For this to work for me, it has to be fully automateable, over UUCP.

Is there a script that will take the XML export and incorporate it into the MySQL database so the Mediawiki front end will see it ?

Then a cronjob my end pulls the XML, and sends it to the school via UUCP, which stuffs it on stdin to a script, and the wiki is magically updated every day.

I am currently working on a Window$ syncing software. It will be able to read the XML created by [[Special:Export]]. All I need on the wikipedia end is a function that returns a plain text list of all articles that have been edited since a given date. A single, simple SQL query on the wikipedia server will do.

Magnus

Jay Bowks

8 Jun 8 Jun

9:43 p.m.

From: "Andy Rabagliati" andyr@wizzy.com

...

Folks, [...] Due to the time taken to download new snapshots, I only do these between semesters - it can take a week or so to download, using UUCP at cheap rates in evenings and weekends.

Graphics (you knew that was coming !)

Hi Andy, I work with three different school districts in SW New Hampshire, USA. We have the students using Wikipedia as a resource with different access. One school has an AOL dial up account at 33.6k another a frame relay connection and one more on local cable broad band. We've tried it all!

One outlying school cannot afford a live fulltime connection and we download whole websites and mirror them on the server when requested by teachers, so the process is during nighttime. For this school I've brought a lot of information on CD-R's, shared it out of the drive and have the kids access the info. These are all just ways to make do, untill they get faster access in the area, (which is very rural).

I was thinking that if you could use a backup sent to you on CD-R or a DVD of whatever images or tarballs will be needed it would save you incredible amounts of download time, wouldn't it?

...

A daily SQL update for the main page.

Even a weekend update would be quite recent and something to look forward to for the students and teachers.

...

Thanks again for an invaluable resource that almost replaces the web at one fell swoop in places that cannot afford the web. Cheers, Andy!

I wholeheartedly agree, Andy. I feel that Wikipedia is an excellent resource for schools!

With regards, Jay B.

Ray Saintonge

10:25 p.m.

Jay Bowks wrote:

...

From: "Andy Rabagliati" andyr@wizzy.com

...
Folks, [...] Due to the time taken to download new snapshots, I only do these between semesters - it can take a week or so to download, using UUCP at cheap rates in evenings and weekends.

Graphics (you knew that was coming !)

Hi Andy, I work with three different school districts in SW New Hampshire, USA. We have the students using Wikipedia as a resource with different access. One school has an AOL dial up account at 33.6k another a frame relay connection and one more on local cable broad band. We've tried it all!

One outlying school cannot afford a live fulltime connection and we download whole websites and mirror them on the server when requested by teachers, so the process is during nighttime. For this school I've brought a lot of information on CD-R's, shared it out of the drive and have the kids access the info. These are all just ways to make do, untill they get faster access in the area, (which is very rural).

I was thinking that if you could use a backup sent to you on CD-R or a DVD of whatever images or tarballs will be needed it would save you incredible amounts of download time, wouldn't it?

...

A daily SQL update for the main page.

Even a weekend update would be quite recent and something to look forward to for the students and teachers.

...
Thanks again for an invaluable resource that almost replaces the web at one fell swoop in places that cannot afford the web. Cheers, Andy!

I wholeheartedly agree, Andy. I feel that Wikipedia is an excellent resource for schools!

Andy brings the problem down to earth. We can still find places with limited Internet access without having to look in Africa. The United States has some of the best colleges and universities in the world, but an OECD study a couple years ago showed that education at the elementary and secondary levels it is remarkable for the inconsistent quality between one district and an other.

The poorer school districts need all the help they can get.

Jay Bowks

9:58 p.m.

New subject: Hmmm... Wiki Digital Courier

From: "Andy Rabagliati" andyr@wizzy.com

...

In conjunction with the Wizzy Digital Courier internet access project http://wizzy.org.za/ I am considering installing wikipedia.

Hmm, maybe they'll even consider a "Wiki Digital Courier", delivering the latest update, weekly or daily etc., but on CD-R or on DVD if it's some of the large wikis. As I mentioned in a previous response.

For very rural communities such as the one's that would use Wizzy, this would be a plus. Perhaps a semester DVD and then CD-R updates to keep it in sync.

What do you think?

With regards, Jay B. [[User:ILVI]]

Tim Starling

9 Jun 9 Jun

6:47 a.m.

...

After feedback from teachers, I have two requests :-

Graphics (you knew that was coming !)

To determine feasibility of this, can you give me an idea of the size of a potential graphics archive ? An incremental snapshot would almost be mandatory for this, I think.

We've avoided doing this in the past for legal reasons, but I think with an appropriate disclaimer on the download page it shouldn't be a problem. To say it's worth navigating the legal hassles so that we can educate thousands of kids is in my opinion a vast understatement.

My suggestion for the text is:

Please note: these images are provided for use with a Wikipedia installation only, and may only be displayed within Wikipedia articles. Many images are not licensed under GFDL and are only available as "fair use". Some images may be used freely, please check the associated image description pages. For more information see http://en.wikipedia.org/wiki/Fair_use

Since the images are already compressed, many with an LZW-type algorithm, it probably wouldn't gain much to attempt to compress further. I'll make an uncompressed tar file.

...

A daily SQL update for the main page.

This will give it a newsy feel, instead of talking about Mandarin Chinese for two months :-) I know that some of the articles it refers to will not be available, but ... tough.

As Ashar Voultoiz pointed out, you can obtain the wikitext for any page using Special:Export. Perhaps someone could write a script to regularly download the wikitext and save it to a database.

-- Tim Starling

7329

Age (days ago)

7330

Last active (days ago)

wikipedia-l@lists.wikimedia.org

11 comments

8 participants

tags (0)

participants (8)

Alfio Puglisi
Andy Rabagliati
Ashar Voultoiz
Fennec Foxen
Jay Bowks
Magnus Manske
Ray Saintonge
Tim Starling