Re: [Mediawiki-api] Mirroring Wikipedia through the API?

12 Jul 2008

Dirk Riehle wrote:
...
  looking through the API:
http://en.wikipedia.org/w/api.php I can't find 
 any way to get at the actual page contents. Is this correct? 
I'm not sure how to do this via the API (I believe there is a way 
though), but you can use action=raw on index.php e.g. [1].

...
  I assume this is deliberate to avoid Wikipedia
mirroring or the like? I 
 remember discussions where WP mirroring was frowned upon. While I can 
 see that folks may not like it if someone uses Wikipedia to make money 
 through Adsense I don't quite understand how you can prevent it (you 
 can't, given the GFDL, I think).

 More importantly, I can think of many legitimate uses of Wikpedia where 
 someone wants to mirror it and enhance the functionality. I can envision 
 many users who are better served through specific apps in front of 
 Wikipedia than passive contents added to WP like bots do it. And the WMF 
 is unlikely to write all these apps, nor will it want to operate them I 
 assume. How is that handled? Simply not allowed? 
It is not permitted as it puts stress on the database servers, you can 
purchase a live feed though, see [2].

...
  Finally, and that's why I'm sending this email
to this mailing list: How 
 does Powerset do this: Go to powerset.com, search for something you 
 might find in Wikipedia, and see how it provides an uptodate 
 (click-through) copy of the Wikipedia page. My hunch is that the they 
 use a database dump for search and then screen-scrape, or is there a 
 better explanation? 
One would assume they use a database dump, which contains the page text, 
and simply parse the wikitext for information.  The database dump can 
also be used for searching and other tasks (nearly everything the API 
can do, and more) when imported into a wiki - which they would be able 
to do.  It is also possible they could be using a live feed to stay up 
to date.  Screen scraping is not necessary for this.

[1] http://en.wikipedia.org/w/index.php?action=raw&title=Main_Page
[2] http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Mediawiki-api] Mirroring Wikipedia through the API?