[Mediawiki-api] Mirroring Wikipedia through the API?

Roan Kattouw roan.kattouw at home.nl
Sun Jul 13 19:56:40 UTC 2008


Dirk Riehle schreef:
> Hi,
>
> looking through the API: http://en.wikipedia.org/w/api.php I can't find 
> any way to get at the actual page contents. Is this correct?
>   
Like someone else said before me, you can get unparsed wikitext through 
prop=revisions&rvprop=content . If you want HTML without the sidebar and 
all that, you can get it with

index.php?action=render&title=Foo
> Finally, and that's why I'm sending this email to this mailing list: How 
> does Powerset do this: Go to powerset.com, search for something you 
> might find in Wikipedia, and see how it provides an uptodate 
> (click-through) copy of the Wikipedia page. My hunch is that the they 
> use a database dump for search and then screen-scrape, or is there a 
> better explanation?
You can actually search Wikipedia through the API:

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Foo&srwhat=text&srlimit=5
(gets a list of 5 pages containing 'Foo')

http://en.wikipedia.org/w/api.php?action=query&generator=search&gsrsearch=Foo&gsrwhat=text&&gsrlimit=5prop=revisions&rvprop=content
(gets the contents of those pages)

You can also get search suggestions in the OpenSearch format with 
http://en.wikipedia.org/w/api.php?action=opensearch&search=Te

Roan Kattouw (Catrope)



More information about the Mediawiki-api mailing list