Hi there,
I read in an article that KDE and the foundation are announcing a knowledge-intergrated desktop using a webservice API.
This is just great! And the idea of a SOAP/WSDL API is even better.
What is the status or are the plans on this API?
I realy would like to have the wikipedia websites to be accesible trough a API.
I don't know and couldn't find out if there's allready a developing team, ect on this one. But it would be a great feature which I also want to contribute to.
So could anyone tell me the story of the API ;-)
Cheers,
Michiel -- M.P.A. (Michiel) van Hulst e: michiel@vanhulst.nu w: http://michiel.vanhulst.nu
On 1/26/06, Michiel van Hulst michiel@vanhulst.nu wrote:
So could anyone tell me the story of the API ;-)
Of course I may not have all the facts of the matter but this is basically what happened: On 2005-06-23 Jimbo Wales on behalf of Wikimedia/Wikipedia announced that an API (most likely SOAP) would be written to access Wikimedia content. As far as I know the foundation did not actually follow up on this, somebody obviously has to write it and they didn't hire anyone to do that (yet?). Perhaps they were hoping that someone would do it for free upon them announcing it but that obviously hasn't happened. Long story short lots of talk but not much of anything else.
I actually started writing a SOAP API that used standard MediaWiki functions at one point which worked for some limited things like getting article text but didn't finish it because other things came up.
There are several issues with implementing a robot API like that, one is that a lot of our logic is still tied to our current XHTML output code, which would have to be split off into a backend and presentation frontends. Another is that it's inherently hard to write some simple things like getting the first paragraph of an article (or a summary) because we don't store those things relationally, and UA's having to implement their own parser for our syntax isn't really practical due to its complexity.
See the page on meta[1] and bug 208[2]
1. http://meta.wikimedia.org/wiki/KDE_and_Wikipedia 2. http://bugzilla.wikipedia.org/show_bug.cgi?id=208
Ævar, Brion,
Thanks for your replies,
I think an API would be in everybody's intrests. I don't have looked at the complete data structure so I couldn't make a prediction how much effort it would take to code a first verion of an API.
The basis functionality I personaly would like to see in a first sort like API, it to ask a question/page on subject x and receiving this article in clean content.
I'm going to look into the data sets and try to get a feeling what should be done. If this isn't to scary (proberbly is scary) I would like to contribute some of my time to try and setup or code futher on the work allready started or at least have a look at it.
It would be great to have this API, but like all other things in live you need to get you priorities ;-)
Cheers,
Michiel
On Jan 26, 2006, at 9:04 PM, Ævar Arnfjörð Bjarmason wrote:
On 1/26/06, Michiel van Hulst michiel@vanhulst.nu wrote:
So could anyone tell me the story of the API ;-)
Of course I may not have all the facts of the matter but this is basically what happened: On 2005-06-23 Jimbo Wales on behalf of Wikimedia/Wikipedia announced that an API (most likely SOAP) would be written to access Wikimedia content. As far as I know the foundation did not actually follow up on this, somebody obviously has to write it and they didn't hire anyone to do that (yet?). Perhaps they were hoping that someone would do it for free upon them announcing it but that obviously hasn't happened. Long story short lots of talk but not much of anything else.
I actually started writing a SOAP API that used standard MediaWiki functions at one point which worked for some limited things like getting article text but didn't finish it because other things came up.
There are several issues with implementing a robot API like that, one is that a lot of our logic is still tied to our current XHTML output code, which would have to be split off into a backend and presentation frontends. Another is that it's inherently hard to write some simple things like getting the first paragraph of an article (or a summary) because we don't store those things relationally, and UA's having to implement their own parser for our syntax isn't really practical due to its complexity.
See the page on meta[1] and bug 208[2]
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Jan 26, 2006, at 9:06 PM, Brion Vibber wrote:
I read in an article that KDE and the foundation are announcing a knowledge-intergrated desktop using a webservice API.
This is just great! And the idea of a SOAP/WSDL API is even better.
What is the status or are the plans on this API?
We chatted a bit in August and not much has happened since. A bunch of other things that are really annoying but have had more immediate priority (like making the servers work regularly ;) keep coming up.
-- M.P.A. (Michiel) van Hulst e: michiel@vanhulst.nu w: http://michiel.vanhulst.nu
Hi all
One question: Is SOAP really the best for wikipedia here? As I know SOAP, it can't Cache with Squid and normally you have to use a big lib. to to request and handle the response. When I think about million request to the wikimedia servers with SOAP... ;)
Why don't think about REST or simple called: XML over HTTP. You could use all existing mechanisms to create a website, but try to create a XML instead of the website itself. It's not really harder to use then SOAP, but you don't need any* additional knowledge and tools to handle it. And I think the caching is very Important for Wikimedia, and XML over HTTP is the same then HTML over HTTP for Squid I think.*
Did anyone have a Link to the diskussions made where SOAP was choosen? Thx :)
Ævar Arnfjörð Bjarmason schrieb:
On 1/26/06, Michiel van Hulst michiel@vanhulst.nu wrote:
So could anyone tell me the story of the API ;-)
Of course I may not have all the facts of the matter but this is basically what happened: On 2005-06-23 Jimbo Wales on behalf of Wikimedia/Wikipedia announced that an API (most likely SOAP) would be written to access Wikimedia content. As far as I know the foundation did not actually follow up on this, somebody obviously has to write it and they didn't hire anyone to do that (yet?). Perhaps they were hoping that someone would do it for free upon them announcing it but that obviously hasn't happened. Long story short lots of talk but not much of anything else.
I actually started writing a SOAP API that used standard MediaWiki functions at one point which worked for some limited things like getting article text but didn't finish it because other things came up.
There are several issues with implementing a robot API like that, one is that a lot of our logic is still tied to our current XHTML output code, which would have to be split off into a backend and presentation frontends. Another is that it's inherently hard to write some simple things like getting the first paragraph of an article (or a summary) because we don't store those things relationally, and UA's having to implement their own parser for our syntax isn't really practical due to its complexity.
See the page on meta[1] and bug 208[2]
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
HI
Oh, I was to stupid to read the complete message, I missed the links under the text, sorry :) I will now read them.
Leander Hanwald schrieb:
Hi all
One question: Is SOAP really the best for wikipedia here? As I know SOAP, it can't Cache with Squid and normally you have to use a big lib. to to request and handle the response. When I think about million request to the wikimedia servers with SOAP... ;)
Why don't think about REST or simple called: XML over HTTP. You could use all existing mechanisms to create a website, but try to create a XML instead of the website itself. It's not really harder to use then SOAP, but you don't need any* additional knowledge and tools to handle it. And I think the caching is very Important for Wikimedia, and XML over HTTP is the same then HTML over HTTP for Squid I think.*
Did anyone have a Link to the diskussions made where SOAP was choosen? Thx :)
Ævar Arnfjörð Bjarmason schrieb:
On 1/26/06, Michiel van Hulst michiel@vanhulst.nu wrote:
So could anyone tell me the story of the API ;-)
Of course I may not have all the facts of the matter but this is basically what happened: On 2005-06-23 Jimbo Wales on behalf of Wikimedia/Wikipedia announced that an API (most likely SOAP) would be written to access Wikimedia content. As far as I know the foundation did not actually follow up on this, somebody obviously has to write it and they didn't hire anyone to do that (yet?). Perhaps they were hoping that someone would do it for free upon them announcing it but that obviously hasn't happened. Long story short lots of talk but not much of anything else.
I actually started writing a SOAP API that used standard MediaWiki functions at one point which worked for some limited things like getting article text but didn't finish it because other things came up.
There are several issues with implementing a robot API like that, one is that a lot of our logic is still tied to our current XHTML output code, which would have to be split off into a backend and presentation frontends. Another is that it's inherently hard to write some simple things like getting the first paragraph of an article (or a summary) because we don't store those things relationally, and UA's having to implement their own parser for our syntax isn't really practical due to its complexity.
See the page on meta[1] and bug 208[2]
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Thursday 26 January 2006 21:04, Ævar Arnfjörð Bjarmason wrote:
There are several issues with implementing a robot API like that, one is that a lot of our logic is still tied to our current XHTML output code, which would have to be split off into a backend and presentation frontends.
i have spent quite a lot of though on a machine-friendly API for MediaWiki. naturally, at first i focussed on getting some SOAP, XML-RPC or REST-API. then i ran into the above mentioned problem: i would not like to implement an API if i don't have a cleanly separated backend i can use. this would mean much work.
the next thing i did was: i wrote an API using simple screen-scrapping, which works surprisingly good. you can generate regexps from [[Special:Allmessages]] output and take the namespaces from [Special:Export]]. using this API for some time now i noticed: the forms MediaWiki uses and the format of its XHTML seldomly change. the XHTML tags have most of the classes and ids you need find the data you need inside pages.
in a little brainstorming with [[meta:User:Duesentrieb]] the idea of a bot-language appeared, which would make a screenscrapping-API quite stable. seen from the "we need a properly separated backend layer, then we build an API on top", this is ugly, and it does not go far enough for the client-part of the API. besides, brion hates it, because it's not the right way to do it.
after that, i began to see MediaWiki as a somewhat convoluted server for some ajax-application i was building atop of it. again, it's surprising how good such a thing works.
then it came to me: we nearly have a usable REST-style API.
there's three parts to an API: a path for data to get into the server. for this purpose, i think the forms are usable, because they are stable enough. if the forms change, the API changes, yes, but when they are changed there is a reason for it, that in many cases would make a change in the API necessary anyway.
the second part is getting data out of the server: it has to be marshalled in some way the client can understand. the XHTML output our skins produce is not perfect, but it's possible to write a skin that generates well enough structured XHTML with more ids and classes to use the output everywhere XML can be parsed and queried.
the third part is the application logic working with the incoming data and producing outgoing data. this part we get for free - it's already there, it's properly tested an i would not want to mess with it.
daniel
Michiel van Hulst wrote:
I read in an article that KDE and the foundation are announcing a knowledge-intergrated desktop using a webservice API.
This is just great! And the idea of a SOAP/WSDL API is even better.
What is the status or are the plans on this API?
We chatted a bit in August and not much has happened since. A bunch of other things that are really annoying but have had more immediate priority (like making the servers work regularly ;) keep coming up.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org