2009/1/30 Johannes Beigel johannes.beigel@pediapress.com:
On 29.01.2009, at 13:48, Brianna Laugher wrote:
On Wikimedia Commons a little bit of work has been done to this end: http://commons.wikimedia.org/wiki/Commons:Commons_API
We've been aware of this page and Magnus' implementation, and we think it looks really good!
The information is (AFAIK) scraped from the rendered XHTML of articles. This could be done in a less error-prone way (and more efficiently) if the data would be stored and accessed via database in some way. Of course this would require some discussion, formal decisions and code changes. But as I stated in an earlier post: I think MediaWiki is so widely used by people who want to share and collaborate on free content, that it's not too farfetched to build some "license infrastracture" into the software itself.
I agree that it makes a lot of sense. But because it would be a big change, I fear that unless the lead developers show great enthusiasm for the idea, it will take a very long time to be accepted and completed. Whereas building an "add-on" tool can be faster to get to point of functionality.
It may be a good idea to try and build the Commons API to mimic the MediaWiki API, imagining that in the future such information will be available via that. So then hopefully for now people could use the Commons API, and in the future switch to the MediaWiki API by just changing the API URL, and all their queries could stay the same.
How does that sound? Other ideas about how to approach it are welcome...
cheers Brianna
Brianna Laugher schrieb:
I agree that it makes a lot of sense. But because it would be a big change, I fear that unless the lead developers show great enthusiasm for the idea, it will take a very long time to be accepted and completed. Whereas building an "add-on" tool can be faster to get to point of functionality.
Guys, before re-inventing several wheels, please look at what we already have.
Please have a look at http://commons.wikimedia.org/wiki/Commons:Tag_categories, which defines a way to make license tags machine readable. Using that scheme, it would be easy to build a script on the toolserver that delivers metadata in a machine readable form. No need for screen scraping.
Also, please consider http://www.mediawiki.org/wiki/Extension:RDF which provides a way for mediawiki to serve machine readable metadata about anything and everything. It would be easy to integrate it into license tags. It has been around for years, all it needs is a little push from the community and some code review.
-- daniel
On Fri, Jan 30, 2009 at 8:24 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Brianna Laugher schrieb:
I agree that it makes a lot of sense. But because it would be a big change, I fear that unless the lead developers show great enthusiasm for the idea, it will take a very long time to be accepted and completed. Whereas building an "add-on" tool can be faster to get to point of functionality.
Guys, before re-inventing several wheels, please look at what we already have.
Please have a look at http://commons.wikimedia.org/wiki/Commons:Tag_categories, which defines a way to make license tags machine readable. Using that scheme, it would be easy to build a script on the toolserver that delivers metadata in a machine readable form. No need for screen scraping.
Yes there is. Not for the license name (which I get using categories in my experimental API), but for things like name of author etc. These are only available as either HTML tag IDs (which I use) or raw wikitext.
Magnus
On Fri, Jan 30, 2009 at 12:55 AM, Brianna Laugher brianna.laugher@gmail.com wrote:
2009/1/30 Johannes Beigel johannes.beigel@pediapress.com:
On 29.01.2009, at 13:48, Brianna Laugher wrote:
On Wikimedia Commons a little bit of work has been done to this end: http://commons.wikimedia.org/wiki/Commons:Commons_API
We've been aware of this page and Magnus' implementation, and we think it looks really good!
The information is (AFAIK) scraped from the rendered XHTML of articles. This could be done in a less error-prone way (and more efficiently) if the data would be stored and accessed via database in some way. Of course this would require some discussion, formal decisions and code changes. But as I stated in an earlier post: I think MediaWiki is so widely used by people who want to share and collaborate on free content, that it's not too farfetched to build some "license infrastracture" into the software itself.
I agree that it makes a lot of sense. But because it would be a big change, I fear that unless the lead developers show great enthusiasm for the idea, it will take a very long time to be accepted and completed. Whereas building an "add-on" tool can be faster to get to point of functionality.
It may be a good idea to try and build the Commons API to mimic the MediaWiki API, imagining that in the future such information will be available via that. So then hopefully for now people could use the Commons API, and in the future switch to the MediaWiki API by just changing the API URL, and all their queries could stay the same.
There is a big conceptual difference between the two APIs, IMHO. The MediaWiki API can be used to query technically defined things: Link lists, categories, template usage and so on. A Commons API (mine or someone elses) parses the content itself for data and relations that are not technically defined.
One way would be to add some kind of license metadata per page into the database. This is possible, but rather specific; also, it would likely mean to create a separate interface just for that.
The better way (IMHO) is to store all used "page:template:parameter:value" tuples in a wiki in a separate database table, which could be queried by the MediaWiki API. This has been suggested time and again by me and others. It would then be much easier for a third-party API to get the relevant data for a page. The functionality is part of Semantic Wikimedia, but would actually scale as a project on its own ;-)
This approach would also aloow for the integration of tools like TemplateTiger [1] directly into Wikipedia.
Magnus
[1] http://toolserver.org/~kolossos/templatetiger/tt-table4.php?template=Persond...
Magnus Manske schrieb:
The better way (IMHO) is to store all used "page:template:parameter:value" tuples in a wiki in a separate database table, which could be queried by the MediaWiki API. This has been suggested time and again by me and others. It would then be much easier for a third-party API to get the relevant data for a page. The functionality is part of Semantic Wikimedia, but would actually scale as a project on its own ;-)
Indeed. Here'S my take on it: http://brightbyte.de/page/WikiData_light. I have proposed this as a project to the German chapter, maybe it'll actually be taken on...
No need for screen scraping.
Yes there is. Not for the license name (which I get using categories in my experimental API), but for things like name of author etc. These are only available as either HTML tag IDs (which I use) or raw wikitext.
Yes you are right, i was only thinking of the meta-info about licenses themselves. For authorship info, you'd need screen scarping -- or stored page:template:parameter:value tuples.
I REALLY want that. It would be extremly useful for a LOT of things. And it's not hard to do. CC-ing mediawiki-l.
-- daniel