Hi mediawiki developers,
We (Google) are trying to maintain our internal mirror of wikidata.orgup-to-date in real-time, so that our indexing pipeline can get latest interlanguage information in real-time.
I noticed wikidata.org is also powered by standard MediaWiki software, where standard query api to a specific revision works, e.g. revision query: http://www.wikidata.org/w/api.php?action=query&prop=revisions&format... and recentchanges: http://wikidata.org/w/api.php?action=query&list=recentchanges&format...
My questions are:
- Are the APIs above ("action=query&prop=revisions" and "actioon=query&list=recentchanges") the supported way to retrieve wikidata.org in realtime? - Is there any document about the json format in response? It looks to me that "links" are about interwiki/interlanguage links (or sitelinks in wikidata.org's terminology), but I feel more comfortable if I see some official document about this. - There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is there a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
Thanks
Hi Jiang,
I don't work on the Wikidata project itself, so I will let one of the project members advise you on the best use of action=query&list=recentchanges. A year ago I created a simple realtime Web visualization of changes in Wikipedias [1]. It logs into the language specific IRC chatrooms and listens for updates from the bots in there.
I pulled out the code that listens for updates and parses the IRC messages into a small Node library called wikichanges [2], which you could theoretically use to track wikidata changes without having to poll the API. However you would still need to talk to the API to get the content of the changes.
I updated the example.js [3] to show how to just get updates for wikidata if you wanted to take a look. I got the idea for the project from Patrick Sinclair who wrote a bot for keeping BBC content up to date with Wikipedia.
//Ed
[1] http://wikistream.inkdroid.org/ [2] https://github.com/edsu/wikichanges [3] https://github.com/edsu/wikichanges/blob/master/example.js
On Mon, Mar 4, 2013 at 2:44 AM, Jiang BIAN bianjiang@google.com wrote:
Hi mediawiki developers,
We (Google) are trying to maintain our internal mirror of wikidata.org up-to-date in real-time, so that our indexing pipeline can get latest interlanguage information in real-time.
I noticed wikidata.org is also powered by standard MediaWiki software, where standard query api to a specific revision works, e.g. revision query: http://www.wikidata.org/w/api.php?action=query&prop=revisions&format... and recentchanges: http://wikidata.org/w/api.php?action=query&list=recentchanges&format...
My questions are:
Are the APIs above ("action=query&prop=revisions" and "actioon=query&list=recentchanges") the supported way to retrieve wikidata.org in realtime? Is there any document about the json format in response? It looks to me that "links" are about interwiki/interlanguage links (or sitelinks in wikidata.org's terminology), but I feel more comfortable if I see some official document about this. There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is there a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
Thanks
-- Jiang BIAN
This email may be confidential or privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it went to the wrong person. Thanks.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Jiang,
Not sure if this will answer your questions but here are the API autogenerated docs https://en.wikipedia.org/w/api.php.
Also, if you don't get the information you need here, you might want to consult the mediawiki-api maling listhttps://lists.wikimedia.org/mailman/listinfo/mediawiki-apifor more details on the API.
Mariya
On Mon, Mar 4, 2013 at 11:14 AM, Ed Summers ehs@pobox.com wrote:
Hi Jiang,
I don't work on the Wikidata project itself, so I will let one of the project members advise you on the best use of action=query&list=recentchanges. A year ago I created a simple realtime Web visualization of changes in Wikipedias [1]. It logs into the language specific IRC chatrooms and listens for updates from the bots in there.
I pulled out the code that listens for updates and parses the IRC messages into a small Node library called wikichanges [2], which you could theoretically use to track wikidata changes without having to poll the API. However you would still need to talk to the API to get the content of the changes.
I updated the example.js [3] to show how to just get updates for wikidata if you wanted to take a look. I got the idea for the project from Patrick Sinclair who wrote a bot for keeping BBC content up to date with Wikipedia.
//Ed
[1] http://wikistream.inkdroid.org/ [2] https://github.com/edsu/wikichanges [3] https://github.com/edsu/wikichanges/blob/master/example.js
On Mon, Mar 4, 2013 at 2:44 AM, Jiang BIAN bianjiang@google.com wrote:
Hi mediawiki developers,
We (Google) are trying to maintain our internal mirror of wikidata.org up-to-date in real-time, so that our indexing pipeline can get latest interlanguage information in real-time.
I noticed wikidata.org is also powered by standard MediaWiki software,
where
standard query api to a specific revision works, e.g. revision query:
http://www.wikidata.org/w/api.php?action=query&prop=revisions&format...
and recentchanges:
http://wikidata.org/w/api.php?action=query&list=recentchanges&format...
My questions are:
Are the APIs above ("action=query&prop=revisions" and "actioon=query&list=recentchanges") the supported way to retrieve wikidata.org in realtime? Is there any document about the json format in response? It looks to me
that
"links" are about interwiki/interlanguage links (or sitelinks in wikidata.org's terminology), but I feel more comfortable if I see some official document about this. There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is
there
a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
Thanks
-- Jiang BIAN
This email may be confidential or privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it went to
the
wrong person. Thanks.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 4 March 2013 07:44, Jiang BIAN bianjiang@google.com wrote:
There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is there a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
I can't answer on the APIs, but the three-letter prefixes are also languages - got is Gothic, xmf Mingrelian, etc.
XXwiki or XXXwiki will always refer to the XX or XXX language Wikipedia using standard ISO 639-1 or 639-2 codes; there are a couple of exceptions, such as simplewiki, but anything with two or three characters should be reliable.
Hey,
There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is
there
a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
You cannot infer the language from the site identifier. "enwiki" is a site identifier. The software allows having multiple sites for the same language. For instance you could have an entity that is also described on the English Wikitionary. Or an entity described on a third party website as well, such as a movie on imdb. Unfortunately it looks like we are not yet providing an actual API for accessing this information.
Are the APIs above ("action=query&prop=revisions" and
"actioon=query&list=recentchanges") the supported way to retrieve wikidata.org in realtime?
I suspect this is your best bet for now. We have a mechanism for change propagation to mirrors, though right now the only implementation on top of this that we have is WMF specific. Volunteers and third parties can however create their own implementation suitable for non-WMF use.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
There is a site on metawiki about the json format which is not up to date (phase 2). Maybe this is what you are looking for: https://meta.wikimedia.org/wiki/Wikidata/Data_model_in_JSON
Severin Wünsch
On Mon, Mar 4, 2013 at 6:25 PM, Jeroen De Dauw jeroendedauw@gmail.comwrote:
Hey,
There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is
there
a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
You cannot infer the language from the site identifier. "enwiki" is a site identifier. The software allows having multiple sites for the same language. For instance you could have an entity that is also described on the English Wikitionary. Or an entity described on a third party website as well, such as a movie on imdb. Unfortunately it looks like we are not yet providing an actual API for accessing this information.
Are the APIs above ("action=query&prop=revisions" and
"actioon=query&list=recentchanges") the supported way to retrieve wikidata.org in realtime?
I suspect this is your best bet for now. We have a mechanism for change propagation to mirrors, though right now the only implementation on top of this that we have is WMF specific. Volunteers and third parties can however create their own implementation suitable for non-WMF use.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Thanks Jeroen for your confirmation.
In another maillist, Yuri pointed me Wikidata API roadmaphttps://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API. I don't know with those changes, if my current use ("action=query&prop=revisions" and "action=query&list=recentchanges") will be supported?
Some more background: We are already using same way (polling recentchanges, and querying latest revisions) to maintain our internal mirror of *.wikipedia.org, *. wiktionary.org in sync. We expect to have similar (same) way to to maintain wikidata.org. According to my experiment, it works fine so far.
On Tue, Mar 5, 2013 at 1:25 AM, Jeroen De Dauw jeroendedauw@gmail.comwrote:
Hey,
There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is
there
a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
You cannot infer the language from the site identifier. "enwiki" is a site identifier. The software allows having multiple sites for the same language. For instance you could have an entity that is also described on the English Wikitionary. Or an entity described on a third party website as well, such as a movie on imdb. Unfortunately it looks like we are not yet providing an actual API for accessing this information.
@Yuri, do you have plan to implement some API to return such mapping?
Are the APIs above ("action=query&prop=revisions" and
"actioon=query&list=recentchanges") the supported way to retrieve wikidata.org in realtime?
I suspect this is your best bet for now. We have a mechanism for change propagation to mirrors, though right now the only implementation on top of this that we have is WMF specific. Volunteers and third parties can however create their own implementation suitable for non-WMF use.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
In another maillist, Yuri pointed me Wikidata API roadmaphttps://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API. I don't know with those changes, if my current use ("action=query&prop=revisions" and "action=query&list=recentchanges") will be supported?
We don't plan to break stuff just for the sake of breaking it :) Will try
to keep everything backwards compatible for some time, with the exception of any security or major performance issues we might discover.
Some more background:
We are already using same way (polling recentchanges, and querying latest revisions) to maintain our internal mirror of *.wikipedia.org, *. wiktionary.org in sync. We expect to have similar (same) way to to maintain wikidata.org. According to my experiment, it works fine so far.
To be discussed of the list.
On Tue, Mar 5, 2013 at 1:25 AM, Jeroen De Dauw jeroendedauw@gmail.comwrote:
Hey,
There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is
there
a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
You cannot infer the language from the site identifier. "enwiki" is a site identifier. The software allows having multiple sites for the same language. For instance you could have an entity that is also described on the English Wikitionary. Or an entity described on a third party website as well, such as a movie on imdb. Unfortunately it looks like we are not yet providing an actual API for accessing this information.
@Yuri, do you have plan to implement some API to return such mapping?
I personally wasn't planning on doing it just yet, but it should be done as part of our migration from the interwiki sql table to the sites one (at least that's what i have been told). For now, if you run http://en.wikipedia.org/w/api.php?action=sitematrix the dbname seems to match the site most of the time if not always.
The same processes that keep *.wikipedia.org up to date *should* work with *.wikidata.org -- eventually, it is the same software underneath.
The API is not going to change within the close future. In mid-term, I expect the Wikidata APIs to evolve together with the underlying Mediawiki API, but we will not devote resources to this in this month. Currently, the last month of the initial development has started, and we need to finish a few other things first.
And as Yuri says, the changes to the API are planned to be backward compatible, anyway.
So, in short, for real time data, the API is the right approach, as described here: http://www.wikidata.org/w/api.php For keeping a mirror, the methods used for Wikipedia should be applicable to Wikidata too.
Cheers, Denny
2013/3/5 Jiang BIAN bianjiang@google.com
Thanks Jeroen for your confirmation.
In another maillist, Yuri pointed me Wikidata API roadmaphttps://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API. I don't know with those changes, if my current use ("action=query&prop=revisions" and "action=query&list=recentchanges") will be supported?
Some more background: We are already using same way (polling recentchanges, and querying latest revisions) to maintain our internal mirror of *.wikipedia.org, *. wiktionary.org in sync. We expect to have similar (same) way to to maintain wikidata.org. According to my experiment, it works fine so far.
On Tue, Mar 5, 2013 at 1:25 AM, Jeroen De Dauw jeroendedauw@gmail.comwrote:
Hey,
There are some ids like "dewiki", "enwiki" etc, which I guess can be interpreted to corresponding languages "de", "en" respectively. But is
there
a reliable map from these *wiki to the language code? And some are even using 3-letter prefix, e.g. gotwiki, xmfwiki.
You cannot infer the language from the site identifier. "enwiki" is a site identifier. The software allows having multiple sites for the same language. For instance you could have an entity that is also described on the English Wikitionary. Or an entity described on a third party website as well, such as a movie on imdb. Unfortunately it looks like we are not yet providing an actual API for accessing this information.
@Yuri, do you have plan to implement some API to return such mapping?
Are the APIs above ("action=query&prop=revisions" and
"actioon=query&list=recentchanges") the supported way to retrieve wikidata.org in realtime?
I suspect this is your best bet for now. We have a mechanism for change propagation to mirrors, though right now the only implementation on top of this that we have is WMF specific. Volunteers and third parties can however create their own implementation suitable for non-WMF use.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Jiang BIAN
This email may be confidential or privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it went to the wrong person. Thanks.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Thanks all your help!
On Tue, Mar 5, 2013 at 10:02 PM, P. Blissenbach publi@web.de wrote:
"Andrew Gray" andrew.gray@dunelm.org.uk wrote
XXwiki or XXXwiki will always refer to the XX or XXX language Wikipedia using standard ISO 639-1 or 639-2 codes; there are a couple of exceptions, such as simplewiki, but anything with two or three characters should be reliable.
Replace ISO 639-2 above with ISO 639-3. We are not using ISO 639-2 codes any more (unless they coincide with ISO 639-3 ones, which happens, but not always)
Exceptions from the ISO 639-1 or 639-3 rule can be found at:
http://meta.wikimedia.org/wiki/List_of_Wikipedias#Nonstandard_language_codes There are 6 of 14 ones which could be easily avoided as of today, mostly since new language codes have been added to the ISO 639-3 list. Few more will likely or certaily follow, but 3 or 4 codes could remain that will not as easily be done away with.
Purodha
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l