i look for a dataset with some specific characteristics. revision number
is one, cause articles with low revisions dont create enough metrics for
our algorithm and ones with too much take very long time (network
effects). so it would be helpful to save the time to download lot of
xml, compute needed metrics and select it locally.
anyway, i would suggest to generete this metadata when a new revision is
created. just one counting variable and way easier to offer afterwards.
the strong point i want to make is: this is central metadata of the
article, like size, number of characters, date created, urls, page ids,
human-readable titles and computable titles of both, article and talk page.
another point I have some troubles now is that for example when you
output the page in a query, you use the human readable title as an
variable for the article. the page-id or the computable title (dont know
how to call it, the one used in the url, i.e. Barack_Obama, not Barack
Obama) would be better to use as a key. i. e. got a problem in creating
files with the actual variable (had now the problem with the HIV/AIDS
page, when python looked for a folder HIV where it wanted to create a
page AIDS in) and also to address other apis or services with computer
is more direct with it. i use the api for example to select my data and
get it then from the export special page.
thanks for your answers!
cheers, stefan
On 2015-02-04 23:10, John wrote:
This type of data is very expensive to generate. If
you can provide
some more context of that you are trying to do I might be able to
provide some help
On Wednesday, February 4, 2015, Stefan Kasberger
<mail(a)stefankasberger.at <mailto:mail@stefankasberger.at>> wrote:
Hello,
I try to get the number of revisions back for some articles, but I
don't find any query where this will be offered over the API. only
found this answer at stackoverflow.
http://stackoverflow.com/questions/7136343/wikipedia-api-how-to-get-the-num…
is this still unsolved? would save me lot of time and I think this
is one of the most important metadata about an article. I will use
it to download just articles between 500 and 5000 revisions, cause
lower is useless for our research and more is too expensive to
compute.
thanks for your answer.
cheers, Stefan
--
*Stefan Kasberger*
*E* mail(a)stefankasberger.at
<javascript:_e(%7B%7D,'cvml','mail@stefankasberger.at');>
*W*
www.openscienceASAP.org <http://www.openscienceASAP.org>
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
--
*Stefan Kasberger*
*E* mail(a)stefankasberger.at
*W*
www.openscienceASAP.org