) than from the API.
--
Mark
On Wed, Feb 4, 2015 at 4:55 PM, Stefan Kasberger
<mail(a)stefankasberger.at> wrote:
i look for a dataset with some specific characteristics. revision number is one, cause
articles with low revisions dont create enough metrics for our algorithm and ones with too
much take very long time (network effects). so it would be helpful to save the time to
download lot of xml, compute needed metrics and select it locally.
anyway, i would suggest to generete this metadata when a new revision is created. just
one counting variable and way easier to offer afterwards.
the strong point i want to make is: this is central metadata of the article, like size,
number of characters, date created, urls, page ids, human-readable titles and computable
titles of both, article and talk page.
another point I have some troubles now is that for example when you output the page in a
query, you use the human readable title as an variable for the article. the page-id or the
computable title (dont know how to call it, the one used in the url, i.e. Barack_Obama,
not Barack Obama) would be better to use as a key. i. e. got a problem in creating files
with the actual variable (had now the problem with the HIV/AIDS page, when python looked
for a folder HIV where it wanted to create a page AIDS in) and also to address other apis
or services with computer is more direct with it. i use the api for example to select my
data and get it then from the export special page.
thanks for your answers!
cheers, stefan
On 2015-02-04 23:10, John wrote:
This type of data is very expensive to generate. If you can provide some more context of
that you are trying to do I might be able to provide some help
On Wednesday, February 4, 2015, Stefan Kasberger <mail(a)stefankasberger.at> wrote:
Hello,
I try to get the number of revisions back for some articles, but I don't find any
query where this will be offered over the API. only found this answer at stackoverflow.
http://stackoverflow.com/questions/7136343/wikipedia-api-how-to-get-the-num…
is this still unsolved? would save me lot of time and I think this is one of the most
important metadata about an article. I will use it to download just articles between 500
and 5000 revisions, cause lower is useless for our research and more is too expensive to
compute.
thanks for your answer.
cheers, Stefan
--
Stefan Kasberger
E mail(a)stefankasberger.at
W
www.openscienceASAP.org
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
--
Stefan Kasberger
E mail(a)stefankasberger.at
W
www.openscienceASAP.org
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api