Hi,
Since the Mediawiki API allows to get the size in bytes of the last revision https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barack%20obama&prop=revisions&rvprop=size of a Wikipedia page, is it not possible to retrieve this information with a generator? (it's a real question, I'm not at all comfortable with this API).
Ettore Rizza
Le sam. 12 janv. 2019 à 14:41, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Right, I see what you mean. Thanks a lot!
On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
Well, if you take just the MWAPI part of the query https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D, you’ll get exactly 10000 results, but most of them aren’t male authors (a lot of them seem to be lists of various kinds). And I think those 10000 results are all we can get from the API, so if we limit those to male authors afterwards, we only get a few results (about 100), and there’s no way to increase that number as far as I’m aware, because apparently we can’t get more than 10000 total pages from MWAPI.
Cheers, Lucas On 12.01.19 13:57, Reem Al-Kashif wrote:
Thank you so much, Nicolas & Lucas!
@Lucas this helps a lot! At least I will get an idea about what I need until PetScan is sorted out. Would you elaborate a bit more what do you mean by "most of its results are linked to items we don’t care about"?
Best, Reem
On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages).
SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf https://en.wikipedia.org/ https://en.wikipedia.org/; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. }
Try it!
Cheers, Lucas On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem,
If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "wikibase:mwapi" a lot, maybe it has or will changed).
Cheers, Nicolas
Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards, Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Kind regards, Reem Al-Kashif*
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Kind regards,Reem Al-Kashif* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata