Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
Hi Reem,
If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "wikibase:mwapi" a lot, maybe it has or will changed).
Cheers, Nicolas
Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards,Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#m_5550320304718012392_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages).
SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf https://en.wikipedia.org/; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. }
Try it! <https://query.wikidata.org/embed.html#SELECT %3Fitem %3FtitleEn WHERE {%0A%20 hint%3AQuery hint%3Aoptimizer "None".%0A%20 SERVICE wikibase%3Amwapi {%0A%20%20%20 bd%3AserviceParam wikibase%3Aendpoint "en.wikipedia.org"%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 wikibase%3Aapi "Generator"%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 mwapi%3Agenerator "querypage"%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 mwapi%3Agqppage "Longpages"%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 mwapi%3Agqplimit "max".%0A%20%20%20 %3Ftitle wikibase%3AapiOutput mwapi%3Atitle.%0A%20 }%0A%20 BIND(STRLANG(%3Ftitle%2C "en") AS %3FtitleEn)%0A%20 %3Fsitelink schema%3Aname %3FtitleEn%3B%0A%20%20%20%20%20%20%20%20%20%20%20 schema%3AisPartOf <https%3A%2F%2Fen.wikipedia.org%2F>%3B%0A%20%20%20%20%20%20%20%20%20%20%20 schema%3Aabout %3Fitem.%0A%20 %3Fitem wdt%3AP31 wd%3AQ5%3B%0A%20%20%20%20%20%20%20 wdt%3AP106 wd%3AQ36180%3B%0A%20%20%20%20%20%20%20 wdt%3AP21 wd%3AQ6581097.%0A}>
Cheers, Lucas
On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem,
If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "|wikibase:mwapi|" a lot, maybe it has or will changed).
Cheers, Nicolas
Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif <reemalkashif@gmail.com mailto:reemalkashif@gmail.com> a écrit :
Hello! Hope this finds you well. I put together a query <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D> to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes? I understand that this is better done by PetScan, but my PetScan query <https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit=> refuses to cooperate for a reason I don't know yet.. :/ Thanks in advance. Best, Reem -- *Kind regards, Reem Al-Kashif* <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thank you so much, Nicolas & Lucas!
@Lucas this helps a lot! At least I will get an idea about what I need until PetScan is sorted out. Would you elaborate a bit more what do you mean by "most of its results are linked to items we don’t care about"?
Best, Reem
On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages).
SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf https://en.wikipedia.org/ https://en.wikipedia.org/; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. }
Try it!
Cheers, Lucas On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem,
If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "wikibase:mwapi" a lot, maybe it has or will changed).
Cheers, Nicolas
Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards, Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Well, if you take just the MWAPI part of the query https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D, you’ll get exactly 10000 results, but most of them aren’t male authors (a lot of them seem to be lists of various kinds). And I think those 10000 results are all we can get from the API, so if we limit those to male authors afterwards, we only get a few results (about 100), and there’s no way to increase that number as far as I’m aware, because apparently we can’t get more than 10000 total pages from MWAPI.
Cheers, Lucas
On 12.01.19 13:57, Reem Al-Kashif wrote:
Thank you so much, Nicolas & Lucas!
@Lucas this helps a lot! At least I will get an idea about what I need until PetScan is sorted out. Would you elaborate a bit more what do you mean by "most of its results are linked to items we don’t care about"?
Best, Reem
On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister <mail@lucaswerkmeister.de mailto:mail@lucaswerkmeister.de> wrote:
You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages). SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org <http://en.wikipedia.org>"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf <https://en.wikipedia.org/> <https://en.wikipedia.org/>; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. } Try it! Cheers, Lucas On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem, If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "|wikibase:mwapi|" a lot, maybe it has or will changed). Cheers, Nicolas Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif <reemalkashif@gmail.com <mailto:reemalkashif@gmail.com>> a écrit : Hello! Hope this finds you well. I put together a query <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D> to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes? I understand that this is better done by PetScan, but my PetScan query <https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit=> refuses to cooperate for a reason I don't know yet.. :/ Thanks in advance. Best, Reem -- *Kind regards, Reem Al-Kashif* <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- *Kind regards, Reem Al-Kashif*
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Right, I see what you mean. Thanks a lot!
On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
Well, if you take just the MWAPI part of the query https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D, you’ll get exactly 10000 results, but most of them aren’t male authors (a lot of them seem to be lists of various kinds). And I think those 10000 results are all we can get from the API, so if we limit those to male authors afterwards, we only get a few results (about 100), and there’s no way to increase that number as far as I’m aware, because apparently we can’t get more than 10000 total pages from MWAPI.
Cheers, Lucas On 12.01.19 13:57, Reem Al-Kashif wrote:
Thank you so much, Nicolas & Lucas!
@Lucas this helps a lot! At least I will get an idea about what I need until PetScan is sorted out. Would you elaborate a bit more what do you mean by "most of its results are linked to items we don’t care about"?
Best, Reem
On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages).
SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf https://en.wikipedia.org/ https://en.wikipedia.org/; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. }
Try it!
Cheers, Lucas On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem,
If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "wikibase:mwapi" a lot, maybe it has or will changed).
Cheers, Nicolas
Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards, Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Kind regards, Reem Al-Kashif*
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi,
Since the Mediawiki API allows to get the size in bytes of the last revision https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barack%20obama&prop=revisions&rvprop=size of a Wikipedia page, is it not possible to retrieve this information with a generator? (it's a real question, I'm not at all comfortable with this API).
Ettore Rizza
Le sam. 12 janv. 2019 à 14:41, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Right, I see what you mean. Thanks a lot!
On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
Well, if you take just the MWAPI part of the query https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D, you’ll get exactly 10000 results, but most of them aren’t male authors (a lot of them seem to be lists of various kinds). And I think those 10000 results are all we can get from the API, so if we limit those to male authors afterwards, we only get a few results (about 100), and there’s no way to increase that number as far as I’m aware, because apparently we can’t get more than 10000 total pages from MWAPI.
Cheers, Lucas On 12.01.19 13:57, Reem Al-Kashif wrote:
Thank you so much, Nicolas & Lucas!
@Lucas this helps a lot! At least I will get an idea about what I need until PetScan is sorted out. Would you elaborate a bit more what do you mean by "most of its results are linked to items we don’t care about"?
Best, Reem
On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages).
SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf https://en.wikipedia.org/ https://en.wikipedia.org/; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. }
Try it!
Cheers, Lucas On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem,
If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "wikibase:mwapi" a lot, maybe it has or will changed).
Cheers, Nicolas
Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards, Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Kind regards, Reem Al-Kashif*
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Kind regards,Reem Al-Kashif* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Le sam. 12 janv. 2019 à 15:01, Ettore RIZZA ettorerizza@gmail.com a écrit :
Hi,
Since the Mediawiki API allows to get the size in bytes of the last revision https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barack%20obama&prop=revisions&rvprop=size of a Wikipedia page, is it not possible to retrieve this information with a generator? (it's a real question, I'm not at all comfortable with this API).
Ettore Rizza
If I understand things right (I'm not really comfortable either with this API), all the API is not implemented (yet?) in the Wikidata Query Service, only some basic functionalities.
Cheers, ~nicolas
SELECT ?item ?titleEn WITH { SELECT ?item WHERE { ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097; wikibase:sitelinks ?sitelinks. } # ORDER BY DESC(?sitelinks) LIMIT 50 } AS %maleAuthors WHERE { INCLUDE %maleAuthors. hint:SubQuery hint:optimizer "None". ?article schema:about ?item; schema:isPartOf https://en.wikipedia.org/; schema:name ?titleEn. BIND(STR(?titleEn) AS ?title) SERVICE wikibase:mwapi { bd:serviceParam wikibase:api "Generator"; wikibase:endpoint "en.wikipedia.org"; mwapi:generator "allpages"; mwapi:gapfrom ?title; mwapi:gapminsize "10000"; mwapi:gaplimit "1"; wikibase:limit 1 . ?item_ wikibase:apiOutputItem mwapi:item. } FILTER(?item = ?item_) } LIMIT 50
Conveniently, it has a minimum size parameter built in, so we don’t even need to get the size as a revision property and filter on it afterwards.
However, this requires one API call per item, so it doesn’t scale at all – this query with just 50 arbitrary author items already takes about half a minute. (The commented-out ORDER BY DESC(?sitelinks) is intended as a heuristic to find larger articles first, but all the top 50 authors by sitelinks have articles longer than 10000 bytes on enwiki, so in that case you might as well just skip the MWAPI part altogether.)
I don’t think this can work very well. Even if MWAPI was expanded so that we could directly feed 50 or even 500 titles to the query API (as the titles parameter, skipping generators altogether), that’s probably still too much of a bottleneck for this kind of query.
On 12.01.19 15:00, Ettore RIZZA wrote:
Hi,
Since the Mediawiki API allows to get the size in bytes of the last revision https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barack%20obama&prop=revisions&rvprop=size of a Wikipedia page, is it not possible to retrieve this information with a generator? (it's a real question, I'm not at all comfortable with this API).
Ettore Rizza
Le sam. 12 janv. 2019 à 14:41, Reem Al-Kashif <reemalkashif@gmail.com mailto:reemalkashif@gmail.com> a écrit :
Right, I see what you mean. Thanks a lot! On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister <mail@lucaswerkmeister.de <mailto:mail@lucaswerkmeister.de>> wrote: Well, if you take just the MWAPI part of the query <https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D>, you’ll get exactly 10000 results, but most of them aren’t male authors (a lot of them seem to be lists of various kinds). And I think those 10000 results are all we can get from the API, so if we limit those to male authors afterwards, we only get a few results (about 100), and there’s no way to increase that number as far as I’m aware, because apparently we can’t get more than 10000 total pages from MWAPI. Cheers, Lucas On 12.01.19 13:57, Reem Al-Kashif wrote:
Thank you so much, Nicolas & Lucas! @Lucas this helps a lot! At least I will get an idea about what I need until PetScan is sorted out. Would you elaborate a bit more what do you mean by "most of its results are linked to items we don’t care about"? Best, Reem On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister <mail@lucaswerkmeister.de <mailto:mail@lucaswerkmeister.de>> wrote: You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages). SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org <http://en.wikipedia.org>"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf <https://en.wikipedia.org/> <https://en.wikipedia.org/>; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. } Try it! Cheers, Lucas On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem, If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "|wikibase:mwapi|" a lot, maybe it has or will changed). Cheers, Nicolas Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif <reemalkashif@gmail.com <mailto:reemalkashif@gmail.com>> a écrit : Hello! Hope this finds you well. I put together a query <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D> to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes? I understand that this is better done by PetScan, but my PetScan query <https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit=> refuses to cooperate for a reason I don't know yet.. :/ Thanks in advance. Best, Reem -- *Kind regards, Reem Al-Kashif* <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata -- *Kind regards, Reem Al-Kashif* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata -- *Kind regards, Reem Al-Kashif* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wow, thank you! It would take me a whole month to write such a query. :-|
Ettore Rizza
Le sam. 12 janv. 2019 à 15:42, Lucas Werkmeister mail@lucaswerkmeister.de a écrit :
SELECT ?item ?titleEn WITH { SELECT ?item WHERE { ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097; wikibase:sitelinks ?sitelinks. } # ORDER BY DESC(?sitelinks) LIMIT 50 } AS %maleAuthors WHERE { INCLUDE %maleAuthors. hint:SubQuery hint:optimizer "None". ?article schema:about ?item; schema:isPartOf https://en.wikipedia.org/ https://en.wikipedia.org/; schema:name ?titleEn. BIND(STR(?titleEn) AS ?title) SERVICE wikibase:mwapi { bd:serviceParam wikibase:api "Generator"; wikibase:endpoint "en.wikipedia.org"; mwapi:generator "allpages"; mwapi:gapfrom ?title; mwapi:gapminsize "10000"; mwapi:gaplimit "1"; wikibase:limit 1 . ?item_ wikibase:apiOutputItem mwapi:item. } FILTER(?item = ?item_) } LIMIT 50
Conveniently, it has a minimum size parameter built in, so we don’t even need to get the size as a revision property and filter on it afterwards.
However, this requires one API call per item, so it doesn’t scale at all – this query with just 50 arbitrary author items already takes about half a minute. (The commented-out ORDER BY DESC(?sitelinks) is intended as a heuristic to find larger articles first, but all the top 50 authors by sitelinks have articles longer than 10000 bytes on enwiki, so in that case you might as well just skip the MWAPI part altogether.)
I don’t think this can work very well. Even if MWAPI was expanded so that we could directly feed 50 or even 500 titles to the query API (as the titles parameter, skipping generators altogether), that’s probably still too much of a bottleneck for this kind of query. On 12.01.19 15:00, Ettore RIZZA wrote:
Hi,
Since the Mediawiki API allows to get the size in bytes of the last revision https://en.wikipedia.org/w/api.php?action=query&format=json&titles=barack%20obama&prop=revisions&rvprop=size of a Wikipedia page, is it not possible to retrieve this information with a generator? (it's a real question, I'm not at all comfortable with this API).
Ettore Rizza
Le sam. 12 janv. 2019 à 14:41, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Right, I see what you mean. Thanks a lot!
On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister mail@lucaswerkmeister.de wrote:
Well, if you take just the MWAPI part of the query https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D, you’ll get exactly 10000 results, but most of them aren’t male authors (a lot of them seem to be lists of various kinds). And I think those 10000 results are all we can get from the API, so if we limit those to male authors afterwards, we only get a few results (about 100), and there’s no way to increase that number as far as I’m aware, because apparently we can’t get more than 10000 total pages from MWAPI.
Cheers, Lucas On 12.01.19 13:57, Reem Al-Kashif wrote:
Thank you so much, Nicolas & Lucas!
@Lucas this helps a lot! At least I will get an idea about what I need until PetScan is sorted out. Would you elaborate a bit more what do you mean by "most of its results are linked to items we don’t care about"?
Best, Reem
On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister < mail@lucaswerkmeister.de> wrote:
You can’t directly query for the size as far as I know, but you can use the longpages query page generator to get a list of the longest enwiki pages, then filter the associated items for male authors. But this will only get you about a hundred results until the longpages list is exhausted (most of its results are linked to items we don’t care about), and it won’t get you the actual size (and therefore the order of results isn’t necessarily meaningful either, you just know they’re among the longest pages).
SELECT ?item ?titleEn WHERE { hint:Query hint:optimizer "None". SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org"; wikibase:api "Generator"; mwapi:generator "querypage"; mwapi:gqppage "Longpages"; mwapi:gqplimit "max". ?title wikibase:apiOutput mwapi:title. } BIND(STRLANG(?title, "en") AS ?titleEn) ?sitelink schema:name ?titleEn; schema:isPartOf https://en.wikipedia.org/ https://en.wikipedia.org/; schema:about ?item. ?item wdt:P31 wd:Q5; wdt:P106 wd:Q36180; wdt:P21 wd:Q6581097. }
Try it!
Cheers, Lucas On 12.01.19 12:56, Nicolas VIGNERON wrote:
Hi Reem,
If this page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI is up-o-date it's does not seem possible to get the article size of a wikipedia article (but I must I don't use and know "wikibase:mwapi" a lot, maybe it has or will changed).
Cheers, Nicolas
Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif reemalkashif@gmail.com a écrit :
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards, Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Kind regards, Reem Al-Kashif*
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Kind regards, Reem Al-Kashif* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Reem!
Your problem seems to me as a more suitable for WM Quarry https://quarry.wmflabs.org/ (some docs https://meta.wikimedia.org/wiki/Research:Quarry), than the WDQS. I've used it just once and some time ago, so I won't be able to help you more, but it shouldn't be complicated in principal to find all articles in en.wiki in a certain category/ies (males, writers) and then sort/filter these by the article length.
Maybe it helps Best regards Jan
On Sat, 12 Jan 2019 at 12:16, Reem Al-Kashif reemalkashif@gmail.com wrote:
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards,Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#m_-7395554735261053608_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Reem,
Going back to the original question, I think your PetScan query is timing out because it doesn't like the depth=50 requirement - because this is a messy category tree it probably starts generating a lot of results at that depth. Remember that categories about individual people often end up in these categories, and any one of those can open up a whole new set of categories - as a result, it would either generate huge numbers of articles, or possibly go into an infinite loop.
If you cut the depth down you can get a decent number of results and apply the size filter.
depth=3 gets ~68k results, of which ~16k are over 10kb in size.
https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=3&a...
Andrew.
On Sat, 12 Jan 2019 at 11:17, Reem Al-Kashif reemalkashif@gmail.com wrote:
Hello!
Hope this finds you well. I put together a query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%7D to create a list of English Wikipedia articles about male writers. Is it possible to filter the results by size? For example, articles that are larger than or equal to 10k bytes?
I understand that this is better done by PetScan, but my PetScan query https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=50&categories=Male%20writers&ns%5B0%5D=1&larger=10000&search_max_results=500&interface_language=en&&doit= refuses to cooperate for a reason I don't know yet.. :/
Thanks in advance.
Best, Reem
--
*Kind regards,Reem Al-Kashif*
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#m_-2239464299791067440_m_-842422966700709082_m_3608676783914821221_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata