<div dir="ltr">Thanks for your answers guys! :) I'll be looking forward to the improvements on geodata.<div><br></div><div>Cheers,<br><div><br></div><div>Marc</div></div><div hspace="streak-pt-mark" style="max-height:1px"><img style="width:0px; max-height:0px;" src="https://mailfoogae.appspot.com/t?sender=abWFyY21pcXVlbEBnbWFpbC5jb20%3D&type=zerocontent&guid=6db6c6c5-409d-4e3c-85c4-974c4c161b24"><font color="#ffffff" size="1">ᐧ</font></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-03-02 23:52 GMT+01:00 <span dir="ltr"><<a href="mailto:labs-l-request@lists.wikimedia.org" target="_blank">labs-l-request@lists.wikimedia.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send Labs-l mailing list submissions to<br>
<a href="mailto:labs-l@lists.wikimedia.org">labs-l@lists.wikimedia.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-l</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:labs-l-request@lists.wikimedia.org">labs-l-request@lists.wikimedia.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:labs-l-owner@lists.wikimedia.org">labs-l-owner@lists.wikimedia.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Labs-l digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. doubt on GeoData / how to obtain articles with coords<br>
(Marc Miquel)<br>
2. Re: [Analytics] doubt on GeoData / how to obtain articles<br>
with coords (Marc Miquel)<br>
3. Re: [Analytics] doubt on GeoData / how to obtain articles<br>
with coords (Gerard Meijssen)<br>
4. Re: [Analytics] doubt on GeoData / how to obtain articles<br>
with coords (Bryan White)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Mon, 2 Mar 2015 23:33:18 +0100<br>
From: Marc Miquel <<a href="mailto:marcmiquel@gmail.com">marcmiquel@gmail.com</a>><br>
To: <a href="mailto:labs-l@lists.wikimedia.org">labs-l@lists.wikimedia.org</a>, <a href="mailto:analytics@lists.wikimedia.org">analytics@lists.wikimedia.org</a><br>
Subject: [Labs-l] doubt on GeoData / how to obtain articles with<br>
coords<br>
Message-ID:<br>
<CANSEGinkmuzN3nDmWmWm73AmiGTFyKGjDWeMinQ=<a href="mailto:17Sbv9xnXw@mail.gmail.com">17Sbv9xnXw@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hi guys,<br>
<br>
I am doing some research and I struggling a bit to obtain geolocalized<br>
articles in several languages. They told me that the best tool to obtain<br>
the geolocalization for each article would be GeoData API. But I see there<br>
I need to introduce each article name and I don't know if it is the best<br>
way.<br>
<br>
I am thinking for instance that for big wikipedies like French or German I<br>
might need to make a million queries to get only those with coords... Also,<br>
I would like to obtain the region according to ISO 3166-2 which seems to be<br>
there.<br>
<br>
My objective is to obtain different lists of articles related to countries<br>
and regions.<br>
<br>
I don't know if using WikiData with python would be a better option. But I<br>
see that there there isn't the region. Maybe I could combine WikiData and<br>
some other tool to give me the region.<br>
Anyone could help me?<br>
<br>
Thanks a lot.<br>
<br>
Marc Miquel<br>
ᐧ<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/9ed9822c/attachment-0001.html" target="_blank">https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/9ed9822c/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Mon, 2 Mar 2015 23:42:44 +0100<br>
From: Marc Miquel <<a href="mailto:marcmiquel@gmail.com">marcmiquel@gmail.com</a>><br>
To: "A mailing list for the Analytics Team at WMF and everybody who<br>
has an interest in Wikipedia and analytics."<br>
<<a href="mailto:analytics@lists.wikimedia.org">analytics@lists.wikimedia.org</a>><br>
Cc: "<a href="mailto:labs-l@lists.wikimedia.org">labs-l@lists.wikimedia.org</a>" <<a href="mailto:labs-l@lists.wikimedia.org">labs-l@lists.wikimedia.org</a>><br>
Subject: Re: [Labs-l] [Analytics] doubt on GeoData / how to obtain<br>
articles with coords<br>
Message-ID:<br>
<CANSEGink9qxRNvzyqrUX4gncX6CJcttqM7eKAcxk=<a href="mailto:mzSSoiXKw@mail.gmail.com">mzSSoiXKw@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hi Max and Oliver,<br>
<br>
Thanks for your answers. geo_tags table seems quite uncomplete. I just<br>
checked some random articles in for instance Nepali Wikipedia, for its<br>
Capital Katmandú there is coords in the real article but it doesn't appear<br>
in geo_tags. Then it doesn't seem an option.<br>
<br>
Marc<br>
ᐧ<br>
<br>
2015-03-02 23:38 GMT+01:00 Oliver Keyes <<a href="mailto:okeyes@wikimedia.org">okeyes@wikimedia.org</a>>:<br>
<br>
> Max's idea is an improvement but still a lot of requests. We really need<br>
> to start generating these dumps :(.<br>
><br>
> Until the dumps are available, the fastest way to do it is probably Quarry<br>
> (<a href="http://quarry.wmflabs.org/" target="_blank">http://quarry.wmflabs.org/</a>) an open MySQL client to our public database<br>
> tables. So, you want the geo_tags table; getting all the coordinate sets on<br>
> the English-language Wikipedia would be something like:<br>
><br>
> SELECT * FROM enwiki_p.geo_tags;<br>
><br>
> This should be available for all of our production wikis (SHOW DATABASES<br>
> is your friend): you want [project]_p rather than [project]. Hope that<br>
> helps!<br>
><br>
> On 2 March 2015 at 17:35, Max Semenik <<a href="mailto:maxsem.wiki@gmail.com">maxsem.wiki@gmail.com</a>> wrote:<br>
><br>
>> Use generators:<br>
>> api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max<br>
>><br>
>> On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel <<a href="mailto:marcmiquel@gmail.com">marcmiquel@gmail.com</a>> wrote:<br>
>><br>
>>> Hi guys,<br>
>>><br>
>>> I am doing some research and I struggling a bit to obtain geolocalized<br>
>>> articles in several languages. They told me that the best tool to obtain<br>
>>> the geolocalization for each article would be GeoData API. But I see there<br>
>>> I need to introduce each article name and I don't know if it is the best<br>
>>> way.<br>
>>><br>
>>> I am thinking for instance that for big wikipedies like French or German<br>
>>> I might need to make a million queries to get only those with coords...<br>
>>> Also, I would like to obtain the region according to ISO 3166-2 which seems<br>
>>> to be there.<br>
>>><br>
>>> My objective is to obtain different lists of articles related to<br>
>>> countries and regions.<br>
>>><br>
>>> I don't know if using WikiData with python would be a better option. But<br>
>>> I see that there there isn't the region. Maybe I could combine WikiData and<br>
>>> some other tool to give me the region.<br>
>>> Anyone could help me?<br>
>>><br>
>>> Thanks a lot.<br>
>>><br>
>>> Marc Miquel<br>
>>> ᐧ<br>
>>><br>
>>> _______________________________________________<br>
>>> Analytics mailing list<br>
>>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>>><br>
>>><br>
>><br>
>><br>
>> --<br>
>> Best regards,<br>
>> Max Semenik ([[User:MaxSem]])<br>
>><br>
>> _______________________________________________<br>
>> Analytics mailing list<br>
>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>><br>
>><br>
><br>
><br>
> --<br>
> Oliver Keyes<br>
> Research Analyst<br>
> Wikimedia Foundation<br>
><br>
> _______________________________________________<br>
> Analytics mailing list<br>
> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/cf591d4a/attachment-0001.html" target="_blank">https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/cf591d4a/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Mon, 2 Mar 2015 23:47:08 +0100<br>
From: Gerard Meijssen <<a href="mailto:gerard.meijssen@gmail.com">gerard.meijssen@gmail.com</a>><br>
To: Wikimedia Labs <<a href="mailto:labs-l@lists.wikimedia.org">labs-l@lists.wikimedia.org</a>><br>
Subject: Re: [Labs-l] [Analytics] doubt on GeoData / how to obtain<br>
articles with coords<br>
Message-ID:<br>
<CAO53wxV3+q_b8RXe-2z4D8tEZz6yCbLeamBxt9k=+<a href="mailto:Sy1bonSnA@mail.gmail.com">Sy1bonSnA@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hoi,<br>
What is the point.. Harvest jobs have been run on many Wiikipedias and the<br>
result ended up in Wikidata.. Is this enough or does the data need to be in<br>
the text for a language text as well ?<br>
<br>
When you run a job querying Wikipedias have the result end up in Wikidata<br>
as well.. It allows people stand on the shoulders of giants..<br>
Thanks,<br>
GerardM<br>
<br>
On 2 March 2015 at 23:42, Marc Miquel <<a href="mailto:marcmiquel@gmail.com">marcmiquel@gmail.com</a>> wrote:<br>
<br>
> Hi Max and Oliver,<br>
><br>
> Thanks for your answers. geo_tags table seems quite uncomplete. I just<br>
> checked some random articles in for instance Nepali Wikipedia, for its<br>
> Capital Katmandú there is coords in the real article but it doesn't appear<br>
> in geo_tags. Then it doesn't seem an option.<br>
><br>
> Marc<br>
> ᐧ<br>
><br>
> 2015-03-02 23:38 GMT+01:00 Oliver Keyes <<a href="mailto:okeyes@wikimedia.org">okeyes@wikimedia.org</a>>:<br>
><br>
>> Max's idea is an improvement but still a lot of requests. We really need<br>
>> to start generating these dumps :(.<br>
>><br>
>> Until the dumps are available, the fastest way to do it is probably<br>
>> Quarry (<a href="http://quarry.wmflabs.org/" target="_blank">http://quarry.wmflabs.org/</a>) an open MySQL client to our public<br>
>> database tables. So, you want the geo_tags table; getting all the<br>
>> coordinate sets on the English-language Wikipedia would be something like:<br>
>><br>
>> SELECT * FROM enwiki_p.geo_tags;<br>
>><br>
>> This should be available for all of our production wikis (SHOW DATABASES<br>
>> is your friend): you want [project]_p rather than [project]. Hope that<br>
>> helps!<br>
>><br>
>> On 2 March 2015 at 17:35, Max Semenik <<a href="mailto:maxsem.wiki@gmail.com">maxsem.wiki@gmail.com</a>> wrote:<br>
>><br>
>>> Use generators:<br>
>>> api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max<br>
>>><br>
>>> On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel <<a href="mailto:marcmiquel@gmail.com">marcmiquel@gmail.com</a>><br>
>>> wrote:<br>
>>><br>
>>>> Hi guys,<br>
>>>><br>
>>>> I am doing some research and I struggling a bit to obtain geolocalized<br>
>>>> articles in several languages. They told me that the best tool to obtain<br>
>>>> the geolocalization for each article would be GeoData API. But I see there<br>
>>>> I need to introduce each article name and I don't know if it is the best<br>
>>>> way.<br>
>>>><br>
>>>> I am thinking for instance that for big wikipedies like French or<br>
>>>> German I might need to make a million queries to get only those with<br>
>>>> coords... Also, I would like to obtain the region according to ISO 3166-2<br>
>>>> which seems to be there.<br>
>>>><br>
>>>> My objective is to obtain different lists of articles related to<br>
>>>> countries and regions.<br>
>>>><br>
>>>> I don't know if using WikiData with python would be a better option.<br>
>>>> But I see that there there isn't the region. Maybe I could combine WikiData<br>
>>>> and some other tool to give me the region.<br>
>>>> Anyone could help me?<br>
>>>><br>
>>>> Thanks a lot.<br>
>>>><br>
>>>> Marc Miquel<br>
>>>> ᐧ<br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Analytics mailing list<br>
>>>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>>>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>>>><br>
>>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Best regards,<br>
>>> Max Semenik ([[User:MaxSem]])<br>
>>><br>
>>> _______________________________________________<br>
>>> Analytics mailing list<br>
>>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>>><br>
>>><br>
>><br>
>><br>
>> --<br>
>> Oliver Keyes<br>
>> Research Analyst<br>
>> Wikimedia Foundation<br>
>><br>
>> _______________________________________________<br>
>> Analytics mailing list<br>
>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>><br>
>><br>
><br>
> _______________________________________________<br>
> Labs-l mailing list<br>
> <a href="mailto:Labs-l@lists.wikimedia.org">Labs-l@lists.wikimedia.org</a><br>
> <a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-l</a><br>
><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/a97a6f4c/attachment-0001.html" target="_blank">https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/a97a6f4c/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Mon, 2 Mar 2015 15:52:00 -0700<br>
From: Bryan White <<a href="mailto:bgwhite@gmail.com">bgwhite@gmail.com</a>><br>
To: Wikimedia Labs <<a href="mailto:labs-l@lists.wikimedia.org">labs-l@lists.wikimedia.org</a>><br>
Subject: Re: [Labs-l] [Analytics] doubt on GeoData / how to obtain<br>
articles with coords<br>
Message-ID:<br>
<CADtx0sducCh=<a href="mailto:bxO_QkL3xuXo-Ke2XgcykEbBun7rp8gyci8BgA@mail.gmail.com">bxO_QkL3xuXo-Ke2XgcykEbBun7rp8gyci8BgA@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Marc,<br>
<br>
If anybody would know, it would be Kolossos. He is one of the people<br>
responsible for geohack, integration with OpenStreetmap and other<br>
geographical referencing doohickeys.<br>
<br>
He is more active on the German site, see<br>
<a href="https://de.wikipedia.org/wiki/Benutzer:Kolossos" target="_blank">https://de.wikipedia.org/wiki/Benutzer:Kolossos</a>. His email link is there<br>
or you can search thru this email list for it.<br>
<br>
Bryan<br>
<br>
On Mon, Mar 2, 2015 at 3:42 PM, Marc Miquel <<a href="mailto:marcmiquel@gmail.com">marcmiquel@gmail.com</a>> wrote:<br>
<br>
> Hi Max and Oliver,<br>
><br>
> Thanks for your answers. geo_tags table seems quite uncomplete. I just<br>
> checked some random articles in for instance Nepali Wikipedia, for its<br>
> Capital Katmandú there is coords in the real article but it doesn't appear<br>
> in geo_tags. Then it doesn't seem an option.<br>
><br>
> Marc<br>
> ᐧ<br>
><br>
> 2015-03-02 23:38 GMT+01:00 Oliver Keyes <<a href="mailto:okeyes@wikimedia.org">okeyes@wikimedia.org</a>>:<br>
><br>
>> Max's idea is an improvement but still a lot of requests. We really need<br>
>> to start generating these dumps :(.<br>
>><br>
>> Until the dumps are available, the fastest way to do it is probably<br>
>> Quarry (<a href="http://quarry.wmflabs.org/" target="_blank">http://quarry.wmflabs.org/</a>) an open MySQL client to our public<br>
>> database tables. So, you want the geo_tags table; getting all the<br>
>> coordinate sets on the English-language Wikipedia would be something like:<br>
>><br>
>> SELECT * FROM enwiki_p.geo_tags;<br>
>><br>
>> This should be available for all of our production wikis (SHOW DATABASES<br>
>> is your friend): you want [project]_p rather than [project]. Hope that<br>
>> helps!<br>
>><br>
>> On 2 March 2015 at 17:35, Max Semenik <<a href="mailto:maxsem.wiki@gmail.com">maxsem.wiki@gmail.com</a>> wrote:<br>
>><br>
>>> Use generators:<br>
>>> api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max<br>
>>><br>
>>> On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel <<a href="mailto:marcmiquel@gmail.com">marcmiquel@gmail.com</a>><br>
>>> wrote:<br>
>>><br>
>>>> Hi guys,<br>
>>>><br>
>>>> I am doing some research and I struggling a bit to obtain geolocalized<br>
>>>> articles in several languages. They told me that the best tool to obtain<br>
>>>> the geolocalization for each article would be GeoData API. But I see there<br>
>>>> I need to introduce each article name and I don't know if it is the best<br>
>>>> way.<br>
>>>><br>
>>>> I am thinking for instance that for big wikipedies like French or<br>
>>>> German I might need to make a million queries to get only those with<br>
>>>> coords... Also, I would like to obtain the region according to ISO 3166-2<br>
>>>> which seems to be there.<br>
>>>><br>
>>>> My objective is to obtain different lists of articles related to<br>
>>>> countries and regions.<br>
>>>><br>
>>>> I don't know if using WikiData with python would be a better option.<br>
>>>> But I see that there there isn't the region. Maybe I could combine WikiData<br>
>>>> and some other tool to give me the region.<br>
>>>> Anyone could help me?<br>
>>>><br>
>>>> Thanks a lot.<br>
>>>><br>
>>>> Marc Miquel<br>
>>>> ᐧ<br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Analytics mailing list<br>
>>>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>>>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>>>><br>
>>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Best regards,<br>
>>> Max Semenik ([[User:MaxSem]])<br>
>>><br>
>>> _______________________________________________<br>
>>> Analytics mailing list<br>
>>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>>><br>
>>><br>
>><br>
>><br>
>> --<br>
>> Oliver Keyes<br>
>> Research Analyst<br>
>> Wikimedia Foundation<br>
>><br>
>> _______________________________________________<br>
>> Analytics mailing list<br>
>> <a href="mailto:Analytics@lists.wikimedia.org">Analytics@lists.wikimedia.org</a><br>
>> <a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
>><br>
>><br>
><br>
> _______________________________________________<br>
> Labs-l mailing list<br>
> <a href="mailto:Labs-l@lists.wikimedia.org">Labs-l@lists.wikimedia.org</a><br>
> <a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-l</a><br>
><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/33159725/attachment.html" target="_blank">https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/33159725/attachment.html</a>><br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-l</a><br>
<br>
<br>
End of Labs-l Digest, Vol 39, Issue 1<br>
*************************************<br>
</blockquote></div><br></div>