[Labs-l] [Analytics] doubt on GeoData / how to obtain articles with coords
Gerard Meijssen
gerard.meijssen at gmail.com
Mon Mar 2 22:47:08 UTC 2015
Hoi,
What is the point.. Harvest jobs have been run on many Wiikipedias and the
result ended up in Wikidata.. Is this enough or does the data need to be in
the text for a language text as well ?
When you run a job querying Wikipedias have the result end up in Wikidata
as well.. It allows people stand on the shoulders of giants..
Thanks,
GerardM
On 2 March 2015 at 23:42, Marc Miquel <marcmiquel at gmail.com> wrote:
> Hi Max and Oliver,
>
> Thanks for your answers. geo_tags table seems quite uncomplete. I just
> checked some random articles in for instance Nepali Wikipedia, for its
> Capital Katmandú there is coords in the real article but it doesn't appear
> in geo_tags. Then it doesn't seem an option.
>
> Marc
> ᐧ
>
> 2015-03-02 23:38 GMT+01:00 Oliver Keyes <okeyes at wikimedia.org>:
>
>> Max's idea is an improvement but still a lot of requests. We really need
>> to start generating these dumps :(.
>>
>> Until the dumps are available, the fastest way to do it is probably
>> Quarry (http://quarry.wmflabs.org/) an open MySQL client to our public
>> database tables. So, you want the geo_tags table; getting all the
>> coordinate sets on the English-language Wikipedia would be something like:
>>
>> SELECT * FROM enwiki_p.geo_tags;
>>
>> This should be available for all of our production wikis (SHOW DATABASES
>> is your friend): you want [project]_p rather than [project]. Hope that
>> helps!
>>
>> On 2 March 2015 at 17:35, Max Semenik <maxsem.wiki at gmail.com> wrote:
>>
>>> Use generators:
>>> api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
>>>
>>> On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel <marcmiquel at gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I am doing some research and I struggling a bit to obtain geolocalized
>>>> articles in several languages. They told me that the best tool to obtain
>>>> the geolocalization for each article would be GeoData API. But I see there
>>>> I need to introduce each article name and I don't know if it is the best
>>>> way.
>>>>
>>>> I am thinking for instance that for big wikipedies like French or
>>>> German I might need to make a million queries to get only those with
>>>> coords... Also, I would like to obtain the region according to ISO 3166-2
>>>> which seems to be there.
>>>>
>>>> My objective is to obtain different lists of articles related to
>>>> countries and regions.
>>>>
>>>> I don't know if using WikiData with python would be a better option.
>>>> But I see that there there isn't the region. Maybe I could combine WikiData
>>>> and some other tool to give me the region.
>>>> Anyone could help me?
>>>>
>>>> Thanks a lot.
>>>>
>>>> Marc Miquel
>>>> ᐧ
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Max Semenik ([[User:MaxSem]])
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/a97a6f4c/attachment.html>
More information about the Labs-l
mailing list