[Labs-l] Labs-l Digest, Vol 39, Issue 1

Marc Miquel marcmiquel at gmail.com
Mon Mar 2 23:04:07 UTC 2015


Thanks for your answers guys! :) I'll be looking forward to the
improvements on geodata.

Cheers,

Marc
ᐧ

2015-03-02 23:52 GMT+01:00 <labs-l-request at lists.wikimedia.org>:

> Send Labs-l mailing list submissions to
>         labs-l at lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.wikimedia.org/mailman/listinfo/labs-l
> or, via email, send a message with subject or body 'help' to
>         labs-l-request at lists.wikimedia.org
>
> You can reach the person managing the list at
>         labs-l-owner at lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Labs-l digest..."
>
>
> Today's Topics:
>
>    1. doubt on GeoData / how to obtain articles with coords
>       (Marc Miquel)
>    2. Re: [Analytics] doubt on GeoData / how to obtain articles
>       with coords (Marc Miquel)
>    3. Re: [Analytics] doubt on GeoData / how to obtain articles
>       with coords (Gerard Meijssen)
>    4. Re: [Analytics] doubt on GeoData / how to obtain articles
>       with coords (Bryan White)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 2 Mar 2015 23:33:18 +0100
> From: Marc Miquel <marcmiquel at gmail.com>
> To: labs-l at lists.wikimedia.org, analytics at lists.wikimedia.org
> Subject: [Labs-l] doubt on GeoData / how to obtain articles with
>         coords
> Message-ID:
>         <CANSEGinkmuzN3nDmWmWm73AmiGTFyKGjDWeMinQ=
> 17Sbv9xnXw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi guys,
>
> I am doing some research and I struggling a bit to obtain geolocalized
> articles in several languages. They told me that the best tool to obtain
> the geolocalization for each article would be GeoData API. But I see there
> I need to introduce each article name and I don't know if it is the best
> way.
>
> I am thinking for instance that for big wikipedies like French or German I
> might need to make a million queries to get only those with coords... Also,
> I would like to obtain the region according to ISO 3166-2 which seems to be
> there.
>
> My objective is to obtain different lists of articles related to countries
> and regions.
>
> I don't know if using WikiData with python would be a better option. But I
> see that there there isn't the region. Maybe I could combine WikiData and
> some other tool to give me the region.
> Anyone could help me?
>
> Thanks a lot.
>
> Marc Miquel
>> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/9ed9822c/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 2 Mar 2015 23:42:44 +0100
> From: Marc Miquel <marcmiquel at gmail.com>
> To: "A mailing list for the Analytics Team at WMF and everybody who
>         has an  interest in Wikipedia and analytics."
>         <analytics at lists.wikimedia.org>
> Cc: "labs-l at lists.wikimedia.org" <labs-l at lists.wikimedia.org>
> Subject: Re: [Labs-l] [Analytics] doubt on GeoData / how to obtain
>         articles        with coords
> Message-ID:
>         <CANSEGink9qxRNvzyqrUX4gncX6CJcttqM7eKAcxk=
> mzSSoiXKw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Max and Oliver,
>
> Thanks for your answers. geo_tags table seems quite uncomplete. I just
> checked some random articles in for instance Nepali Wikipedia, for its
> Capital Katmandú there is coords in the real article but it doesn't appear
> in geo_tags. Then it doesn't seem an option.
>
> Marc
>>
> 2015-03-02 23:38 GMT+01:00 Oliver Keyes <okeyes at wikimedia.org>:
>
> > Max's idea is an improvement but still a lot of requests. We really need
> > to start generating these dumps :(.
> >
> > Until the dumps are available, the fastest way to do it is probably
> Quarry
> > (http://quarry.wmflabs.org/) an open MySQL client to our public database
> > tables. So, you want the geo_tags table; getting all the coordinate sets
> on
> > the English-language Wikipedia would be something like:
> >
> > SELECT * FROM enwiki_p.geo_tags;
> >
> > This should be available for all of our production wikis (SHOW DATABASES
> > is your friend): you want [project]_p rather than [project]. Hope that
> > helps!
> >
> > On 2 March 2015 at 17:35, Max Semenik <maxsem.wiki at gmail.com> wrote:
> >
> >> Use generators:
> >>
> api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
> >>
> >> On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel <marcmiquel at gmail.com>
> wrote:
> >>
> >>> Hi guys,
> >>>
> >>> I am doing some research and I struggling a bit to obtain geolocalized
> >>> articles in several languages. They told me that the best tool to
> obtain
> >>> the geolocalization for each article would be GeoData API. But I see
> there
> >>> I need to introduce each article name and I don't know if it is the
> best
> >>> way.
> >>>
> >>> I am thinking for instance that for big wikipedies like French or
> German
> >>> I might need to make a million queries to get only those with coords...
> >>> Also, I would like to obtain the region according to ISO 3166-2 which
> seems
> >>> to be there.
> >>>
> >>> My objective is to obtain different lists of articles related to
> >>> countries and regions.
> >>>
> >>> I don't know if using WikiData with python would be a better option.
> But
> >>> I see that there there isn't the region. Maybe I could combine
> WikiData and
> >>> some other tool to give me the region.
> >>> Anyone could help me?
> >>>
> >>> Thanks a lot.
> >>>
> >>> Marc Miquel
> >>> ᐧ
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics at lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>>
> >>
> >>
> >> --
> >> Best regards,
> >> Max Semenik ([[User:MaxSem]])
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics at lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >>
> >
> >
> > --
> > Oliver Keyes
> > Research Analyst
> > Wikimedia Foundation
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics at lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/cf591d4a/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Mon, 2 Mar 2015 23:47:08 +0100
> From: Gerard Meijssen <gerard.meijssen at gmail.com>
> To: Wikimedia Labs <labs-l at lists.wikimedia.org>
> Subject: Re: [Labs-l] [Analytics] doubt on GeoData / how to obtain
>         articles with coords
> Message-ID:
>         <CAO53wxV3+q_b8RXe-2z4D8tEZz6yCbLeamBxt9k=+
> Sy1bonSnA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hoi,
> What is the point.. Harvest jobs have been run on many Wiikipedias and the
> result ended up in Wikidata.. Is this enough or does the data need to be in
> the text for a language text as well ?
>
> When you run a job querying Wikipedias have the result end up in Wikidata
> as well.. It allows people stand on the shoulders of giants..
> Thanks,
>       GerardM
>
> On 2 March 2015 at 23:42, Marc Miquel <marcmiquel at gmail.com> wrote:
>
> > Hi Max and Oliver,
> >
> > Thanks for your answers. geo_tags table seems quite uncomplete. I just
> > checked some random articles in for instance Nepali Wikipedia, for its
> > Capital Katmandú there is coords in the real article but it doesn't
> appear
> > in geo_tags. Then it doesn't seem an option.
> >
> > Marc
> > ᐧ
> >
> > 2015-03-02 23:38 GMT+01:00 Oliver Keyes <okeyes at wikimedia.org>:
> >
> >> Max's idea is an improvement but still a lot of requests. We really need
> >> to start generating these dumps :(.
> >>
> >> Until the dumps are available, the fastest way to do it is probably
> >> Quarry (http://quarry.wmflabs.org/) an open MySQL client to our public
> >> database tables. So, you want the geo_tags table; getting all the
> >> coordinate sets on the English-language Wikipedia would be something
> like:
> >>
> >> SELECT * FROM enwiki_p.geo_tags;
> >>
> >> This should be available for all of our production wikis (SHOW DATABASES
> >> is your friend): you want [project]_p rather than [project]. Hope that
> >> helps!
> >>
> >> On 2 March 2015 at 17:35, Max Semenik <maxsem.wiki at gmail.com> wrote:
> >>
> >>> Use generators:
> >>>
> api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
> >>>
> >>> On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel <marcmiquel at gmail.com>
> >>> wrote:
> >>>
> >>>> Hi guys,
> >>>>
> >>>> I am doing some research and I struggling a bit to obtain geolocalized
> >>>> articles in several languages. They told me that the best tool to
> obtain
> >>>> the geolocalization for each article would be GeoData API. But I see
> there
> >>>> I need to introduce each article name and I don't know if it is the
> best
> >>>> way.
> >>>>
> >>>> I am thinking for instance that for big wikipedies like French or
> >>>> German I might need to make a million queries to get only those with
> >>>> coords... Also, I would like to obtain the region according to ISO
> 3166-2
> >>>> which seems to be there.
> >>>>
> >>>> My objective is to obtain different lists of articles related to
> >>>> countries and regions.
> >>>>
> >>>> I don't know if using WikiData with python would be a better option.
> >>>> But I see that there there isn't the region. Maybe I could combine
> WikiData
> >>>> and some other tool to give me the region.
> >>>> Anyone could help me?
> >>>>
> >>>> Thanks a lot.
> >>>>
> >>>> Marc Miquel
> >>>> ᐧ
> >>>>
> >>>> _______________________________________________
> >>>> Analytics mailing list
> >>>> Analytics at lists.wikimedia.org
> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Max Semenik ([[User:MaxSem]])
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics at lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Research Analyst
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics at lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >>
> >
> > _______________________________________________
> > Labs-l mailing list
> > Labs-l at lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/labs-l
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/a97a6f4c/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Mon, 2 Mar 2015 15:52:00 -0700
> From: Bryan White <bgwhite at gmail.com>
> To: Wikimedia Labs <labs-l at lists.wikimedia.org>
> Subject: Re: [Labs-l] [Analytics] doubt on GeoData / how to obtain
>         articles with coords
> Message-ID:
>         <CADtx0sducCh=
> bxO_QkL3xuXo-Ke2XgcykEbBun7rp8gyci8BgA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Marc,
>
> If anybody would know, it would be Kolossos.  He is one of the people
> responsible for geohack, integration with OpenStreetmap and other
> geographical referencing doohickeys.
>
> He is more active on the German site, see
> https://de.wikipedia.org/wiki/Benutzer:Kolossos.   His email link is there
> or you can search thru this email list for it.
>
> Bryan
>
> On Mon, Mar 2, 2015 at 3:42 PM, Marc Miquel <marcmiquel at gmail.com> wrote:
>
> > Hi Max and Oliver,
> >
> > Thanks for your answers. geo_tags table seems quite uncomplete. I just
> > checked some random articles in for instance Nepali Wikipedia, for its
> > Capital Katmandú there is coords in the real article but it doesn't
> appear
> > in geo_tags. Then it doesn't seem an option.
> >
> > Marc
> > ᐧ
> >
> > 2015-03-02 23:38 GMT+01:00 Oliver Keyes <okeyes at wikimedia.org>:
> >
> >> Max's idea is an improvement but still a lot of requests. We really need
> >> to start generating these dumps :(.
> >>
> >> Until the dumps are available, the fastest way to do it is probably
> >> Quarry (http://quarry.wmflabs.org/) an open MySQL client to our public
> >> database tables. So, you want the geo_tags table; getting all the
> >> coordinate sets on the English-language Wikipedia would be something
> like:
> >>
> >> SELECT * FROM enwiki_p.geo_tags;
> >>
> >> This should be available for all of our production wikis (SHOW DATABASES
> >> is your friend): you want [project]_p rather than [project]. Hope that
> >> helps!
> >>
> >> On 2 March 2015 at 17:35, Max Semenik <maxsem.wiki at gmail.com> wrote:
> >>
> >>> Use generators:
> >>>
> api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
> >>>
> >>> On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel <marcmiquel at gmail.com>
> >>> wrote:
> >>>
> >>>> Hi guys,
> >>>>
> >>>> I am doing some research and I struggling a bit to obtain geolocalized
> >>>> articles in several languages. They told me that the best tool to
> obtain
> >>>> the geolocalization for each article would be GeoData API. But I see
> there
> >>>> I need to introduce each article name and I don't know if it is the
> best
> >>>> way.
> >>>>
> >>>> I am thinking for instance that for big wikipedies like French or
> >>>> German I might need to make a million queries to get only those with
> >>>> coords... Also, I would like to obtain the region according to ISO
> 3166-2
> >>>> which seems to be there.
> >>>>
> >>>> My objective is to obtain different lists of articles related to
> >>>> countries and regions.
> >>>>
> >>>> I don't know if using WikiData with python would be a better option.
> >>>> But I see that there there isn't the region. Maybe I could combine
> WikiData
> >>>> and some other tool to give me the region.
> >>>> Anyone could help me?
> >>>>
> >>>> Thanks a lot.
> >>>>
> >>>> Marc Miquel
> >>>> ᐧ
> >>>>
> >>>> _______________________________________________
> >>>> Analytics mailing list
> >>>> Analytics at lists.wikimedia.org
> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Max Semenik ([[User:MaxSem]])
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics at lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Research Analyst
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics at lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >>
> >
> > _______________________________________________
> > Labs-l mailing list
> > Labs-l at lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/labs-l
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.wikimedia.org/pipermail/labs-l/attachments/20150302/33159725/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
> End of Labs-l Digest, Vol 39, Issue 1
> *************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150303/f8d49853/attachment-0001.html>


More information about the Labs-l mailing list