Hi guys,
I am doing some research and I struggling a bit to obtain geolocalized articles in several languages. They told me that the best tool to obtain the geolocalization for each article would be GeoData API. But I see there I need to introduce each article name and I don't know if it is the best way.
I am thinking for instance that for big wikipedies like French or German I might need to make a million queries to get only those with coords... Also, I would like to obtain the region according to ISO 3166-2 which seems to be there.
My objective is to obtain different lists of articles related to countries and regions.
I don't know if using WikiData with python would be a better option. But I see that there there isn't the region. Maybe I could combine WikiData and some other tool to give me the region. Anyone could help me?
Thanks a lot.
Marc Miquel ᐧ
Use generators: api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel marcmiquel@gmail.com wrote:
Hi guys,
I am doing some research and I struggling a bit to obtain geolocalized articles in several languages. They told me that the best tool to obtain the geolocalization for each article would be GeoData API. But I see there I need to introduce each article name and I don't know if it is the best way.
I am thinking for instance that for big wikipedies like French or German I might need to make a million queries to get only those with coords... Also, I would like to obtain the region according to ISO 3166-2 which seems to be there.
My objective is to obtain different lists of articles related to countries and regions.
I don't know if using WikiData with python would be a better option. But I see that there there isn't the region. Maybe I could combine WikiData and some other tool to give me the region. Anyone could help me?
Thanks a lot.
Marc Miquel ᐧ
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Max's idea is an improvement but still a lot of requests. We really need to start generating these dumps :(.
Until the dumps are available, the fastest way to do it is probably Quarry ( http://quarry.wmflabs.org/) an open MySQL client to our public database tables. So, you want the geo_tags table; getting all the coordinate sets on the English-language Wikipedia would be something like:
SELECT * FROM enwiki_p.geo_tags;
This should be available for all of our production wikis (SHOW DATABASES is your friend): you want [project]_p rather than [project]. Hope that helps!
On 2 March 2015 at 17:35, Max Semenik maxsem.wiki@gmail.com wrote:
Use generators: api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel marcmiquel@gmail.com wrote:
Hi guys,
I am doing some research and I struggling a bit to obtain geolocalized articles in several languages. They told me that the best tool to obtain the geolocalization for each article would be GeoData API. But I see there I need to introduce each article name and I don't know if it is the best way.
I am thinking for instance that for big wikipedies like French or German I might need to make a million queries to get only those with coords... Also, I would like to obtain the region according to ISO 3166-2 which seems to be there.
My objective is to obtain different lists of articles related to countries and regions.
I don't know if using WikiData with python would be a better option. But I see that there there isn't the region. Maybe I could combine WikiData and some other tool to give me the region. Anyone could help me?
Thanks a lot.
Marc Miquel ᐧ
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Mon, Mar 2, 2015 at 2:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
We really need to start generating these dumps :(
https://phabricator.wikimedia.org/T53225
Hi Max and Oliver,
Thanks for your answers. geo_tags table seems quite uncomplete. I just checked some random articles in for instance Nepali Wikipedia, for its Capital Katmandú there is coords in the real article but it doesn't appear in geo_tags. Then it doesn't seem an option.
Marc ᐧ
2015-03-02 23:38 GMT+01:00 Oliver Keyes okeyes@wikimedia.org:
Max's idea is an improvement but still a lot of requests. We really need to start generating these dumps :(.
Until the dumps are available, the fastest way to do it is probably Quarry (http://quarry.wmflabs.org/) an open MySQL client to our public database tables. So, you want the geo_tags table; getting all the coordinate sets on the English-language Wikipedia would be something like:
SELECT * FROM enwiki_p.geo_tags;
This should be available for all of our production wikis (SHOW DATABASES is your friend): you want [project]_p rather than [project]. Hope that helps!
On 2 March 2015 at 17:35, Max Semenik maxsem.wiki@gmail.com wrote:
Use generators: api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel marcmiquel@gmail.com wrote:
Hi guys,
I am doing some research and I struggling a bit to obtain geolocalized articles in several languages. They told me that the best tool to obtain the geolocalization for each article would be GeoData API. But I see there I need to introduce each article name and I don't know if it is the best way.
I am thinking for instance that for big wikipedies like French or German I might need to make a million queries to get only those with coords... Also, I would like to obtain the region according to ISO 3166-2 which seems to be there.
My objective is to obtain different lists of articles related to countries and regions.
I don't know if using WikiData with python would be a better option. But I see that there there isn't the region. Maybe I could combine WikiData and some other tool to give me the region. Anyone could help me?
Thanks a lot.
Marc Miquel ᐧ
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Not every Wikipedia had integrated their templates with GeoData, therefore to get better results you want to either request coordinates from a major Wikipedia, or use Wikidata (which has its own issues/fun stuff). Bug: https://phabricator.wikimedia.org/T35704
On Mon, Mar 2, 2015 at 2:42 PM, Marc Miquel marcmiquel@gmail.com wrote:
Hi Max and Oliver,
Thanks for your answers. geo_tags table seems quite uncomplete. I just checked some random articles in for instance Nepali Wikipedia, for its Capital Katmandú there is coords in the real article but it doesn't appear in geo_tags. Then it doesn't seem an option.
Marc ᐧ
2015-03-02 23:38 GMT+01:00 Oliver Keyes okeyes@wikimedia.org:
Max's idea is an improvement but still a lot of requests. We really need to start generating these dumps :(.
Until the dumps are available, the fastest way to do it is probably Quarry (http://quarry.wmflabs.org/) an open MySQL client to our public database tables. So, you want the geo_tags table; getting all the coordinate sets on the English-language Wikipedia would be something like:
SELECT * FROM enwiki_p.geo_tags;
This should be available for all of our production wikis (SHOW DATABASES is your friend): you want [project]_p rather than [project]. Hope that helps!
On 2 March 2015 at 17:35, Max Semenik maxsem.wiki@gmail.com wrote:
Use generators: api.php?action=query&generator=allpages&gapnamespace=0&prop=coordinates&gaplimit=max&colimit=max
On Mon, Mar 2, 2015 at 2:33 PM, Marc Miquel marcmiquel@gmail.com wrote:
Hi guys,
I am doing some research and I struggling a bit to obtain geolocalized articles in several languages. They told me that the best tool to obtain the geolocalization for each article would be GeoData API. But I see there I need to introduce each article name and I don't know if it is the best way.
I am thinking for instance that for big wikipedies like French or German I might need to make a million queries to get only those with coords... Also, I would like to obtain the region according to ISO 3166-2 which seems to be there.
My objective is to obtain different lists of articles related to countries and regions.
I don't know if using WikiData with python would be a better option. But I see that there there isn't the region. Maybe I could combine WikiData and some other tool to give me the region. Anyone could help me?
Thanks a lot.
Marc Miquel ᐧ
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Marc,
Marc Miquel schreef op 2-3-2015 om 23:33:
My objective is to obtain different lists of articles related to countries and regions.
Can you elaborate more? What is your real goal? What is your user story? You could probably pull a lot from Wikidata.
Maarten
Hi Maarten,
My goal is to obtain a set of articles with coords for different Wikipedias: knowing their title, coords, country-region. I am doing this because I want to study characteristics of this specific group of articles. This is a research study I want to include in my PhD.
Wikidata is great in many senses but I see that property coords does not appear in all the articles which have coords. Besides, country-region appears but in a different format.
I would like to know up to which point geo_tags is realiable, which wikipedias implement it... (and where is the information in that table from, templates from geodata or where.). This would allow me to take a decision on the list of Wikipedias I want to study based on how update the coords are in WikiData or geo_tags...Of course, as many wikipedias as possible would be the best, from big to small ones.
Another option would be to detect links to Geohack as external links and capture their text to see the country-region, but are they available in the tables?
Thanks so much.
Marc Miquel ᐧ
2015-03-03 17:02 GMT+01:00 Maarten Dammers maarten@mdammers.nl:
Hi Marc,
Marc Miquel schreef op 2-3-2015 om 23:33:
My objective is to obtain different lists of articles related to countries and regions.
Can you elaborate more? What is your real goal? What is your user story? You could probably pull a lot from Wikidata.
Maarten
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics