This is super cool, thanks for sharing!  Would you mind if I write it up for the Wikidata Query Service docs?

On Mon, Apr 20, 2015 at 3:50 PM, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
On 20.04.2015 23:47, Daniel Kinzler wrote:
Something seems to be wrong with the order, though. Munich (pop > 1m in all
statements) is listed way after Chemnitz (pop < 300k in all statements). Any
idea why?

Good catch. My query was too simple (using one "random" population instead of the biggest one). Here is a better query, this time even with populations given:

PREFIX : <http://www.wikidata.org/entity/>
SELECT ?city (MAX(?population) AS ?max_population)  ?citylabel ?mayorlabel WHERE {
 ?city :P31c/:P279c* :Q515 .  # find instances of subclasses of city
 ?city :P6s ?statement .      # with a P6 (head of goverment) statement
 ?statement :P6v ?mayor .     # ... that has the value ?mayor
 ?mayor :P21c :Q6581072 .     # ... where the ?mayor has P21 (sex or gender) female
 FILTER NOT EXISTS { ?statement :P582q ?x }  # ... but the statement has no P582 (end date) qualifier

 # Now select the population value of the ?city
 # (the number is reached through a chain of three properties)
 ?city :P1082s/:P1082v/<http://www.wikidata.org/ontology#numericValue> ?population .

 # Optionally, find English labels for city and mayor:
 OPTIONAL {
   ?city rdfs:label ?citylabel .
   FILTER ( LANG(?citylabel) = "en" )
 }
 OPTIONAL {
   ?mayor rdfs:label ?mayorlabel .
   FILTER ( LANG(?mayorlabel) = "en" )
 }
} GROUP BY ?city ?citylabel ?mayorlabel
ORDER BY DESC(?max_population) LIMIT 100


Oh... maybe quantity values are sorted in alphanumeric order, because they are
decimal strings? They should be xsd:decimal...

They are.

Markus



Am 20.04.2015 um 22:18 schrieb Markus Krötzsch:
Hi all,

For many years, Denny and I have been giving talks about why we need to improve
the data management in Wikipedia. To explain and motivate this, we have often
asked the simple question: "What are the world's largest cities with a female
mayor?" The information to answer this is clearly in Wikipedia, but it would be
painfully hard to get the result by reading articles.

I recently had the occasion of actually phrasing this in SPARQL, so that an
answer can now, finally, be given. The query to run at

http://milenio.dcc.uchile.cl/sparql

is as follows (with some explaining comments inline):

PREFIX : <http://www.wikidata.org/entity/> SELECT DISTINCT ?city ?citylabel
?mayorlabel WHERE {
  ?city :P31c/:P279c* :Q515 .  # find instances of subclasses of city
  ?city :P6s ?statement .      # with a P6 (head of goverment) statement
  ?statement :P6v ?mayor .     # ... that has the value ?mayor
  ?mayor :P21c :Q6581072 .     # ... where the ?mayor has P21 (sex or gender) female
  FILTER NOT EXISTS { ?statement :P582q ?x }  # ... but the statement has no P582
(end date) qualifier

  # Now select the population value of the ?city
  # (the number is reached through a chain of three properties)
  ?city :P1082s/:P1082v/<http://www.wikidata.org/ontology#numericValue>
?population .

  # Optionally, find English labels for city and mayor:
  OPTIONAL {
    ?city rdfs:label ?citylabel .
    FILTER ( LANG(?citylabel) = "en" )
  }
  OPTIONAL {
    ?mayor rdfs:label ?mayorlabel .
    FILTER ( LANG(?mayorlabel) = "en" )
  }
} ORDER BY DESC(?population) LIMIT 100

To see the results, just paste this into the box at
http://milenio.dcc.uchile.cl/sparql and press "Run query".

The query does not filter the most recent population but relies on Virtuoso to
pick the biggest value for DESC sorting, and on the world to have (mostly)
cities with increasing population numbers over time. This is also the reason why
the population is not printed (it would give you more than one match per city
then, even with DISTINCT). Picking the current population will become easier
once ranks are used more widely to mark it.

There might also be some inaccuracies in cases where a past mayor does not have
an "end date" set in Wikidata (Madrid has a suspiciously large number of current
mayors ...), but a query can only ever be as good as its input data.

I hope this is inspiring to some of you. One could also look for the world's
youngest or oldest current mayors with similar queries, for example.

Cheers,

Markus


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l