On 4/21/15 6:05 PM, James Douglas wrote:
Here's the (nearly) equivalent query for the statements dump[1] loaded into Blazegraph:

PREFIX wd: <http://www.wikidata.org/entity/>
SELECT DISTINCT ?city ?citylabel ?mayorlabel WHERE {
  ?city      wd:P31s/wd:P31v wd:Q515     .      # find instances of subclasses of city
  ?city      wd:P6s          ?statement  .      # with a P6 (head of goverment) statement
  ?statement wd:P6v          ?mayor      .      # ... that has the value ?mayor
  ?mayor     wd:P21s/wd:P21v wd:Q6581072 .      # ... where the ?mayor has P21 (sex or gender) female
  FILTER NOT EXISTS { ?statement wd:P582q ?x }  # ... but the statement has no P582 (end date) qualifier
  # Now select the population value of the ?city
  # (the number is reached through a chain of three properties)
  ?city wd:P1082s/wd:P1082v/<http://www.wikidata.org/ontology#numericValue> ?population .

  # Optionally, find English labels for city and mayor:
    ?city wd:P373s/wd:P373v ?citylabel .
    # FILTER ( LANG(?citylabel) = "en" )
    ?mayor wd:P373s/wd:P373v ?mayorlabel .
    # FILTER ( LANG(?mayorlabel) = "en" )

} ORDER BY DESC(?population) LIMIT 100

Free beer to anyone who can figure out how to use those language filters.  Would we need to also load property definitions[2]?

1. http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150223/wikidata-statements.nt.gz
2. http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150223/wikidata-properties.nt.gz

Please try to share SPARQL related examples using SPARQL Query Results URLs that identity documents where the content is dynamically generated via SPARQL query processing. It makes them easier to tweak and diagnose, amongst other things.


1. http://bit.ly/wikidata-query-example-cities-with-female-mayor -- shortened query results url

BTW -- for language tags to work, the content would have to have also been language tagged using "xyz"@en prior to upload to DBMS. Then, modulo use of LANG filter, you would be seeing stuff like "Madrid"@en in the output produced by the SELECT LIST.


On Tue, Apr 21, 2015 at 11:13 AM, Jeremy Baron <jeremy@tuxmachine.com> wrote:

On Tue, Apr 21, 2015 at 5:05 PM, Thad Guidry <thadguidry@gmail.com> wrote:
> We had US Census, World Bank, and UN Data as our primary data sources for any /statistics/ of a City/Town/Village.  Here's Houston - https://www.freebase.com/m/03l2n#/location/statistical_region

I don't understand where a lot of those numbers are from.

Also, maybe Houston is a bad example because the Census Bureau revised
numbers after the data was released.[0] Even some official Census
Bureau sites still report the old, pre-appeal number.[1]

There are multiple years that have duplicate conflicting values after
clicking "65 values total »" at your link. At first I was thinking it
may be something like estimates base vs. estimate vs. decennial.
However, for 2010 and 2011 there's one value that matches estimate
from [1] (source = [2]) and a larger value (source = [3]) that does
not match any other data I've seen. [2] and [3] both use the same
"Attribution URI" [4].

In any case, why take this from freebase instead of importing directly
from Census Bureau data? It's available in bulk. Format isn't great
but isn't horrible either. (at least the 5-year ACS is inconsistent
about upper/lower case for state two letter abbreviations. and, I
think, most humans would prefer something like a geoid as a key rather
than a dataset specific key used to look up the geoid in a different
file. and other quirks)


[0] http://www.chron.com/news/houston-texas/houston/article/City-wins-census-appeal-count-adjusted-4087372.php
[1] http://factfinder.census.gov/bkmk/table/1.0/en/PEP/2013/PEPANNRES/1620000US4835000
[2] https://www.freebase.com/g/11x1k306j
[3] https://www.freebase.com/m/0jst35z
[4] http://www.census.gov/popest/about/terms.html

