Hi Thad,
"Assignment" can be done with CONSTRUCT, so something like this would work to split a name into two parts: PREFIX ex: http://example.org# CONSTRUCT { ?human ex:hasfirstName ?first. ?human ex:hasSecondName ?second } WHERE { ?human wdt:P31 wd:Q5; rdfs:label ?label . BIND (STRBEFORE(?label, " ") AS ?first) . BIND (STRAFTER(?label, " ") AS ?second) . FILTER (lang(?label)= "en") . }
Christopher Johnson Scientific Associate Universitätsbibliothek Leipzig
On 19 September 2017 at 14:00, wikidata-request@lists.wikimedia.org wrote:
Send Wikidata mailing list submissions to wikidata@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/wikidata or, via email, send a message with subject or body 'help' to wikidata-request@lists.wikimedia.org
You can reach the person managing the list at wikidata-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikidata digest..."
Today's Topics:
- Weekly Summary #278 (Léa Lacroix)
- How to split a label by whitespace in WDQS ? (Thad Guidry)
- Re: How to split a label by whitespace in WDQS ? (Marco Neumann)
- Re: How to split a label by whitespace in WDQS ? (Nicolas VIGNERON)
- Re: How to split a label by whitespace in WDQS ? (Lucas Werkmeister)
- Re: How to split a label by whitespace in WDQS ? (Thad Guidry)
- Categories in RDF/WDQS (Stas Malyshev)
Message: 1 Date: Mon, 18 Sep 2017 17:36:38 +0200 From: Léa Lacroix lea.lacroix@wikimedia.de To: "Discussion list for the Wikidata project." wikidata@lists.wikimedia.org Subject: [Wikidata] Weekly Summary #278 Message-ID: <CAERksTZPK-wMkwcr4hXBTA3TPTQcBPntN3dHFpZJ 1798eM4oYQ@mail.gmail.com> Content-Type: text/plain; charset="utf-8"
*Here's your quick overview of what has been happening around Wikidata over the last week.*Events https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Events/ Press/Blogs https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Press_coverage
- Upcoming: Wikidata Wahldaten Workshop 2017
https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_ Wahldaten_Workshop_2017 – 30 September 2017 in Vienna, Austria
- Upcoming: Wikimedia Research Showcase
https://meta.wikimedia.org/wiki/Wikimedia_Research/ Showcase#September_2017
- Past: Wikidata workshop in Zurich
https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_Zurich (the slides of the speakers are linked on the page)
- Past: GLAMhack Wikidata workshop in Lausanne (see the slides of the
Query Service introduction https://docs.google.com/presentation/d/1hwUBbtP0TppAKrEpjtSjdOXePZ_ 7OIRNDWsAHzVk0NA/edit#slide=id.g1f4d0124c0_0_0 )
- Past: Wikidata workshop in Kolkata
https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_ workshop_Kolkata_2017, India
- Bridging real and fictional worlds
https://medium.com/wiki-playtime/bridging-real-and- fictional-worlds-1af32ee65a26 in Wikidata, by Martin Poulter
- Weekend at the Museum (of Brittany)
https://www.lehir.net/weekend-at-the-museum-of-brittany/, by Envel Le Hir https://www.wikidata.org/wiki/User:Envlh
- Wiki Loves Monuments und Wikidata
http://archivalia.hypotheses.org/67371, by SW
- The French Connection at the Wikimania 2017 Hackathon
https://www.lehir.net/the-french-connection-at-the- wikimania-2017-hackathon/, by Envel Le Hir https://www.wikidata.org/wiki/User:Envlh
Other Noteworthy Stuff
- Wikidata ontology explorer
https://lucaswerkmeister.github.io/wikidata-ontology-explorer/: creates a tree of a class or property, shows common properties and statements
- Join the mysterious group of Wikidata:Flashmob
https://www.wikidata.org/wiki/Wikidata:Flashmob who improve labels, or summon them on an item
- A breaking change to the *wbcheckconstraints* API output format was
- Q40000000 https://www.wikidata.org/wiki/Q40000000 was created
- Improvements coming soon to Recent Changes
https://www.wikidata.org/wiki/Wikidata:Project_chat# Improvements_coming_soon_to_Recent_Changes
- Several new catalogs in Mix'n'Match
https://tools.wmflabs.org/mix-n-match/#.2Fcatalog.2F547 incl. Encyclopædia Britannica, National Gallery artists and ArtCyclopedia
Did you know?
- Newest properties
https://www.wikidata.org/wiki/Special:ListProperties: United Nations Treaty Series Registration Number https://www.wikidata.org/wiki/Property:P4231, Sefaria ID https://www.wikidata.org/wiki/Property:P4230, ICD-10-CM https://www.wikidata.org/wiki/Property:P4229, Encyclopedia of Australian Science ID https://www.wikidata.org/wiki/Property:P4228, Indonesian Small Islands Directory ID https://www.wikidata.org/ wiki/Property:P4227, Cyworld ID https://www.wikidata.org/wiki/Property:P4226, IPA Braille https://www.wikidata.org/wiki/Property:P4225, category contains https://www.wikidata.org/wiki/Property:P4224, Enciclopedia Italiana ID https://www.wikidata.org/wiki/Property:P4223, United Nations Treaty Series Volume Number https://www.wikidata.org/wiki/Property:P4222, National Criminal Justice ID https://www.wikidata.org/wiki/Property:P4221, order of battle https://www.wikidata.org/wiki/Property:P4220, Tyrolean Art Cadastre inventory ID https://www.wikidata.org/wiki/Property:P4219, shelf life https://www.wikidata.org/wiki/Property:P4218, UK Electoral Commission ID https://www.wikidata.org/wiki/Property:P4217, LNB Pro A player ID https://www.wikidata.org/wiki/Property:P4216, nLab ID https://www.wikidata.org/wiki/Property:P4215, highest observed lifespan https://www.wikidata.org/wiki/Property:P4214, Unicode hex codepoint https://www.wikidata.org/wiki/Property:P4213, PACTOLS thesaurus ID https://www.wikidata.org/wiki/Property:P4212, Bashkir encyclopedia (Russian version) ID https://www.wikidata.org/wiki/Property:P4211, Bashkir encyclopedia (Bashkir version) ID https://www.wikidata.org/wiki/Property:P4210
- Query examples:
<https://query.wikidata.org/#SELECT%20%3Falgorithm%20%
- Algorithms and the problems they solve
3FalgorithmLabel%20%20%3FprobLabel%0A%7B%0A%09% 3Falgorithm%20wdt%3AP2159%20%3Fprob%20.%0A%09SERVICE% 20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase% 3Alanguage%20%22en%2Cen%22%20%20%7D%20%20%20%20%0A%7D> (source https://twitter.com/WikiDigi/status/908717947937591298) - Swiss items with article in exactly one of German-, French-, Italian-, and Romansh-language Wikipedias https://query.wikidata.org/#%23%20map%20of%20Swiss%20items% 20with%20article%20in%20exactly%20one%20of%20German- %2C%20French-%2C%20Italian-%2C%20and%20Romansh-language% 20Wikipedias%0A%23defaultView%3AMap%0ASELECT%20%3Fitem%20% 28SAMPLE%28%3Ftitle%29%20AS%20%3FitemLabel%29%20%28SAMPLE% 28%3Flocation%29%20AS%20%3Flocation%29%20%28SAMPLE%28% 3Flanguage%29%20AS%20%3Flayer%29%20WITH%20%7B%0A%20% 20SELECT%20%2a%20WHERE%20%7B%0A%20%20%20%20wd%3AQ39%20p% 3AP1332%2Fpsv%3AP1332%2Fwikibase%3AgeoLatitude%20% 3Fn%3B%0A%20%20%20%20%20%20%20%20%20%20%20p%3AP1333%2Fpsv% 3AP1333%2Fwikibase%3AgeoLatitude%20%3Fs%3B%0A%20% 20%20%20%20%20%20%20%20%20%20p%3AP1334%2Fpsv%3AP1334% 2Fwikibase%3AgeoLongitude%20%3Fe%3B%0A%20%20%20%20%20%20% 20%20%20%20%20p%3AP1335%2Fpsv%3AP1335%2Fwikibase% 3AgeoLongitude%20%3Fw.%0A%20%20%7D%0A%7D%20AS%20%25switzerlandBoundingBox% 20WHERE%20%7B%0A%20%20VALUES%20%3Fwiki%20%7B%20%3Chttps%3A% 2F%2Fde.wikipedia.org%2F%3E%20%3Chttps%3A%2F%2Ffr.wikipedia.org%2F%3E%20% 3Chttps%3A%2F%2Fit.wikipedia.org%2F%3E%20%3Chttps%3A%2F% 2Frm.wikipedia.org%2F%3E%20%7D%0A%20%20%3Fitem%20wdt% 3AP17%20wd%3AQ39%3B%0A%20%20%20%20%20%20%20%20wdt%3AP625% 20%3Flocation.%0A%20%20%3Farticle%20a%20schema% 3AArticle%3B%0A%20%20%20%20%20%20%20%20%20%20%20schema% 3Aabout%20%3Fitem%3B%0A%20%20%20%20%20%20%20%20%20%20% 20schema%3AisPartOf%20%3Fwiki%3B%0A%20%20%20%20%20%20%20%20% 20%20%20schema%3AinLanguage%20%3Flanguage%3B%0A%20%20%20% 20%20%20%20%20%20%20%20schema%3Aname%20%3Ftitle.%0A%20%20% 23%20filter%20out%20some%20stray%20results%20that%20have%20country% 20Switzerland%20but%20coordinates%20outside%20it% 20%28e.%E2%80%AFg.%20rivers%29%0A%20%20INCLUDE%20% 25switzerlandBoundingBox.%0A%20%20BIND%28geof%3Alatitude% 28%3Flocation%29%20AS%20%3Flat%29%0A%20%20BIND%28geof% 3Alongitude%28%3Flocation%29%20AS%20%3Flon%29%0A%20% 20FILTER%28%3Fs%20%3C%3D%20%3Flat%20%26%26%20%3Flat%20%3C% 3D%20%3Fn%20%26%26%0A%20%20%20%20%20%20%20%20%20%3Fw%20% 3C%3D%20%3Flon%20%26%26%20%3Flon%20%3C%3D%20%3Fe%29%0A% 7D%0AGROUP%20BY%20%3Fitem%0AHAVING%28COUNT%28DISTINCT% 20%3Fwiki%29%20%3D%201%29 (source <https://twitter.com/WikidataFacts/status/908444126999441408
)
- Popular gender-neutral given names <https://query.wikidata.org/#%23%20names%20that%20were%
20used%20gender-neutrally%20among%20people%20on% 20Wikidata%0ASELECT%20%3Fname%20%3FnameLabel%20%3Fwomen%20% 3Ftotal%20%3Fratio%20%28ABS%28%3Fratio-0.5%29%20AS%20% 3FdiffFrom5050%29%20WHERE%20%7B%0A%20%20%7B%0A%20%20%20% 20SELECT%20%3Fname%20%28COUNT%28%2a%29%20AS%20%3Ftotal%29% 20%28SUM%28%3Fwoman%29%20AS%20%3Fwomen%29%20%28SUM%28% 3Fwoman%29%2FCOUNT%28%2a%29%20AS%20%3Fratio%29%20WHERE%20% 7B%20%23%20should%20be%20%28%3Fwomen%2F%3Ftotal%20AS%20% 3Fratio%29%20%E2%80%93%20see%20T172113%0A%20%20%20%20%20% 20%3Fperson%20wdt%3AP31%20wd%3AQ5%3B%0A%20%20%20%20%20%20% 20%20%20%20%20%20%20%20wdt%3AP735%20%3Fname%3B%0A%20%20% 20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP21%20% 3FsexOrGender%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20% 20wdt%3AP569%20%3Fdob.%0A%20%20%20%20%20%20BIND%28IF%28% 3FsexOrGender%20IN%20%28wd%3AQ6581072%2C%20wd%3AQ1052281% 29%2C%201%2C%200%29%20AS%20%3Fwoman%29%0A%20%20%20%20%7D% 0A%20%20%20%20GROUP%20BY%20%3Fname%0A%20%20%20%20HAVING% 28%3Ftotal%20%3E%3D%2010%20%26%26%200.4%20%3C%3D%20% 3Fratio%20%26%26%20%3Fratio%20%3C%3D%200.6%29%20%23% 20arbitrary%20limits%2C%20feel%20free%20to%20tweak%0A% 20%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd% 3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_ LANGUAGE%5D%2Cen%22.%20%7D%0A%7D%0AORDER%20BY%20DESC%28%3Ftotal%29> (source <https://twitter.com/WikidataFacts/status/907740126150873089
)
- Computer network protocols and their ports <https://query.wikidata.org/#%0ASELECT%20%3Fitem%20%
3FitemLabel%20%3FportLabel%0AWHERE%20%0A%7B%0A%20%20% 3Fitem%20wdt%3AP31%20wd%3AQ15836568%20.%0A%20%20%3Fitem%20wdt%3AP1641%20% 3Fport%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd% 3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_ LANGUAGE%5D%2Cen%22.%20%7D%0A%7D> (source https://twitter.com/WikiDigi/status/907614182564036609) - Software developers by number of software titles https://query.wikidata.org/#%23defaultView%3ABubbleChart% 0ASELECT%20%3Fdeveloper%20%3FdeveloperLabel%20%28COUNT% 28%3Fsoftware%29%20AS%20%3Fcount%29%20WHERE%20%7B%0A% 20%20%3Fsoftware%20%28p%3AP31%2Fps%3AP31%2Fwdt%3AP279%2a%29% 20wd%3AQ7397.%0A%20%20%3Fsoftware%20wdt%3AP178%20% 3Fdeveloper.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd% 3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D% 0A%7D%0AGROUP%20BY%20%3Fdeveloper%20%3FdeveloperLabel%0AORDER%20BY% 20DESC%28%3Fcount%29%0ALIMIT%20100 (source https://twitter.com/WikiDigi/status/907961753606045696) - Spacecraft and what they were named after https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fobj%20% 3FobjLabel%20%3FobjDescription%20%3Fnom%20%3FnomLabel%20% 3FnomDescription%0AWHERE%20%7B%0A%7B%3Fobj%20wdt%3AP31% 2Fwdt%3AP279%2a%20wd%3AQ40218%20%7D%20%23%20type%20of% 20spacecraft%0AUNION%20%7B%20%3Fobj%20wdt%3AP31%2Fwdt% 3AP279%2a%20wd%3AQ13226541%20%7D%20%23%20or%20spaceflight% 20programme%0A%3Fobj%20wdt%3AP138%20%3Fnom%20%23named% 20after%0ASERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase% 3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22%20%7D%0A% 7D%20ORDER%20BY%20ASC%28%3FobjLabel%29 (source https://twitter.com/mlpoulter/status/908774078080802818)
- Newest WikiProjects
<https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:WikiProjects
:
WikiProject property constraints https://www.wikidata.org/wiki/Wikidata:WikiProject_ property_constraints
Development
- Worked more on the constraints gadget in order to make it also qork
for references and qualifiers
- Made progress on persistently storing edits for the new Lexeme entity
type (next to items and properties)
- Worked on the RDF mapping for full URIs of external identifiers (
phabricator:T121274 https://phabricator.wikimedia.org/T121274)
You can see all open tickets related to Wikidata here https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R. Monthly Tasks
- Add labels, in your own language(s), for the new properties listed
above.
- Comment on property proposals: all open proposals
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Overview
- Suggested and open tasks
https://www.wikidata.org/wiki/Wikidata:Contribute/ Suggested_and_open_tasks !
- Contribute to a Showcase item
https://www.wikidata.org/wiki/Special:MyLanguage/ Wikidata:Showcase_items .
- Help translate https://www.wikidata.org/wiki/Special:LanguageStats
or proofread the interface and documentation pages, in your own language!
- Help merge identical items
https://www.wikidata.org/wiki/User:Pasleim/projectmerge across Wikimedia projects.
- Help write the next summary!
https://www.wikidata.org/wiki/Wikidata:Status_updates/Next
-- Léa Lacroix Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Thanks Christopher,
But I really am looking to split by whitespace, with an unknown of how many tokens in a label. My example of human names was just to simplify, but could be anything... not just human names. Any Wikidata QID. Like "Castle of Saint Pée sur Nivelle" I would want 6 columns automatically created for that. Or in JSON terms.. An array of string objects. { "Castle", "of", "Saint", "Pée", "sur", "Nivelle", }
This has to do with a use case of pre-processing the label names for data ingestion into further analysis workflows. I was hoping that I could easily leverage a bit of horsepower for free from the WDQS for this (splitting label names)...perhaps even using the Label service itself to do the splitting.
The indexing service behind the scenes already stores much of this, and stores those tokens for each label. The problem is that we don't currently have a way to get the tokens of a label for any particular QID and its labels in various languages. And that's what I want to solve, either through SPARQL or an enhancement to the Label service or something else. If the answer is that I will have to resort to my own programmatic methods via the dump files then so be it, I guess, but I'd rather not have to put in the work for something that is done already behind the scenes.
-Thad +ThadGuidry https://plus.google.com/+ThadGuidry
this is something you might want to bring up with the Blazegraph team. Jena for example provides the apf:strSplit SPARQL function in ARQ.
On Tue, Sep 19, 2017 at 2:39 PM, Thad Guidry thadguidry@gmail.com wrote:
Thanks Christopher,
But I really am looking to split by whitespace, with an unknown of how many tokens in a label. My example of human names was just to simplify, but could be anything... not just human names. Any Wikidata QID. Like "Castle of Saint Pée sur Nivelle" I would want 6 columns automatically created for that. Or in JSON terms.. An array of string objects. { "Castle", "of", "Saint", "Pée", "sur", "Nivelle", }
This has to do with a use case of pre-processing the label names for data ingestion into further analysis workflows. I was hoping that I could easily leverage a bit of horsepower for free from the WDQS for this (splitting label names)...perhaps even using the Label service itself to do the splitting.
The indexing service behind the scenes already stores much of this, and stores those tokens for each label. The problem is that we don't currently have a way to get the tokens of a label for any particular QID and its labels in various languages. And that's what I want to solve, either through SPARQL or an enhancement to the Label service or something else. If the answer is that I will have to resort to my own programmatic methods via the dump files then so be it, I guess, but I'd rather not have to put in the work for something that is done already behind the scenes.
-Thad +ThadGuidry
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata