Hey folks,
we plan to drop the wb_entity_per_page table sometime soon[0], because
it is just not required (as we will likely always have a programmatic
mapping from entity id to page title) and it does not supported non
-numeric entity ids as it is now. Due to this removing it is a blocker
for the commons metadata.
Is anybody using that for their tools (on tool labs)? If so, please
tell us so that we can give you instructions and a longer grace period
to update your scripts.
Cheers,
Marius
[0]: https://phabricator.wikimedia.org/T95685
Hoi,
Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It
is a page that may update regularly from Wikidata thanks to the
ListeriaBot. Obviously, there may be a few more because I am falling ever
more behind with my quest for registering deaths in 2015.
I have copied his work and created a page for people who died in the
Netherlands in 2015 [2]. It is trivially easy to do this and, the result is
great. The result looks great, it can be used for any country in any
Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important
metadata at Wikidata. I am really happy that we can showcase their work. It
is important work because as someone reminded me at some stage, this is
part of what amounts to the policy of living people...
Thanks,
GerardM
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil
[2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
We lack several maintenance scripts for the clients, that is human
readable special pages with reports on which pages lacks special
treatment. In no particular order we need some way to identify
unconnected pages in general (the present one does not work [1]), we
need some way to identify pages that are unconnected but has some
language links, we need to identify items that are used in some
language and lacks labels (almost like [2],but on the client and for
items that are somehow connected to pages on the client), and we need
to identify items that lacks specific claims and the client pages use
a specific template.
There are probably more such maintenance pages, these are those that
are most urgent. Now users start to create categories to hack around
the missing maintenance pages, which create a bunch of categories.[3]
At Norwegian Bokmål there are just a few scripts that utilize data
from Wikidata, still the number of categories starts to grow large.
For us at the "receiving end" this is a show stopper. We can't
convince the users that this is a positive addition to the pages
without the maintenance scripts, because them we more or less are in
the blind when we try to fix errors. We can't use random pages to try
to prod the pages to find something that is wrong, we must be able to
search for the errors and fix them.
This summer we (nowiki) have added about ten (10) properties to the
infobokses, some with scripts and some with the property parser
function. Most of my time I have not been coding, and I have not been
fixing errors. I have been trying to explain to the community why
Wikidata is a good idea. At one point the changes was even reverted
because someone disagree with what we had done. The whole thing
basically revolves around "my article got an Q-id in the infobox and I
don't know how to fix it". We know how to fix it, and I have explained
that to the editors at nowiki several times. They still don't get it,
so we need some way to fix it, and we don't have maintenance scripts
to do it.
Right now we don't need more wild ideas that will swamp the
development for months and years to come, we need maintenance scripts,
and we need them now!
[1] https://no.wikipedia.org/wiki/Spesial:UnconnectedPages
[2] https://www.wikidata.org/wiki/Special:EntitiesWithoutLabel
[3] https://no.wikipedia.org/wiki/Spesial:Prefiksindeks/Kategori:Artikler_hvor
John Erling Blad
/jeblad
Hi, it's first of July and I would like to introduce you a quarterly goal
that the Engineering Community team has committed to:
Establish a framework to engage with data engineers and open data
organizations
https://phabricator.wikimedia.org/T101950
We are missing a community framework allowing Wikidata content and tech
contributors, data engineers, and open data organizations to collaborate
effectively. Imagine GLAM applied to data.
If all goes well, by the end of September we would like to have basic
documentation and community processes for open data engineers and
organizations willing to contribute to Wikidata, and ongoing projects with
one open data org.
If you are interested, get involved! We are looking for
* Wikidata contributors with good institutional memory
* people that has been in touch with organizations willing to contribute
their open data
* developers willing to help improving our software and programming missing
pieces
* also contributors familiar with the GLAM model(s), what works and what
didn't work
This goal has been created after some conversations with Lydia Pintscher
(Wikidata team) and Sylvia Ventura (Strategic Partnerships). Both are on
board, Lydia assuring that this work fits into what is technically
effective, and Sylvia checking our work against real open data
organizations willing to get involved.
This email effectively starts the bootstrapping of this project. I will
start creating subtasks under that goal based on your feedback and common
sense.
--
Quim Gil
Engineering Community Manager @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
Hey folks :)
We'll be rolling out arbitrary access to a large number of wikis next
week and the week after. See
https://www.wikidata.org/wiki/Wikidata:Arbitrary_access for the full
list.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
To pick up on a few different comments from this thread:
@revi (Hong, Yongmin)
-- Yes, of course you are correct that it is categories rather than
galleries that are the important structure for finding and navigating
images on Commons.
But even if we were to change the status-quo and ban sitelinks from
Wikidata items to Commons galleries (which might not be 100% popular,
since we currently have 87,000 sitelinks to galleries, up by about 3000
in the last year) -- even if we were to ban sitelinks to galleries, this
would still leave the question of whether Commons categories should be
sitelinked to categories or to articles.
Currently there are 311,000 Commonscats sitelinked to category items on
Wikidata, and 207,000 sitelinked to article items on Wikidata. So the
status-quo is for Commonscats to be sitelinked to category items (and in
the past has been even more so).
The problem is that, the way the software exists at the moment, you
can't have both. So if a Commonscat is sitelinked to an article item,
that precludes a Commonscat being sitelinked to a category item, and
vice-versa.
At the moment, the expectation is that a Commonscat will be sitelinked
to a category item, if possible. Of 323,825 Commonscats that can be
identified with a Wikidata category item, 311,000 are connected by
corresponding sitelinks. So if people are writing scripts or queries to
look for such relationships, they will most likely look for sitelinks.
On the other hand, of 884,439 Commonscats that can be identified with
article-like Wikidata items, only 207,494 (= 23.4%) are connected by
sitelinks -- even if this number has doubled in the last 12 months, the
expectation in the current status quo is that such a connection is more
likely *not* to be represented in a sitelink, by a three-to-one margin.
Instead, links between Commonscats and article-like Wikidata items are
currently overwhelmingly represented by the P373 property, which at the
moment records 807,776 (= 91.3%) of such identifications.
This is the property that the script
https://commons.wikimedia.org/wiki/User:Jheald/wdcat.js
looks up in order to display a link to Reasonator if there is a Wikidata
article-item for a Commonscat.
To get the best idea of whether there is a corresponding Wikidata item
and Wikipedia articles for a Commonscat, Commons users should therefore
use wdcat.js -- which is easily activated by adding a line to the
common.js file, such as at
https://commons.wikimedia.org/wiki/User:Jheald/common.js
-- because this will pick up four times more connections than currently
exist as sitelinks.
The important think is to establish and preserve clear expectations,
that software can build against.
At the moment, with the confusion of sitelinks of Commonscats to
articles and to categories, there is no guarantee that it is possible to
create a sitelink to an article. On the other hand, it should always be
possible to create a P373 connection. They are the connections that
systematically *can* be made, so it is important to make sure that they
systematically *are* made, to drag up the return rate from the current
91.3% to nearer 100%.
That would be helped by a policy that was absolutely systematic in
prescribing what should and what should not be sitelinked.
@ RomaineWiki:
You say that actively enforcing the longstanding Wikidata sitelink
policy of only sitelinking Commons categories to category-like items,
and Commons galleries to article-like items would be a plan to
"demolish the navigational structure of Commons".
But it wouldn't change any of the internal structures on Commons, and
would merely underline the current fact that even now only 23% of
Commonscats are linked to articles by sitelinks, compared to 91% by P373.
Isn't it better to get Commons users used to using (and improving) the
wdcat.js script, which uses the P373 property that can always be added,
rather than perpetuating the current muddle of Commonscat <-> article
sitelinks, which are so haphazard ?
@ Steinsplitter
As I understand it, the long-term plans for a new Wikibase structure
specifically for Commons are currently no longer an immediate
development priority; but will presumably start to move forward again
sooner or later.
On the other hand, this discussion was specifically about sitelinks.
Here I believe what has driven the Wikidata side has been the desire to
have a rule that is simple and consistent and predictable, because that
is the foundation needed to develop queries and scripts and tools and
user-interfaces on top of.
Combining that desideratum with the technical restriction of only
allowing one sitelink to each item from each wiki and vice-versa, is
what has led to the recommended scheme of linking
Commons categories <-> category-like items
Commons galleries <-> article-like items
with property P373 to handle identification of Commons categories <->
article-like items.
This fulfils the requirements of simplicity, consistency and predictability.
It's not ideal from a user-interface point of view (or a philosophical
point of view). But so long as the rule is applied consistently, the
limitations it leads to can be worked round with appropriate software
improvements -- eg in the first instance the wdcat.js script.
But to encourage people to develop and improve such software, it is
helpful for the above structure to be applied consistently.
In contrast perpetuating inconsistency and muddle blurs what is needed,
and works against the stable predictable basis needed to make such
software work.
All best,
James.
On 29/08/2015 14:39, Steinsplitter Wiki wrote:
> Wikidata needs to ask the Commons Community before doing commons related changes.
>
> It is so hard to understand what the wikidata people like to do with commons. Tons of text, hard to read. I don't understand what they like to do, but if this change is affecting commons then commons community consensus is needed.
>
>
> Date: Fri, 28 Aug 2015 17:34:28 +0200
> From: romaine.wiki(a)gmail.com
> To: wikidata(a)lists.wikimedia.org; commons-l(a)lists.wikimedia.org
> Subject: Re: [Commons-l] [Wikidata] Trends in links from Wikidata items to Commons
>
> As I wrote before, that thought is too simple. You only say that a zero belongs to a zero, and a two belongs to a two, then you only describe the type of page, but you ignore the subject of a page. That subject matters much more than the namespace number.
>
> Especially Wikinews is a wrong example, as most categories on Commons do not have a 1 to 1 relationship with Commons.
> However, articles on Wikipedia do have mostly a 1 on 1 relationship with categories on Commons.
>
> Romaine
>
> 2015-08-28 17:09 GMT+02:00 Luca Martinelli <martinelliluca(a)gmail.com>:
> 2015-08-28 12:09 GMT+02:00 Romaine Wiki <romaine.wiki(a)gmail.com>:
>
>> And I agree completely with what Revi says:
>
>>> Wikidata ignores this Commons' fact by trying to enforce ridiculous rules
>
>>> like this.
>
>
>
> It's not such a ridiculous rule, if you think of the rationale behind
>
> it: if gallery = ns0 and category = ns2, linking ns0 <--> ns2 in the
>
> same item is IMHO not a rational thing to do (not even for Wikinews if
>
> you ask me, but I'm digressing).
>
>
>
> So the *practical* problem that we have to address is the list of
>
> links in the left column. We really don't have any possibilty to
>
> exploit P373 in any way, not even with a .js, to fix this?
>
>
>
> L.
>
>
>
> _______________________________________________
>
> Wikidata mailing list
>
> Wikidata(a)lists.wikimedia.org
>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
>
> _______________________________________________
> Commons-l mailing list
> Commons-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>
>
>
> _______________________________________________
> Commons-l mailing list
> Commons-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>
Hey folks,
As you know Wikidata is one of the 100 winners of the "Germany - Land
der Ideen" competition 2015. We now have the chance to win the
Public’s Choice Award as well! At the moment, Wikidata is ranked 15th.
We need to be among the first 10 to enter the next round and win the
Public’s Choice Award. Voting is open until 23 August.
The information about the competition and projects is available in
English (https://www.land-der-ideen.de/en/projects-germany/landmarks-land-ideas/comp…
and https://www.land-der-ideen.de/en/projects-germany/landmarks-land-ideas/2015…)
but the voting process only in German. But it’s pretty easy to vote
anyway:
* Go to the Wikidata voting page at
https://www.land-der-ideen.de/ausgezeichnete-orte/preistraeger/wikidata
* Click the yellow button on the right ("Jetzt abstimmen").
* Type in your email address and tick the box to agree to the voting rules.
* You will receive a link via email. Click the link within 24 hours.
You can repeat this daily until 23 August. You have one vote every day.
Let's win this! :D
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Tell me if I am right or wrong about this.
If I am coining a URI for something that has an identifier in an outside
system is is straightforward to append the identifier (possibly modified a
little) to a prefix, such as
http://dbpedia.org/resource/Stellarator
Then you can write
@prefix dbpedia: <http://dbpedia.org/resource/>
and then refer to the concept (in either Turtle or SPARQL) as
dbpedia:Stellarator.
I will take one step further than this and say that for pedagogical and
other coding situations, the extra length of prefix declarations is an
additional cognitive load on top of all the other cognitive loads of
dealing with the system, so in the name of concision you can do something
like
@base <http://dbpedia.org/resource/>
@prefix : <http://dbpedia.org/ontology/>
and then you can write :someProperty and <Stellarator>, and your queries
are looking very simple.
The production for a QName cannot begin with a number so it is not correct
to write something like
dbpedia:100
or expect to have the full URI squashed to that. This kind of gotcha will
drive newbies nuts, and the realization of RDF as a universal solvent
requires squashing many of them.
Another example is
isbn:9971-5-0210-0
If you look at the @base declaration above, you see a way to get around
this, because with the base above you can write
<100> which works just fine in the dbpedia case.
I like what Wikidata did with using fairly dense sequential integers for
the ids, so a dbpedia resource URI looks like
http://www.wikidata.org/entity/Q4876286
which is always a QName, so you can write
@base <http://www.wikidata.org/entity/>
@prefix wd: <http://www.wikidata.org/entity/>
and then you can write
wd:Q4876286
<Q4876286>
and it is all fine, because (i) wikidata added the alpha prefix and (ii)
started at the beginning with it, and (iii) made up a plausible
explanation for it is that way. Freebase mids have the same property, so
:BaseKB has it too
I think customers would expect to be able to give us
isbn:0884049582
and have it just work, but because a number is never valid in the QName,
you can encode the URI like this:
http://isbn.example.com/I0884049582
and then write
isbn:I0884049582
<I0884049582>
which is not too bad. Note, however, if you want to write
<0884049582> you have to encode as
http://isbn.example.com/I0884049582
because, at least with the Jena framework, the same thing happens if you
write
@base <http://isbn.example.com/I>
or
@base <http://isbn.example.com/>
so you can't choose a representation which supports that mode of expression
and a :+prefix mode.
Now what bugs me is, what to do in the case of something which "might or
might not be numeric". What internal prefix would find good acceptability
for end users?
--
Paul Houle
*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*
(607) 539 6254 paul.houle on Skype ontology2(a)gmail.com
:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/
Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>
Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275
Hi everyone :)
We've finally done all the groundwork for unit support. I'd love for
you to give the first version a try on the test system here:
http://wikidata.beta.wmflabs.org/wiki/Q23950
There are a few known issues still but since this is one of the things
holding back Wikidata I made the call to release now and work on these
remaining things after that. What I know is still missing:
* We're showing the label of the item of the unit. We should be
showing the symbol of the unit in the future.
(https://phabricator.wikimedia.org/T77983)
* We can't convert between units yet - we only have the groundwork for
it so far. (https://phabricator.wikimedia.org/T77978)
* The items representing often-used units should be ranked higher in
the selector. (https://phabricator.wikimedia.org/T110673)
* When editing an existing value you see the URL of unit's item. This
should be replaced by the label.
(https://phabricator.wikimedia.org/T110675)
* When viewing a diff of a unit change you see the URL of the unit's
item. This should be replaced by the label.
(https://phabricator.wikimedia.org/T108808)
* We need to think some more about the automatic edit summaries for
unit-related changes. (https://phabricator.wikimedia.org/T108807)
If you find any bugs or if you are missing other absolutely critical
things please let me know here or file a ticket on
phabricator.wikimedia.org. If everything goes well we can get this on
Wikidata next Wednesday.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.