Hey folks,
we plan to drop the wb_entity_per_page table sometime soon[0], because
it is just not required (as we will likely always have a programmatic
mapping from entity id to page title) and it does not supported non
-numeric entity ids as it is now. Due to this removing it is a blocker
for the commons metadata.
Is anybody using that for their tools (on tool labs)? If so, please
tell us so that we can give you instructions and a longer grace period
to update your scripts.
Cheers,
Marius
[0]: https://phabricator.wikimedia.org/T95685
Hoi,
Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It
is a page that may update regularly from Wikidata thanks to the
ListeriaBot. Obviously, there may be a few more because I am falling ever
more behind with my quest for registering deaths in 2015.
I have copied his work and created a page for people who died in the
Netherlands in 2015 [2]. It is trivially easy to do this and, the result is
great. The result looks great, it can be used for any country in any
Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important
metadata at Wikidata. I am really happy that we can showcase their work. It
is important work because as someone reminded me at some stage, this is
part of what amounts to the policy of living people...
Thanks,
GerardM
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil
[2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
Hello all,
The Photographers' Identities Catalog (PIC) is an ongoing project of
visualizing photo history through the lives of photographers and photo
studios. I have information on 115,000 photographers and studios as of
tonight. It is still under construction, but as I've almost completed an
initial indexing of the ~12,000 photographers in WikiData, I thought I'd
share it with you. We (the New York Public Library) hope to launch it
officially in mid to late January. This represents about 12 years worth of
my work of researching in NYPL's photography collection, censuses and
business directories, and scraping or indexing trusted websites, databases,
and published biographical dictionaries pertaining to photo history.
Again, please bear in mind that our programmer is still hard at work (and I
continue to refine and add to the data*), but we welcome your feedback,
questions, critiques, etc. To see the WikiData photographers, select
WikiData from the Source dropdown. Have fun!
*PIC*
<http://mgiraldo.github.io/pic/?address.AddressTypeID=*&address.CountryID=*&…>
Thanks,
David
*Tomorrow, for instance, I'll start mining Wikidata for birth & death
locations.
Hi all,
as you know, Tpt has been working as an intern this summer at Google. He
finished his work a few weeks ago and I am happy to announce today the
publication of all scripts and the resulting data he has been working on.
Additionally, we publish a few novel visualizations of the data in Wikidata
and Freebase. We are still working on the actual report summarizing the
effort and providing numbers on its effectiveness and progress. This will
take another few weeks.
First, thanks to Tpt for his amazing work! I have not expected to see such
rich results. He has exceeded my expectations by far, and produced much
more transferable data than I expected. Additionally, he also was working
on the primary sources tool directly and helped Marco Fossati to upload a
second, sports-related dataset (you can select that by clicking on the
gears icon next to the Freebase item link in the sidebar on Wikidata, when
you switch on the Primary Sources tool).
The scripts that were created and used can be found here:
https://github.com/google/freebase-wikidata-converter
All scripts are released under the Apache license v2.
The following data files are also released. All data is released under the
CC0 license (in order to make this explicit, a comment has been added to
the start of each file, stating the copyright and the license. If any
script dealing with the files hiccups due to that line, simply remove the
first line).
https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-mapped-mis…
The actual missing statements, including URLs for sources, are in this
file. This was filtered against statements already existing in Wikidata,
and the statements are mapped to Wikidata IDs. This contains about 14.3M
statements (214MB gzipped, 831MB unzipped). These are created using the
mappings below in addition to the mappings already in Wikidata. The quality
of these statements is rather mixed.
Additional datasets that we know meet a higher quality bar have been
previously released and uploaded directly to Wikidata by Tpt, following
community consultation.
https://tools.wmflabs.org/wikidata-primary-sources/data/additional-mapping.…
Contains additional mappings between Freebase MIDs and Wikidata QIDs, which
are not available in Wikidata. These are mappings based on statistical
methods and single interwiki links. Unlike the first set of mappings we had
created and published previously (which required multiple interwiki links
at least), these mappings are expected to have a lower quality - sufficient
for a manual process, but probably not sufficient for an automatic upload.
This contains about 3.4M mappings (30 MB gzipped, 64MB unzipped).
https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-new-labels…
This file includes labels and aliases for Wikidata items which seem to be
currently missing. The quality of these labels is undetermined. The file
contains about 860k labels in about 160 languages, with 33 languages having
more than 10k labels each (14MB gzipped, 32MB unzipped).
https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-reviewed-m…
This is an interesting file as it includes a quality signal for the
statements in Freebase. What you will find here are ordered pairs of
Freebase mids and properties, each indicating that the given pair were
going through a review process and likely have a higher quality on average.
This is only for those pairs that are missing from Wikidata. The file
includes about 1.4M pairs, and this can be used for importing part of the
data directly (6MB gzipped, 52MB unzipped).
Now anyone can take the statements, analyse them, slice and dice them,
upload them, use them for your own tools and games, etc. They remain
available through the primary sources tool as well, which has already led
to several thousand new statements in the last few weeks.
Additionally, Tpt and I created in the last few days of his internship a
few visualizations of the current data in Wikidata and in Freebase.
First, the following is a visualization of the whole of Wikidata:
https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-color.png
The visualization needs a bit of explanation, I guess. The y-axis (up/down)
represents time, the x-axis (left/right) represents space / geolocation.
The further down, the closer you are to the present, the further up the
more you go in the past. Time is given in a rational scale - the 20th
century gets much more space than the 1st century. The x-axis represents
longitude, with the prime meridian in the center of the image.
Every item is being put at its longitude (averaged, if several) and at its
earliest point of time mentioned on the item. For items without either,
neighbouring items propagate their value to them (averaging, if necessary).
This is done repeatedly until the items are saturated.
In order to understand that a bit better, the following image offers a
supporting grid: each line from left to right represents a century (up to
the first century), and each line from top to bottom represent a meridian
(with London in the middle of the graph).
https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-grid-color…
The same visualizations has also been created for Freebase:
https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-color.pnghttps://tools.wmflabs.org/wikidata-primary-sources/data/freebase-grid-color…
In order to compare the two graphs, we also overlaid them over each other.
I will leave the interpretation to you, but you can easily see the
strengths of weaknesses of both knowledge bases.
https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-red-freeba…https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-red-wikida…
The programs for creating the visualizations are all available in the
Github repository mentioned above (plenty of RAM is recommended to run it).
Enjoy the visualizations, the data and the script! Tpt and I are available
to answer questions. I hope this will help with understanding and analysing
some of the results of the work that we did this summer.
Cheers,
Denny
Hey folks :)
The in other projects sidebar is a feature that adds links to articles
on other projects to the sidebar. So for example on the Wikivoyage
article about Berlin it links to the respective content on Wikipedia,
Commons and so on.This works similar to the language links that you
already have in the sidebar of an article. It is intended to make more
content available to our users and bring our projects closer together.
The in other projects sidebar has been around for many months now. It
has been a beta feature on all wikis and default on a handful. It
can't stay a beta feature forever and we should decide on its faith
now. I am happy with the reactions we have been getting about it and
believe it adds significant value for our readers and editors on
smaller projects. I therefor want to enable it on all projects. This
will happen in January. The exact date is still being decided.
If for some reason your project decides it does not want this feature
please let me know and we'll disable it for you.
The ticket for tracking this is https://phabricator.wikimedia.org/T103102
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
We lack several maintenance scripts for the clients, that is human
readable special pages with reports on which pages lacks special
treatment. In no particular order we need some way to identify
unconnected pages in general (the present one does not work [1]), we
need some way to identify pages that are unconnected but has some
language links, we need to identify items that are used in some
language and lacks labels (almost like [2],but on the client and for
items that are somehow connected to pages on the client), and we need
to identify items that lacks specific claims and the client pages use
a specific template.
There are probably more such maintenance pages, these are those that
are most urgent. Now users start to create categories to hack around
the missing maintenance pages, which create a bunch of categories.[3]
At Norwegian Bokmål there are just a few scripts that utilize data
from Wikidata, still the number of categories starts to grow large.
For us at the "receiving end" this is a show stopper. We can't
convince the users that this is a positive addition to the pages
without the maintenance scripts, because them we more or less are in
the blind when we try to fix errors. We can't use random pages to try
to prod the pages to find something that is wrong, we must be able to
search for the errors and fix them.
This summer we (nowiki) have added about ten (10) properties to the
infobokses, some with scripts and some with the property parser
function. Most of my time I have not been coding, and I have not been
fixing errors. I have been trying to explain to the community why
Wikidata is a good idea. At one point the changes was even reverted
because someone disagree with what we had done. The whole thing
basically revolves around "my article got an Q-id in the infobox and I
don't know how to fix it". We know how to fix it, and I have explained
that to the editors at nowiki several times. They still don't get it,
so we need some way to fix it, and we don't have maintenance scripts
to do it.
Right now we don't need more wild ideas that will swamp the
development for months and years to come, we need maintenance scripts,
and we need them now!
[1] https://no.wikipedia.org/wiki/Spesial:UnconnectedPages
[2] https://www.wikidata.org/wiki/Special:EntitiesWithoutLabel
[3] https://no.wikipedia.org/wiki/Spesial:Prefiksindeks/Kategori:Artikler_hvor
John Erling Blad
/jeblad
Hi, it's first of July and I would like to introduce you a quarterly goal
that the Engineering Community team has committed to:
Establish a framework to engage with data engineers and open data
organizations
https://phabricator.wikimedia.org/T101950
We are missing a community framework allowing Wikidata content and tech
contributors, data engineers, and open data organizations to collaborate
effectively. Imagine GLAM applied to data.
If all goes well, by the end of September we would like to have basic
documentation and community processes for open data engineers and
organizations willing to contribute to Wikidata, and ongoing projects with
one open data org.
If you are interested, get involved! We are looking for
* Wikidata contributors with good institutional memory
* people that has been in touch with organizations willing to contribute
their open data
* developers willing to help improving our software and programming missing
pieces
* also contributors familiar with the GLAM model(s), what works and what
didn't work
This goal has been created after some conversations with Lydia Pintscher
(Wikidata team) and Sylvia Ventura (Strategic Partnerships). Both are on
board, Lydia assuring that this work fits into what is technically
effective, and Sylvia checking our work against real open data
organizations willing to get involved.
This email effectively starts the bootstrapping of this project. I will
start creating subtasks under that goal based on your feedback and common
sense.
--
Quim Gil
Engineering Community Manager @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
Hey folks :)
I'll be doing another office hour to talk about all things Wikidata.
As usual I'll give an overview of the past 3 months and what's ahead.
It'll be in #wikimedia-office on Freenode. It'll be on January 21st at
17:00 UTC. For your timezone please see
https://www.timeanddate.com/worldclock/fixedtime.html?hour=17&min=00&sec=0&….
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Hi all,
I just discovered that there seems to be a limit on the length of
property values... some properties for compounds, however, are longer,
the InChI being a good example... 400 chars is not enough for some
compounds in Wikipedia, like teixobactin (Q18720369)....
This length is not defined by the property definition itself (InChI
(P234)), so I am wondering if this max length is system wide, or if
there are options to vary it? A max length of 1024 is better, though
still would not allow InChIs values for all compounds...
Looking forward to hearing from you, and a happy new year,
Egon
--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen
Hey folks :)
You can now submit talk proposals for Wikimania at
https://wikimania2016.wikimedia.org/wiki/Submissions I'd love to see
many Wikidata submissions from you all. If you need help fleshing out
a proposal, don't know if a certain topic is good for a talk there or
similar I am happy to help. Just send me an email.
Please note: The submission deadline is apparently already on January 7th.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.