Hello all,
We’re planning to add JSON-LD as a serialization format for Wikidata. This
will allow for example an easier access to RDF data from Javascript.
This is now deployed on https://wikidata.beta.wmflabs.org. Example:
https://wikidata.beta.wmflabs.org/wiki/Special:EntityData/Q64.jsonld
As with the other formats we already support (like turtle or rdf/xml),
content negotiation is used if the format is not indicated by a suffix like
.jsonld. The MIME type that can be used in the Accept header to request
JSON-LD output is application/ld+json.
If you’re interested in this feature, please test it, and let us know if
you find any issues. The related ticket is this one
<https://phabricator.wikimedia.org/T207168>. If everything goes as planned,
we will enable it to wikidata.org on October 31st.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello all,
This is an important announcement for all the tool builders and maintainers
who access Wikidata’s data by *querying directly Labs database replicas*.
In May-June 2019, the Wikidata development team will drop the wb_terms
table from the database in favor of a new optimized schema. Over years,
this table has become too big, causing various issues.
This change requires the tools using wb_terms to be updated. Developers and
maintainers will need to *adapt their code* to the new schema before the
migration starts and switch to the new code when the migration starts.
The migration will start on *May 29th*. On May 15th, a test system will be
available for you to test your code.
The table being used by plenty of external tools, we are setting up a
process to make sure that the change can be done together with the
developers and maintainers, without causing issues and broken tools. Most
of the documentation and updates will take place on Phabricator:
- In this Phabricator task <https://phabricator.wikimedia.org/T221764>,
you can find a description of the changes and the process, and you can ask
for more details or for help in the comments. This is also where updates
will be announced if necessary.
- On the Tool Builders Migration board
<https://phabricator.wikimedia.org/tag/wb_terms_-_tool_builders_migration>
you will find all the details about the migration, how to update your
tool <https://phabricator.wikimedia.org/T221765>, and you can add your
own tasks.
- If you need to discuss with the Wikidata developers or get more
specific help, we set up two dedicated IRC meetings and a session at the
Wikimedia hackathon. More information in this task
<https://phabricator.wikimedia.org/T221764>.
We are aware that this change will ask you to make some important changes
in your code, and we are willing to help you as much as our resources allow
us to. We hope that you will understand that this change is made to avoid
bigger issues in the near future.
Note that this change is not impacting Wikibase instances outside of
Wikidata. A dedicated migration plan and announcement will follow.
We strongly encourage you to not wait until last minute to make the changes
in your code. If you have any question or issue, we will be happy to help.
In order to keep the discussions in one place, please ask questions or
raise issues directly in the Phabricator task and board.
Thanks for your understanding,
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello all,
This change is relevant for everyone using the *wbeditentity* endpoint of
Wikidata’s API.
While working on editing the termbox from mobile, we discovered a bug in
our code of the wbeditentity endpoint, that does not conform with the
implicit interpretation of the documentation
<https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/docs/change-…>.
A request including {"aliases":{"en":[]}} should, according to the implicit
interpretation of its documentation, replace all aliases in English by an
empty string, meaning removing all aliases. However, at the moment this
action is not actually performed, meaning that this request would leave the
aliases untouched.
We want to fix this bug, because we need this request to work in order to
be able to remove all aliases also in the new termbox on mobile. We are
treating this bug fix as a breaking change because the documentation was
ambiguous, and there may be some tools currently sending requests with
empty alias arrays when nothing need to be touched, intentionally or not.
If you are maintaining a tool, please *inspect your tool usage of
wbeditentity endpoint*, and make sure that no calls with empty alias arrays
are sent unless the intention is to remove these aliases.
According to our breaking change policy, this bug fix will be first
deployed on beta.wikidata.org later on May 28th, then on wikidata.org on *June
12th*.
If you have any question or issue, feel free to discuss in the related
ticket <https://phabricator.wikimedia.org/T203337>.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello all,
After several months of development and testing together with the WikiProject
ShEx <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx>, Shape
Expressions are about to be enabled on Wikidata.
*First of all, what are Shape Expressions?*
ShEx (Q29377880) <https://www.wikidata.org/wiki/Q29377880> is a concise,
formal modeling and validation language for RDF structures. Shape
Expressions can be used to define shapes within the RDF graph. In the case
of Wikidata, this would be sets of properties, qualifiers and references
that describe the domain being modeled.
See also:
- a short video about ShEx <https://www.youtube.com/watch?v=AR75KhEoRKg>
made by community members during the Wikimedia hackathon 2019
- introduction to ShEx <http://shex.io/shex-primer/>
- more details about the language <http://shex.io/shex-semantics/>
*What can it be used for?*
On Wikidata, the main goal of Shape Expressions would be to describe what
the basic structure of an item would be. For example, for a human, we
probably want to have a date of birth, a place of birth, and many other
important statements. But we would also like to make sure that if a
statement with the property “children” exists, the value(s) of this
property should be humans as well. Schemas will describe in detail what is
expected in the structure of items, statements and values of these
statements.
Once Schemas are created for various types of items, it is possible to test
some existing items against the Schema, and highlight possible errors or
lack of information. Subsets of the Wikidata graph can be tested to see
whether or not they conform to a specific shape through the use of
validation tools. Therefore, Schemas will be very useful to help the
editors improving the data quality. We imagine this to be especially useful
for wiki projects to more easily discuss and ensure the modeling of items
in their domain. In the spirit of Wikidata not restricting the world, Shape
Expressions are a tool to highlight, not prevent, errors.
On top of this, one could imagine other uses of Schemas in the future, for
example building a tool that would suggest, when creating a new item, what
would be the basic structure for this item, and helping adding statements
or values. A bit like this existing tool, Cradle
<https://tools.wmflabs.org/wikidata-todo/cradle/#/>, that is currently not
based on ShEx.
*What is going to change on Wikidata?*
- A new extension will be added to Wikidata: EntitySchema
<https://www.mediawiki.org/wiki/Extension:EntitySchema>, defining the
Schema namespace and its behavior as well as special pages related to it.
- A new entity type, EntitySchema, will be enabled to store Shape
Expressions. Schemas will be identified with the letter E.
- The Schemas will have multilingual labels, descriptions and aliases
(quite similar to the termbox on Items), and the schema text one can fill
with a syntax called ShEx Compact Syntax (ShExC)
<http://shex.io/shex-semantics/#shexc>. You can see an example here
<https://wikidata-shex.wmflabs.org/wiki/EntitySchema:E2>.
- The external tool shex-simple
<https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/she…>
is directly linked from the Schema pages in order to check entities of your
choice against the schema.
*When is this happening?*
Schemas will be enabled on on test.wikidata.org on May 21st and on
wikidata.org on May 28th. After this release, they will be integrated to
the regular maintenance just like the rest of Wikidata’s features.
*How can you help?*
- Before the release, you can try to edit or create Shape Expressions on
our test system <https://wikidata-shex.wmflabs.org/wiki/Main_Page>
- If you find any issue or feature you’d like to have, feel free to
create a new task on Phabricator with the tag shape-expressions
- Once Schemas are enabled, you can discuss about it on your favorite
wikiprojects: for example, what types of items would you like to model?
- You can also get more information about how to create a Schema
<https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx/How_to_get_started%…>
*See also: *
- Main Phabricator board
<https://phabricator.wikimedia.org/tag/shape_expressions/>
- Technical documentation of the extension
<https://meta.wikimedia.org/wiki/Extension:EntitySchema>
- To enhance the interface, you can use this user script
<https://www.wikidata.org/wiki/User:Zvpunry/EntitySchemaHighlighter.js>
to highlight items and properties in the schema code and turn the IDs into
links
If you have any questions, feel free to reach me. Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Reminder: Technical Advice IRC meeting this week **Wednesday 3-4 pm UTC**
on #wikimedia-tech.
The Technical Advice IRC Meeting is a weekly support event for volunteer
developers. Every Wednesday, two full-time developers are available to help
you with all your questions about Mediawiki, gadgets, tools and more! This
can be anything from "how to get started" over "who would be the best
contact for X" to specific questions on your project.
If you know already what you would like to discuss or ask, please add your
topic to the next meeting:
https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
Hope to see you there!
Michi (for the Technical Advice IRC Meeting crew)
--
Michael F. Schönitzer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
https://wikimedia.de
Unsere Vision ist eine Welt, in der alle Menschen am Wissens der Menschheit
teilhaben, es nutzen und mehren können. Helfen Sie uns dabei!
https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Reminder: Technical Advice IRC meeting this week **Today 3-4 pm UTC** on
#wikimedia-tech.
Question can be asked in English, Arabic & Spanish!
Today and next week we have the special Topic "Preparing Wikidata Tools for
the wb_terms migration". If you have any tool that might be affected by
this change feel free to join and ask!
Other topics are welcome too.
The Technical Advice IRC Meeting is a weekly support event for volunteer
developers. Every Wednesday, two full-time developers are available to help
you with all your questions about Mediawiki, gadgets, tools and more! This
can be anything from "how to get started" over "who would be the best
contact for X" to specific questions on your project.
If you know already what you would like to discuss or ask, please add your
topic to the next meeting:
https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
Hope to see you there!
Michi (for the Technical Advice IRC Meeting crew)
--
Michael F. Schönitzer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
https://wikimedia.de
Unsere Vision ist eine Welt, in der alle Menschen am Wissens der Menschheit
teilhaben, es nutzen und mehren können. Helfen Sie uns dabei!
https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Reminder: Technical Advice IRC meeting this week **Wednesday 3-4 pm UTC**
on #wikimedia-tech.
Question can be asked in English, German & Hungarian!
The Technical Advice IRC Meeting is a weekly support event for volunteer
developers. Every Wednesday, two full-time developers are available to help
you with all your questions about Mediawiki, gadgets, tools and more! This
can be anything from "how to get started" over "who would be the best
contact for X" to specific questions on your project.
If you know already what you would like to discuss or ask, please add your
topic to the next meeting:
https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
Hope to see you there!
Michi (for the Technical Advice IRC Meeting crew)
--
Michael F. Schönitzer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
https://wikimedia.de
Unsere Vision ist eine Welt, in der alle Menschen am Wissens der Menschheit
teilhaben, es nutzen und mehren können. Helfen Sie uns dabei!
https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
This is really a defective redesign. It reintroduced numeric IDs to be removed by T114902. See also T179928. We should reconsider reintroduce a new table to link unperfixed and perfixed entity ID.
Also oppose any "partial migration for first XXX items" process in T221765: this makes queries much more complicated. Please first fill all data in the new schema, then discontinue the old table.
Hi!
I started looking into how to produce RDF dump of MediaInfo entities,
and I've encountered some roadblocks that I am not sure how to get
around. Would like to hear suggestions on this, here or on phabricator
directly, or on IRC:
1. https://phabricator.wikimedia.org/T222299
Basically right now when we are enumerating entities for certain types,
we are just looking at pages from namespace related to entity types and
assume page title is parseable directly into entity ID. However, with
slot entities like MediaInfo it is not the case. So, we need there
generic service that would take a page and set of entity types, and
figure out:
a. Which of those entity types are "regular" entities with dedicated
page IDs and which ones live in slots
b. For the regular entities, do $this->entityIdParser->parse(
$row->page_title ) as before
c. For slot entities, check that the slot is present and if so, produce
entity ID specific to this slot. Preferably this is also done without
separate db access (may not be easy) since SqlEntityIdPager needs to
have good performance.
I am not sure whether there's an API that does that.
EntityByLinkedTitleLookup comes very close and even has a hook that does
the right thing, but it does DB access even for local IDs for Wikidata
(can be fixed) and does not support batching. Any other suggestions how
the above can be properly done?
There's also complication that pages to slots is no longer one-to-one,
so fetch() operation can return not only $limit but anywhere from 0 to
(number of slots)*$limit entity Ids. Probably not a huge deal but might
need some careful handling.
2. https://phabricator.wikimedia.org/T222306
The entities in SDC are not local entities - e.g. if I am looking at
https://commons.wikimedia.org/wiki/Special:EntityData/M9103972.json P180
and Q83043 do not come from Commons, they come from Wikidata. However,
they do not have prefixes, which means RDF builder thinks they are
local, and assigns them Commons-based namespaces, which is obviously
wrong, since they are Wikidata entities. While Commons has a bunch of
redirects set up, RDF identifies data by literal URL and has no idea
about redirects, so querying data would be problematic if Wikidata
datataset is combined with Commons dataset. It would, for example, make
it next to impossible to run federated queries between Wikidata and
Commons, as two stores would use different URIs for Wikidata entities.
Additionally, current RDF generation process assumes wd: prefix always
belongs to local wiki, so on Commons wd: is
<https://commons.wikimedia.org/entity/> but on Wikidata it's of course
the Wikidata URL. This may be very confusing to people. If wd: means
different things in Commons and Wikidata, then federated queries may be
confusing as it'd be unclear which wd: means what where. Ideally, we'd
not use wd: prefix for Commons at all, but this goes against the
assumption hardcoded in RdfVocabulary that local wiki entities are wd:.
So again I am not sure what's the best way to treat this situation,
since I am not sure how federation model in SDC is working - the code
suggests there should be some kinds of prefixes for entity IDs, but SDC
does not seem to use any.
Any suggestions about the above are welcome.
Thanks,
--
Stas Malyshev
smalyshev(a)wikimedia.org