Hello,
I am looking for the list of supported language codes for variations of
Chinese. So far in API responses I found these:
zh
zh-cn
zh-hans
zh-hant
zh-hk
zh-tw
zh-mo
zh-sg
"Configure" link in "more languages" section leads to this page:
https://www.wikidata.org/wiki/Help:Navigating_Wikidata/User_Options#Babel_e…
Which in turn refers to
https://meta.wikimedia.org/wiki/Table_of_Wikimedia_projects#Projects_per_la…
But apparently there is no such values as 'zh-cn', 'zh-nans' etc.
How can I get the COMPLETE list of language codes (desirably with
description) for Chinese that is supported by Wikidata?
Best regards,
Vlad
Hello all!
There has been a number of concerns raised about the performance and
scaling of Wikdata Query Service. We share those concerns and we are
doing our best to address them. Here is some info about what is going
on:
In an ideal world, WDQS should:
* scale in terms of data size
* scale in terms of number of edits
* have low update latency
* expose a SPARQL endpoint for queries
* allow anyone to run any queries on the public WDQS endpoint
* provide great query performance
* provide a high level of availability
Scaling graph databases is a "known hard problem", and we are reaching
a scale where there are no obvious easy solutions to address all the
above constraints. At this point, just "throwing hardware at the
problem" is not an option anymore. We need to go deeper into the
details and potentially make major changes to the current architecture.
Some scaling considerations are discussed in [1]. This is going to take
time.
Reasonably, addressing all of the above constraints is unlikely to
ever happen. Some of the constraints are non negotiable: if we can't
keep up with Wikidata in term of data size or number of edits, it does
not make sense to address query performance. On some constraints, we
will probably need to compromise.
For example, the update process is asynchronous. It is by nature
expected to lag. In the best case, this lag is measured in minutes,
but can climb to hours occasionally. This is a case of prioritizing
stability and correctness (ingesting all edits) over update latency.
And while we can work to reduce the maximum latency, this will still
be an asynchronous process and needs to be considered as such.
We currently have one Blazegraph expert working with us to address a
number of performance and stability issues. We
are planning to hire an additional engineer to help us support the
service in the long term. You can follow our current work in phabricator [2].
If anyone has experience with scaling large graph databases, please
reach out to us, we're always happy to share ideas!
Thanks all for your patience!
Guillaume
[1] https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
[2] https://phabricator.wikimedia.org/project/view/1239/
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST
Hello all,
Two things:
1) On Thursday June 13th at 18:00 UTC (11am Pacific) there will be an
open office hours for those of you who would like to share your thoughts
on the event; topics you'd like to see discussed there, decisions you'd
like made, etc.
It will occur using Google Meet, at this url:
https://meet.google.com/exz-zxfy-nuj
If you can't make it to this office hours, don't fret! You can always
(continue to) share your thoughts on the Phabricator task:
https://phabricator.wikimedia.org/T220212
2) REMINDER: The deadline for participant/attendee nominations is Monday
June 17th, this coming Monday. Remember, you can nominate others or
yourself. And you can fill out the form as many times as you have
nominations.
Form: https://forms.gle/CLeGFSMiEasJgEU27
FAQ: https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ
This survey is conducted via a third-party service, which may make it
subject to additional terms. For more information on privacy and
data-handling, see this survey privacy statement:
https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey…
On behalf of the Technical Conference Program Committee,
Greg
On Wed, May 29, 2019 at 04:39:37PM -0700, Greg Grossmeier wrote:
> Hello all,
>
> As you may have seen, the next Wikimedia Technical Conference[0] is
> coming up in November 2019.
>
> It will take place November 12-15th in Atlanta, GA (USA). As announced
> at the Hackathon and documented on-wiki[1] this year's event will
> focus on the topic of "Developer Productivity".
>
> Like last year, we are looking for diverse stakeholders, perspectives,
> and experiences that will help us to make informed decisions. We need
> people who can create and architect solutions, as well as those who
> will make funding and prioritization decisions for the projects.
>
> See the FAQ for (hopefully) any questions you have:
> <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ>
>
> Please fill out the survey using this link to nominate yourself or someone
> else to attend: <https://forms.gle/CLeGFSMiEasJgEU27>
>
> This survey is conducted via a third-party service, which may make it
> subject to additional terms. For more information on privacy and
> data-handling, see this survey privacy statement:
> <https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey…>
>
> This nomination form will remain open between May 29 and June 17, 2018.
>
> If you have any questions, please post them on the event's talk page
> <https://www.mediawiki.org/wiki/Talk:Wikimedia_Technical_Conference/2019>.
>
> Thanks!
>
> Greg and the Technical Conference 2019 Program Committee
>
> [0] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019>
> [1] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019#Vision_S…>
>
> --
> Greg Grossmeier
> Release Team Manager
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| Release Team Manager A18D 1138 8E47 FAC8 1C7D |
Sorry for cross-posting!
Reminder: Technical Advice IRC meeting this week **Wednesday 3-4 pm UTC**
on #wikimedia-tech.
Questions can be asked in English and Persian!
The Technical Advice IRC Meeting (TAIM) is a weekly support event for
volunteer developers. Every Wednesday, two full-time developers are
available to help you with all your questions about Mediawiki, gadgets,
tools and more! This can be anything from "how to get started" over "who
would be the best contact for X" to specific questions on your project.
If you know already what you would like to discuss or ask, please add your
topic to the next meeting:
https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
Hope to see you there!
--
Raz Shuty
Engineering Manager
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Imagine a world, in which every single human being can freely share in the
sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello all,
This change is relevant for everyone using the *wbeditentity* endpoint of
Wikidata’s API.
While working on editing the termbox from mobile, we discovered a bug in
our code of the wbeditentity endpoint, that does not conform with the
implicit interpretation of the documentation
<https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/docs/change-…>.
A request including {"aliases":{"en":[]}} should, according to the implicit
interpretation of its documentation, replace all aliases in English by an
empty string, meaning removing all aliases. However, at the moment this
action is not actually performed, meaning that this request would leave the
aliases untouched.
We want to fix this bug, because we need this request to work in order to
be able to remove all aliases also in the new termbox on mobile. We are
treating this bug fix as a breaking change because the documentation was
ambiguous, and there may be some tools currently sending requests with
empty alias arrays when nothing need to be touched, intentionally or not.
If you are maintaining a tool, please *inspect your tool usage of
wbeditentity endpoint*, and make sure that no calls with empty alias arrays
are sent unless the intention is to remove these aliases.
According to our breaking change policy, this bug fix will be first
deployed on beta.wikidata.org later on May 28th, then on wikidata.org on *June
12th*.
If you have any question or issue, feel free to discuss in the related
ticket <https://phabricator.wikimedia.org/T203337>.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
GerardM post triggered my interest to post to the mailing list. As you
might know I am working on functional quadstore that is quadstore that
keeps around old version of data, like a wiki but in direct-acyclic-graph.
It only stores differences between commits. It rely on snapshot of the
latest version for fast reads. My ultimate goal is to build somekind of
portable knowlege base. That is something like WikiBase + blazegraph but
that you spinup on regular machine with the press of button.
Enought brag about me. I wont't reply to all the message of the threads one
by one but:
Here is what SHOULD BE possible:
- incremental dumps
- time traveling queries
- full dumps
- The federation of wikibase SHOULD BE possible since it stored in a
history like GIT and git pull git push are planned in the ROADMAP
And online edition of the quadstore.
Access Control List are not designed yet, I except that this should be
enforced by the application layer.
I planned start working on Data Management System (something like CKAN)
with search featrure. But I would gadly work with wikimedia instead.
Also, given it modeled after git, one can do merge-request like features,
ie. exist the massive import that is crippled.
What I would need is logs possibly with timing of queries (read and write)
to do benchmarks.
Maybe I should ask for fund at mediawiki?
FWIW, I got 2 times faster than blazegraph on microbenchmark.
> Hoi,
> Wikidata grows like mad. This is something we all experience in the really
> bad response times we are suffering. It is so bad that people are asked
> what kind of updates they are running because it makes a difference in the
> lag times there are.
>
> Given that Wikidata is growing like a weed, it follows that there are two
> issues. Technical - what is the maximum that the current approach supports
> - how long will this last us. Fundamental - what funding is available to
> sustain Wikidata.
>
> For the financial guys, growth like Wikidata is experiencing is not
> something you can reliably forecast. As an organisation we have more money
> than we need to spend, so there is no credible reason to be stingy.
>
> For the technical guys, consider our growth and plan for at least one
> year. When the impression exists that the current architecture will not
> scale beyond two years, start a project to future proof Wikidata.
>
> It will grow and the situation will get worse before it gets better.
> Thanks,
> GerardM
>
> PS I know about phabricator tickets, they do not give the answers to the
> questions we need to address.
>
Dear Sir,
I thank you for your efforts. When dealing with biomedical taxonomic statements jn Wikidata, we found similar deficiencies. I have already decided to write a paper about the biomedical taxonomy of Wikidata and how to adjust it. I will be honoured if you can be the first author of the work. You have already extracted the taxonomic statements. So, you can easily filter the biomedical ones. This work has already been done for other taxonomies such as SNOMED-CT (https://scholar.google.ca/citations?user=UsG8QFwAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=c4LlYxsAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=jVLGHGQAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=fBAvwi4AAAAJ&hl=fr&oi=sra). I will be available online for further discussion if you agree to work with our team. This will be simple.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
____________________
+21629499418
-------- Message d'origine --------
De : Gabriel Altay <gabriel.altay(a)gmail.com>
Date : 2019/06/15 23:05 (GMT+01:00)
À : Discussion list for the Wikidata project <wikidata(a)lists.wikimedia.org>
Objet : Re: [Wikidata] instance of, subclass of, oh my
Thanks Jan, I will pursue the badminton discussion on the talk page.
On Sat, Jun 15, 2019 at 5:49 PM Jan Ainali <jan(a)aina.li<mailto:jan@aina.li>> wrote:
Hello Gabriel,
I agree with you about the badminton tournaments, that seems odd. It appears to already be a discussion about that on the talk page of the only participant in the badminton project: https://www.wikidata.org/wiki/User_talk:Florentyna#subclass_of:_badminton_t…
Perhaps it is best to continue the discussion there?
/Jan Ainali
http://ainali.com<http://ainali.com/>
Den lör 15 juni 2019 kl 23:11 skrev Gabriel Altay <gabriel.altay(a)gmail.com<mailto:gabriel.altay@gmail.com>>:
Hello everyone,
I was playing around with a recent wikidata dump and extracted the items that "looked" like classes based on the definition here,
https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Classes
Specifically, an item is a class-item if any of the following are true,
* the item is the value of a P31 ("instance of") statement
* the item has a P279 ("subclass of") statement (subclass)
* the item is the value of a P279 ("subclass of") statement (superclass)
Once I extracted all items that met these criteria (2,399,621 items from wikidata-20190603-all.json.bz2) I started examining the results. One of the things I found slightly surprising is that there are about 23k badminton events that are classes b/c they have "subclass of https://www.wikidata.org/wiki/Q13357858" statements. SPARQL query below.
https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0…
It also looks like there is a badminton project page,
https://www.wikidata.org/wiki/Category:WikiProject_Badmintonhttps://www.wikidata.org/wiki/Wikidata:WikiProject_Badminton/Subclass
I'd like to remove these statements as it seems that a particular instance of a badminton tournament
https://www.wikidata.org/wiki/Q121940
is not a class.
It seems that this pattern is also in place for about 1,000,000 items which are instance of gene (e.g. https://www.wikidata.org/wiki/Q40108).
I had a couple questions for the mailing list,
1) do folks know if there is an active group working on wikidata ontology
2) i've read a few messages about shape expressions. would it be worthwhile to setup a shape expression that prevents most items from having both "instance of" and "subclass of" statements?
3) if these entries are generated by bots, what is the best way to get in touch with the owner, their user talk page?
I am probably missing a lot of information about what has been done so far in the community, but I'm happy to read anything someone points me towards.
best,
-Gabriel
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org<mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org<mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
Hello everyone,
I was playing around with a recent wikidata dump and extracted the items
that "looked" like classes based on the definition here,
https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Classes
Specifically, an item is a class-item if any of the following are true,
* the item is the value of a P31 ("instance of") statement
* the item has a P279 ("subclass of") statement (subclass)
* the item is the value of a P279 ("subclass of") statement (superclass)
Once I extracted all items that met these criteria (2,399,621 items
from wikidata-20190603-all.json.bz2) I started examining the results. One
of the things I found slightly surprising is that there are about 23k
badminton events that are classes b/c they have "subclass of
https://www.wikidata.org/wiki/Q13357858" statements. SPARQL query below.
https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0…
It also looks like there is a badminton project page,
https://www.wikidata.org/wiki/Category:WikiProject_Badmintonhttps://www.wikidata.org/wiki/Wikidata:WikiProject_Badminton/Subclass
I'd like to remove these statements as it seems that a particular instance
of a badminton tournament
https://www.wikidata.org/wiki/Q121940
is not a class.
It seems that this pattern is also in place for about 1,000,000 items which
are instance of gene (e.g. https://www.wikidata.org/wiki/Q40108).
I had a couple questions for the mailing list,
1) do folks know if there is an active group working on wikidata ontology
2) i've read a few messages about shape expressions. would it be
worthwhile to setup a shape expression that prevents most items from having
both "instance of" and "subclass of" statements?
3) if these entries are generated by bots, what is the best way to get in
touch with the owner, their user talk page?
I am probably missing a lot of information about what has been done so far
in the community, but I'm happy to read anything someone points me towards.
best,
-Gabriel