Hello all,
Our team is refactoring some code around the change tags on Recent Changes.
This can impact people using the database on ToolForge.
Currently, the tags are stored in the table change_tag in the column ct_tag.
In the next days, we will add a column ct_tag_id with a unique identifier
for these tags. A new table change_tag_def that will store the tag id, the
message, and more information like how many times this tag is used on the
local wiki.
On the long term, we plan to drop the column ct_tag since the tag will be
identified with ct_tag_id.
This change will happen on:
- French Wikipedia: Monday July 2nd
- All other wikis: from July 9th
If there is any problem (trouble with saving edits, slow down of recent
changes…) please create a subtask of T185355
<https://phabricator.wikimedia.org/T185355> or contact Ladsgroup
<https://www.wikidata.org/wiki/User:Ladsgroup>.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
*This change impacts people running bots and semi-automated tools to edit
Wikidata.*
Hello all,
Based on the previous discussions that happened around the limitation set
up to fix the important dispatch lag on clients, we came with a new
solution to try.
The database behind Wikidata is replicated to several other database
servers. At each edit, the changes are replicated to these other servers.
There is always a short lag, which is usually less than a second. If this
lag is too high, the other databases can’t synchronize correctly, which can
cause problems for reading and editing Wikidata, or reusing data on other
projects.
If the lag is too high on too many servers, the master database stops
accepting new edits. When the lag is close to the limit, the system is
prioritizing “humans” edits and ignore the edits from bots, sending back an
error. This limit is set up by the maxlag option in the API.
People writing bots can set up a number as maxlag for their bot. The
default value is 5. This number is used to evaluate two things: the
replication lag between master database and replicas, and the size of the
job queue.
*On Tuesday, June 3rd, maxlag will also evaluate the dispatch lag between
Wikidata and clients (eg Wikipedias).*
The dispatch lag is the latency between an edit on Wikidata and the moment
when it’s shown on clients. Its median value is around 2 minutes.
*If you’re running a bot and using a standard configuration (maxlag=5),
when the median of dispatch lag is more than 300 seconds, your bot edits
won’t be saved and will return an error. *
If this change is impacting your work too much, please let us know by
letting a comment in this ticket <https://phabricator.wikimedia.org/T194950>.
This is also where you can ask any question. You can also change your
configuration in order to increase the maxlag limit.
More information: Wikidata dispatch Grafana board
<https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch?refresh=1m&org…>
Thanks for your constructive feedback,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hi!
I am working now on Lexeme fulltext search. One of the unclear moments I
have encountered is how to display Lexemes as search results. I am
basing on assumption that we want to match both Lemmas and Forms (please
tell me if I'm wrong). Having the match, I plan to display Lemma match
like this:
title (LN)
Synthetic description
e.g.
color/colour (L123)
English noun
Meaning, the first line with link would be standard lexeme link
generated by Lexeme code (which also deals with multiple lemmas) and the
description line is generated description of the Lexeme - just like in
completion search. The problem here, however, is since the link is
generated by the Lexeme code, which has no idea about search, we can not
properly highlight it. This can be solved with some trickery, probably,
e.g. to locate search matches inside generated string and highlight
them, but first I'd like to ensure this is the way it should be looking.
More tricky is displaying the Form (representation) match. I could
display here the same as above, but I feel this might be confusing.
Another option is to display Form data, e.g. for "colors":
color/colour (L123)
colors: plural for color (L123): English noun
The description line features matched Form's representation and
synthetic description for this form. Right now the matched part is not
highlighted - because it will otherwise always be highlighted, as it is
taken from the match itself, so I am not sure whether it should be or not.
So, does this display look as what we want to produce for Lexemes? Is
there something that needs to be changed or improved? Would like to hear
some feedback.
Thanks,
--
Stas Malyshev
smalyshev(a)wikimedia.org
Hello all,
Wikidata's wb_terms database table is replicated on toolforge, and people
can build scripts, tools, etc using these data.
As we're considering phasing out this database table, we want to understand
what and how data is used, so we can offer some reasonable replacement(s)
for the users.
If you are the author of a tool using wb_terms replica, or use replicas in
any other way, please provide some basic information on what are you using,
and how, preferably by adding a comment to this Phabricator task
<https://phabricator.wikimedia.org/T197161>.
Example:
- Usage: I have created a tool that finds Wikidata items that have
English label, but are missing a label in my native language
- What data I use: I query label data from wb_terms, and then process
results in my tool to find the gaps.
We don't set any hard deadline for this survey, but it would be great to
have some overview of existing usage of the replica by the end of June
2018. If you have any other questions that you think are related, do not
hesitate to ask on Phabricator.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello Wikicommunity,
What attributes are we seeking to output , just hyperlinks? The output
shall easily be In tabular format for further analysis.
~ Sikeyboi
On Thu, Jun 7, 2018, 3:22 PM <wikidata-tech-request(a)lists.wikimedia.org>
wrote:
> Send Wikidata-tech mailing list submissions to
> wikidata-tech(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
> or, via email, send a message with subject or body 'help' to
> wikidata-tech-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikidata-tech-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikidata-tech digest..."
>
>
> Today's Topics:
>
> 1. Query data type using WDQS (Thad Guidry)
> 2. Re: Query data type using WDQS (Lucas Werkmeister)
> 3. Re: Query data type using WDQS (Thad Guidry)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 7 Jun 2018 10:01:10 -0500
> From: Thad Guidry <thadguidry(a)gmail.com>
> To: Wikidata technical discussion <wikidata-tech(a)lists.wikimedia.org>
> Subject: [Wikidata-tech] Query data type using WDQS
> Message-ID:
> <CAChbWaPz-8bwyKbZF1pXfcB55=
> Mc3PDOcN_y3fRhMDH1H+39rw(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> 1. Ok, after 1 hour, I have given up trying to find some good documentation
> on Data Type querying with Sparql and asking for assistance from experts.
>
> Many properties have a Data Type locked in with URL as shown on this Wiki
> maintenance page:
> https://www.wikidata.org/wiki/Category:Properties_with_url-datatype
>
> However, I don't quite understand how to get at the Data Type itself in
> Sparql ?
>
> https://query.wikidata.org/#%23Subproperties%20of%20URL%20%28P2699%29%0ASEL…
>
> 2. Should equivalent property <
> https://www.wikidata.org/wiki/Property:P1628>
> and equivalent class <https://www.wikidata.org/wiki/Property:P1709> both
> be
> a subproperty of URL (I have applied this assumption to both, but wondering
> if I might have missed some other way of expressing that or if not needed
> since the Data Type is set, but having difficulty querying that in 1.)
>
> I'm looking to query subjects having a statement where the property is a
> Data Type = URL and filtered by contains("world") (don't ask why, hahaha)
>
> Thanks in advance for direction and help,
> -Thad
>
1. Ok, after 1 hour, I have given up trying to find some good documentation
on Data Type querying with Sparql and asking for assistance from experts.
Many properties have a Data Type locked in with URL as shown on this Wiki
maintenance page:
https://www.wikidata.org/wiki/Category:Properties_with_url-datatype
However, I don't quite understand how to get at the Data Type itself in
Sparql ?
https://query.wikidata.org/#%23Subproperties%20of%20URL%20%28P2699%29%0ASEL…
2. Should equivalent property <https://www.wikidata.org/wiki/Property:P1628>
and equivalent class <https://www.wikidata.org/wiki/Property:P1709> both be
a subproperty of URL (I have applied this assumption to both, but wondering
if I might have missed some other way of expressing that or if not needed
since the Data Type is set, but having difficulty querying that in 1.)
I'm looking to query subjects having a statement where the property is a
Data Type = URL and filtered by contains("world") (don't ask why, hahaha)
Thanks in advance for direction and help,
-Thad