Hi all,
The execution plan for this query [1] indicates that it will be entirely
based on existing indexes. Yet in practice, the query takes a long time to
run. Why is that?
Also, interestingly, the execution plan for this modified version [2] is
identical, but it runs almost instantaneously.
Lastly, I expect that last query to return no results (because the NOT
EXISTS condition should not be met) but it does return a row. Why is that?
Thanks!
Huji
[1]
https://tools.wmflabs.org/sql-optimizer?use=enwiki_p&sql=use+fawiki_p%3B%0D…
[2]
https://tools.wmflabs.org/sql-optimizer?use=fawiki_p&sql=select%0D%0A++p1.p…
Cross-posting from the mediawiki-api-announce(a)lists.wikimedia.org list.
---------- Forwarded message ---------
From: Brad Jorsch (Anomie) <bjorsch(a)wikimedia.org>
Date: Fri, Jun 21, 2019 at 8:30 AM
Subject: [Mediawiki-api-announce] BREAKING CHANGE: Improved timestamp support
To: <mediawiki-api-announce(a)lists.wikimedia.org>
An upgrade to the timestamp library used by MediaWiki is resulting in
two changes to the handling of timestamp inputs to the action API.
There will be no change to timestamps output by the API.
All of these changes should be deployed to Wikimedia wikis with 1.34.0-wmf.10.
Historically MediaWiki has ignored timezones in supported formats that
include timestamps, treating them as if the timezone specified were
UTC. In the future, specified timezones will be honored (and converted
to UTC).
Historically some invalid formats were accepted, such as
"2019-05-22T12:00:00.....1257" or "Wed, 22 May 2019 12:00:00 A
potato". Due to improved validation, these will no longer be accepted.
Support for ISO 8601 and other formats has also been improved. See
https://www.mediawiki.org/wiki/Timestamp for details on the formats
that will be supported.
_______________________________________________
Mediawiki-api-announce mailing list
Mediawiki-api-announce(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
I files a bug report on toolforge:
https://phabricator.wikimedia.org/T226088
On Wed, Jun 19, 2019 at 9:29 AM <boghog(a)me.com> wrote:
> Thanks for your replies.
>
> I contacted the NIH to register the citation-template-filling tool and my
> e-mail address and added both as parameters to the query url (i.e.,
> “&email=my@email“ and “&tool=ciation-filling-tool") to the PubMedLite.pm
> module that citation-template-filling tool calls. They also assured me that
> “tools.wmflabs.org” has not been blocked from accessing the NIH site.
>
> I have done a few more tests:
>
> curl '
> https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12345
> '
> curl: (6) Could not resolve host: eutils.ncbi.nlm.nih.gov
>
> does not work on the tool server, but does work on my local workstation.
> The following
>
> curl -k '
> https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12345
> '
> curl: (6) Could not resolve host: eutils.ncbi.nlm.nih.gov
>
> also does not work. But
>
> curl -k '
> https://130.14.29.110/entrez/eutils/efetch.fcgi?db=pubmed&id=12345'
>
> works.
>
> I am not sure what to do next. Is the problem on the “tools.wmflabs.org”
> or “eutils.ncbi.nlm.nih.gov” side? Also what precisely is the problem, so
> that I can accurately describe the issue to server administrators?
>
> Thanks,
>
> Boghog
>
> On Jun 14, 2019, at 11:07 PM, Magnus Manske via Cloud <
> cloud(a)lists.wikimedia.org> wrote:
>
> FWIW, I am running a service that queries eutils.
>
> They have a rate limitation, but I do stay under that, AFAICT. Maybe
> multiple bots from the same host, together, triggered a rate limiting
> mechanism?
>
> On Fri, Jun 14, 2019 at 5:56 PM Alex Monk <krenair(a)gmail.com> wrote:
>
>> I looked at `dig eutils.ncbi.nlm.nih.gov +trace` and at `strace -f dig
>> eutils.ncbi.nlm.nih.gov +trace` - it looks to me like NIH's nameservers
>> are not willing to serve labs?
>>
>>
>> On Fri, 14 Jun 2019 at 05:59, Konrad Koehler via Cloud <
>> cloud(a)lists.wikimedia.org> wrote:
>>
>>> The following tool has been running without problem for years:
>>>
>>> https://tools.wmflabs.org/citation-template-filling/cgi-bin/index.cgi
>>>
>>> Recently the tool has started generating the following error message:
>>>
>>> Can't call method "findnodes" on an undefined value at
>>> /data/project/citation-template-filling/perl/ActivePerl-5.26/site/lib/WWW/Search/PubMedLite.pm
>>> line 117.
>>>
>>> PubMedLite.pm in turn makes a request to “'
>>> https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi” for data.
>>> For debugging, I queried the entrez server using an equivalent command in
>>> curl which generated the following error messages:
>>>
>>> curl '
>>> https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=1234…
>>> '
>>>
>>> curl: (6) Could not resolve host: eutils.ncbi.nlm.nih.gov
>>>
>>> I also tried replacing “eutils.ncbi.nlm.nih.gov” with its IP address:
>>>
>>> curl 'https://130.14.29.110/entrez/eutils/efetch.fcgi?db=pubmed&id=12345
>>> '
>>>
>>> curl: (51) SSL: no alternative certificate subject name matches target
>>> host name '130.14.29.110'
>>>
>>> The same curl command runs without problem on my local workstation:
>>>
>>> curl '
>>> https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=1234…
>>> ’
>>>
>>> <!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
>>> January 2019//EN" "
>>> https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd"> …
>>>
>>> Anyone have an idea how to fix this? Thanks,
>>>
>>> Boghog
>>> _______________________________________________
>>> Wikimedia Cloud Services mailing list
>>> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
>
>
>
Hello all,
Two things:
1) On Thursday June 13th at 18:00 UTC (11am Pacific) there will be an
open office hours for those of you who would like to share your thoughts
on the event; topics you'd like to see discussed there, decisions you'd
like made, etc.
It will occur using Google Meet, at this url:
https://meet.google.com/exz-zxfy-nuj
If you can't make it to this office hours, don't fret! You can always
(continue to) share your thoughts on the Phabricator task:
https://phabricator.wikimedia.org/T220212
2) REMINDER: The deadline for participant/attendee nominations is Monday
June 17th, this coming Monday. Remember, you can nominate others or
yourself. And you can fill out the form as many times as you have
nominations.
Form: https://forms.gle/CLeGFSMiEasJgEU27
FAQ: https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ
This survey is conducted via a third-party service, which may make it
subject to additional terms. For more information on privacy and
data-handling, see this survey privacy statement:
https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey…
On behalf of the Technical Conference Program Committee,
Greg
On Wed, May 29, 2019 at 04:39:37PM -0700, Greg Grossmeier wrote:
> Hello all,
>
> As you may have seen, the next Wikimedia Technical Conference[0] is
> coming up in November 2019.
>
> It will take place November 12-15th in Atlanta, GA (USA). As announced
> at the Hackathon and documented on-wiki[1] this year's event will
> focus on the topic of "Developer Productivity".
>
> Like last year, we are looking for diverse stakeholders, perspectives,
> and experiences that will help us to make informed decisions. We need
> people who can create and architect solutions, as well as those who
> will make funding and prioritization decisions for the projects.
>
> See the FAQ for (hopefully) any questions you have:
> <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ>
>
> Please fill out the survey using this link to nominate yourself or someone
> else to attend: <https://forms.gle/CLeGFSMiEasJgEU27>
>
> This survey is conducted via a third-party service, which may make it
> subject to additional terms. For more information on privacy and
> data-handling, see this survey privacy statement:
> <https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey…>
>
> This nomination form will remain open between May 29 and June 17, 2018.
>
> If you have any questions, please post them on the event's talk page
> <https://www.mediawiki.org/wiki/Talk:Wikimedia_Technical_Conference/2019>.
>
> Thanks!
>
> Greg and the Technical Conference 2019 Program Committee
>
> [0] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019>
> [1] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019#Vision_S…>
>
> --
> Greg Grossmeier
> Release Team Manager
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| Release Team Manager A18D 1138 8E47 FAC8 1C7D |
The actor and comment views on the wiki replicas are slowed by a need
to make subqueries against 8 other tables in order to determine
which rows should and should not be visible on the replica service. With
recent changes to the replica view schema, this problem has become much more visible.
The WMCS team has deployed a set of specialized views of these two tables that
will allow individual queries to only be slowed by a single subquery against
a related target, eg. a query for an actor mentioned in the log_actor field of
the logging table could be made against actor_logging, which will only check against
logic in the actor table--not 7 other tables that aren't related to the query.
On the flip side the actor_logging view will only have rows that are exposed
in the logging table.
For more information, see: https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_…
If other documentation about the Wiki Replicas on wikitech needs updating related to this change, we would
like your help finding it! Please let us know on IRC, phab task, email or on wiki if you find things that need
updating related to the actor and comment tables. A Phabricator task is already open to update the MediaWiki
documentation related (https://phabricator.wikimedia.org/T225007), but it is likely that there are bits around wikitech
to update as well.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hello,
Some Toolforge SQL queries have much worse performance when updated for the
new comment table.
For example, see one previous SQL query
<https://tools.wmflabs.org/sql-optimizer?use=enwiki_p&sql=SELECT%0D%0A++++++…>
and its updated version
<https://tools.wmflabs.org/sql-optimizer?use=enwiki_p&sql=SELECT%0D%0A++++++…>
which just adds a join on the comment table. The change adds ten dependent
subqueries and increases Stalktoy load times from ≈15 seconds to ≈245
seconds, and the same change for another query was enough to have my tools
temporarily rate-limited <https://phabricator.wikimedia.org/T217853>.
I haven't found a way to update those queries efficiently. Is there an
optimisation I'm missing? Why does the comment view need subqueries on ten
other tables, and is there an alternate version without the subqueries for
cases where we just need to join by ID?
--
Jesse Plamondon-Willard (Pathoschild)
Hello,
What can be done to decrease the lagging on web.db.svc.eqiad.wmflabs? Tools working on this are severely affected ands up with outdated results.
Regards.
Hi all,
I have a query that used to do a GROUP BY on rev_user and now I have
migrated it to do so using rev_actor instead. The latest version looks like
this <https://pastebin.com/jWTzsiJY> and its performance is not good
<https://tools.wmflabs.org/sql-optimizer?use=fawiki_p&sql=SELECT%0D%0A++acto…>
at all. In fact, I left it running for fawiki for several hours and it did
not generate any results (it used to generate results in about 30-40
minutes). Also, the execution plan that is shown in the second link
involves some tables like filearchive, image and oldimage that are not part
of the query.
Is this a side effect of the revision_actor_temp table?
Also, what is that "key0" about, which is not associated with any tables?
Finally, any advice on how to make it more efficient is highly appreciated.
Thanks!
Huji
Similar to the earlier removal of text fields from the wiki replicas for comment storage refactors in Mediawiki, we are going to remove “user text” columns from the views that are deprecated in the Mediawiki schema to prepare for when they will actually be removed upstream. The column drops are tracked and explained here https://phabricator.wikimedia.org/T223406 <https://phabricator.wikimedia.org/T223406>. The tables with names such as <tablename>_compat will not see a difference in structure. The change is scheduled for Monday, May 27th.
The fields that are dropping from the views are:
revision: rev_user and rev_user_text.
archive: ar_user and ar_user_text.
ipblocks: ipb_by and ipb_by_text.
image: img_user and img_user_text.
oldimage: oi_user and oi_user_text.
filearchive: fa_user and fa_user_text.
recentchanges: rc_user and rc_user_text.
logging: log_user and log_user_text.
Ideally, tools that connect to the replicas should gather the information from the appropriate entries in the actor table instead, again, this is similar to the change for the comment table. The data is already there for you to start using. The alternative is to try using the related <tablename>_compat table, which won’t be changing in a user-visible way at this time.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce