Thanks for sharing.
This gives nice analysis from data to insights - how do we drive actions
from this report?
Do we plan to use this data to make better tools?
For example have a common pitfalls and how to avoid them: searching for
library of congress links with regex search instead of external links query
(
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bextlinks )
(and similar for iwlinks for interwiki links)
This can be even actively pushed to tools (either using User-Agent to
contact the tool devs, or using warnings in the API result)
On Wed, May 30, 2018 at 11:51 PM, Trey Jones <tjones(a)wikimedia.org> wrote:
Hey everyone,
As part of T195491 <https://phabricator.wikimedia.org/T195491>, Erik has
been looking into the details of our regex processing and ways to handle
ridiculously long-running regex queries. He pulled all the regex queries
over the last 90 days to get a sense of what features people are using and
what impact certain changes he was considering would have on users. Turns
out there are a lot more users than I would have thought—which is good
news! And a lot of them look like bots.
He also made the mistake of pointing me to the data and highlighting a
common pattern—searches for interwiki links. I couldn't help myself—I
started digging around found that the majority of the searches are looking
for those interwiki links, and the vast majority of regex searches fall
into three types—interwiki links, URLs, and Library of Congress collection
IDs.
Overall, there are 5,613,506 regexes total across all projects and all
languages, over a 90-day period. That comes out to ~62K/day—which is a lot
more than I'd expected, though I hadn't thought about bots using regexes.
Read more on MediaWiki
<https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Regular_Expression_Searches>
.
—Trey
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
_______________________________________________
Discovery mailing list
Discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery