Hi,
Myself and Erutuon are working on writing a Rust library[1] that
validates and normalizes MediaWiki page titles, initially for use in
bots and tools, but it has other potential use cases too.
There is some prior art for this, a mediawiki-title npm package[2] is
used by various nodejs services, and there's a mediawiki.Title
ResourceLoader module in core too, all of which reimplement MediaWiki's
title parsing, validation and normalization routines...basically
MediaWikiTitleCodec and what it calls.
One problem area is $wgLegalTitleChars. It's part of a PHP regex that
gets used to check if any invalid characters are present by inverting
it, basically [^{legalchars}]. It's also exposed via the meta=siteinfo
API, which suggests that it's useful for external callers, but in its
current form it's really not.
The current value is:
$wgLegalTitleChars = "
%!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\\x80-\\xFF+";
It's is a mix of literal characters, some escaped and some not, and
various ranges. Specifically, it's tuned for PHP regexes, like / is
escaped because it delimits regexes (an unnecessary escape and therefore
problematic in Rust), and it uses syntax like \x80-\xFF, while in
JavaScript we want \u0080-\uFFFF. We'd also like to use the Unicode
escape class syntax in Rust, but for now have worked around it to use
the byte class.
In fact, we have functions that parse the byte class and turn it into a
unicode class, Title::convertByteClassToUnicodeClass() in PHP and a
JavaScript version too[3]. This seems entirely unnecessary to me, given
that we could just write this sequence in a more regex-neutral manner to
make it more portable:
$wgLegalTitleCharacters = [
'characters' => " %!\"$&'()*,-./:;=?@\\\\^_`~",
// +
'plus' => true,
// A-Z, a-z, 0-9
'alphanumeric' => true,
// \x80-\xFF
'non-ascii' => true,
];
Characters are literal characters to run through preg_quote or
mw.util.escapeRegExp, and the ranges are to be specified in whatever
format the specific regex engine would like.
But this also opens up the question - is there a valid use case for
customizing $wgLegalTitleChars anymore? We already have a comment that
says "Don't change this unless you know what you're doing". There's also
one that says "In some rare cases you may wish to remove + for
compatibility with old links." - is that still a consideration today?
I would think that for easy importing/exporting across various MediaWiki
wikis we want the set of legal title characters to be rather static. And
if someone wants to ban some character from being used,
Extension:TitleBlacklist (bundled) provides a much less invasive way to
do so.
So to recap:
1. Can we get rid of the ability to customize legal title characters?
2. If #1 is no, any objections to the breaking change of swapping out
$wgLegalTitleChars (string) for $wgLegalTitleCharacters (array)?
Note that extensions can still read from the old global, just it
can't be overridden in LocalSettings anymore.
The actual patch to review is
<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/745386>.
[1] https://gitlab.com/mwbot-rs/mwbot/-/tree/master/mwtitle
[2] https://github.com/wikimedia/mediawiki-title/
[3] https://github.com/wikimedia/mediawiki-title/blob/master/lib/utils.js
Hello cloud-vps users!
It's time for our annual cleanup of unused projects and resources. Our
new developer advocate Komla Sapaty will be guiding this process; please
respond promptly to his emails and do your best to make him feel welcome!
Every year or so the Cloud Services team tries to identify and clean up
unused projects and VMs. We do this via an opt-in process: anyone can
mark a project as 'in use,' and that project will be preserved for
another year.
I've created a wiki page that lists all existing projects, here:
https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2021_Purge
If you are a VPS user, please visit that page and mark any projects that
you use as {{Used}}. Note that it's not necessary for you to be a
project admin to mark something -- if you know that you're currently
using a resource and want to keep using it, go ahead and mark it
accordingly. If you /are/ a project admin, please take a moment to mark
which VMs are or aren't used in your projects.
When February arrives, I will shut down and begin the process of
reclaiming resources from unused projects.
If you think you use a VPS project but aren't sure which, I encourage
you to poke around on https://tools.wmflabs.org/openstack-browser/ to
see what looks familiar. Worst case, just email
cloud(a)lists.wikimedia.org with a description of your use case and we'll
sort it out there.
Exclusive toolforge users are free to ignore this email and future
related things.
Thank you!
-Andrew and the WMCS team
Hey, I am Sai Teja Anantha, an Undergraduate student from IIT Kharagpur. I
am a newbie to open source contributions, and I am looking forward to
contributing to the Wikimedia foundation Organization.
I am looking for projects in this Organization. Where can I find the
projects currently that are active in this Organization, please let me know.
Thanks in Advance
[Apologies for cross-posting]
Hi all,
We're excited to announce the launch of the Wikimedia Research Fund
[1] with the goal of diversifying the network of Wikimedia researchers
globally and supporting the Wikimedia Movement in deeper understanding
of the projects, decision making, and building new
technologies.
The reason we're reaching out to you, as the developer community, is
that we believe one of the ways research can be impactful is through
its use in tools/code and as developers, you have knowledge of what
research can support you in your work. (Examples, OCR for specific
languages, better NLP models in your local language, other ML models
for specific tasks that you want your tools to offer, and more.)
The deadline to apply is January 3, 2022. We intend to give funds of
value USD 2k-50k.
More info at https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_%26_Tech…
.
Big thanks to Emily Lescak for all her behind-the-scenes work to make
the launch of the Research Fund possible, and to the Community
Resources team at the Wikimedia Foundation for giving us the
opportunity and the funds.
If you have questions, please reach out to us at
research_fund(a)wikimedia.org, meta [2], here, or in one of our upcoming
office hours [3]! :)
Best,
the Research Fund committee chairs
Benjamin Mako Hill (University of Washington)
Leila Zia (Wikimedia Foundation)
[1] https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_%26_Tech…
[2] https://meta.wikimedia.org/wiki/Grants_talk:Start
[3] https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_%26_Tech…
Hi all!
I know this is a past-due notice considering it's Wednesday. But I'm
cancelling the train for this week.
We are distracted by some internal changes, and I don't feel that we have
the capacity to devote attention to these internal changes and do the train
as well.
Sorry all :(
Tyler Cipriani (he/him)
Engineering Manager, Release Engineering
Wikimedia Foundation
Hello , Siddhi Bhanushali here. I am a third year computer engineering
student. I have participated in 5+ open source programs and gave a quality
contribution. Some of the Open source events are Hacktoberfest 2021,
Girlscript winter of contributing , lgmsoc etc. I know
html,css,js,Bootstrap,python, ejs node,express,sql,mongodb. Please guide
me how to start my contribution for gsoc 2022 and also I need guidance for
proposal writing.
Hey, I am Shreyaans Jain 2nd year student at IIT Kanpur and looking for
some projects to contribute to GSOC 2022 . I am a open source enthusiasts
and have taken part in may open source program like Hactoberfest 2021 (
recently completed) , SWOC2021. I like open source because its for everyone
, Anyone can learn here and contribute and the same your organization does
by providing free education.
Kindly help me to get started. Hope to hear from you soon.
I will be very thankful to you.