(If you don’t work with links tables such as templatelinks, pagelinks and
so on, feel free to ignore this message)
TLDR: The schema of links tables (starting with templatelinks) will change
to have numeric id pointing to linktarget table instead of repeating
namespace and title.
The current schema and storage of most links tables are: page id (the
source), namespace id of the target link and title of the target. For
example, if a page with id of 1 uses Template:Foo, the row in the database
would be 1, 6, and Foo (Template namespace has id of 6)
Repeating the target’s title is not sustainable, for example more than half
of Wikimedia Commons database is just three links tables. The sheer size of
these tables makes a considerable portion of all queries slower, backups
and dumps taking longer and taking much more space than needed due to
unnecessary duplication. In Wikimedia Commons, on average a title is
duplicated around 100 times for templatelinks and around 20 times for
pagelinks. The numbers for other wikis depend on the usage patterns.
Moving forward, these tables will be normalized, meaning a typical row will
hold mapping of page id to linktarget id instead. Linktarget is a new table
deployed in production and contains immutable records of namespace id and
string. The major differences between page and linktarget tables are: 1-
linktarget values won’t change (unlike page records that change with page
move) 2- linktarget values can point to non-existent pages (=red links).
The first table being done is templatelinks, then pagelinks, imagelinks and
categorylinks will follow. During the migration phase both values will be
accessible but we will turn off writing to the old columns once the values
are backfilled and switched to be read from the new schema. We will
announce any major changes beforehand but this is to let you know these
changes are coming.
While the normalization of all links tables will take several years to
finish, templatelinks will finish in the next few months and is the most
So if you:
… rely on the schema of these tables in cloud replicas, you will need to
change your tools.
… rely on dumps of these tables, you will need to change your scripts.
Currently, templatelinks writes to both data schemes for new rows in most
wikis. This week we will start backfilling the data with the new schema but
it will take months to finish in large wikis.
You can keep track of the general long-term work in
https://phabricator.wikimedia.org/T300222 and the specific work for
templatelinks in https://phabricator.wikimedia.org/T299417. You can also
read more on the reasoning in https://phabricator.wikimedia.org/T222224.
*Amir Sarabadani (he/him)*
Staff Database Architect
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi, this is a friendly reminder that we would love to hear from you
about your experience at last weekend's Hackathon.
Please fill in the survey until Sunday, May 29th and help us improve.
See the links below.
Thanks a lot in advance!
-------- Quoted Message --------
From: Haley Lepp
Date: Sun, 22 May 2022 11:13:39 -0700
On behalf of the 2022 Wikimedia Hackathon Committee, we would like to
thank you for coming to the Wikimedia Hackathon!
Please consider giving us feedback on the Hackathon and your
suggestions for improvement.
There are two ways to give feedback:
1. Fill out the Wikimedia Hackathon Survey
more information on privacy and data-handling, see the survey privacy
>. The survey will remain open until May 29, 2022.
2. If you would like to share feedback but do not wish to take the
Qualtrics survey, you can leave feedback on
the Etherpad <https://etherpad.wikimedia.org/p/Wikimedia_Hackathon_2022_Feedback
Finally, check out
the badges <https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2022/How_to#Joining_a_se…
> the committee made. You can put them on your userpages to show your
Thank you again for joining us! It was so much fun to meet everyone and
See you at the Wikimania Hackathon in August!
Haley, on behalf of the
2022 Wikimedia Hackathon Team
Andre Klapper (he/him) | Bugwrangler / Developer Advocate
Wikitech-ambassadors mailing list -- wikitech-ambassadors(a)lists.wikimedia.org
To unsubscribe send an email to wikitech-ambassadors-leave(a)lists.wikimedia.org
I want to set up a custom rsyslog config on my spi-tools VPS instance. I know what I want to end up with, but I'm trying to get it puppetized. It's not really clear what I need to do. Do I really need to set up my own standalone puppetmaster <https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster>? That seems like overkill.
I want to send data from a process running on toolforge to a VPS host. I tried the obvious:
On the VPS host (puppet-test.spi-tools.eqiad1.wikimedia.cloud):
> $ nc -4 -l -p 23001
> echo foo | nc -v -4 puppet-test.spi-tools.eqiad1.wikimedia.cloud 23001
> nc: connect to puppet-test.spi-tools.eqiad1.wikimedia.cloud port 23001 (tcp) failed: Connection timed out
I'm assuming I need to configure a security group in horizon to allow ingress on that port, is that correct?