Wikitech-l September 2012

wikitech-l@lists.wikimedia.org

128 participants
114 discussions

MediaWiki 1.20 release candidate
by Mark A. Hershberger 28 Sep '12

28 Sep '12

Last week, I announced the MediaWiki 1.20 release candidate that I created on wikitech-l (http://lists.wikimedia.org/pipermail/wikitech-l/2012-September/063226.html shortened: http://hexm.de/lo). A few people have downloaded it and, so far, I don't know of any problem reports, but maybe that is because the people on wikitech-l are already using newer versions MediaWiki. Today I'd like to get more people looking at it, so I'm announcing it on mediawiki-l and mwusers.com. Download: http://mah.everybody.org/mediawiki-1.20.0beta0.tar.gz Release Notes: https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob_plain;f=R… shortened: http://hexm.de/ln GPG signatures: http://mah.everybody.org/mediawiki-1.20.0beta0.tar.gz.sig Public key: http://mah.everybody.org/contact-info http://pgp.mit.edu:11371/pks/lookup?op=vindex&search=0x7956EE477F901A30 (If there there is anyone with a PGP key in the Philadelphia/New York City area who can sign my key, let me know. I have to get some more signatures.)

6 13

Indexing sha1 hashes in mysql
by Asher Feldman 27 Sep '12

27 Sep '12

As we've increased our use of sha1 hashes to identify unique content over the past year, I occasionally see changesets or discussions about indexing sha1's in mysql. When indexing a text field, it's generally beneficial to define the smallest index that still uniquely matches a high percentage of rows. Search and insert performance both benefit from the space savings. As a cryptographic hash function, sha1 has a very high degree of uniformity. We can estimate the percent of partial index look-ups that will match a unique result just by comparing the size of the table to the space covered by the index. sha1 hashes are 160bits, which mediawiki stores in mysql with base36 encoding. base36(2^160) == "twj4yidkw7a8pn4g709kzmfoaol3x8g". Looking at enwiki.revision.rev_sha1, the smallest current value is 000002xi72hkkhn1nvfdeffgp7e1w3s and the largest, twj4yi9tgesxysgyi41bz16jdkwroha. The number of combinations covered by indexing the top bits represented by the left-most 4 thru 10 characters: sha1_index(4) = 1395184 (twj4) sha1_index(5) = 50226658 (twj4y) sha1_index(6) = 1808159706 (twj4yi) sha1_index(7) = 65093749429 (twj4yid) sha1_index(8) = 2343374979464 (twj4yidk) sha1_index(9) = 84361499260736 (twj4yidkw) sha1_index(10) = 3037013973386503 (twj4yidkw7) percentage of unique matches in a table of 2B sha1's: sha1_index(7) = 96.92% sha1_index(8) = 99.91% sha1_index(9) = 99.997% sha1_index(10) = 99.9999% percentage of unique matches in a table of 10B sha1's: sha1_index(8) = 99.573% sha1_index(9) = 99.988% sha1_index(10) = 99.9996% Given current table sizes and growth rates, an 8 character index on a sha1 column should be sufficient for years for many cases (i.e. media files outside of commons, revisions on projects outside of the top 10), while a 10 character index still provides >99.99% coverage of 100 billion sha1's. Caveat: The likely but rare worst case for a partial index is that we may have tables with hundreds of rows containing the same sha1, perhaps revisions of a page that had a crazy revert war. A lookup for that specific sha1 will have to do secondary lookups for each match, as would lookups of any other sha1 that happens to collide within the index space. If the index is large enough to make the later case quite unlikely, prudent use of caching can address the first. <tl;dr> Where an index is desired on a mysql column of base36 encoded sha1 hashes, I recommend ADD INDEX (sha1column(10)). Shorter indexes will be sufficient in many cases, but this is still provides a >2/3 space savings while covering a huge (2^51.43) space. -Asher

8 11

Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!
by Greg Sabino Mullane 27 Sep '12

27 Sep '12

Marcin Cieslak <saper(a)saper.info> writes: > From the PostgreSQL side I see one problem with nesting - we are already > using savepoints to emulate MySQL's "INSERT IGNORE" and friends.\ > It might be difficult to abuse that feature for something more than this. > There is a class "SavepointPostgres" which is used for that. Well, that is more of a stop-gap measure anyway. Long-term, we need to abstract out insert ignore properly into something that does a select and conditional insert, or a delete + insert, rather than rely on throwing, catching, and discarding an error. In an ideal world, the MySQL code would do that too instead of relying on insert ignore. -- Greg Sabino Mullane greg(a)endpoint.com End Point Corporation PGP Key: 0x14964AC8

1 0

Announcement: Mark Holmquist joins Wikimedia as Features Engineer
by Terry Chay 26 Sep '12

26 Sep '12

Hello everyone, It’s with great pleasure that I’m announcing that Mark Holmquist has joined the Wikimedia Foundation as a Feature Engineer. Mark is a free software advocate and software engineer. Before joining us, he was getting his B.S. in Computer Science at the University of Redlands. You may have remembered him from back when he was a student doing the WMF org chart before he graduated in June of this year, or as a contractor working on UploadWizard and assorted bug zapping. :-) He has contributed to many free software projects in the past, including Minetest, Etherpad, and OpenHatch. Recently he has worked on the EtherEditor extension for MediaWiki[0]. BTW, Erik will be wanting to suck your 20% time into finishing out the outstanding bugs. ;-) On the side, he enjoys playing Bridge[1] and the occasional jaunt for Parkour[2]. He lives in the Mission in foggy San Francisco. He also really enjoy reading and discussing court cases, either oral arguments or decisions. He's subscribed to the Cornell Supreme Court bulletin [3], so bug him about the cases that are in the news. :-) His first official day was on September 17th, but I flaked out because of the all-staff and then decided to procrastinate some more on this announcement since you already know him from all his participation on IRC and wikitech-l. Until December, he will be working with Gabriel Wicke on the Parsoid, which is the rewrite of the Wikitext parser to work with the VisualEditor proejct (I assume James and Trevor have completed the necessary hazing on the VE Team and he has gotten his t-shirt). But more generally, he'll be on the Features team with Editor Engagement and assorted front-end technical debt here at the WMF. Please join me in welcoming Mark Holmquist to the Wikimedia Foundation. :-) Take care, Terry [0] http://www.mediawiki.org/wiki/Extension:EtherEditor [1] http://en.wikipedia.org/wiki/Contract_bridge [2] https://en.wikipedia.org/wiki/Parkour [3] http://www.law.cornell.edu/supct/cert/

6 5

GLAMwiki Toolset Project : Request for Comments - Technical Architecture
by dan entous 26 Sep '12

26 Sep '12

dear all, as some of you may already know, the GLAMwiki Toolset Project, http://outreach.wikimedia.org/wiki/GLAM/Toolset_project, is a collaboration between Wikimedia Nederland, Wikimedia UK, Wikimedia France and Europeana, with the goal of providing a set of tools to get materials from GLAM institutions onto Wikimedia Commons in a way that reuse can easily be tracked, and that Commons materials can easily be integrated back into the collection of the original GLAM or even other GLAMs. as part of our initial goal of creating a GLAM Upload System, we are looking to gather Wikimedia community input on the proposed architecture and technologies. if you have time and interest, please take a look and let us know your thoughts, http://outreach.wikimedia.org/wiki/GLAM/Toolset_project/Request_for_Comment…. with kind regards, dan

2 5

Question About wfUrlEncode
by Tyler Romeo 26 Sep '12

26 Sep '12

Hey, So I'm working on https://gerrit.wikimedia.org/r/22167 (Uri class), and it's failing a unit test. I know the reason it's failing the unit test is because of wfUrlencode() (the failure only started occurring when I moved wfUrlencode() over to the Uri class). However, I cannot figure out why it's failing, because the actual code hasn't really changed at all (in fact some of wfUrlencode is even copied/pasted). Maybe somebody can offer some insight? *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo(a)gmail.com

1 0

Page Curation launch on English Wikipedia
by Fabrice Florin 26 Sep '12

26 Sep '12

Hi folks, I am happy to announce that the Wikimedia Foundation has just launched Page Curation, a new suite of tools for reviewing articles on Wikipedia. Current page patrol tools like Special:NewPages and Twinkle can be hard to use quickly and accurately, and have led to frustration for some users. Page Curation aims to improve that page patrol experience by making it faster and easier to review new pages, using two integrated tools: the New Pages Feed and the Curation Toolbar. Read the Page Curation announcement on our blog: http://blog.wikimedia.org/2012/09/25/page-curation-launch/ To learn more, visit our introduction page: https://en.wikipedia.org/wiki/Wikipedia:Page_Curation/Introduction If you are an experienced editor, please give Page Curation a try: https://en.wikipedia.org/wiki/Special:NewPagesFeed We are also holding IRC office hours on Wednesday, September 26 at 4pm PT (23:00 UTC), during which we will be happy to answer any questions you may have. Please report any issues on our talk page or to our Community Liaison, Oliver Keyes <okeyes(a)wikimedia.org>. A number of patrollers have already started using Page Curation, and we hope that more curators will adopt this new toolkit over time. A 'release version' was deployed on the English Wikipedia on September 20, 2012, and we plan to make it available to other projects in coming weeks. This feature was created in close collaboration with editors. We would like to take this opportunity to thank all the community members who patiently guided our progress over the past few months. This includes folks like Athleek123, DGG, Dori, Fluffernutter, Logan, The Helpful One, Tom Morris, Utar and WereSpielChequers, to name but a few. We are deeply grateful for your generous contributions to this project! We designed Page Curation to offer a better experience, by making it easier for curators to review new pages and by providing more feedback to creators so they can improve Wikipedia together. We hope that you will find this new tool useful. Enjoy! Fabrice Florin Product Manager, Editor Engagement Team Wikimedia Foundation User:Fabrice Florin (WMF) https://en.wikipedia.org/wiki/Wikipedia:Editor_Engagement

1 0

Re: [Wikitech-l] Flagged Reviews default by quality levels
by Dovi Jacobs 26 Sep '12

26 Sep '12

>"When a page reaches X level of quality, that version becomes the default." >When creating all the interface message for editing, viewing, and history, >this is definitely not easy to get right and keep simple for new users. >Anyway, to be clear, you can make the "latest version" the default for all >pages and manually make the "latest reviewed" version the default on a >per-page basis already. You just can't use "quality versions" as the default version. Hi, and thanks for the info. On an immediate practical level I will inform them of this at the wiki, and we will unfortunately be forced to turn off the last approved version as default so as not to offput new users. In terms of the future, I respectfully disagree with you that there is any essential real-life problem for users if the extension were to be implemented in the way I described. Even for those who share your evaluation (which I don't), there should be no reason not to provide such basic functionality as at least an option. Furthermore, to manually make the "latest reviewed" version the default on a per-page basis is extremely cumbersome (even using bots) and not a reasonable solution to the problem. To conclude, I personally think the extension you have developed (I was only recently reminded that you are the primary developer) is already one of the most important developments for the future of Mediawiki (and often under appreciated as such). Please let me know if there is any positive way I can become involved in helping this functionality become part of the program.

3 2

Proposal to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis
by MZMcBride 25 Sep '12

25 Sep '12

Hi. There's a fairly straightforward request to add an API/Developer/Developer Hub link to the footer of Wikimedia wikis, near the "Disclaimers", "Mobile view", etc. links: <https://bugzilla.wikimedia.org/show_bug.cgi?id=33464>. This should be fairly trivial to implement, but I wanted to check if there were any objections to such an idea. The basic rationale for including such a link is that other sites such as Twitter and Facebook and Wordnik have similar links. This kind of link encourages viewers to become involved in development for the project (e.g., Wikipedia), as opposed to the MediaWiki logo, which serves a somewhat different purpose/functionality. Assuming there aren't any objections to the idea, the only remaining question seems to be where to point such a link. Brandon suggested <https://www.mediawiki.org/wiki/Project:Help>. I'm wondering if there's an "API basics" FAQ or something similar on mediawiki.org that could be included in whatever landing page is chosen. Something that includes common questions about the API such as "What formats are available?", "Do I need an API key?" (which I think is a pretty major question to answer, given how many other APIs I've come across work), "How can I get started developing for Wikimedia wikis?", etc. Does anyone know if such a page exists already? MZMcBride

22 41

Re: [Wikitech-l] shortened links (was Re: MediaWiki 1.20 release candidate (and triage announcement))
by Jeremy Baron 25 Sep '12

25 Sep '12

On Sep 24, 2012 7:18 PM, "Mark A. Hershberger" <mah(a)everybody.org> wrote: > On 09/23/2012 06:33 PM, K. Peachey wrote: > > On Mon, Sep 24, 2012 at 4:03 AM, Mark A. Hershberger <mah(a)everybody.org> wrote: > >> On 09/23/2012 12:54 PM, Krinkle wrote: > >>> https://bugzilla.wikimedia.org/[...] > >> > >> Link shortened: http://hexm.de/lp > > > > There is no need to shorten urls in emails, Please don't. > > I've personally seen mail readers and MTAs that mangle long URLs. Since > this sort of mangling happens, I see a need for a way to make URLs > usable. I use a shortener for those URLS as a precautionary measure and > to help me communicate with others. I personally agree it's annoying and wish you didn't. But maybe there's mangling examples I've not seen. IMHO, it should usually be enough to either wrap in angle brackets or put the URL in a footnote on it's own line with just the footnote number. Anyway, I do certainly think we have bigger fish to fry. -Jeremy

2 1

← Newer
1
2
3
4
5
6
7
...
12
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2012