Wikitech-l July 2013

wikitech-l@lists.wikimedia.org

158 participants
162 discussions

by Max Semenik

I remeber we discussed using asserts and decided they're a bad idea for WMF-deployed code - yet I see Warning: assert() [<a href='function.assert'>function.assert</a>]: Assertion failed in /usr/local/apache/common-local/php-1.22wmf12/extensions/WikibaseDataModel/DataModel/Claim/Claims.php on line 291 Thoughts? -- Best regards, Max Semenik ([[User:MaxSem]])

10 years, 9 months

(no subject)

by Tyler Romeo

Hey all, Mozilla made an announcement yesterday about a new framework called Minion: http://blog.mozilla.org/security/2013/07/30/introducing-minion/ https://github.com/mozilla/minion It's an automated security testing framework for use in testing web applications. I'm currently looking into how to use it. Would there be any interest in setting up such a framework for automated security testing of MediaWiki? *-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo(a)gmail.com

10 years, 9 months

MediaWiki Language Extension Bundle 2013.07 release‏‏‏

by Amir E. Aharoni

Hallo, I would like to announce the release of MediaWiki language extension bundle 2013.07 * https://translatewiki.net/mleb/MediaWikiLanguageExtensionBundle-2013.07.tar… * sha256sum: ca381ea1bc1f10c56df28353f91a25129c604ff11938b424833925e8716e2ff3 Quick links: * Installation instructions are at https://www.mediawiki.org/wiki/MLEB * Announcements of new releases will be posted to a mailing list: https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n * Report bugs to https://bugzilla.wikimedia.org * Talk with us at #mediawiki-i18n @ freenode Release notes for each extension are below. Amir E. Aharoni == Babel == Only localization updates. == cldr == No changes. == CleanChanges == Only localization updates. == LocalisationUpdate == Only localization updates. == Translate == ===Noteworthy changes=== Groups are sorted alphabetically in the export tab of Special:Translate. Support for Yandex Translate API v1.5. Edit summaries for automated edits are written in the content language (bug 52142). == UniversalLanguageSelector == ===Noteworthy changes=== The functions for web fonts loading were optimized to improve performance. The internals of loading translated message were changed from the original jquery.i18n implementation to allow loading messages from other domains (CORS). Languages code aliases are now used properly in the Common languages section. This allows, for example, proper display of Tagalog for users from the Philippines. The variable $wgULSNoImeSelectors was added to disable IME on elements by specifying jQuery selectors that match them. The CSS class 'uls-settings-trigger' can be added to any element so that clicking it will make the ULS appear. It is useful for documentation and examples. Web fonts are applied to the IME selector menu, too. ===Fonts=== Persian and Malayalam no longer have a default font. Added fonts for Canadian Syllabic, Urdu (non-default), Updated UnifrakturMaguntia font. ===Input methods=== LRM and RLM were added to the Hebrew input methods and the redundant he-kbd input method was removed. Danda was removed from the Marathi phonetic input method. A bug was fixed in Kannada, Tamil and Marathi input methods that didn't allow typing some characters. The Slovak input method was fixed according to the standard Slovak keyboard. The names of the Oriya input methods were updated. -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬

10 years, 9 months

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

by Petr Onderka

> > For storing updateable indexes, Berkeley DB 4-5, GDBM, and higher-level > options like SQLite are widely used. LevelDB<https://code.google.com/p/leveldb/> is > pretty cool too. > I think that with the amount of data we're dealing with, it makes sense to have the file format under tight control. For example, saving a single byte on each revision means total savings of ~500 MB for enwiki. In any case, at this point it would be more work to switch to one of those than to keep using the format I created. > For delta coding, there's xdelta3 <http://xdelta.org/>, open-vcdiff<https://code.google.com/p/open-vcdiff/>, > and Git's<http://stackoverflow.com/questions/9478023/is-the-git-binary-diff-algorithm…> > delta <https://github.com/git/git/blob/master/diff-delta.c> code<https://github.com/git/git/blob/master/patch-delta.c>. > (rzip <http://rzip.samba.org/>/rsync are wicked awesome, but not as easy > to just drop in as a library.) > I'm certainly going to try to use some library for delta compression, because they seem to do pretty much exactly what's needed here. Thanks for the suggestions. Petr Onderka

10 years, 9 months

First preview version of incremental dumps

by Petr Onderka

Hi, after a month of work on my GSoC project Incremental Dumps [1], I think I have now something worth sharing and talking about, though it's still far from complete. What the code can do now is to read a pages-history XML dump and create the various kinds of dumps (pages/stub, current/history) in the new format from that. It can then convert a dump in the new format back to XML. The XML output is almost the same as existing XML dumps, but there are some differences [2]. The current state of the new format also now has a detailed specification [3] (this describes the current version, the format is still in flux and can change daily). If you want, you can also try running the code. [4] It's not production-quality yet (e.g. it doesn't report errors properly), but it should work. Compilation instructions are in the README file. Any comments or questions are welcome. Petr Onderka User:Svick [1]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps [2]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/XML_… [3]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/Spec… [4]: https://github.com/wikimedia/operations-dumps-incremental/tree/gsoc

10 years, 9 months

Varnish

by Jacobo Nájera

I am exploring around Varnish Cache, do you know some tools for test performance and benchmarking? Thanks, Jacobo

10 years, 9 months

"You have new messages" glitch?

by scs＠eskimo.com

Today I'm noticing that if I visit someone else's user or talk page (this is on en.wp), I see a little orange box saying "Talk: you have new messages" even though I don't. Presumably that user does, or something.

10 years, 9 months

Voting disabled in bugzilla for some products

by John Vandenberg

I may be mistaken, but I thought voting was enabled for the product VisualEditor, and now it is not. Could someone confirm this? It is currently disabled for: Commons App Huggle openZIM Parsoid (Spam) Tool Labs tools VisualEditor Wiki Loves Monuments WikiLoves Monuments Mobile Wikimedia Labs Wikipedia App Voting provides a way to watch a bug without sending any bugmail. It also lets people do a +1 without adding a comment. Is the reason for disabling voting because of perception that large numbers of votes magically creates new developers? If so, would it be re-enabled if the voting interface was relabelled and better documented. i.e. https://bugzilla.wikimedia.org/show_bug.cgi?id=34490 -- John Vandenberg

10 years, 9 months

Re: [Wikitech-l] [Wiktionary-l] Listing missing words of wiktionnaries

by Mathieu Stumpf

Le 2013-07-26 20:26, Amgine a écrit : > The request is to create a web-based text corpus[1] from which to > derive > frequencies and then compare with existing wiktionaries. Not a light > undertaking, but one which has been proposed and implemented > previously > (e.g. Connel's Gutenberg project[2]) > > Generically speaking, someone would need to determine the appropriate > size of the corpus sample, it's temporal currency, and the method of > creating and maintaining it. This isn't easy to do, and having no > strictures results in unwieldy and mostly irrelevant products like > Google's n-grams[3] (on the other hand, if someone can figure out how > to > filter n-grams usefully it would mean we don't have to build our > own.) Actually, I think it would be interesting to have a trend history of words usage over centuries (current trend would also be interesting but probably harder to implement). Wikisource may be used in order to achieve that. > > Amgine > > [1] https://en.wikipedia.org/wiki/Linguistic_corpus > [2] https://en.wiktionary.org/wiki/User:Connel_MacKenzie/Gutenberg > [3] http://storage.googleapis.com/books/ngrams/books/datasetsv2.html > > > On 26/07/13 09:18, Lars Aronsson wrote: >> On 07/23/2013 11:23 AM, Mathieu Stumpf wrote: >>> Here is what I would like to do : generating reports which give, >>> for >>> a given language, a list of words which are used on the web with a >>> number evaluating its occurencies, but which are not in a given >>> wiktionary. >>> >>> How would you recommand to implemente that within the wikimedia >>> infrastructure? >> >> Some years back, I undertook to add entries for >> Swedish words in the English Wiktionary. You can >> follow my diary at http://en.wiktionary.org/wiki/User:LA2 >> >> Among the things I did was to extract a list of all >> Swedish words that already had entries. The best >> way was to use CatScan to list entries in categories >> for Swedish words. Even if there is a page called >> "men", this doesn't mean the Swedish word "men" >> has an entry, because it could be the English word >> "men" that is in that page. >> >> Then I extracted all words from some known texts, >> e.g. novels, the Bible, government reports, and the >> Swedish Wikipedia, counting the number of >> occurrencies of each word. Case significance is >> a bit tricky. There should not be an entry for >> lower-case stockholm, so you can't just convert >> everything to lower case. But if a sentence begins >> with a capital letter, that word should not have >> a capitalized entry. Another tricky issue is >> abbreviations, which should keep the period, >> for example "i.e." rather than "i" and "e". But >> the period that ends a sentence should be removed. >> When splitting a text into words, I decided to keep >> all periods and initial capital letters, even if this >> leads to some false words. >> >> When you have word frequency statistics for a text, >> and a list of existing entries from Wiktionary, you >> can compute the coverage, and I wrote a little >> script for this. I found that English Wiktionary already >> had Swedish entries covering 72% of the words in the >> Bible, and when I started to add entries for the most >> common of the missing words, I was able to increase >> this to 87% in just a single month (September 2010). >> >> Many of the common words that were missing when >> I started were adverbs such as "thereof", "herein", >> which occur frequently in any text but are not very >> exciting to write entries about. This statistics-based >> approach gave me a reason to add those entries. >> >> It is interesting to contrast a given text to a given >> dictionary in this way. The Swedish entries in the >> English Wiktionary is a different dictionary than the >> Swedish entries in the German or Danish Wiktionary. >> The kinds of words found in the Bible are different >> from those found in Wikipedia or in legal texts. >> There is not a single, universal text corpus that we >> can aim to cover. Google has released its ngram >> dataset. I'm not sure if it covers Swedish, but even >> if it does, it must differ from the corpus frequencies >> published by the Swedish Academy. >> >> It is relatively easy to extract a list of existing entries >> from Wiktionary. But to prepare a given text corpus >> for frequency and coverage analysis needs more >> preparation. > > > _______________________________________________ > Wiktionary-l mailing list > Wiktionary-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiktionary-l -- Association Culture-Libre http://www.culture-libre.org/

10 years, 9 months

Replication to the toolserver stopped

by Silke Meyer

Hi wikitech-l! The db replication of s5 and s6 stopped on the toolserver. Merlissimo searched for information and found that you stopped some of your slaves that toolserver is using as master. Is there an ETA when they will be back? Please provide some information! Thanks und cheers, Silke -- Silke Meyer Internes IT-Management und Projektmanagement Toolserver Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. (030) 219 158 260 http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

10 years, 9 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l July 2013