Wikitech-l February 2009

wikitech-l@lists.wikimedia.org

117 participants
104 discussions

by Hugo Vincent

Hi everyone, I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent. I would prefer a script written in Python, but any recommendations would be very welcome. Do you know of anything suitable? Kind Regards, Hugo Vincent, Bluewater Systems.

11 years, 10 months

Replacement stats for placeholder images?

by David Gerard

I've been putting placeholder images on a lot of articles on en:wp. e.g. [[Image:Replace this image male.svg]], which goes to [[Wikipedia:Fromowner]], which asks people to upload an image if they own one. I know it's inspired people to add free content images to articles in several cases. What I'm interested in is numbers. So what I'd need is a list of edits where one of the SVGs that redirects to [[Wikipedia:Fromowner]] is replaced with an image. (Checking which of those are actually free images can come next.) Is there a tolerably easy way to get this info from a dump? Any Wikipedia statistics fans who think this'd be easy? (If the placeholders do work, then it'd also be useful convincing some wikiprojects to encourage the things. Not that there's ownership of articles on en:wp, of *course* ...) - d.

14 years, 6 months

Mailing lists problems

by Michael Bimmler

Hi, I have just seen two independent instances where people said that they sent posts to foundation-l in the last +/- 12 hours, which never got posted on the list. The emails do not show up in the moderation queue either (nor are these two subscribers, or the entire list, moderated). Are there any technical problems with the mail(inglist) server that you are aware of? Michael -- Michael Bimmler mbimmler(a)gmail.com

14 years, 10 months

en.wiki migrated to new search backend

by Robert Stojnic

Hi all, We now have english wikipedia fully migrated to new servers / new search backend. We cannot fully migrate other wikis until we resolve some hardware issues. In the meantime, here is the overview of new features now deployed on en.wiki: 1) Did you mean... - we now have search suggestions. Care has been taken to provide suggestions that are context-sensitive, i.e. on phrases, proper names, etc.. 2) fuzzy and wildcard queries - a word can be made fuzzy by adding ~ to it's end, e.g. query sarah~ thompson~ will give all different spellings and similar names to sarah thompson. Wildcards can now be prefix and suffix, e.g. *stan will give various countries in central asia. 3) prefix: - using this magic prefix, queries can be limited to pages beginning with certain prefix. E.g. mwsuggest prefix:Wikipedia:Village Pump will search all village pumps and archives for mwsuggest. This should be especially useful for archive searching in concert with inputbox or searchbox 4) intitle: - using this magic prefix, queries can be limited to titles only 5) generally improved quality of search results via usage of related articles (based on co-occurrence of links), anchor text, text abstracts, proximity within articles, sections, redirects, improved stemming and such Cheers, Robert

14 years, 10 months

new extension for embedded music scores

by River Tarnell

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 hello, i have written a new extension to embed music scores in MediaWiki pages: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:ABC unlike the Lilypond extension, this uses a simple input language (ABC) that is much easier to validate for security. ABC is mostly used to transcribe Irish trad and other simple tunes, but it recently gained support for more advanced features, e.g. multiple staves and lyrics. this is supported in the extension using the 'abcm2ps' tool. unlike the existing ABC extension (AbcMusic), it doesn't support opening arbitrary files as ABC input (which is a potential security issue), and has several additional features: - - The original ABC can be downloaded easily - - The score can be downloaded as PDF, PostScript, MIDI or Ogg Vorbis - - A media player can be embedded in the page to play the media file i believe the ABC format is suitable for transcribing the majority of scores currently on Wikimedia projects. although it can't handle all of them, it is better than the current situation. plus, as ABC is simple, and existing ABC scores are easily available, it's easier for novice users to contribute. i would be interested to hear peoples' thoughts on enabling this extension on Wikimedia. - river. -----BEGIN PGP SIGNATURE----- iD8DBQFJBwL+IXd7fCuc5vIRAqG6AJ9RxKTGjJ7ywdZoesrTJWrMPtBYrACgjgDX lIY552ilDFaVG1mLzqW1F/Y= =7Tda -----END PGP SIGNATURE-----

14 years, 11 months

Open Document Format

by Lars Aronsson

Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle. This should be an FAQ but, but it isn't obvious from http://commons.wikimedia.org/wiki/Commons:File_types -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

14 years, 11 months

On extension SVN revisions in Special:Version

by Brion Vibber

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Special:Version displays SVN version numbers for extensions out of $wgExtensionCredits, which seems to be done with $LastChangedRevision$ keywords in the extension's entry point file. This produces massively incorrect numbers in many cases, since the entry point file is relatively rarely changed in non-trivial extensions consisting of multiple files. Updates to the body, class, i18n, and other files are not reflected. If we're running on a SVN checkout of the extension, we could check the directory for its current revision much as we do for MediaWiki itself; this would tell us for instance if an extension's subdirectory has been updated separately from the core MediaWiki. But if we aren't on a SVN checkout, or if individual files have been updated to different versions, this may or may not tell us anything useful. Anybody have a suggestion on how to best handle this? - -- brion -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklb6VcACgkQwRnhpk1wk44MNACg2c0ztpocjHfsb5l+KxSu8e+I wXgAoMSrjFeTPzEnMY4904bxXZv+DiYf =GNqG -----END PGP SIGNATURE-----

14 years, 12 months

DBMS where join+limit works?

by Tim Starling

On thistle with DB=dewiki: mysql> explain select * from recentchanges left join tag_summary on ts_rc_id=rc_id order by rc_timestamp desc limit 50\G *************************** 1. row *************************** table: recentchanges type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1179921 Extra: Using temporary; Using filesort *************************** 2. row *************************** table: tag_summary type: ALL possible_keys: ts_rc_id key: NULL key_len: NULL ref: NULL rows: 4 Extra: 2 rows in set (0.00 sec) Whenever you do a join with a limit, MySQL gets the query plan wrong. It scans the small table and filesorts the large table. You have to use FORCE INDEX on the small table to suppress the scan. We've seen this many times. It's very difficult to detect during code review and frequently crashes the site. Does anyone know a DBMS where joining with limits actually works? Because I'm sick of this crap. -- Tim Starling

15 years

BSD License question

by Sergey Chernyshev

I've made some customizations to OpenID selector code ( http://code.google.com/p/openid-selector/) and combined it with MediaWiki OpenID extension, you can see the result here: http://www.sharingbuttons.org/Special:OpenIDLogin Iwant to check it in back into the repository, but it uses "New BSD License" and I wonder if it's OK to do so. Otherwise I'll write one from scratch and GPL it. Sergey -- Sergey Chernyshev http://www.sergeychernyshev.com/

15 years

Enwiki dump crawling since 10/15/2008

by Christian Storm

>> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote: >> The current enwiki database dump (http://download.wikimedia.org/enwiki/20081008/ >> ) has been crawling along since 10/15/2008. > The current dump system is not sustainable on very large wikis and > is being replaced. You'll hear about it when we have the new one in > place. :) > -- brion Following up on this thread: http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html Brion, Can you offer any general timeline estimates (weeks, months, 1/2 year)? Are there any alternatives to retrieving the article data beyond directly crawling the site? I know this is verboten but we are in dire need of retrieving this data and don't know of any alternatives. The current estimate of end of year is too long for us to wait. Unfortunately, wikipedia is a favored source for students to plagiarize from which makes out of date content a real issue. Is there any way to help this process along? We can donate disk drives, developer time, ...? There is another possibility that we could offer but I would need to talk with someone at the wikimedia foundation offline. Is there anyone I could contact? Thanks for any information and/or direction you can give. Christian

15 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2009