Wikitech-l December 2008

wikitech-l@lists.wikimedia.org

89 participants
65 discussions

by Hugo Vincent

Hi everyone, I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent. I would prefer a script written in Python, but any recommendations would be very welcome. Do you know of anything suitable? Kind Regards, Hugo Vincent, Bluewater Systems.

11 years, 10 months

Replacement stats for placeholder images?

by David Gerard

I've been putting placeholder images on a lot of articles on en:wp. e.g. [[Image:Replace this image male.svg]], which goes to [[Wikipedia:Fromowner]], which asks people to upload an image if they own one. I know it's inspired people to add free content images to articles in several cases. What I'm interested in is numbers. So what I'd need is a list of edits where one of the SVGs that redirects to [[Wikipedia:Fromowner]] is replaced with an image. (Checking which of those are actually free images can come next.) Is there a tolerably easy way to get this info from a dump? Any Wikipedia statistics fans who think this'd be easy? (If the placeholders do work, then it'd also be useful convincing some wikiprojects to encourage the things. Not that there's ownership of articles on en:wp, of *course* ...) - d.

14 years, 6 months

en.wiki migrated to new search backend

by Robert Stojnic

Hi all, We now have english wikipedia fully migrated to new servers / new search backend. We cannot fully migrate other wikis until we resolve some hardware issues. In the meantime, here is the overview of new features now deployed on en.wiki: 1) Did you mean... - we now have search suggestions. Care has been taken to provide suggestions that are context-sensitive, i.e. on phrases, proper names, etc.. 2) fuzzy and wildcard queries - a word can be made fuzzy by adding ~ to it's end, e.g. query sarah~ thompson~ will give all different spellings and similar names to sarah thompson. Wildcards can now be prefix and suffix, e.g. *stan will give various countries in central asia. 3) prefix: - using this magic prefix, queries can be limited to pages beginning with certain prefix. E.g. mwsuggest prefix:Wikipedia:Village Pump will search all village pumps and archives for mwsuggest. This should be especially useful for archive searching in concert with inputbox or searchbox 4) intitle: - using this magic prefix, queries can be limited to titles only 5) generally improved quality of search results via usage of related articles (based on co-occurrence of links), anchor text, text abstracts, proximity within articles, sections, redirects, improved stemming and such Cheers, Robert

14 years, 11 months

new extension for embedded music scores

by River Tarnell

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 hello, i have written a new extension to embed music scores in MediaWiki pages: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:ABC unlike the Lilypond extension, this uses a simple input language (ABC) that is much easier to validate for security. ABC is mostly used to transcribe Irish trad and other simple tunes, but it recently gained support for more advanced features, e.g. multiple staves and lyrics. this is supported in the extension using the 'abcm2ps' tool. unlike the existing ABC extension (AbcMusic), it doesn't support opening arbitrary files as ABC input (which is a potential security issue), and has several additional features: - - The original ABC can be downloaded easily - - The score can be downloaded as PDF, PostScript, MIDI or Ogg Vorbis - - A media player can be embedded in the page to play the media file i believe the ABC format is suitable for transcribing the majority of scores currently on Wikimedia projects. although it can't handle all of them, it is better than the current situation. plus, as ABC is simple, and existing ABC scores are easily available, it's easier for novice users to contribute. i would be interested to hear peoples' thoughts on enabling this extension on Wikimedia. - river. -----BEGIN PGP SIGNATURE----- iD8DBQFJBwL+IXd7fCuc5vIRAqG6AJ9RxKTGjJ7ywdZoesrTJWrMPtBYrACgjgDX lIY552ilDFaVG1mLzqW1F/Y= =7Tda -----END PGP SIGNATURE-----

14 years, 11 months

Open Document Format

by Lars Aronsson

Can someone explain why the Wikimedia Commons accepts uploads of printable PDF documents (e.g. brochures) but not the editable source version in Open Document Format (e.g. .ODT). This seems to violate the open source principle. This should be an FAQ but, but it isn't obvious from http://commons.wikimedia.org/wiki/Commons:File_types -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

14 years, 11 months

On extension SVN revisions in Special:Version

by Brion Vibber

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Special:Version displays SVN version numbers for extensions out of $wgExtensionCredits, which seems to be done with $LastChangedRevision$ keywords in the extension's entry point file. This produces massively incorrect numbers in many cases, since the entry point file is relatively rarely changed in non-trivial extensions consisting of multiple files. Updates to the body, class, i18n, and other files are not reflected. If we're running on a SVN checkout of the extension, we could check the directory for its current revision much as we do for MediaWiki itself; this would tell us for instance if an extension's subdirectory has been updated separately from the core MediaWiki. But if we aren't on a SVN checkout, or if individual files have been updated to different versions, this may or may not tell us anything useful. Anybody have a suggestion on how to best handle this? - -- brion -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklb6VcACgkQwRnhpk1wk44MNACg2c0ztpocjHfsb5l+KxSu8e+I wXgAoMSrjFeTPzEnMY4904bxXZv+DiYf =GNqG -----END PGP SIGNATURE-----

15 years

Re: [Wikitech-l] The never-dying topic: category intersection

by Aerik Sylvan

On Wed, 03 Dec 2008 16:48:39 +0100, Roan Kattouw <roan.kattouw(a)home.nl > wrote: > > We had a pretty lengthy discussion about this before the summer, and the > consensus seemed to be that a fulltext-based approach looked most > viable. I actually wrote an extension that does that, and promised to > release it soon; that was quite a few months ago, and I never got around > to it. I'll release it properly when I have time, which will hopefully > be before Christmas :D > > The code needs some tweaking and refactoring, though. It's pretty > tightly integrated with the article text search (both functions in one > form) and has all kinds of weird features, because the guy who paid me > to write it wanted them. It also doesn't support three-letter word > searching (which core does these days, using a prefix hack), which is > pretty bad since categories with short titles (or stopword titles) won't > be found either. > > Roan Kattouw (Catrope) > > Hey Roan, does your code use the a new table for the category search (with fulltext index) and do you have the hooks for maintaining that table? Do you display the the results on a new search results page, or did you hack the existing one? Basically, I'm thinking that even if your stuff isn't ready for prime time, you may have already done a lot of the heavy lifting... can we get our hands on it? Thanks! Aerik -- http://eventfeed.org - An Initiative Promoting Syndication of Events http://www.wikidweb.com - the Wiki Directory of the Web http://tagthis.info - Hosted Tagging for your website!

15 years, 3 months

License information (was: PDF/Collection feature live on de.wikibooks)

by Johannes Beigel

Am 10.10.2008 um 21:22 schrieb Erik Moeller: > 2008/10/10 Derbeth <derbeth(a)wp.pl>: >> I wonder about the legal aspects. In my opinion, when you create a >> ready-to-print version, >> you have to attach the text of GFDL license to it - directly, not >> as a link. Like it is done in >> http://en.wikibooks.org/wiki/Image:LaTeX.pdf. As Erik wrote: This is already implemented (either a title of an article or a URL to some license text can be set in LocalSettings.php), but it's currently not configured. >> Secondly, current version of the tool does a plagiarism - beacause >> it does not mention >> image authors and does not provide any mean (like by making images >> clickable) to check >> these authors. > > Ouch, thanks for pointing that out. Tricky to do this automatically > since it's all wiki-text with templates, but we'll investigate a > solution here. We'd highly appreciate input from the community regarding this topic! The printed books from PediaPress contain a list of figures where the license of each image is listed, together with the URL to the image description page. As some kind of "hotfix" this solution could be implemented in the PDF export of the Collection extension, too. But this doesn't really solve the problem. We think it's more of a technical/software thing, so I cross-posted (and set Reply-To) to Wikitech-l. In our opinion, license management/handling must be a core feature of MediaWiki, because the software is explicitely developed for the collaborative distribution of free content. Licenses of the containing articles and images should not be represented via some agreed-upon convention but via structured (and machine-readable) information, available for each relevant object in the wiki. Some information that would be desired: - Full (official) name of the license(s). - Whether the full text of the license has to be included or a reference sufficient. - Reference to the full text of the license(s) (in some rigidly defined format like wikitext). - Whether attribution is required. If so: The list of required attributions. So, basically all the information that's required to check if it's possible to take some part of the MediaWiki and use it somewhere else and all the information that has to be included in that other place. This information could be made accessible via MediaWiki API, but ideally it's contained in the wikitext and/or XHTML, too. All this could be handled via microformats, even inside of templates, but the main point is that any kind of new technique has to be enforced, ideally via MediaWiki software itself: In the commons wikis there are some conventions that can be used in software by people/ companies like us (although we have to work with hacks and workarounds), but oftentimes, in wikis with smaller communities this information doesn't even exist at all. -- Johannes Beigel

15 years, 3 months

Section anchor encoding

by Brion Vibber

[Breaking this thread off...] On 12/28/08 1:32 AM, Niklas Laxström wrote: > The anchors of non-latin headers are already (latin) gibberish: > #.D0.A4.D0.B8.D0.BB.D1.8C.D0.BC.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.8F > > It doesn't seem reasonable to think that people could create anchors > in their head from text, except in special cases. If we're going to stick with strict ASCII-limited anchors, it might be worth considering making them more legible, say with transliteration to ASCII Latin chars. :P On the other hand, XHTML *doesn't* actually limit us this way! The XHTML 1.0 recommendation of restriction to [A-Za-z][A-Za-z0-9:_.-]* is for compatibility with HTML 4.0, which defines: ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods ("."). XHTML specifcies ID and NMTOKEN types here, which are *not* restricted to ASCII, but rather a large number of scripts: http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-NameChar http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Letter http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Digit http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Extender If there are no major browser compatibility problems, I would probably recommend we roll back the nasty old .XX encoding for HTML 4 compatibility, in which case we could quite legally produce something direct, such as: http://ru.wikipedia.org/wiki/Уплисцихе#Уплисцихе_в_средневековье which URL-encodes out to: http://ru.wikipedia.org/wiki/%D0%A3%D0%BF%D0%BB%D0%B8%D1%81%D1%86%D0%B8%D1%… (which can be nicely displayed as pretty Unicode in the URL bar of modern browsers) as opposed to the current: http://ru.wikipedia.org/wiki/%D0%A3%D0%BF%D0%BB%D0%B8%D1%81%D1%86%D0%B8%D1%… -- brion

15 years, 4 months

Subpage titles

by Michael J. Walsh

This is a proposal to change how the MediaWiki software displays subpages. This isn't really an issue over a Wikipedia because subpages in the main namespace are disabled, but using subpages at Wikisource is a standard way of dividing up works, leaving only a table of contents at the root article. The problem is that this results in pages titles like this: United States Code/Title 35/Chapter 14/Section 151 or even Nicene and Post-Nicene Fathers: Series II/Volume I/Constantine/The Life of Constantine/Book II/Chapter 23 which IMHO looks more like a file system than a user-friendly website. I would suggest United States Code » Title 35 » Chapter 14 » Section 151 or my own favourite United States Code » Title 35 » Chapter 14 » Section 151 (With "Section 151" in bigger font.) This would effectively involve moving the subpages div above the title and changing the title from the entire path to just the subpage name. See: http://en.wikisource.org/wiki/Wikisource:Scriptorium#Subpage_formatting for the discussion I started on Wikisource and further down the same page http://en.wikisource.org/wiki/ Wikisource:Scriptorium#Subpage_formatting:_some_more_examples for some formatted versions of the above examples and http://en.wikisource.org/wiki/Nicene_and_Post- Nicene_Fathers:_Series_II/Volume_I/Constantine/ The_Life_of_Constantine/Book_II/Chapter_23 and http://en.wikisource.org/wiki/Treaty_on_European_Union/ Protocol_on_the_convergence_criteria_referred_to_in_Article_109j_of_the_ Treaty_establishing_the_European_Community for examples of how ugly the current setup can be. Michael

15 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l December 2008