Wikitech-l February 2007

wikitech-l@lists.wikimedia.org

114 participants
150 discussions

Start a nNew thread

[Fwd: Re: mwdumper does not work]
by Jeff V. Merkey 27 Feb '07

27 Feb '07

Wrong list. Sorry. Jeff

1 0

Meta tags for keywords
by roman.spitzbart＠liwest.at 27 Feb '07

27 Feb '07

Hi, I try to write an extension for Mediawiki that allows users to influence the keywords that are added as meta tags to the HTML code: <meta name="keywords" content="Find out which table relates to a SYS LOB Segment" /> I already figured out that these tags are written OutputPage.php and are generated by the addKeywords function. I have found the 'OutputPageParserOutput' hook, that allows me to add more keywords (unfortunately it does not allow any access to the existing keywords to remove them). What I'm not sure if how to get the keywords the user entered into the function. What I thought of was a special tag that users can use to specific keywords that override the generated keywords... Any ideas or hints where I should start looking? Roman

6 10

Captcha changes
by Brion Vibber 27 Feb '07

27 Feb '07

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've fixed up the image-based captcha to read from subdirectories, which should put less load on the file server. Also I've gone ahead and enabled it for all wikis, partly in response to concerns that there's some possibly machine-based mass registration going on which might be malicious. If there's problems, let us know. - -- brion vibber (brion @ pobox.com / brion @ wikimedia.org) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF2gfYwRnhpk1wk44RAgoyAJ9d29QYZU8uio4uutwYhGCeaRSvigCfTkvP NohYqJpwi07qF5a7HAW2G3Y= =WJnP -----END PGP SIGNATURE-----

3 3

Introduction + 7zip dump format better for random access
by Stian Haklev 27 Feb '07

27 Feb '07

Dear all, I thought I'd introduce myself first. I've been using Wikipedia for about four years, mostly the English one (user houshuang), but also occasionally the Norwegian (I am Norwegian), the Chinese, and lately the Indonesian (I currently live in Jakarta). I am also very interested in Wikipedia from a social and technological perspective. Lately I've been working a lot on a way to use Wikipedia HTML dump files offline. I posted about this a while ago, the current version can be downloaded from http://houshuang.org/blog . I am working on one with a few improvements. It's working quite well right now (I have about 8 small language wikis on my HD, and all the interlingua links work, etc). THe idea is that it works without unzipping the files first, you just place them in the right directory, and given that you have 7zip and Ruby, it should just work (serving files to localhost). I will make a better installer, a tiny graphical UI etc later, so that I can put it on a CD with a given language file, and it will just work with one click, on Mac, Windows, Linux. The problem is that 7zip was never optimized to work quickly on extracting one given file out of hundreds of thousands, or millions. Right now, the Indonesian wikipedia (60MB 7zipped) takes about 15 seconds for a page on my two-year old iBook, whereas the Chinese one (250MB 7zipped) takes about 150 seconds for a page. I haven't dared try any of the bigger ones, like the German (1,5GB) or the English (four files a 1,5GB)... My first thought was if it was possible to modify the open-source 7zip to generate an index of which block the different files where, which would then make the actual extraction a lot faster. The problem is that I suck at C, and I have been looking for people to help me, even offering a small bounty to the developer. (If anyone here would help me, that would be MUCH appreciated! I personally think it would be quite easy, given the sourcecode that exists, but I don't know for sure). The developer himself suggested packing the Wikipedia dump file with something like this 7z a -mx -ms=32M -m0d=32M archive * which would make it more modular, and much faster to access. However, I really don't want to repack all the dumpfiles (I cannot imagine the time it would take to rezip the 1,5GB big file) and I don't have the capacity to host them - my intention has all the time been for my program to work out of the box with the Wikipedia dump files... So I am writing here, since I don't know how else to contact the dump file developers... is there any way they would consider using these options for making the dump files, or if not, what are the reasons (maybe the files would get slightly bigger, but I think the benefit would far outweigh the disadvantage!). Anyway, any help or guidance or ideas would be much appreciated. And feel free to play around with my program. It's very unfinished (and I have a better version which I will publish soon), but it's already quite functional, and has saved me through several long boring meetings in fancy hotels with too expensive wifi :) Thank you very much, and please let me know if there are other mailing lists - Wikipedia discussion pages which would be more appropriate for this question. Stian in Jakarta -- Stian Haklev - University of Toronto http://houshuang.org/blog - Random Stuff that Matters

5 4

using /etc/passwd for authentication
by Holger Witsch 27 Feb '07

27 Feb '07

Hi all, I'm new on this forum. please bear with me. I am running a mediawiki (1.6.9 because I would like to put my fingers into updating php in a server where I am not an admin). Now this is a wiki which will be looked at by a lot of people in my company, but only edited by about 15 programmers, who all have an account on that server. Is there a way to allow account creation only for users already present in /etc/passwd ? Thanks a lot if anyone has an idea. Holger

2 1

[GFDL Tools for Foundation XML Dumps] gfdl-wikititle released under GPLv3
by Jeffrey V. Merkey 27 Feb '07

27 Feb '07

I have open sourced a simple "C" based utility under GPLv3 which will convert Wikimedia Foundation XML Wikipedia Dumps into a format which allows easy compliance with the GFDL for off-Wikipedia wikis. This utility allows the XML dumps to be parsed and [[en:<Article Title>]] or [http://en.wikipedia.org/wiki/<ArticleTitle>] tags to be inserted into the dump at the end of articles. Dumps converted with this tool point back into the source Wikipedia artcles and edit history when imported into MediaWiki. The tools allows tags which use interwiki_sql links as well as simple URL addresses. This code is provided to the Wikipedia Community and other consumers of Wikipedia content with an easy way to insert GFDL compliant tags into master XML dumps and import them. The source code, binaries, and makefile are available at: ftp://www.wikigadugi.org/wiki/xml/gfdl-wikititle.tar.gz Enjoy. Jeff

1 0

rebuildImage.php MIME detection error 1.9.3
by Jeff V. Merkey 27 Feb '07

27 Feb '07

Log excerpt attached. rebuildImage.php is detecting all png files as "text/plain" and updating the wiki database with this information. This causes the viewer to fail with "possible malicious code" errors when the system tries to render the png images. Command was: php maintenance/rebuildImages.php --missing >& /wikidump/image.log Jeff chrpdb 2007-02-26 22:42:26: crawling /wikidump/chrp/images/0/00 chrpdb 2007-02-26 22:42:26: Empty filename for /wikidump/chrp/images/0/00/Lyngør_1984.jpeg chrpdb 2007-02-26 22:42:26: Empty filename for /wikidump/chrp/images/0/00/Guetteur_au_poste_de_l'écluse_26.jpg chrpdb 2007-02-26 22:42:26: Empty filename for /wikidump/chrp/images/0/00/Trollhättefallen.jpg chrpdb 2007-02-26 22:42:27: Empty filename for /wikidump/chrp/images/0/00/André_Marie_Ampère.jpg chrpdb 2007-02-26 22:42:27: Empty filename for /wikidump/chrp/images/0/00/São_Francisco_Ensenada.jpg chrpdb 2007-02-26 22:42:29: crawling /wikidump/chrp/images/0/01 chrpdb 2007-02-26 22:42:29: Empty filename for /wikidump/chrp/images/0/01/Karl_Von_Linné.jpg chrpdb 2007-02-26 22:42:29: Empty filename for /wikidump/chrp/images/0/01/CemalGürsel.jpg chrpdb 2007-02-26 22:42:30: Empty filename for /wikidump/chrp/images/0/01/Tuân_Pham.jpeg chrpdb 2007-02-26 22:42:32: crawling /wikidump/chrp/images/0/02 chrpdb 2007-02-26 22:42:32: Empty filename for /wikidump/chrp/images/0/02/Français.JPG chrpdb 2007-02-26 22:42:33: Empty filename for /wikidump/chrp/images/0/02/La_Soufrière-arieal.jpg chrpdb 2007-02-26 22:42:33: Empty filename for /wikidump/chrp/images/0/02/Megas_(Hættuleg_Hljómsveit_&_Glæpakvendið_Stella).jpg chrpdb 2007-02-26 22:42:33: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:33: Empty filename for /wikidump/chrp/images/0/02/Map_götaland.png chrpdb 2007-02-26 22:42:34: Empty filename for /wikidump/chrp/images/0/02/Bento_gonçalves_pt.gif chrpdb 2007-02-26 22:42:37: crawling /wikidump/chrp/images/0/03 chrpdb 2007-02-26 22:42:37: Empty filename for /wikidump/chrp/images/0/03/Die_große_Kirche_in_Suwalki.jpg chrpdb 2007-02-26 22:42:38: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:38: Empty filename for /wikidump/chrp/images/0/03/Carte_France_Département_18.png chrpdb 2007-02-26 22:42:38: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:38: Empty filename for /wikidump/chrp/images/0/03/Svpmap_västergötland.png chrpdb 2007-02-26 22:42:38: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:38: Empty filename for /wikidump/chrp/images/0/03/Finnish_map_with_Kittilä_highlighted.png chrpdb 2007-02-26 22:42:38: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:38: Empty filename for /wikidump/chrp/images/0/03/Göttingen_in_Germany.png chrpdb 2007-02-26 22:42:44: crawling /wikidump/chrp/images/0/04 chrpdb 2007-02-26 22:42:44: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:44: Empty filename for /wikidump/chrp/images/0/04/Jönköping_county.png chrpdb 2007-02-26 22:42:45: Empty filename for /wikidump/chrp/images/0/04/Package_Diagram_for_168-Pin_PGA_Embedded_IntelDX2_Processor.JPG chrpdb 2007-02-26 22:42:45: Empty filename for /wikidump/chrp/images/0/04/Ríos_Montt.JPG chrpdb 2007-02-26 22:42:45: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:45: Empty filename for /wikidump/chrp/images/0/04/Carte_France_Département_42.png chrpdb 2007-02-26 22:42:45: Empty filename for /wikidump/chrp/images/0/04/Autoportrait_à_la_palette.jpg chrpdb 2007-02-26 22:42:45: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:45: Empty filename for /wikidump/chrp/images/0/04/Île-de-France_flag.png chrpdb 2007-02-26 22:42:54: crawling /wikidump/chrp/images/0/05 chrpdb 2007-02-26 22:42:54: Empty filename for /wikidump/chrp/images/0/05/C_Jürgens.JPG chrpdb 2007-02-26 22:42:54: Empty filename for /wikidump/chrp/images/0/05/Nord-Trøndelag_coa.gif chrpdb 2007-02-26 22:42:54: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:54: Empty filename for /wikidump/chrp/images/0/05/Bjarkøy_coa.png chrpdb 2007-02-26 22:42:55: Empty filename for /wikidump/chrp/images/0/05/LÉ_Eithne_(P31).jpg chrpdb 2007-02-26 22:42:55: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:55: Empty filename for /wikidump/chrp/images/0/05/Caffè_Nero_(logo).PNG chrpdb 2007-02-26 22:42:55: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:55: Empty filename for /wikidump/chrp/images/0/05/AlbaniaTiranëCounty.png chrpdb 2007-02-26 22:42:55: Surprising mime type: text/plain chrpdb 2007-02-26 22:42:55: Empty filename for /wikidump/chrp/images/0/05/Öland.png chrpdb 2007-02-26 22:42:56: Empty filename for /wikidump/chrp/images/0/05/Krag-Jørgensen-Speed_Loader_1.jpg chrpdb 2007-02-26 22:43:01: crawling /wikidump/chrp/images/0/06 chrpdb 2007-02-26 22:43:01: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:01: Empty filename for /wikidump/chrp/images/0/06/Pokémon_Emerald_Trainers.png chrpdb 2007-02-26 22:43:02: Empty filename for /wikidump/chrp/images/0/06/HilmiÖzkök.jpg chrpdb 2007-02-26 22:43:02: Empty filename for /wikidump/chrp/images/0/06/María_Elena_Moyano.jpg chrpdb 2007-02-26 22:43:02: Empty filename for /wikidump/chrp/images/0/06/Arnaldo_tamayo_méndez.jpeg chrpdb 2007-02-26 22:43:02: Empty filename for /wikidump/chrp/images/0/06/Günter_Verheugen.jpg chrpdb 2007-02-26 22:43:03: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:03: Empty filename for /wikidump/chrp/images/0/06/AntiquePokéball.png chrpdb 2007-02-26 22:43:09: crawling /wikidump/chrp/images/0/07 chrpdb 2007-02-26 22:43:09: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:09: Empty filename for /wikidump/chrp/images/0/07/Carte_France_Département_26.png chrpdb 2007-02-26 22:43:10: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:10: Empty filename for /wikidump/chrp/images/0/07/Carte_Localisation_Région_France_Midi-Pyrénée.png chrpdb 2007-02-26 22:43:11: Empty filename for /wikidump/chrp/images/0/07/OrchestreMétropolitain.jpg chrpdb 2007-02-26 22:43:11: Empty filename for /wikidump/chrp/images/0/07/Filzstift.Sion_Switzerland_Notre_Dame_de_Valère_2002.jpg chrpdb 2007-02-26 22:43:11: Empty filename for /wikidump/chrp/images/0/07/WilhelmRöntgen.JPG chrpdb 2007-02-26 22:43:11: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:11: Empty filename for /wikidump/chrp/images/0/07/Brønnøy_coa.png chrpdb 2007-02-26 22:43:18: crawling /wikidump/chrp/images/0/08 chrpdb 2007-02-26 22:43:19: Empty filename for /wikidump/chrp/images/0/08/Portrait_of_Miklós_Zrinyi.jpg chrpdb 2007-02-26 22:43:20: Empty filename for /wikidump/chrp/images/0/08/Éire.jpg chrpdb 2007-02-26 22:43:25: crawling /wikidump/chrp/images/0/09 chrpdb 2007-02-26 22:43:26: Empty filename for /wikidump/chrp/images/0/09/Äänekoski_bus_disaster_2004.jpg chrpdb 2007-02-26 22:43:26: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:26: Empty filename for /wikidump/chrp/images/0/09/Celes_Chère_menu.png chrpdb 2007-02-26 22:43:26: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:26: Empty filename for /wikidump/chrp/images/0/09/Carte_France_Département_41.png chrpdb 2007-02-26 22:43:27: Empty filename for /wikidump/chrp/images/0/09/Lü_Bu_and_Diao_Chan_TV_Serial.jpg chrpdb 2007-02-26 22:43:27: Empty filename for /wikidump/chrp/images/0/09/Nornorna_spinner_ödets_trådar_vid_Yggdrasil.jpg chrpdb 2007-02-26 22:43:27: Empty filename for /wikidump/chrp/images/0/09/Drave_at_Vízvár,_Hungary.JPG chrpdb 2007-02-26 22:43:33: crawling /wikidump/chrp/images/0/0a chrpdb 2007-02-26 22:43:33: Empty filename for /wikidump/chrp/images/0/0a/La_Martinière_01.jpg chrpdb 2007-02-26 22:43:33: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:33: Empty filename for /wikidump/chrp/images/0/0a/Côte_d'Ivoire_map.png chrpdb 2007-02-26 22:43:33: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:33: Empty filename for /wikidump/chrp/images/0/0a/Polli_pula_e_Lleshit_në_kaçile_t'leshit.ogg chrpdb 2007-02-26 22:43:33: Empty filename for /wikidump/chrp/images/0/0a/Eddé.jpg chrpdb 2007-02-26 22:43:34: Empty filename for /wikidump/chrp/images/0/0a/Battle_of_Bråvalla.JPG chrpdb 2007-02-26 22:43:34: Surprising mime type: text/plain chrpdb 2007-02-26 22:43:34: Empty filename for /wikidump/chrp/images/0/0a/Provincia_Autónoma_Hebrea.png

3 5

Introduction + 7zip dump format better for
by Stian Haklev 27 Feb '07

27 Feb '07

> How about using another algorithm that does this already? Thanks for those who answered my mail. The thing is that I really really want my program to work natively with the dump files from Wikipedia, both because it would take me days to recompress 1,5 GB files, I would have no capacity to host them, etc. I am not just making this program to generate one offline CD for one language, but for it to be a tool that works with all dump files (right now I have the dump files of 9 smaller languages in a catalogue, all the interwiki links work etc - just slow)... Therefore, if the people running the dumps would consider changing the format to something that was easier to random access, I would be all for it. Indeed I don't know the pros and cons that made them choose 7zip in the first place. However, I don't even know where to start a discussion with them, and I am imagining that such a decision would take very long to implement. Thus I figured trying to tweak 7zip would probably be a much faster way. :) If you have any pointers on getting in touch with the dump people (I also tried pointing out on a talk page somewhere a week ago that on the static download page it says that dumps for December are currently in progress, and the link points to November, however, all the dumps for December are done according to the log, and you can download them if you type in the URL manually... that was a week ago and apparently that talk page wasn't a good way of getting in touch with them), or if some of them are hanging around here, I'd love to have a discussion with them, both about the dump format itself, and some other technical details on how they prepare the material. (I would also love to have several HTML dumps - one with only the article pages. Currently there is only one - which includes all pages, even the image detail pages). Let me say though, that they've done an awesome job, and there are some really neat decisions in how to make the static dumps. Thanks a lot Stian

1 0

MediaWiki automated test run failure 2007-02-27
by brion＠pobox.com 27 Feb '07

27 Feb '07

An automated run of parserTests.php showed the following failures: This is MediaWiki version 1.10alpha (r20068). Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"... 18 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * TODO: Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed] * TODO: Link containing double-single-quotes '' (bug 4598) [Has never passed] * TODO: message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * TODO: message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * TODO: HTML bullet list, unclosed tags (bug 5497) [Has never passed] * TODO: HTML ordered list, unclosed tags (bug 5497) [Has never passed] * TODO: HTML nested bullet list, open tags (bug 5497) [Has never passed] * TODO: HTML nested ordered list, open tags (bug 5497) [Has never passed] * TODO: Inline HTML vs wiki block nesting [Has never passed] * TODO: Mixing markup for italics and bold [Has never passed] * TODO: 5 quotes, code coverage +1 line [Has never passed] * TODO: dt/dd/dl test [Has never passed] * TODO: Images with the "|" character in the comment [Has never passed] * TODO: Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * TODO: Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed] Passed 493 of 511 tests (96.48%)... 18 tests failed!

1 0

License status of MW screenshots.
by Mark Clements 27 Feb '07

27 Feb '07

This pops up every so often at www.mediawiki.org, particularly in relation to the public domain help pages that we are trying to compile. What is the correct license for screenshots of MediaWiki? Are we able to distribute them along with the PD help pages (when we get to that stage)? Currently they are variously tagged as GFDL, PD, (c) WMF and possibly others. Are there any considerations that may cause some screenshots to be under one license and some under another? E.g. if the screenshot includes the MW logo or certain interface text does it affect how it can be licensed? What about the different skins? It would be good to have this question answered officially, once and for all. - Mark Clements (HappyDog)

16 45

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2007