Wikitech-l May 2003

wikitech-l@lists.wikimedia.org

59 participants
147 discussions

RE: Creating a Plucker version (was: Static html dump)

by Erik Zachte

> If anyone is interested, I have a rudimental Perl script that is > capable of reading the downloadable SQL dump and output all the > articles as separate files in a number of alphabetical directories. > It's not very fast, but it works. > What's missing from the script: wikimarkup -> HTML conversion, Mr David A. Wheeler, Have you seen my Perl script for conversion of the SQL dump to TomeRaider datase? You might find useful code there. It renders all pages in html, checks als hyperlinks and unlinks half a million orphaned ones. It edits wiki code to remove redundant tags, fixes some badly coded html tables, adds stats and language specific introduction. Replaces html tags by extended ascii (saves a lot of space). Resolves redirects, thus making hyperlinks point directly to the proper article. It removes tables that only contained an image (plus possibly a single footer text). In fact I think the script could be extended to generate separate html pages in a few hours. Plucker specifics not taken into account. Script: http://members.chello.nl/epzachte/Wikipedia/WikiToTome.pl More info: http://members.chello.nl/epzachte/Wikipedia Erik Zachte

20 years, 11 months

Modifications in CVS

by Thomas Corell

Modified Files: includes/SearchEngine.php languages/Language.php languages/LanguageDe.php Modified Search to display a link to generate a new page if with 'Go' no article is found. You have to modify "nogomatch" like demonstrated in LanguageDe.php to do so. The "showingresultnum" texts are modified, limit is removed now. Brion, take a look if it's now more as you like it. I still left $3, if anyone insists on displaying the $limit value ($1). It's all testet, at home of course, using "Nase" (nose) as query. If you think it's ok, please install at german wikipedia, in LanguageDe.php are further changes independent from the SearchEngine. Thank you in advance. -- Smurf smurf(a)AdamAnt.mud.de ------------------------- Anthill inside! ---------------------------

20 years, 11 months

DB error [wikide]

by Thomas Corell

Just another DB-Error: <table border=1 bordercolor=black cellspacing=0 cellpadding=2><tr> <th>cur_id</th><th>cur_namespace</th><th>cur_title</th><th>cur_text</th><th>cur_is_redirect</th></tr> <tr><td>5144</td> <td>0</td> <td><a href="http://de.wikipedia.org/wiki/Tacitus_%28Kaiser%29" class='internal'>Tacitus_(Kaiser)</a></td> <td>#REDIRECT [[Marcus Claudius Tacitus]]</td> <td>0</td> </tr> </table> It should be "cur_is_redirect = 1" or did I miss something again? How to solve this? (other than "update cur set cur_is_redirect=1 where cur_id = '5144'") -- Smurf smurf(a)AdamAnt.mud.de ------------------------- Anthill inside! ---------------------------

20 years, 11 months

Re: [Wikitech-l] Modification in SearchEngine (and other files)

by Brion Vibber

Je Lundo 19 Majo 2003 23:24, Thomas Corell skribis: > Can someone checking if the englisch translation of the > ''showingresultsnum'' is propper english? I tried, but that means not > so much ;) "Showing below $3 results using the respective limit of $1 starting with #$2." Whoa... :) I'd prefer something simpler: "Showing up to $1 results starting with #$2." Or better yet, just give the actual number of results and let the chunk size be shown by the "next X" / "prev X" links. -- brion vibber (brion @ pobox.

20 years, 11 months

Creating a Plucker version (was: Static html dump)

by David A. Wheeler

> Je Mardo 20 Majo 2003 03:53, Alfio Puglisi skribis: > > I just subscribed (I'm the wikipedia user At18) to ask about the > > automatic html dump function. ... > > > If anyone is interested, I have a rudimental Perl script that is > > capable of reading the downloadable SQL dump and output all the > > articles as separate files in a number of alphabetical directories. > > It's not very fast, but it works. That's great! > > What's missing from the script: wikimarkup -> HTML conversion, You should be able to call the existing PHP code that generates HTML to do this. A tool that generated the entire Wikipedia, in static HTML format, would make it trivial to generate the "Plucker" format for Palm PDAs. Plucker is offline web browser for Palm PDAs; it's open source software/Free Software (OSS/FS) released under the GPL. It can handle HTML, as well as PNG, GIF, JPEG, txt, and a few others; HTML is usually rendered as you'd expect (hypertext, italics, bold, font size changes, lists, indenting all work). It'd be very nice if the Wikipedia were available in Plucker format; that would mean that an OSS/FS reader could be used to view the text on a Palm PDA. Plucker is available at: "http://www.plkr.org". I have a Palm, and it is the MOST important program I use by far. One minor problem is that Plucker doesn't have an index facility. That could be solved by creating HTML pages that link to sorted articles, e.g., "Master Index" could list "A, B, C..."; clicking on "A" would reach "Index A" which would list "AA, AB, AC...". Then, modify the static version of the main main page so you could quickly jump to the master index. Internally, Plucker will break long pages (>32K) into multiple pages with front and back link - but that'll be automatic and won't affect anything. I don't know of an automatic way to download the Wikipedia images (which, in my mind, is a serious problem). Hopefully there will soon be a way to download the images other than trawling. However, for a Palm you'd have to drop the images in general anyway, so for that particular use it wouldn't matter.

20 years, 11 months

Experimental gzip compression

by Brion Vibber

Just for kicks, I've added some preliminary, experimental support for gzip encoding of pages that have been saved in the file cache. If $wgUseGzip is not enabled in LocalSettings, it shouldn't have any effect; if it is, it'll make compressed copies of cached files and then serve them if the client claims to accept gzip. At present this only affects file-cachable pages: so plain current page views by not-logged-in users. Compression is only done when generating the cached file, so it oughtn't to drain CPU resources too much. My informal testing shows the gzipping takes about 2-3 ms, which is much shorter than most of the page generation steps. (Though it will eat up some additional disk space, as both uncompressed and compressed copies are kept on disk.) I'd appreciate some testing with various user agents to see if things are working. If you receive a compressed page, there'll be a comment at the end of the page like  A few notes: This needs zlib support compiled into PHP to work. I've done this on Larousse. An on-the-fly compression filter could also be turned on for dynamic pages and logged-in users, but I haven't done this yet. Compression support could be a user-selectable option, so those with problem browsers could turn it off, or those with slow modems could turn it on where off by default. :) The purpose of all this is of course to save bandwidth; there are two ends of this, the server and the client: Jimbo has pooh-poohed concerns about our bandwidth usage; certainly the server has a nice fat pipe to the internet and isn't in danger of choking, and whatever Bomis's overall bandwidth usage, Jimbo hasn't complained that we're crowding out his legitimate business. :) But still, we're looking at 5-20 *gigabytes* *per day*. A fair chunk of that is probably images and archive dumps, but a lot is text. On the client end: schmucks with dial-up may appreciate a little compression. :) I've also fixed what seems to be a conflict between the page cache and client-side caching. There are some race conditions remaining as far as making sure that two loads of the same page don't overwrite each other's work or read another's page partway through, and adding a gzipped second file perhaps complicates this a bit... also still some cases where caches aren't invalidated properly. -- brion vibber (brion @ pobox.com)

20 years, 11 months

Static html dump

by Alfio Puglisi

Hello, I just subscribed (I'm the wikipedia user At18) to ask about the automatic html dump function. I see from the database page that it's "in development". If anyone is interested, I have a rudimental Perl script that is capable of reading the downloadable SQL dump and output all the articles as separate files in a number of alphabetical directories. It's not very fast, but it works. What's missing from the script: wikimarkup -> HTML conversion, some intelligence to autodetect redirects, dealing with images, and so on. I don't know if someone is in charge of this fuction. If so, I can post the script. Otherwise, I can further develop it myself, given some directions. Alfio

20 years, 11 months

script shows up with IP address

by Marco Krohn

Hi, I am currently working on a small python library which allows reading and writing articles. Among the features is an (semi)automatic detection of copyright violations. When an article is written by the script, I remarked that the IP address of my machine shows up in the log. I would prefer it if my username shows up, allowing everyone to contact me easily. Therefore my question: what parameters do I have to send to /w/wiki.phtml in order to appear with my username? Any help with this would be very much appreciated, thanks in advance, Marco P.S. Not sure this is the right list to ask this: could someone please obscure the emails in the list archives? The problem is that by replying many people cite the full email address which shows up in the archive, e.g., see http://mail.wikipedia.org/pipermail/wikipedia-l/2003-April/009828.html for an example. I already have enough spam in my inbox :-( -- +++ GMX - Mail, Messaging & more http://www.gmx.net +++ Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!

20 years, 11 months

Re: [Wikitech-l] LanguageFr.php

by Guillaume Blanchard

> Ok, cool ! > It's not hurried... anyhow the misspeling page don't work well on french > wikipedia ;o) > It don't check accent while it is the more common misspeling in french > (with double consonant). > Can you add the misspeling function to "class Language" that we can make a > different version for "class LanguageFr" ? > Regards, > > Aoineko Sorry I just think there are perhaps a more easy way. I don't know exactly what this used for : "linktrail" => "/^([a-zàâçéèêîôû]+)(.*)\$/sD", But perhaps we can use a different filter (it is?) for the misspelling page. For search page -> we want to find page even if we forget an accent For misspelling page -> we want to find pages where there are accent misspelling An other question : I have the developper statute on CVS. If I finally succeed to connect to CVS from my office I will be able to make my own update. Who to ask before to do any update ? This list ? Where can I found the "todo list" ? I'm video-game programmer so I'm more efficient in hard core coding than in data-base programming. Regards, Aoineko

20 years, 11 months

ISBN update

by erik_moeller＠gmx.de

I've changed the string for the ISBN replacement page from "WIKI-ISBN" to "MAGICNUMBER" to avoid triggering the "ISBN .." parser itself on the page. Otherwise you get into trouble when you try something like "ISBN WIKI- ISBN". Regards, Erik

20 years, 11 months

← Newer
1
2
3
4
5
6
7
8
9
...
15
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2003