Wikitech-l February 2004

wikitech-l@lists.wikimedia.org

86 participants
164 discussions

Parser implementation submitted to HEAD
by Jens Frank 01 Mar '04

01 Mar '04

Hi, a new parser for internal links [[...]] and ''quotes'' has been committed to cvs HEAD. Using the new parser, image thumbnail captions can have links via [[Image:bla.jpg|thumb|A big [[bla]] I've photographed]]. The code has broken prefixed links that are used by ar: (Al[[razi]] kind of links). I'll have to fix this tomorow. A test page is available at http://jeluf.mormo.org/testwiki/wiki.phtml?title=London Regards, JeLuF

2 3

Re: User-agent block list & Netscape 3
by Hunter Peress 29 Feb '04

29 Feb '04

It is wrong to support old browsers because it makes wikipedia maintainance more crufty and difficult. Theres 3 main cases where someone has an old browser: They are in the first world environment: 1) are completely clueless 2) are part of the digital divide (eg old, and hence somewhat clueless) or 3)They are in the third world. The majority of people in the first world are not those who we need care about whatsoever, as its very easy for them to seek out help (remember, linux is free, get a live CD; boot it, run wikipedia...) Third world however, is the most important area to focus on. Now, the ONLY reason these people are running an old browser is because they are running windows. Thus, we should tell them to run linux. Its the ONLY viable solution for running an up to date OS on legacy hardware. Now, if they have internet connection, i dont care where they are, they are within physical distance of someone with more of a brain than they have (namely the ISP staff). Either one of these people will be able to install linux. You see, these 3rd world herd humans that are not able to figure out how to run something other than NS3/IE2.0 ... were somehow able to figure out how to get a windows box. Thus, we know that they arent incapable of doing ...things. Thus, if they breath, and are poor we advise them to run linux. its better for us, and it will domino be better for them and then us...and so forth. this really isnt worth discussion: rectify quietly and efficiently. __________________________________ Do you Yahoo!? Yahoo! Mail SpamGuard - Read only the mail you want. http://antispam.yahoo.com/tools

10 10

enforcing hierarchy on mediawiki, newbie questions
by Edward S. Peschko 29 Feb '04

29 Feb '04

hey all, I wanted to run my own wiki, and had a couple of questions that I would greatly appreciate being answered (or getting pointers to the right direction for answers) I want to make a wiki, but I want to enforce some constraints on it.. First, I want the wiki to be in a generic hierarchy such that the hierarchy follows certaion administrator-defined rules - ie: a) its a hierarchy with three levels b) linking between levels is limited to certain places inside the web page. Ultimately, what I'm looking for is an interface between wiki and CVS - I'd like the pages to be gathered from mysql and dumped into source control in corresponding directories (this would fit my application very well). And, ultimately to go the other way, to take stuff out of CVS and put it onto the wiki. I wouldn't mind writing the glue code to do this.. but I'd rather not write an entire new wiki in order to do it.. ;-( In other words, I'm hoping that this isn't a wheel that have to reinvent. I'm really interested in using wiki as a collaboriative tool, but I'm not sure if it fits the more rigorous model that I have in mind.. If it doesn't, what do people suggest as alternatives? Thanks much, Ed ( ps - what's the difference between wiki and its various clones particularly phpwiki? Is there a wiki coded in perl? )

4 3

Re: Cool project for perl programmer
by Timwi 29 Feb '04

29 Feb '04

Jimmy Wales wrote: > To compare Wikipedia to Columbia Encyclopedia... > http://www.encyclopedia.com/ > has the full text of Columbia. > > There are pages for alphabetic browsing. > http://www.encyclopedia.com/browse/browse-Aa.asp > > From these pages, it should be possible to get a list of all their > article titles. > > These could be matched up against Wikipedia article titles. Well, matching them up doesn't prove very easy. For example, what they call "Abdül Aziz" is on Wikipedia called "Abd-ul-Aziz". I have used the following heuristics to match up articles: - redirects (obviously) - names in the other order ("Thomas Jefferson" rather than "Jefferson, Thomas") - middle names deleted The latter two are already somewhat error-prone (though I haven't spotted such an error yet). With just these, I was able to match up 24003 article titles from encyclopedia.com with articles on Wikipedia. 25101 other article titles did not yield Wikipedia equivalents (although many of them have one; e.g. Aziz as mentioned above). A number of other titles (silly me forgot to output their number) led to the same Wikipedia article; for example, "Aachen" and "Aix-la-Chapelle" were listed seperately on theirs, but of course they're the same thing. The 24003 Wikipedia articles I could match up amount to 79979774 bytes (almost 80 MB). However, unfortunately I also had to find that some of them are disambiguation pages; for example, where encyclopedia.com has a one and only "Adalbert, Saint", Wikipedia's [[Saint Adalbert]] disambiguates to [[Adalbert of Prague]] and [[Adalbert of Magdeburg]]. So, clearly, this isn't quite as easy. But anyway. Here is the complete report: (**WARNING!** 5.3 MB file! Very slow server! Better let it download, have dinner, and then view locally!) http://lionking.org/~timwi/t/wikipedia/comparison.html Greetings, Timwi P.S.: More fun projects? ;-)

1 0

RE: [Wikitech-l] Re: Faster parsing
by Poor, Edmund W 29 Feb '04

29 Feb '04

Timwi is, of course, completely accurate in all his statements. If I were a college professor, I would award him A+ and try to get him a job as a teaching fellow! :-) My worst error was describing n log n as "log n" -- back to algebra class for Uncle Ed! Anyway, since there are lots of sharp minds ready to pounce on any bugs, why don't we start taking an organized look at the database structure? I'm clearly no expert on /devising/ sort algorithms, but I'm fairly good at recognizing whether someone has come up with a good idea. Who wants to work on database structure with me -- or, at least, is willing to let me sit in and watch?! Ed Poor, aka Uncle Ed

11 31

Unbelieveable 1.1.0
by ciaran 29 Feb '04

29 Feb '04

I really wish I hadn't "upgraded" to this recommended version. But I won't go into all the issues here. One question: The status of links (broken, existing) NO LONGER UPDATE on any Wiki posts. Does anyone know why this is? I don't ever plan on implementing memcache (the lack of real documentation as to how I can do it being the primary reason)... is this an issue for future wikipedia releases? Is memcache (or lack thereof) ruining the Wiki's ability to update broken/not broken links? I have memcache turned off, and the MediaWiki namespace turned off as well. Thanks for answers. ciaran

5 11

MySQL tables and indexes
by Timwi 29 Feb '04

29 Feb '04

Hi, this isn't strictly on-topic, because it's not specific to Wikipedia, but you can probably help me with this. 1) When I load the downloaded SQL dump into MySQL, does it matter if I have already created the indices for the table, or is this detrimental? 2) If the answer to that is "It is detrimental", then: How do I remove those indexes? Apparently even if I delete the entire database and re-create it with just the 'cur' table, magically the indexes are still there. Thanks, Timwi

3 5

Re: Cool project for perl programmer
by Timwi 28 Feb '04

28 Feb '04

Jimmy Wales wrote: > From these pages, it should be possible to get a list of all their > article titles. > > These could be matched up against Wikipedia article titles. > > Then we could ask the hypothetical: suppose Wikipedia just snagged the > same 55,000 topics as Columbia? How big would the resulting text be? I'm taking it! Just today I've downloaded the en.wikipedia.org database dump. I don't have a very fast machine, so it took some time to decompress, and it's still busy importing it into the DB. Does anyone know approximately how long that takes? (Since it doesn't show any progress meter or anything) But once that is done, the Perl script will be easy. Timwi

1 0

Cool project for Perl programmer
by Erik Zachte 28 Feb '04

28 Feb '04

Well I actually snagged it some 1.5 year ago already. It is a 50 Mb TomeRaider file on my Pocket PC :) Alas not public domain, so I did not publish this. Compare to en: TomeRaider file, most recent edition (Dec) is 185 Mb (TR uses internal compression). The Columbia download was 1 GB, but that was mostly HTML. Comparing on article per article basis will be different. Titles and organization of topics will differ. Erik Zachte

1 0

Cool project for perl programmer
by Jimmy Wales 28 Feb '04

28 Feb '04

To compare Wikipedia to Columbia Encyclopedia... http://www.encyclopedia.com/ has the full text of Columbia. There are pages for alphabetic browsing. http://www.encyclopedia.com/browse/browse-Aa.asp >From these pages, it should be possible to get a list of all their article titles. These could be matched up against Wikipedia article titles. Then we could ask the hypothetical: suppose Wikipedia just snagged the same 55,000 topics as Columbia? How big would the resulting text be? If the answer is in the ballpark of 6,500,000 words -- i.e. the same size as Columbia - then we have an obvious strategy. If, as I would imagine, the answer is that we're bigger, then we can start digging into how many of our longer articles would have to be edited down in order to hit the same "ballpark". Note that we don't have an answer from a publisher as to how big we can be. The guy I talked to expressed a desire to be "as big as possible" but I warned him that that's a limitation that's going to come from their end, not ours, because we're already bigger than Britannica, so our issue is how to get *small enough*, not how to produce *enough*. --Jimbo

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2004