Wikitech-l October 2003

wikitech-l@lists.wikimedia.org

79 participants
171 discussions

Wikipedia parser specs
by Poor, Edmund W 22 Oct '03

22 Oct '03

Does anyone have a requirements document for the Wikipedia parser? If not, will those programmers who have already begun work on such a parser, like Magnus and Frithjof, please send me any scraps of documentation you have? I would like to assemble these into a wiki grammar or something like that. So we can help each other with parser development. I guess the list of "stupid parser tricks" would start with bracket notation for links: [http://www.edpoor.com] is a link to my outdated, static website [http://www.edpoor.com/images/Ae-inAndDog.jpg girl with dog] is an annotated link [[Iraq]] links to the Wikipedia article on Iraq [[Iraq|Rummyland]] links to Iraq but is shown as "Rummyland" (a Doonesbury reference, okay? ;-) Etc. Along with parsing rules for the rendering of text, is the problem of fetching and posting files. That is, coordinating each user's off-line stash (cache?) with the database. Note that some users might not want the entire encyclopedia, but perhaps only those articles they're working on. Or articles one click away? Ed Poor

9 8

RE: [Wikitech-l] Eureka! I've got it (was: Pages with hundreds ofinternational links slow?)
by Poor, Edmund W 21 Oct '03

21 Oct '03

The kerfuggle touched my what? I dunno :-) I was under the impression that EVERY TIME a user requests a page, the software has to double-check each internal link for the presence or absence of the linked page. My ancient Greek cry of jubilation only applies, if this is the case... Ed

2 1

RE: [Wikitech-l] visited links and empty links in red (Ed)
by Anthere 21 Oct '03

21 Oct '03

From: "Poor, Edmund W" <Edmund.W.Poor(a)abc.com> Subject: RE: [Wikitech-l] visited links and empty links in red Anthere, I have never been to Cologne, and I'm sorry you're feeling blue. ;-) Edmond LePauvre Me neither Edmond. I am right now fuming, so kinda red actually. So you can figure me out with a blue skin with red patches on the cheeks. The colors messing together might turn black eventually :-) __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com

1 0

latexwiki
by Wouter Vanden Hove 21 Oct '03

21 Oct '03

Anybody here knows about latexwiki? http://latexwiki.rootnode.com/wiki/LatexWiki Latex-support for Wikipedia could be very valuable for the Wikibooks about math, physics,.. Wouter Vanden Hove www.open-education.org

2 1

Pages with hundreds of international links slow?
by Rob Hooft 21 Oct '03

21 Oct '03

As a result of a robot run on the nl: wikipedia, I am now left with 25840 missing or incorrect links in other wikipedias. I wanted to put these missing links on my user pages to let people help get them in. I have done this on a smaller scale before, but this time it took me more than 1 minute to upload each of 5 segments of the list to the en: wikipedia. Even retrieving them feels like it is bringing the server down to its knees. Did something in that aspect of the software change that slowed it down dramatically?? The only thing I can imagine that is "unique" to these pages is the number of international links. See the 5 pages at: http://en.wikipedia.org/wiki/User_talk:Rob_Hooft (but only try that if you want to investigate, because it takes about a minute to generate each page from the database!) Should I take these offline again awaiting? Rob -- Rob W.W. Hooft || rob(a)hooft.net || http://www.hooft.net/people/rob/

2 2

RE: [Wikitech-l] visited links and empty links in red
by Poor, Edmund W 21 Oct '03

21 Oct '03

Anthere, I have never been to Cologne, and I'm sorry you're feeling blue. ;-) Edmond LePauvre

1 0

wfMsg() ready, is_array() is slow!
by Tim Starling 21 Oct '03

21 Oct '03

I hereby declare that $wgDatabaseMessages can be safely set to true. As long as memcached is enabled. I rolled all the messages into one memcached entry. At first it didn't seem to make much difference. I was confused as to why it seemed to be taking much longer to load from a cache than from Language::getMessage, even though they were both doing the same thing. The reason, when I eventually discovered it, was quite surprising: is_array() is quite fast when it returns false, but painfully slow when it returns true. Like, a millisecond. I rearranged my code so that it doesn't use that function. -- Tim Starling.

1 0

C++ parser : progress report
by Magnus Manske 21 Oct '03

21 Oct '03

My C++ parser is now a working offline browser. This is achieved by converting a mysql dump of the cur table into an sqlite database once, then using apache/php as a frontend and a php file calling the compiled C++ executable, which renders HTML on-the-fly. Browse using your favourite browser :-) Why bother with this, if a static HTML version would do the same? * Proof-of-principle * sqlite databases can be changed, allowing for edits * Encapsuled database object, can be easily switched to use mysql or any other database instead of sqlite for use on a website * Full-text search (which I'll implement next) Some things that bug me: * Is there an easy way to call an executable directly from apache, without the need for a PHP script in-between? * Is there a no-setup-needed web browser? Some .exe that I can start on a windows machine, then fire up the web browser, and view/edit my local wikipedia copy? Magnus

3 2

visited links and empty links in red
by Anthere 21 Oct '03

21 Oct '03

A couple of days ago, I noticed the visited links color changed. It is now red, of a color that is quite similar to the missing links red. Is this due to my browser and if so, how can I change that ? If it is software related, why was it changed ? Where was it discussed ? Can it be reverted please ? Or changed to another color ? For poor sighted people, the two reds are confusing; I had to switch to the feature ? for empty links again. (I am in cologne blue) __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com

2 1

FW: "Auto Extraction from WWW"
by Poor, Edmund W 21 Oct '03

21 Oct '03

Could someone take over this question? I don't know what to tell these guys. Thanks. Ed Poor -----Original Message----- From: Don Rameez [mailto:rameezdon@hotmail.com] Sent: Saturday, October 18, 2003 3:00 PM To: Poor, Edmund W Subject: RE: "Auto Extraction from WWW" Dear Edmund; thanx for ur reply i have installed MySql on Windows & i did try to open the SQL dump but somehow was unable to do so could u plz guide me regarding the same (i m a novice as far as Mysql is concerned) hope to hear from u soon regards Don Rameez - -------Original Message------- From: Poor, Edmund W <mailto:Edmund.W.Poor@abc.com> Date: Wednesday, October 15, 2003 01:06:19 AM To: Don Rameez <mailto:rameezdon@hotmail.com> Subject: RE: "Auto Extraction from WWW" I don't know how to convert SQL tables into plain TEXT. Why not use a SELECT statement? Like: SELECT cur_text FROM cur Ed Poor -----Original Message----- From: Don Rameez [mailto:rameezdon@hotmail.com] Sent: Sunday, October 12, 2003 9:33 PM To: Poor, Edmund W Subject: RE: "Auto Extraction from WWW" Dear Edmund, thanx for acknowledging soon & answering to my queries i appreciate ur concern for the Knowledge base i would like to ask u one more query .... Q) We now have the SQL dump, apart from MySQL is there any possibility that we can access the data in some other format ( say plain TEXT ....) regards Don Rameez -------Original Message------- From: Poor, Edmund W <mailto:Edmund.W.Poor@abc.com> Date: Tuesday, October 14, 2003 11:27:37 AM To: Don Rameez <mailto:rameezdon@hotmail.com> Subject: RE: "Auto Extraction from WWW" Your questions are best put to our senior developers, but I'll give you some preliminary answers. 1. All our articles are stored as plain English text. There is a bit of markup used for links. 2. We are not encouraging direct server-to-server links. Rather, we invite users to edit articles via the web interface. 3. You can get a SQL dump, if you want the entire database. It's much less than one GB in size, and could possibly fit on one CD (we are planning to publish a CD eventually). The difference between our project and yours is that we are a non-encoded encyclopedia. We just have a collection of articles. You are trying to "encode" knowledge, which is Very Difficult. Many attempts have been made in the past; I can't think of a single success, but I can think of half a dozen spectacular failures. It's harder than it looks! I applaud the attempt, but this task involves artificial intelligence (AI), and AI has not progressed beyond the so-called "expert system" or "neural net". These are the toys of AI and have not produced reliable, comprehensive results. What do you really hope to accomplish, in the next 5 to 10 years? Sincerely, Ed Poor Developer & Sysop Wikipedia -----Original Message----- From: Don Rameez [mailto:rameezdon@hotmail.com] Sent: Saturday, October 11, 2003 11:35 PM To: JeLuF(a)gmx.de; ts4294967296(a)hotmail.com; maveric149(a)yahoo.com; Poor, Edmund W; wikitech-l(a)Wikipedia.org; JeLuF(a)gmx.de; ts4294967296(a)hotmail.com; maveric149(a)yahoo.com; Poor, Edmund W; wikitech-l(a)Wikipedia.org Cc: nagarjun(a)hbcse.tifr.res.in Subject: "Auto Extraction from WWW" Dear Sir, We are a group of 3 students currently pursuing our B.E - IT (Bachelor of Engg. Information Technology)from the Mumbai University, INDIA. As of now we are working on a project titled " AUTO EXTRACTION OF CONTENTS FROM THE WORLD WIDE WEB" as a part of our BE project, in the renowned institute of HBCSE-TIFR ( Homi Bhabha Center for Science Education - Tata Institute of Fundamental Research) under the guidance of Scientist Dr.Nagarjuna.G. Our project is based on : OS - GNU/LINUX Language - Python Server - Zope Application - GNOWSYS GNOWSYS, Gnowledge Networking and Organizing System, is a web application for developing and maintaining semantic web content developed in Python and works as an installed product in Zope. Our project involves automatically extracting data from the (WWW) World Wide Web) & use GNOWSYS for handling this vast amount of data. This will not only help us store data in the Gnowledge base in form of meaningful relationships but also see its handling of huge amount of data. The URL for our site is http://www.gnowledge.org With this regards we could think no one but Wikipedia, which in itself is a phenomenon. We would be glad if u could answer to few of our queries : 1] What is the format in which the data is stored in Wikipedia ??? 2] Apart from http or ftp are there any other specific protocols that are in use, which will be required to communicate to the Wikipedia Server ??? 3] How can we utilize the SQL dump ??? We hope you will answer our queries at the earliest With warm regards Thanking You [ Rameez Don , Jaymin Darbari, Ulhas Dhuri ] ____________________________________________________ <http://www.incredimail.com/redir.asp?ad_id=309&lang=9> IncrediMail - Email has finally evolved - Click Here <http://www.incredimail.com/redir.asp?ad_id=309&lang=9>

4 5

← Newer
1
2
3
4
5
6
7
8
9
...
18
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l October 2003