Wikitech-l October 2003

wikitech-l@lists.wikimedia.org

79 participants
171 discussions

AUTO EXTRACTION FROM WWW
by RameezDon 14 Oct '03

14 Oct '03

Dear Sir, We are a group of 3 students currently pursuing our B.E - IT (Bachelor of Engg. Information Technology)from the Mumbai University, INDIA. As of now we are working on a project titled " AUTO EXTRACTION OF CONTENTS FROM THE WORLD WIDE WEB" as a part of our BE project, in the renowned institute os HBCSE-TIFR ( Homi Bhabha Center for Science Education - Tata Institute of Fundamental Research) under the guidance of Scientist Dr.Nagarjuna.G. Our project is based on OS - GNU/LINUX Language - Python Server - Zope Application - GNOWSYS GNOWSYS, Gnowledge Networking and Organizing System, is a web application for developing and maintaining semantic web content developed in Python and works as an installed product in Zope Our project involves automatically extracting data from the (WWW) World Wide Web) & use GNOWSYS for handling this vast amount of data. This will not only help us store data in the Gnowledge base in form of meaningful relationships but also see its handling of huge amount of data. The URL for our site is "http://www.gnowledge.org" With this regards we could think no one but Wikipedia, which in itself is a phenomenon. We would be glad if u could answer to few of our queries : 1] What is the format in which the data is stored in Wikipedia ??? 2] Apart from http or ftp are there any other specific protocols that are in use, which will be required to communicate to the Wikipedia Server ??? 3] How can we utilize the SQL dump ??? We hope you will answer our queries at the earliest With warm regards Thanking You [ Rameez Don , Jaymin Darbari, Ulhas Dhuri ] ________________________________ 15 Mbytes Free Web-based and POP3 Sign up now: http://www.gawab.com

2 1

Back to IPA/SAMPA and pronunciations
by David Friedland 14 Oct '03

14 Oct '03

The last discussions on the lists about how to represent pronunciations on Wikipedia didn't end with any definitive conclusions. One of the proposed suggestions was to provide a way input IPA in an easy-to-use format and then have it automagically get converted into Unicode IPA. As a first step I wrote a Lex analyzer to convert X-SAMPA to Unicode IPA (which results in C code that I compiled to an executable). This will enable editors to enter pronunciations in the ugly-but-easy-to-type X-SAMPA format and have them appear in IPA using the Unicode IPA extensions. I also dove into the Wikipedia code and patched OutputPage.php to support <IPA> </IPA> tags that surround SAMPA and output it as <SPAN class="IPA"> </SPAN>, with the Unicode IPA HTML entities inside the SPAN tag. Finally, I patched style/wikistandard.css to have a .IPA style that explicitly sets the font-family to a list of fonts that are known to contain the IPA Unicode extensions. This is necessary because some broswers (namely Windows IE) don't display IPA Unicode characters even if they're installed unless the currently active font has those characters. I envision the use of <IPA> tags on Wikipedia to be fairly limited, in that IPA/SAMPA will only appear on pages discussing pronunciations, but I think it will make a good starting point, perhaps, for representing pronunciations on Wiktionary. I can attach the diffs for OutputPage.php and style/wikistandard.css to a future message (if I can ever get the sourceforge cvs server to respond), but what should I do with my .lex file and Makefile for building the parser? Should I post them too? I would guess most list readers would not be pleased by my spamming them with such tediums. I understand the hesitation developers have to handing out CVS access to people whose code they have never seen, so email me privately if you want me to send you what I've done. Thanks! - David [[User:Nohat]] Note: for most purposes, X-SAMPA is backwards compatible with SAMPA for various languages, but the Lex analyzer can be modified to support something like <IPA lang="French"> or something like that for the language-specific SAMPA encodings, if that is deemed desirable.

1 1

Server purchase plan - advice needed quickly
by Daniel Mayer 14 Oct '03

14 Oct '03

Jens Frank wrote: >A RAID 0+1 configuration, with data striped over >two disks on one adapter and mirrored to the >other adapter, is more reliable and probably a little >bit faster. That's what my partner said when I told him about the proposed database server (he has to deal with RAID issues at work sometimes). -- Daniel Mayer (aka mav)

4 4

C++ wiki parser
by Magnus Manske 14 Oct '03

14 Oct '03

As some of you might remember, I tried to write a wiki(pedia) offline reader/editor a while ago. After some trouble with the GUI, and because someone pointed (rightly!) out that my parser won't do unicode, I stopped working on this. Lately, I had several people asking me about a stand-alone wiki(pedia) parser. The latest related request was on the mailing list yesterday ("Java code..."). Now, I have created a (partly) working parser, written in C++. It is based on a string object I also wrote, natively using 16-bit chars and thus supporting unicode. (A function to actually import/export mysql unicode remains to be written, though). Together with this, I started rewriting the common wikipedia functions (skins, languages, user management, etc.). That part is still at its beginning, but it can already render the "Wikipedia:How to edit a page" in a half-complete standard skin. This includes a function to parse the "LanguageXX.php" files, so no need to rewrite all of that. (Import is a little slow, though, so that won't be a permanent solution; it rather screams "conversion"). As an example, the whole thing comes as a command line tool, which can render a wiki-style text (from file or pipe) into HTML (no skin or standard skin, by parameter). I have commited the sources (hereby under GPL) at the wikipedia CVS, into a new module called "Waikiki" (seemed like the natural extension of wiki to me;-) This could be the basis for several wiki(pedia) software projects: * Offline editor * Reader, to be distributed on CD/DVD * Wiki(pedia) apache module So, fire up those compilers and go to work! ;-) Magnus

5 4

Re: Re: C++ wiki parser
by user_Jamesday 13 Oct '03

13 Oct '03

>> I don't think it's wise to have four parsers doing actually the same. << On the server side it seems that speed is a desired feature, so C or C++ seems to make sense as a preferred language. They also have the advantage of being very widely understood. Broad understanding probably favors C or simple C++ (simple C++ meaning without really fancy use of classes and inheritance). Hopefully either of those would allow a lot of people to get involved in optimizing and reusing the code.

2 1

LanguageSr.php
by Nikola Smolenski 13 Oct '03

13 Oct '03

I have just finished translating LanguageSr.php. Could it be added to the Wikipedia?

2 3

Re: [Wikitech-l] C++ wiki parser
by Frithjof Engel 13 Oct '03

13 Oct '03

Hello, looks like many people have done almost the same thing: I've also begun to write a parser. Mine is written is Python. It's far away from completition, but I don't want to spend my time writing useless code so I am asking how we want to continue: I don't think it's wise to have four parsers doing actually the same. However, everbody can do what he/she wants, but I personally would rather like to work on one program than doing doubled, useless work. As Magnus Manske already pointed out, such a parser could be the base of several desired tools. We could write a library that could be used by these applications. The difference between mine program and all the others AFAIK is the point how the wiki data actually get loaded. As someone suggested on meta.wikipedia.org I have written a file called 'raw.php' that retrieves the raw wiki data. In my opinion that's quite useful for offline-editing applications and such things. You can get my parser, along with my adapted 'Article.php' and the file 'raw.php', here: www.fms-engel.de/buildHTML.tar.bz2 (Sorry for not providing a patch, when I find time, I'll do it) I don't want to start a language-flamewar, but I probably prefer Magnus Manske's version. C++ is quite fast and there are several GUI libraries one can use for each platform to write nice GUIs. I did not take a look at his program, but I guess it's much more mature than, for example, mine. I would really like to discuss this topic as I think a parser and the resulting possibilities of having one is one of the most wanted features, at least for me :) There is probably a better solution than having four parsers that do actually the same thing... Regards, Frithjof

4 3

EofT/142.177.etc/24 back as User:Mediator
by Poor, Edmund W 13 Oct '03

13 Oct '03

Would someone please look through the server logs and see whether the suspicion about User:Mediator is correct? If it's banned user "EntmoonOfTrolls", then it looks like Jimbo wants some quick action. Ed Poor -----Original Message----- From: Jimmy Wales [mailto:jwales@bomis.com] Sent: Monday, October 13, 2003 4:49 PM To: lazolla(a)hotmail.com; English Wikipedia Subject: Re: [WikiEN-l] EofT/142.177.etc/24 back as User:Mediator Well, let's make it a priority to figure out if it is him, and I think it's time for me to start pursuing some legal means here. > And if anyone can confirm that User:Mediator is not 142.177.etc., I'll > submit my apology and buy the next two rounds of drinks. By all means, let's be reasonably sure first, but I think we have to be very firm in this case. --Jimbo _______________________________________________ WikiEN-l mailing list WikiEN-l(a)Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikien-l

1 0

Add new lists to gmane.org
by Walter Vermeir 13 Oct '03

13 Oct '03

Now most Wikimedia mailing lists you can read by use of gmane.org, A mailing list to usenet service. http://news.gmane.org/?match=wikipedia (old hierarchy, to be moved to the new Wikimedia hierarchy) http://news.gmane.org/?match=wikimedia I have noticed that there are now several new lists who are not (yet) listed. Lists and proposed names and discriptions; 1. MediaWiki-l http://mail.wikipedia.org/mailman/listinfo/mediawiki-l gmane.org.wikimedia.mediawiki "Announcements about the MediaWiki software and site admin list" ! post adress is @wikimedia.org 2. Wikibugs-l http://mail.wikipedia.org/mailman/listinfo/wikibugs-l gmane.org.wikimedia.mediawikibugs or gmane.org.wikimedia.wikibugs "Monitor updates to the MediaWiki bug tracker on SourceForge" -> "Wikibugs-l" is the name of the list but it is about bugs to the MediaWiki Software so "Mediawikibugs" sounds more clear. 3. Wikilegal-l http://mail.wikipedia.org/mailman/listinfo/wikilegal-l gmane.org.wikimedia.wikilegal "Discussion list about the legal matters of the Wikimedia projects" ! post adress is @wikimedia.org Is it ok if those lists are also listed at gmane.org? Comments? There is not yet a; * WikiquoteEN-l * WiktionaryEN-l Do not know of that would be usefull. -- Contact: walter AT wikipedia.be Ook een artikeltje schrijven? WikipediaNL, de vrije GNU/FDL encyclopedie http://www.wikipedia.be

2 2

RE: [Wikitech-l] Wikipedia parser specs
by Poor, Edmund W 13 Oct '03

13 Oct '03

Tarquin, putting some specs on-line at Meta sounds like a good idea. And per Alfio's suggestion, maybe we could start by copying "Editing help". Then massage it from "tips to users" format into "specs for programmers" format. Ed Poor

1 0

← Newer
1
...
6
7
8
9
10
11
12
...
18
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l October 2003