Major problem: z-with-dot letter seems to be screwed in links.
Links from http://local_copy_of_wikipedia/wiki.phtml?title=Wy%C5%BCsze_uczelnie_w_Pols…
to subpages are broken.
They work on Polish UseMod Wikipedia, so it must be problem with conversion script.
Maybe there are some mistakes in latin2 -> unt8 table ?
To Ward and to Wikipedia's tech list,
On http://c2.com/cgi/wiki?WikiHistory it is claimed that this (c2.com)
is the first Wiki website, founded in 1994, as a supplement to the
Portland Pattern Repository. The next dates on the page say that
RecentVisitors and PeopleIndex were added in 1994, NotSoRecentChanges
in 1995, ThreadMode, ThreadModeConsideredHarmful, EditCopy and
WikiCategories in 1996.
However, the Internet Archive's Wayback machine shows no sign of any
Wiki in November 1996, when the c2.com website was first archived.
The Portland Pattern Repository is there, but it consists of plain
HTML files. The Wiki Wiki Web does show on the second archiving in
December 1997. See http://web.archive.org/web/*/http://c2.com
So where was the Wiki hiding during 1994, 1995, and 1996?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
tel +46-70-7891609
http://aronsson.se/http://elektrosmog.nu/http://susning.nu/
The conversion script now includes old revisions of articles. I've tried
it on the Esperanto and Polish databases which I had handy, so far it
seems to work. Please test on other languages if you dare.
It shouldn't be too hard to modify this code to extract just the old
versions from the old English wikipedia and drop them into the current
database there, as well.
Notes:
* Since user accounts are not transferred, there is no numerical user ID
to put in the old_user field. Currently this results in the wiki
thinking the user name in old_user_text was an IP address and trying to
mask the last 3 digits, and not making links to user pages in the
history lists. The digit masking is definitely wrong, however not making
the links is arguably correct behavior.
* The most recent revision still has its user, comment, and timestamp
wiped and replaced with "conversion script", "automatic conversion", and
time of conversion. Would it not be nicer to keep the previous user,
comment, and timestamp, as is done with the older revisions?
* We might, however, still want to add a note that conversion took
place, so it's an obvious cutoff in the history list.
* Do we want to run fixLinks() on the old page versions? (This changes
/subpage links into Page:subpage links.) Right now I do so to preserve
link functionality, but this may not be appropriate, as it changes the
content of previous versions slightly. The purpose of keeping old
versions is to see what changed, so we might prefer to have the
unchanged (and no longer working) /subpage links. Comments?
-- brion vibber (brion @ pobox.com)
Everyone on these lists who can speak Polish - please check this translation
and whether internationalization works in general. Not all strings have been translated yet.
It uses UTF-8.
I think that's the right way.
If you think otherwise please say why.
You can "easily" setup PHP local wikipedia by:
$ cvs -d:pserver:anonymous@cvs.wikipedia.sourceforge.net:/cvsroot/wikipedia login
<click enter for password>
$ mkdir wikipedia
$ cd wikipedia
$ cvs -d:pserver:anonymous@cvs.wikipedia.sourceforge.net:/cvsroot/wikipedia -z3 co .
$ cd phpwiki/fpw
* edit wikiLocalSettings.php to something like:
<?
$wikiLanguage = "pl";
$wikiThisDBserver = "localhost"; # the IP address of the computer running mysqld
$wikiSQLServer = "wikipedia"; # the mysql database used for Wikipedia
$wikiThisDBuser = "taw"; # the mysql user
$wikiThisDBpassword = "xxx"; # mysql password
$wikiCurrentServer = "http://localhost";
$THESCRIPT = "/~taw/wikipedia/wiki.phtml";
$wikiArticleSource = "$wikiCurrentServer$THESCRIPT?title=$1";
?>
* copy and gunzip wikiTextPl.php from my mail
$ ls -s ~/src/wikipedia/phpwiki/fpw/ ~/public_html/wiki
* set up mod_php4 in apache if not already done, by Webmin or otherwise
* set up mysql permisions if not already done, by Webmin or otherwise
* follow instruction from README file to set up mysql database
* now try and enjoy local instalation of wikipedia
Issues:
* importing Polish Rozeta/UseMod tree.
* RewriteRules
* translate all yet not traslated strings
* fix any wrong translations
I would like to see Polish wikipedia moved to new software as soon as possible.
I just checked in a new diff engine from phpwiki 1.3.3 (GPL). It can
display diffs line by line and highlight the changed words. I also
added a diff display to the edit conflict screen.
I'm very impressed with the phpwiki project. They are also working on
internationalization issues; I think we could do far worse than
check out what they have come up with.
Axel
On dim, 2002-03-10 at 09:54, Jimmy Wales wrote:
> pl.wikipedia.com
>
> How about I set this up on Monday? Do I need a new conversion script?
I'll need to tweak the conversion script to convert the Latin-2 database
to UTF-8, and drop in a quick fix to make searching work on non-ASCII
words. Shouldn't be hard, I'll drop a note when it's checked in.
The full set of encoding-related changes to the code still need to be
determined and made, but if we start it off using UTF-8, the database
should be ready.
(Also, we may need to rebuild the search index later, as that part of
the code may still change. But then this is the case if we change the
mysql indexing anyway.)
> I believe that this is best:
>
> 1. I set up pl2.wikipedia.com as the "beta test" Polish wikipedia,
> and import the existing data from pl.wikipedia.com using a conversion
> script.
>
> 2. We run this for one week, inviting as many Polish speakers as we
> can find to test it, incorporating bug fixes all week as quickly as
> they come in.
>
> 3. The following Monday, I make the conversion "for real".
Sounds good to me. Can we commit to such a schedule for Esperanto as
well? We've only put it off so far because we were going to change the
database internals, but since we seem to have decided on UTF-8 it's not
a problem to convert ASAP and make the changes later.
> I'd like to follow up with Spanish and Esperanto as quickly as possible.
FYI, I currently have a beta running the converted Esperanto wiki at
http://leukas.dyndns.org/wiki/ . A more officialish server wouldn't hurt
though, as this is limited to the reliability of my DSL line.
-- brion vibber (brion @ pobox.com)
>Still, what if there are legitimate
>2 letter searches, like for "Ur" as someone else pointed out?
The only way to make mysql index shorter words is by recompiling and
setting MIN_WORD_LEN to a different number. They claim that changing
this variable from the current value of 4 to 0 will enlarge the index
by a factor of 20. So maybe we should go with 2. After that, the
indexes have to be rebuilt.
Axel
> > The English wikipedia isn't just for English
> monolinguals, is it?
>
> Is this the new politically correct term for
> Americans? :-)
Sí, tienes razón.
Ja, du bist richtig.
Jes, vi pravas.
Chuck :)
=====
Come to my homepage! Venu al mia hejmpagxo!
http://amuzulo.babil.komputilo.org/
====
Venu al la senpaga, libera enciklopedio
esperanta reta! http://eo.wikipedia.com/
_________________________________________________________
Do You Yahoo!?
Información de Estados Unidos y América Latina, en Yahoo! Noticias.
Visítanos en http://noticias.espanol.yahoo.com
On Sun, Mar 10, 2002 at 03:11:02PM +0000, Joao M?rio Miranda wrote:
> Tomasz Wegrzanowski wrote:
> > 3) There are problems with interwiki links. If Polish article title contain
> > non-ascii characters, I can't easily link it using [[pl:Name]].
>
> I have the same problem with the portuguese wkipedia. I can't
> link to [[pt:portugu?s]].
You can link by %-escapes (which you can copy from URL).
This method sucks but is the only way until we all adopt UTF-8.
I strongly recommend setting all non-English Wikipedias to UTF-8
on transition to PHP script, regardless of whatever encoding is
currently most popular for your languages.
I also think that English Wikipedia should move to UTF-8.
If I search for "van eyck" (with the whitespace, but sans the quotes),
I get an ugly error message. Searching for "eyck" actually solves my
particular problem for the moment, but the error message is still
ugly. Is it really such a crime to want to search for a string that
includes the whitespace? Perhaps this could be explained somewhat
more politely?
Also, if I search for "van\ eyck", the error message comes up and the
input box contains "van\\ eyck". Maybe there is a potential risk for
a buffer overrun or something more exotic here? I think the input
string should be quoted before being used in a regexp search.
--
Lars Aronsson
<lars(a)aronsson.se>
tel +46-70-7891609
http://aronsson.se/http://elektrosmog.nu/http://susning.nu/