Jimbo, the tricks that you mention (weighing titles more than text,
weighing rare words more than common ones, stopwords etc.) are already
used by mysql's fulltext index; I don't think it would make sense to
reinvent the wheel, especially since we now have real-time searching
and boolean searches, both of which are kind of nice. I am also pretty
sure that the mysql index is reasonably fast, being written in C.
They explain a bit about it at
http://www.mysql.com/doc/F/u/Fulltext_Search.html
Regarding the three letter limit: since we right now already parse
the search string for AND, OR and NOT anyway, it should be pretty easy
to remove short words from the search string, start the query without them,
and then later report the results with a warning like
The short word "the" was ignored.
Axel
This feature should be easy to do.
Unfortunately my PHP knowledge is limited, so I think
it will be better if I just ask for it instead of trying to do it myself :)
Using japanese characters in non-japanese wikipedias is currently hard.
One have to write them as &#xHEXCODE; or &#DECIMALCODE;
I think that it would be much better if parser were able to parse
fake kana (at least basic kana, full kanji would be much more work) &entities;
and convert them to numeric codes.
So one can write &hiragana_wa; or &katakana_chi;
This isn't likely to conflict with anything.
Kana Unicode table (in "English") is on http://pl.wikipedia.com/wiki.cgi?Kana
Entities that would be needed:
* Full hiragana ぁ to ゔ
* Full katakana ァ to ヺ
* Prolongation mark ー
Proposed names:
* &hiragana_x; &hiragana_smallx;
* &katakana_x; &katakana_smallx;
* &kana_long;
I also think that it might be good idea to extend it to other writing sytems
in the future.
Is it possible ?
I have found some problems with this script.
1) Print footer ("This article is from Wikipedia (URL), the free online
encyclopedia. You can find this article at URL") is in wikiPage.php
and it should be in wikiText*.php
That is it should be internationalizable.
2) The same word is used for names of languages and names of non-English
Wikipedias. That's wrong for Polish because "language" is masculine
but "Wikipedia" is feminine, so the word should be different
("polski" and "polska").
3) There are problems with interwiki links. If Polish article title contain
non-ascii characters, I can't easily link it using [[pl:Name]].
The "right" solution is of course using UTF-8 for all wikipedias.
We will have to do it some day, as not doing it is causing
more problems than it saves. But that's not an urgent issue.
As someone has previously pointed out, short word searches (such as
Ur and Oz) are important. There shouldn't be a letter minimum, if
at all possible.
We should probably use google as a good guide for usability standards.
Google lets you search for single letters, for example.
I understand there may be technical difficulties in reducing the minimum,
but we should consider it a flaw if we have a four-letter minimum,
or even a three-letter or two-letter minimum.
I've cc'ed wikipedia-l because this is a policy discussion as much as a
technical discussion.
--tc
So, where do we stand on the issue of international upgrades?
I'd like to get back to these quickly, if possible. Starting with esperanto, and then
polish. And then probably spanish, although of course we'll now need to co-ordinate with
the forpas forked group, so that we minimize the extent of the forkage in the hopes of
bringing things back together soon.
--Jimbo
I tried this script and I have a few things to say.
First, I seriously propose using UTF-8 instead of Latin-2 for Polish Wikipedia.
Browsers shouldn't have any problems with it. At least Mozilla don't have any.
And it will be more "standard", so for example it will be possible to cut&paste kanjis
into it (I tried and it works, that's main reason for this proposal).
On the other hand I don't know how would people without right
fonts react to kanjis on text box.
I think I see the right way.
Simpler way:
markup input ::= what user sends after clicking 'submit'
markup output ::= what user get after clicking 'edit this page'
On wiki markup output:
if (user_option[unicode_fonts] != "yes_i_have_all_fonts_installed")
s/(unicode_char)/numeric[$1]/;
On wiki markup input (options-independent, some people may have only fonts but no input installed):
s/(numeric)/unicode_char[$1]/;
Even better way:
On wiki markup output:
if (user_option[unicode_fonts] != "yes_i_have_all_fonts_installed")
s/(unicode_char)/special_entity[$1]/;
On wiki markup input (options-independent, some people may have only fonts but no input installed):
s/(special_entity)/unicode_char[$1]/;
It would also make sense to generate kanjis pngs instead of fonts if
(user_option[unicode_fonts] != "yes_i_have_all_fonts_installed"),
but that's a issue for future.
It might also make some sense to split user_option[unicode_fonts] into more than one option,
like japanese, chinese, arabic, cyrilic, hebrew, etc.
What do you think about this ?
----
Second, someone should create wikiTextPl.php ;)
There is a bit of code there, but most of translations can be probably taken
from Rozeta UseMod script.
On Fri, Mar 08, 2002 at 04:03:02PM -0800, lcrocker(a)nupedia.com wrote:
>
>
> >Hmmm. Now I think that some general method would be more useful:
> >&katakana_a; &kanji_b; &hebrew_c; or &cyrilic_d;
>
> If and when the W3C ever /standardizes/ these as HTML named
> entity references, we might use them. Until then, I think it's
> better to be able to point to an officially sanctioned doc and
> say "we support these", and let people complain the the
> standards body.
> 0
W3C standarizes HTML, and I'm talking about Wiki markup.
Wiki named entities will be converted into HTML numeric entities
on output, so they have nothing to do with HTML.
I checked in a fix for the bug that caused a #REDIRECT to a
non-existent page to result in an edit conflict. It turns out that
unset() doesn't destroy global values, only local ones.
Axel
I checked in two minor things:
* special_contributions.php had a bug where old contributions were
ignored. I reformulated it as a single SQL query which lists all pages
that a user has made non-minor edits on, excluding log and talk
namespaces. I hope that this is the intended behavior.
* I added a link to [[wikipedia:Searching]] to all search error
messages.
Axel
Hello everyone,
Please allow me to introduce myself, my name is Leonardo and I'm a
contributor to the Spanish Wikipedia.
I've joined this list some days ago since I want to contribute to wiki
software, specially to the Spanish encyclopedia implementation of it.
So, without any further formalities, here's a list of things I wanted to
ask you guys.
* How are u doing? :)
* Ok, I understand the English Wikipedia made a transition to a new
software written in PHP. Do I have to wait too long before I see the
transition to the new software on the international wikipedias?
* Can I get the new software sources? Where?
* About the handling of bugs and feature requests, I think Bugzilla is a
great idea. I, for one, would rely on it heavily to make contributions
to the project.
So, I guess what I'm trying to tell you is, is it a good idea if I work
on the old (perl) wiki software? or should I wait for the PHP wiki?..
bear in mind that I'm mostly interested in the improvement of the
Spanish Wikipedia (although, if I can contribute to other projects in
the process, and eventually make the world a better place, I won't get
mad :)
Personally, I have a couple of ideas I'd like to implement on the wiki
software:
(a) A new, easier(?) way to create tables inside wiki pages. I
understand that some time ago, somebody wanted to do something like this
with vertical bars (`|'):
http://www.wikipedia.com/wiki/How_does_one_edit_a_page/Old+version
(near the bottom of the page)
I wouldn't implement this feature in the same way (I don't even
understand it completely!), but basically I want to create an
alternative method to include tables in the encyclopedia without HTML.
(b) Ok, I know this may sound like pure evil, but I think that Wikipedia
could use some javascript on the editing pages (action=edit).
And before you start throwing sharpened things in my direction, please
let me expand my argument.
Wikipedia is a great tool, and it's certainly very easy to edit pages
and contribute to the project, but a plain <textarea> box is an arguably
limited widget. Now, I'm not saying this is a critical issue. It works,
and it works pretty darn good, but maybe we could take advantage of some
tools that would improve the editing experience (wow, I sound like a
marketroid :-).
Granted, this may be about just pure eye-candy stuff, and it may piss
off many javascript haters. But please remember that (1) it could be
implemented carefully, so it doesn't break anything on any browser, and
remain standards-compliant, and (2) it could be an optional feature, so
in case your head aches every time you see a <script> tag, you could
just turn it off.
Well, so what can Javascript do for us on the editing pages? I can think
of the following uses right now:
- A collection of buttons that allow you to insert special symbols
easily (like a character map, right there beside your editing box)
- A collection of buttons that insert special tags quickly. For example,
if I press a button labeled "strike out", I get the "<strike></strike>"
tags inserted in my text right away.
- A "Preview" area, updated immediately every time I type a character.
I know these are just lame examples, but I guess you get the idea.
----
So, in conclusion, I'm just another guy that wants to contribute with
the Wikipedia Project. Hopefully I'll manage to start coding a lot more
than talking real soon... :)
Thanks,
Leonardo Boshell