Jan Hidders wrote:
>On Monday 16 August 2004 19:35, Magnus Manske wrote:
>> So, what you'd need is an EBNF representation or something?
Mind if I jump in and suggest a substitution for the "or something"
alternative? :) Check out "parsing expression grammar" on wikipedia (and in
more detail on the external links that article leads to). Although at the
moment you probably won't find a parser generator that'll generate the PHP
code you want from it, if the primary goal here is formal specification then
that's not such an issue - and in any case, unlike (LA)LR CFGs, parsing
expression grammars tend to be very easy to convert manually into working
parsers.
>Yes, that is all I ask. A precise formal grammar in whatever notation you
>like
>but preferrably in the input format of bison. If only because that would
>document what it is that your parser exactly does. Note there are some
>requirements:
>- it should accept all possible input strings
>- it should ideally do all the necessary parsing i.e. from the parse tree we
>should be able to generate the output with a single tree-walk, and
>- it should be unambiguous (and even LALR(1)) or have an explicit conflict
>resolution rule
The first and third requirements above are likely to be _very_ difficult to
achieve at the same time in a CFG paradigm, because of the limited lookahead
and extremely rudimentary disambiguation facilities of LR-class parser
generators. LR parser generators are designed for languages that were
designed for LR parser generators; they tend to be difficult or impossible to
use for more freeform languages such as wikitext without making some serious
compromises or horrible hacks. Not to discourage you from trying, though; I
just want to point out an alternative that you might consider when the
shift-reduce and reduce-reduce conflicts start becoming unbearable. :)
One caveat, though: I'm not exactly unbiased, since I wrote much of the
aforementioned stuff on parsing expression grammars. :)
OK, I'll shut up now.
Cheers,
Bryan
I've just now subscribed to this list, so please stop me if this has been already discussed... After hearing about the storm coming toward the wikimedia servers. I thought to myself, wouldn't it be a good idea to have backup servers elsewhere in the world? Especially since Florida is highly susceptible to this sort of weather this time of year... I'm not sure if our funding or budget would allow for distribution of our servers. However, my guess is, there are probably more than one wikipedia people who wouldn't mind setting up a server of theirs somewhere as a backup. Even if it's just a no edit cached back-up. Something is better than nothing...
--
Michael Becker
Hello,
I have some ideas (certainly grabbed from this list though) to enhance
disambiguation pages. Actually the software doesn't really now if a page
is a disambiguation and that prevent me from implementing new feature.
Seems like the best way is to add in the database a way to know what
kind an article is. I came up with two solutions :
1/ add a 'cur_is_disambiguation' just like we have a 'cur_is_redirect'
2/ add a 'cur_kind' with a new table 'kind' that list the different
possible articles (null, 'redirect', 'disambiguation').
I will personally go for #2 that will help us implement new features
later if it's needed (like adding 'stub', 'quality' type of articles).
--
Ashar Voultoiz
>
>
>Gerard Meijssen wrote:
>> some time ago to move to UTF-8. At the time it was not such
>a good idea
>> as UTF-8 does take more room.
>
>More room? UTF-8 does not use more memory, if that's what you
>mean. HTML
>entities (like Ӓ) use 5 up to 7 bytes, while a character
>in UTF-8
>uses at most 4 bytes.
It wasnt the pb, to convert nl.wikipedia, we had to uncompress the old table, which once uncompressed take much more space.
Shaihulud
Ask when you want, I prefer the week-end, I have more spare time :)
Shaihulud
>
>I have been informed by Walter that nl:wikipedia had decideded quite
>some time ago to move to UTF-8. At the time it was not such a
>good idea
>as UTF-8 does take more room.
>
>However, as many other wikipedia like de: and es: have made the
>conversion, it is a good idea to ask for the nl:wikipedia to be
>converted. We have asked if there are people on nl: that have opposing
>views, there were none.
>
>As nl:wikipedia is a very active environment, some prior
>notice would be
>appreciated.
>
>Thanks,
> GerardM
>_______________________________________________
>Wikipedia-l mailing list
>Wikipedia-l(a)Wikimedia.org
>http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
>
Hello,
I investigate to embed a wiki engine into another web application. As
wikipedia user I think mediawiki is one of the best engine and I would
integrate it. But I would have your developer point of view :
Do you think it is a good idea to embed mediawiki as wiki engine or
should I look for another tool ?
Thanks
Regards,
Manuel
Brion Vibber wrote:
> * Split a couple "?>" instances in string literals -- these hose some syntax highlighters
I think you should add a comment to explain this. Otherwise in future
people will be likely to undo this, not thinking anything of it.
Timwi
CB low
> Date: Tue, 17 Aug 2004 01:10:02 +0100
> Rowan Collins wrote:
>
> On Fri, 13 Aug 2004, Jens Ropers wrote (to WikiEN-L):
>
>> Can I suggest we implement a "cover all" redirect, whereby ALL
>> [[Wikipedia:Some stuff here]] kind of links can alternatively be
>> written [[WP:Some stuff here]]?
>
> [snip]
>
>> Not all "Wikipedia:(...)" pages have "own" their redirects (meaning
>> redirects such as WP:SB), but it still seems like a good idea to
>> ALWAYS
>> allow the "Wikipedia:" bit to be abbreviated as "WP:" -- so that if
>> I'm
>> writing e.g. [[WP:Dispute resolution]], this would automatically
>> redirect/be expanded to [[Wikipedia:Dispute resolution]]. Again, I'm
>> asking whether we can do the WP: to Wikipedia: redirecting/expanding
>> stuff on a GENERIC basis. (I know it /could/ be done on an individual
>> article basis, but that's not my point.
>
> And this is even more useful for namespaces with longer names, of
> course: Wikipedia_talk: is just a pain to type!
>
>> It would help, be really convenient and it would not get in the way of
>> anything. Pages can still have actual own redirects such as [[WP:SB]].
>
> Of course, if it were a real aliasing system such as I am imagining, a
> page "called" [[WP:SB]] would actually be called [[Wikipedia:SB]], but
> I don't see that that would matter much - we'd just have to move all
> pages called WP:xxx to Wikipedia:xxx before activating the alias (with
> a bot, or a database script).
We don't actually have to move anything -- it's just a matter of
precedence during the URL resolution process:
If the WP: to Wikipedia: conversion ONLY kicked in AFTER all other
option have been tried (but before the "Wikipedia doesn't have such a
page yet - do you want to create it"-screen got resorted to, then it
would just work.
Pity I can't code.
Thanks and regards,
Jens Ropers
There are two types of IT techs: The ones who watch soap operas and the
ones who watch progress bars.
http://www.ropersonline.com/elmo/#108681741955837683
Hello,
I created lists of cur table entries which have the same namespace and title.
They were probably created by clicking "Save page" more than once on a newly
created article.
These lists lead to bogus entries in maintenance lists, e.g. orphaned articles
or short pages.
I uploaded three lists:
* http://en.wikipedia.org/wiki/User:SirJective/Double_entries
* http://de.wikipedia.org/wiki/Benutzer:SirJective/Doppeleintr%C3%A4ge
* http://nl.wikipedia.org/wiki/Gebruiker:SirJective/Double_entries
Every language I downloaded so far (hu, fr, ja) has double entries.
I used a query like this on a local dump:
SELECT DISTINCT
c1.cur_namespace, c1.cur_title, c1.cur_id, c1.cur_timestamp
FROM
cur c1,
cur c2 USE INDEX (name_title_timestamp)
WHERE
(c1.cur_namespace = c2.cur_namespace
AND c1.cur_title = c2.cur_title)
AND c1.cur_id <> c2.cur_id
ORDER BY c1.cur_namespace, c1.cur_title, c1.cur_timestamp
LIMIT 1000;
At de-WP, Echoray and Peterlustig found that these entries can be removed by
deleting and undeleting them.
Are there any objections to handling these entries this way? If not: Is some
sysop willing to go through the lists and del/undel the entries?
Christian aka SirJective
Steinar H. Gunderson has been working on a C++ diff engine that can be
plugged into MediaWiki without too much fuss. Current 1.4 CVS has
drop-in support for it thanks to JeLuF.
My very unscientific test with a couple of ~134kb revisions of the
Village Pump with some number of paragraphs changed:
Plain page view: 2798ms
Classic diff: 4856ms (2798ms + 2058ms)
Diff with wikidiff: 3342ms (2798ms + 544ms)
Man, those long pages _drag_. :) Turning on memcached & the parser cache
brings down the page view time to 1501 ms, but doesn't significantly
alter the diff views, which go through the full render cycle.
[Current CVS head, no special parser caching options enabled, MySQL
4.0.20, PHP 5.0.1, Athlon XP 2400+. Times are median of 10 successive
hits run with 'ab' on the local host.]
This seems to be roughly a 4x improvement in the actual diffing speed
for a typical long page change a bit case.
The code for the differ is in the 'extensions' module in CVS, under
'wikidiff' subdirectory. It should compile on Linux; I've added a more
or less sensible Makefile. Requires SWIG (http://www.swig.org/).
Once php_wikidiff.so is installed, set $wgUseExternalDiffEngine = true;
and _magic_ happens!
-- brion vibber (brion @ pobox.com)