[actually the subject was "A Modest Proposal on grammar and parsers" but I
wanted to merge it with the other thread]
On 11/14/07, Brion Vibber <brion(a)wikimedia.org> wrote:
>
>
> While that likely would mean changing some corner-case behavior (as
> noted above, the existing parser doesn't always do what's desired), it
> would not be a different *syntax* from the human perspective.
This is a very fine, but very important line you're drawing.
I had been assuming that not supporting the "some ''text[[foo|blah ''blah]]"
case would count as changing the "syntax".
But that syntax is not salient, it is not important, it is arguably broken,
and changing it is not "from the human perspective" necessarily even a
change.
Similarly, you can embed <gallery> tags in the middle of a sentence. Is this
desirable? Is this a mistake? Does this form part of the actual "syntax" we
want to support?
Is it ok if I change the goal of the EBNF project from:
1) To produce a grammar that precisely matches the parser as it currently
behaves.
to
2) To produce a grammar that is indistinguishable from the current parser as
it is normally used.
In other words: let's record the syntax as it exists in people's minds (and
their existing work), rather than the behavior of the actual parser.
Steve
Dear All,
I am a PhD student at the University of Copenhagen, studying Distributed
Cognition in the Wiki(Pedia) Article. I need to be able to get down some
data but I was told (and seen) that the english dump has been broken for a
long time.
So, here are few questions for the beloved tech-people:
- how does one get a dump of en.wikipedia?
- is there a place with historical dumps? Every year or so?
- how difficult would it be to make an easy way to download one article's
*whole* history? (does it have to be necessarily 100 edits at a time? even
if I only need one article, once?)
Big thanks in advance,
rut jesus
On Nov 14, 2007 6:31 PM, Anthony DiPierro <dipierro(a)gmail.com> wrote:
> On Nov 14, 2007 5:48 PM, George Herbert <george.herbert(a)gmail.com> wrote:
> > I am not a Javascript guy, so I apologize in advance if this is a dumb
> > question, but... Is it possible to make {{USERNAME}} some javascript which
> > expands it on the client side, so the server just provides that JS to the
> > browser and lets you figure it out? That would be the same JS code for
> > everyone, so the underlying parsed article would stay in memcached
> > unchanged...
> >
> Yes, but:
>
> 1) that wouldn't work for logical constructs, and
> 2) it wouldn't work for people who don't have javascript enabled.
>
> As it wouldn't work for logical constructs, might as well just make a parse
>
Hit send prematurely:
As it wouldn't work for logical constructs, might as well just make a
single pass substitution of the cached parsed article.
includes/ directory directory contains about 200 files. It looks like a
dump, where it's hard to find something. I propose to move some files to
subdirectories (if it won't break something):
* includes/specialpages/ (SpecialPage.php and all special pages)
* includes/output. (or other name; all functions obviously related to
output: HttpFunctions.php, OutputHandler.php, OutputPage.php,
WebRequest.php, WebResponse.php, etc.)
* includes/database (or db; not sure if there's too many files related)
For API folder (now about 50 files; will be more after merging apiedit):
* includes/api/formats (for ApiFormat*)
* includes/api/edit (for write API)
* includes/api/query/{list,prop,meta} (includes/api/query for
ApiQueryBase, subdirectories for API queries)
I hope that it will help to make file layout of MediaWiki cleaner.
--VasilievVV
I'm making a quick summary of all the steps the parser goes through
currently, partly to get familiar with the parser. For each one I'll then
attach some BNF. Then work out a way of merging the BNFs.
I'm not sure if there's a systematic approach to this. Basic problem:
1. Parser translates X into Y by applying rule A, expressable in BNF.
2. Parser translates Y into Z by applying rule B, expressable in BNF.
What rule captures both of these in one step? Is it always possible? Is
there a general algorithm for the merge?
There are currently 13 distinct major steps, of which one ("Internal parse")
has 14 distinct substeps (not counting hooks). That's also not counting the
preprocessor, so no templates...
Is this a good way to progress towards a complete grammar? The existing
approach seemed to be to simply start from scratch, using a combination of
intuition, testing and examining the code. Any comments?
Perhaps at the least we could start compressing some of these layers. Does
anyone know of two layers that would be impossible to merge? Presumably the
preprocessor has to remain separate at the very least.
Steve
On 11/14/07, MinuteElectron <minuteelectron(a)googlemail.com> wrote:
>
> It is common knowlege with those who follow MediaWiki development
> discussions and have done for a while that changing wikitext syntax is
> considered A Bad Thing (tm) and will most likely get any commits reverted.
>
>
The parsing and rendering of all sorts of weird cases in wikitext syntax is
going to change with the implementation of a new parser not based on inline
pattern transformation. You can ask for the changes to be small. You can
demand that your favourite pathological corner case be treated correctly.
But there will be changes. They're not optional. The only way to build a
parser with the exact same behaviour in every case is the current one is
to copy it line by line.
See http://www.usemod.com/cgi-bin/mb.pl?ConsumeParseRenderVsMatchTransform
So if this is genuinely the position of the developers - that *any* change,
no matter how insignificant, to the syntax, will not be allowed - then there
will never be a new parser. Earlier discussion with Nick Jenkins led to a
more fruitful compromise, though.
So: is that compromise acceptable? If 99% of actual pages render correctly,
is that good enough to start rolling out the parser?
Steve
Hi,
I am searching for opportunities to limit search-areas, so it's possible to
search in subareas of the content.
Is there another possibility to realise this, than assign articles to
namespaces? Because sometimes it's a little bit unpractical to assign articles
to only one namespace...
Maybe there's any kind of extension?
greets...magggus
Hello
The following problem I have also found in google, alas no solution.
When trying to compile texvc on a MacX
I obtain the following error message
ld: undefined symbols -sprintf@LDBLStub
which looks like a problem with certain links.
Anybody can help
Thanks and regards
Uwe Brauer
I created bug 11902, but since I haven't had any replies, I'm reposting
here. Some feedback, please?
http://bugzilla.wikipedia.org/show_bug.cgi?id=11902
--
I think I want to have a go at implementing this, but want to check that
it's a
good idea, and stands a chance of being implemented at Wikipedia.
Motivation: there are lots of links to disambiguous pages, and they're not
at
all visible. This is bad for navigating, bad for reusing content, and bad
for
any kind of analysis that involves links.
- Define a magic word like __AMBIGUOUSPAGE__, which would be included in
{{disambig}} on Wikipedia.
- Flag pages containing this magic word (schema change, I believe?)
- When a link is rendered that goes towards a flagged ambiguous page, it is
rendered differently, such as using a new css style, or perhaps a predefined
but editable template, which would allow an image to be displayed next to
such
a link.
Desirable? Feasible? Would be enabled at Wikipedia?
Issues for discussion:
- Should it be generalised at all to something like __BADLINKTARGET__?
- Should there be a way of controlling which namespaces it applies in?
Perhaps
it's ok to link to a disambiguation page from a talk namespace...
- Should there be a way to suppress it for an individual link? If so, what
would that syntax look like? Perhaps some semantics like linking to any page
named "... (disambiguation)" (configurable) would be taken to be a
deliberate
link, and to deliberately link to a page not named that, you should use a
redirect?
--
Thanks,
Steve
Hi,
I have been using apache rewriting in my .htaccess file in order to use
pretty URLs, e.g. www.mydomain.com/wiki/Page
Here is the code I was using:
RewriteEngine on
RewriteRule
^[^:]*\.(php|src|jpg|jpeg|png|gif|bmp|css|js|inc|phtml|pl|ico|html|shtml)$ -
[L,NC]
RewriteRule ^index.php?title - [L]
RewriteRule ^(.*)$ index.php?title=$1 [L,QSA]
Due to the way mod_rewrite works, pages with Ampersands in them were being
truncated, so if I tried to acces the page "Terms & Conditions" then I would
actually be directed to the page "Terms".
I found a fix for this on MW.org, which was to add a rule to rewrite &
symbols with %26, so they were escaped again. My .htaccess file now looks
like this (the fourth line has been added):
RewriteEngine on
RewriteRule
^[^:]*\.(php|src|jpg|jpeg|png|gif|bmp|css|js|inc|phtml|pl|ico|html|shtml)$ -
[L,NC]
RewriteRule ^index.php?title - [L]
RewriteRule ^(.*)\&(.*)$ $1\%26$2
RewriteRule ^(.*)$ index.php?title=$1 [L,QSA]
This successfully solves the above problem. However if I try to go to a
sub-page of a page with an ampersand then the sub-page part of the page name
is doubled. E.g.
T&C/page1 becomes T&C/page1/page1
T&C/page1/page2 becomes T&C/page1/page2/page1/page2
This only applies to pages with ampersands - not to other pages. I have
checked the value of $_GET at the top of index.php and the title parameter
already includes the doubled text, so it is clearly a problem with the
rewrite rule. However I can't figure out what the problem is.
Has anyone else experienced this problem? I am asking here before taking it
to the Apache forums as I suspect it is something that other MediaWiki users
will have encountered and perhaps a fix is already known.
I am using Apache 1.3.37 and MW 1.6.10
- Mark Clements (HappyDog)