Wikitech-l November 2007

wikitech-l@lists.wikimedia.org

90 participants
79 discussions

So can we change the grammar or not? (was: EBNF grammar project status?)

by Steve Bennett

[actually the subject was "A Modest Proposal on grammar and parsers" but I wanted to merge it with the other thread] On 11/14/07, Brion Vibber <brion(a)wikimedia.org> wrote: > > > While that likely would mean changing some corner-case behavior (as > noted above, the existing parser doesn't always do what's desired), it > would not be a different *syntax* from the human perspective. This is a very fine, but very important line you're drawing. I had been assuming that not supporting the "some ''text[[foo|blah ''blah]]" case would count as changing the "syntax". But that syntax is not salient, it is not important, it is arguably broken, and changing it is not "from the human perspective" necessarily even a change. Similarly, you can embed <gallery> tags in the middle of a sentence. Is this desirable? Is this a mistake? Does this form part of the actual "syntax" we want to support? Is it ok if I change the goal of the EBNF project from: 1) To produce a grammar that precisely matches the parser as it currently behaves. to 2) To produce a grammar that is indistinguishable from the current parser as it is normally used. In other words: let's record the syntax as it exists in people's minds (and their existing work), rather than the behavior of the actual parser. Steve

16 years, 5 months

English Dump

by Di (rut)

Dear All, I am a PhD student at the University of Copenhagen, studying Distributed Cognition in the Wiki(Pedia) Article. I need to be able to get down some data but I was told (and seen) that the english dump has been broken for a long time. So, here are few questions for the beloved tech-people: - how does one get a dump of en.wikipedia? - is there a place with historical dumps? Every year or so? - how difficult would it be to make an easy way to download one article's *whole* history? (does it have to be necessarily 100 edits at a time? even if I only need one article, once?) Big thanks in advance, rut jesus

16 years, 5 months

Re: [Wikitech-l] {{CURRENTUSER}} magic word?

by Anthony

On Nov 14, 2007 6:31 PM, Anthony DiPierro <dipierro(a)gmail.com> wrote: > On Nov 14, 2007 5:48 PM, George Herbert <george.herbert(a)gmail.com> wrote: > > I am not a Javascript guy, so I apologize in advance if this is a dumb > > question, but... Is it possible to make {{USERNAME}} some javascript which > > expands it on the client side, so the server just provides that JS to the > > browser and lets you figure it out? That would be the same JS code for > > everyone, so the underlying parsed article would stay in memcached > > unchanged... > > > Yes, but: > > 1) that wouldn't work for logical constructs, and > 2) it wouldn't work for people who don't have javascript enabled. > > As it wouldn't work for logical constructs, might as well just make a parse > Hit send prematurely: As it wouldn't work for logical constructs, might as well just make a single pass substitution of the cached parsed article.

16 years, 5 months

includes/ directory

by VasilievVV

includes/ directory directory contains about 200 files. It looks like a dump, where it's hard to find something. I propose to move some files to subdirectories (if it won't break something): * includes/specialpages/ (SpecialPage.php and all special pages) * includes/output. (or other name; all functions obviously related to output: HttpFunctions.php, OutputHandler.php, OutputPage.php, WebRequest.php, WebResponse.php, etc.) * includes/database (or db; not sure if there's too many files related) For API folder (now about 50 files; will be more after merging apiedit): * includes/api/formats (for ApiFormat*) * includes/api/edit (for write API) * includes/api/query/{list,prop,meta} (includes/api/query for ApiQueryBase, subdirectories for API queries) I hope that it will help to make file layout of MediaWiki cleaner. --VasilievVV

16 years, 5 months

Parser: multi-layered BNF to single-layered BNF

by Steve Bennett

I'm making a quick summary of all the steps the parser goes through currently, partly to get familiar with the parser. For each one I'll then attach some BNF. Then work out a way of merging the BNFs. I'm not sure if there's a systematic approach to this. Basic problem: 1. Parser translates X into Y by applying rule A, expressable in BNF. 2. Parser translates Y into Z by applying rule B, expressable in BNF. What rule captures both of these in one step? Is it always possible? Is there a general algorithm for the merge? There are currently 13 distinct major steps, of which one ("Internal parse") has 14 distinct substeps (not counting hooks). That's also not counting the preprocessor, so no templates... Is this a good way to progress towards a complete grammar? The existing approach seemed to be to simply start from scratch, using a combination of intuition, testing and examining the code. Any comments? Perhaps at the least we could start compressing some of these layers. Does anyone know of two layers that would be impossible to merge? Presumably the preprocessor has to remain separate at the very least. Steve

16 years, 5 months

So can we change the grammar or not? (was: EBNF grammar project status?)

by Steve Bennett

On 11/14/07, MinuteElectron <minuteelectron(a)googlemail.com> wrote: > > It is common knowlege with those who follow MediaWiki development > discussions and have done for a while that changing wikitext syntax is > considered A Bad Thing (tm) and will most likely get any commits reverted. > > The parsing and rendering of all sorts of weird cases in wikitext syntax is going to change with the implementation of a new parser not based on inline pattern transformation. You can ask for the changes to be small. You can demand that your favourite pathological corner case be treated correctly. But there will be changes. They're not optional. The only way to build a parser with the exact same behaviour in every case is the current one is to copy it line by line. See http://www.usemod.com/cgi-bin/mb.pl?ConsumeParseRenderVsMatchTransform So if this is genuinely the position of the developers - that *any* change, no matter how insignificant, to the syntax, will not be allowed - then there will never be a new parser. Earlier discussion with Nick Jenkins led to a more fruitful compromise, though. So: is that compromise acceptable? If 99% of actual pages render correctly, is that good enough to start rolling out the parser? Steve

16 years, 5 months

SearchPossibilities

by demagggus

Hi, I am searching for opportunities to limit search-areas, so it's possible to search in subareas of the content. Is there another possibility to realise this, than assign articles to namespaces? Because sometimes it's a little bit unpractical to assign articles to only one namespace... Maybe there's any kind of extension? greets...magggus

16 years, 5 months

problems with texvc on Mac X

by Uwe Brauer

Hello The following problem I have also found in google, alas no solution. When trying to compile texvc on a MacX I obtain the following error message ld: undefined symbols -sprintf@LDBLStub which looks like a problem with certain links. Anybody can help Thanks and regards Uwe Brauer

16 years, 5 months

Proposal: display links to disambiguation pages differently

by Steve Bennett

I created bug 11902, but since I haven't had any replies, I'm reposting here. Some feedback, please? http://bugzilla.wikipedia.org/show_bug.cgi?id=11902 -- I think I want to have a go at implementing this, but want to check that it's a good idea, and stands a chance of being implemented at Wikipedia. Motivation: there are lots of links to disambiguous pages, and they're not at all visible. This is bad for navigating, bad for reusing content, and bad for any kind of analysis that involves links. - Define a magic word like __AMBIGUOUSPAGE__, which would be included in {{disambig}} on Wikipedia. - Flag pages containing this magic word (schema change, I believe?) - When a link is rendered that goes towards a flagged ambiguous page, it is rendered differently, such as using a new css style, or perhaps a predefined but editable template, which would allow an image to be displayed next to such a link. Desirable? Feasible? Would be enabled at Wikipedia? Issues for discussion: - Should it be generalised at all to something like __BADLINKTARGET__? - Should there be a way of controlling which namespaces it applies in? Perhaps it's ok to link to a disambiguation page from a talk namespace... - Should there be a way to suppress it for an individual link? If so, what would that syntax look like? Perhaps some semantics like linking to any page named "... (disambiguation)" (configurable) would be taken to be a deliberate link, and to deliberately link to a page not named that, you should use a redirect? -- Thanks, Steve

16 years, 5 months

Doubling of sub-pages when using mod_rewrite...

by Mark Clements

Hi, I have been using apache rewriting in my .htaccess file in order to use pretty URLs, e.g. www.mydomain.com/wiki/Page Here is the code I was using: RewriteEngine on RewriteRule ^[^:]*\.(php|src|jpg|jpeg|png|gif|bmp|css|js|inc|phtml|pl|ico|html|shtml)$ - [L,NC] RewriteRule ^index.php?title - [L] RewriteRule ^(.*)$ index.php?title=$1 [L,QSA] Due to the way mod_rewrite works, pages with Ampersands in them were being truncated, so if I tried to acces the page "Terms & Conditions" then I would actually be directed to the page "Terms". I found a fix for this on MW.org, which was to add a rule to rewrite & symbols with %26, so they were escaped again. My .htaccess file now looks like this (the fourth line has been added): RewriteEngine on RewriteRule ^[^:]*\.(php|src|jpg|jpeg|png|gif|bmp|css|js|inc|phtml|pl|ico|html|shtml)$ - [L,NC] RewriteRule ^index.php?title - [L] RewriteRule ^(.*)\&(.*)$ $1\%26$2 RewriteRule ^(.*)$ index.php?title=$1 [L,QSA] This successfully solves the above problem. However if I try to go to a sub-page of a page with an ampersand then the sub-page part of the page name is doubled. E.g. T&C/page1 becomes T&C/page1/page1 T&C/page1/page2 becomes T&C/page1/page2/page1/page2 This only applies to pages with ampersands - not to other pages. I have checked the value of $_GET at the top of index.php and the title parameter already includes the doubled text, so it is clearly a problem with the rewrite rule. However I can't figure out what the problem is. Has anyone else experienced this problem? I am asking here before taking it to the Apache forums as I suspect it is something that other MediaWiki users will have encountered and perhaps a fix is already known. I am using Apache 1.3.37 and MW 1.6.10 - Mark Clements (HappyDog)

16 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l November 2007