Wikitech-l August 2006

wikitech-l@lists.wikimedia.org

137 participants
168 discussions

Start a nNew thread

New SVN committer

by Brion Vibber

Rod A. Smith (rodasmith) added to do some work w/ WiktionaryZ & extensions. -- brion vibber (brion @ pobox.com)

17 years, 9 months

Formalizing Wikitext (was: Dajoo)

by Jay R. Ashworth

On Thu, Aug 17, 2006 at 12:21:13AM -0400, Eric Astor wrote: > Let's see here. Please consider this an incomplete, unreliable list, meant > solely as an indication of the basic problems encountered when attempting to > formalize MediaWiki's wikitext... And I'm no expert on parsing, except in > that I've spent a large part of the summer constructing parsers for > essentially unparseable languages. Basic point, though, is that MediaWiki > wikitext is INCREDIBLY context-sensitive. > > Single case that shows something interesting: > '''hi''hello'''hi'''hello''hi''' > > Try running it through MediaWiki, and what do you get? > hihellohihellohi > > In other words, you've discovered that the current syntax supports improper > nesting of markup, in a rather unique fashion. I don't know of any way to > duplicate this in any significantly formal system, although I believe a > multiple-pass parser *might* be capable of handling it. In fact, some sort > of multiple-pass parser (the MediaWiki parser) obviously can. I suspect that the "proper" parsing of that particular combination is undefined, and therefore you cna do anything you like. That's one of the points I was suggesting. > Also, templates need to be transcluded before most of the parsing can take > place, since in the current system, the text may leave some > syntactically-significant constructs incomplete, finishing them in the > transclusion stage... And of course, there's extensions, but I gather they're responsible for calling the parser themselves, which seemed to make sense. > In summary, for most definitions of formal, it is impossible to write formal > grammars for most significant subsets of current MediaWiki syntax. I had > significant success with a regex-based grammar specification (using Martel), > backed by a VERY general backend capable of back-tracking and other clever > tricks (mxTextTools) - but the recursive structure is virtually impossible > to handle in a regex-based framework. > > - Eric Astor > > P.S. As indicated above, I honestly feel that the difficulties aren't > insurmountable - if you're willing to build an appropriate parsing > framework, which will be semi-formal at best. > > P.P.S. When possible, in my *copious* free time (</sarcasm>), I'm hoping to > take another frontend to mxTextTools (SimpleParse, to be specific), modify > it sufficiently to support all the necessary features, and then build > something capable of parsing the current MediaWiki syntax (although I might > have to drop support for improper nesting). I've no idea if or when this > might happen, but I'm considering it a long-term goal if the current > situation doesn't improve. I don't know that I think that the spec has to be something you can feed to Bison, certainly. But it has to be unambiguously parseable, with as many corner cases defined as you can manage, at least by humans, before it's worth trying anything more complicated. And it's going to *have* to be done sooner or later. I haven't ever even looked at the parser code, and just from people talking about, I can tell that there will come a time when it's just too tense to work on anymore. Hopefully it will get replaced before then. On Wed, Aug 16, 2006 at 11:26:22PM -0400, Ivan Krsti?? wrote: > Jay R. Ashworth wrote: > > I don't know how useful it will be to have wikitext specified strictly, > > and I don't think we'll be able to tell until we see how far off we > > are, and what might need to be tweaked. > > This was discussed at hacking days. Brion's pronouncement is that the > current syntax will admit essentially no backwards-incompatible changes. My point was more based on taking advantage on the implementation-defined and -dependent portions of the current 'spec'; things like specifying binding and precedence rules concerning things like Eric' first example, above. It's unfortunate that formalization went on the table so late, but it gets done for a reason, and, being an outgrowth of an engineering construct, if you need it, and you don't do it, then you Just Can't do whatever it was that made you decided you needed it. Wasn't someone from SoC working on this? Did we ever get a final status report from the SoC work? (It's done now, isn't it?) And let's be quite clear: *brion* (and Tim) will admit no backwards-imcompatible changes, not the syntax. The syntax is an inanimate non-object. (I'm not trying to be combative, there, just honest.) Cheers, -- jra Cheers, -- jra -- Jay R. Ashworth jra(a)baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274 The Internet: We paved paradise, and put up a snarking lot.

17 years, 9 months

Re: [Wikitech-l] Dajoo: a Java-based offline editor/viewer

by mingli yuan

Hi, buddies. Thanks for your information on the grammar. I'm not an expert on parser, I just translate some MediaWiki PHP code into Java. I did not take a fully test on the parser, but it seems to work normally on several articles I copy from Wikipedia. Now only a subset of the MediaWiki Markup is supported, including heading, horizontal rule, internal links, list, quotes and tables. The parser might be very buggy now. I think a 100% complete compatibility with Mediawiki is very difficult, and in fact dose not make sense. If only a subset of the grammar is supported, it still would be usable and useful in most cases. According to the interoperability, I am using the Mwapi from MwJed, and it works very well.

17 years, 9 months

Unclean UTF-8 in dump, due to stale templatelinks

by Lars Aronsson

When I download the dump http://download.wikimedia.org/svwiki/20060808/svwiki-20060808-templatelinks… and uncompress it, I find violations of UTF-8 (apparently remainders of ISO-8859-1) in these records: 0xf6 in (141524,10,'F\xf6rfattarstub'), 0xf6 in (154217,10,'Geografistub-Gr\×f6nland'), 0xe4 in (147111,10,'Japanskt_L\xe4n'), 0xe4 in (145703,10,'Motorv\xe4gar_i_Sverige'), 0xc4 in (125122,10,'RA\xc4'), 0xe5 in (146360,10,'Sk\xe5despelarstub'), 0xf6 and 0xd6 in (160822,10,'S\xf6dra_\xd6sterbotten'), 0xe4 in (145703,10,'TrafikplatsLandsv\xe4g'), I could still import this SQL dump into mysql (4.0), but when I open the SQL dump file in GNU Emacs (22.0.50) it doesn't go into Unicode mode as it does for a clean UTF-8 file. I've found no errors in some other files I've looked at. This is a total of 9 violations in 8 records referring to 7 different pages (page ID 145703 appears twice) out of 330,000 records, so no real reason for panic. These seven page IDs are not present in the page.sql dump, so apparently stale link records that should have been removed from the database. If I run the inner join: select page_namespace, page_title, tl_namespace, tl_title from page, templatelinks where page_id = tl_from; the result is clean UTF-8. But the result is 2076 rows shorter than the templatelinks table: select count(*) from page, templatelinks where page_id=tl_from; 328349 select count(*) from templatelinks; 330425 -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

17 years, 9 months

What *should* WikiWYG do if you hand it wikitext?

by Jay R. Ashworth

On Tue, Aug 15, 2006 at 10:44:39AM +1000, Nick Jenkins wrote: > As you can see from this edit : > http://wikiwyg.wikia.com/index.php?title=Testpage&diff=136009&oldid=13 > 6007#Neapolitan_double_quotes:_dsomething (which was typed in wikiwyg > mode), the '' gets converted to italics upon saving, not rendered as > a literal '' (i.e. what you see in the wikiwyg mode - two quotes - is > not what you get in the rendered HTML output after saving - italics in > the headline). Ok, now here's a completely different issue: What should WIKIWyg *do* if you hand it something that looks like wikitext? My intuition is that it should *not* treat it as wikitext, and this is the corner case that demonstrates why, but I can see arguments on both sides. Discuss. Cheers, -- jra -- Jay R. Ashworth jra(a)baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274 The Internet: We paved paradise, and put up a snarking lot.

17 years, 9 months

Specialised wiki editor?

by Steve Bennett

Hi all, Two questions: Do any decent editing tools exist, specialised for editing wikis, and in particular, mediawiki wikis. Such a thing would have to be capable of browsing, but let you do editing in some kind of more sophisticated, enhanced way - whatever that is. Secondly, if such a thing doesn't exist (possible since I haven't heard of one), are there any real obstacles to it happening? Why has all the discussion of WYSIWYG wiki editing been focused on server-side implementations. With the exception of querying the database, why can't all this be implemented locally, allowing a possibly richer user experience by using native Windows (for example :)) calls, rather than the limitations of javascript. Has no one tried? Steve

17 years, 9 months

DoS attacks?

by Michael Walfish

Hello Wikitech, I am curious about the degree (if any) to which Wikimedia experiences DoS attacks on its servers. Mainly I'm curious about: (a) whether attacks happen; and (b) the character of the attacks themselves (application-level? SYN flood? ICMP flood?). Is this mailing list the correct forum in which to ask this question? If not, should I email noc(a)wikimedia.org? I am a graduate student doing research on DoS attacks and would be extremely grateful for any information or help. Many thanks in advance. -Mike Walfish

17 years, 9 months

Open proxy blacklist

by Simetrical

We used to use SORBS to blacklist open proxies, but that's pretty dodgy (requiring a $50 donation to remove an IP). Now we don't use anything, which means that admins have to manually block thousands of IPs if some spammer or vandal starts attacking from open proxies. See bug 6988: http://bugs.wikimedia.org/show_bug.cgi?id=6988. An admin from kuwiki just came on #mediawiki asking how to block *all* anonymous users due to the severity of the onslaught (see http://ku.wikipedia.org/w/index.php?title=Taybet:Recentchanges&hideliu=1). Similar issues on a lesser scale occur on many projects, which is why we have the open-proxy-blocking policy in the first place. So, after some Googling, I found http://www.declude.com/Articles.asp?ID=97, a list of various DNSBLs. One promising one appears to be AHBL: see http://www.ahbl.org/services.php. They offer various DNSBLs, but the two of interest to us are probably their Tor and IRC lists (the latter blocks open proxies and otherwise infected computers). Of course, these need to be subjected to scrutiny before we actually use them, and a whitelist (per-project? on Meta?) would be a good idea as well in case we're convinced there's a false positive. What should detected proxies be prevented from doing? Editing anonymously, obviously, and creating accounts, at least to begin with. Registered editing could eventually be prohibited if known good users are whitelistable per-project somehow (by username, not by IP).

17 years, 9 months

MediaWiki automated test run failure 2006-08-16

by brion＠pobox.com

An automated run of parserTests.php showed the following failures: Running test TODO: Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)... FAILED! Running test TODO: Link containing double-single-quotes '' (bug 4598)... FAILED! Running test TODO: Template with thumb image (with link in description)... FAILED! Running test TODO: message transform: <noinclude> in transcluded template (bug 4926)... FAILED! Running test TODO: message transform: <onlyinclude> in transcluded template (bug 4926)... FAILED! Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED! Running test TODO: HTML bullet list, unclosed tags (bug 5497)... FAILED! Running test TODO: HTML ordered list, unclosed tags (bug 5497)... FAILED! Running test TODO: HTML nested bullet list, open tags (bug 5497)... FAILED! Running test TODO: HTML nested ordered list, open tags (bug 5497)... FAILED! Running test TODO: Parsing optional HTML elements (Bug 6171)... FAILED! Running test TODO: Inline HTML vs wiki block nesting... FAILED! Running test TODO: Mixing markup for italics and bold... FAILED! Running test TODO: 5 quotes, code coverage +1 line... FAILED! Running test TODO: HTML Hex character encoding.... FAILED! Running test TODO: dt/dd/dl test... FAILED! Passed 413 of 429 tests (96.27%) FAILED!

17 years, 9 months

Statistics?

by Aerik Sylvan

Do I correctly remember that mediawiki projects do not keep log files and statistics and stuff (other than the ones that can be gleaned from the database itself) to reduce server load? I think I remember somebody saying the even log files are not kept... or that could have been some other reality. I found http://en.wikipedia.org/wiki/Wikipedia:Statistics and http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm And a few other things, but nothing that looks like it would answer some of the questions being asked. So - hitting an external log with the standard client issued image / javascript type counter thingy would get some of that. More data could be gleaned by combining that with the info available from the database. Has this already been discussed/beaten to death? Is it a dumb idea? Anybody got a server and bandwidth to take a gazillion hits to crunch some additional statistics? :-)

17 years, 9 months

← Newer
1
...
6
7
8
9
10
11
12
...
17
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l August 2006