SOS...SOS...SOS...HELP...Slovakia-Slovensko,dakujem za E-mail ale neviem čo tam je napísané,lebo neovládam Váš jazyk -prosím Slovenčinu alebo češtinu...
______________________________________________________________
Od: wikitech-l-request@lists.wikimedia.org Komu: wikitech-l@lists.wikimedia.org Datum: 28.12.2008 04:16 Předmět: Wikitech-l Digest, Vol 65, Issue 34
Send Wikitech-l mailing list submissions to wikitech-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/wikitech-l or, via email, send a message with subject or body 'help' to wikitech-l-request@lists.wikimedia.org
You can reach the person managing the list at wikitech-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Data center move in Amsterdam: expect some downtime (Mark Bergsma) 2. Re: IBM DB2 patch for MediaWiki (Jes?s Quiroga) 3. Re: Anchors haven't id attribute (Danny B.) 4. Re: Anchors haven't id attribute (Brion Vibber) 5. Re: IBM DB2 patch for MediaWiki (Aryeh Gregor) 6. Re: Anchors haven't id attribute (Aryeh Gregor) 7. Re: Anchors haven't id attribute (Danny B.) 8. Re: Anchors haven't id attribute (Aryeh Gregor)
Message: 1 Date: Fri, 26 Dec 2008 22:05:17 +0100 From: Mark Bergsma mark@wikimedia.org Subject: [Wikitech-l] Data center move in Amsterdam: expect some downtime To: Wikimedia developers wikitech-l@lists.wikimedia.org, Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Message-ID: 4955470D.10503@wikimedia.org Content-Type: text/plain; charset=ISO-8859-1
In the upcoming days until new years we will be moving our servers and other equipment in the Amsterdam data center location to a new data center. Unfortunately this might result in some down time and hiccups of certain web sites & services, although we will try to keep this to a minimum.
On Sunday the 28th, between 09:00 and 11:00 UTC we will migrate our network in Amsterdam to new equipment. All services located there will be unreachable for a brief period. Traffic for the main wikis will be rerouted to the Florida cluster however, and should remain unaffected.
In the days after we will be moving the servers themselves. Some services, such as the mailing lists server, the subversion server and the toolserver cluster, will be down for a number of hours while the equipment is being moved. Traffic for the wikis should again remain largely unaffected.
We hope to have the entire migration finished before we enter the last few hours of 2008... and start 2009 with a clean sheet. Happy Holidays everyone!
-- Mark Bergsma mark@wikimedia.org System & Network Administrator, Wikimedia Foundation
Message: 2 Date: Sat, 27 Dec 2008 07:23:00 +0100 From: Jes?s Quiroga jquiroga@pobox.com Subject: Re: [Wikitech-l] IBM DB2 patch for MediaWiki To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: 4955C9C4.9080509@pobox.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello.
After a few days of pondering the issues, I would like to explain what I suggested in my previous message, in more detail and (hopefully) more clearly.
What I'm about to say is pretty abstract, so it's difficult to convey the right meaning. Please forgive me if I say something you already know, or just nonsense :-)
Jes?s Quiroga escribi?:
I believe a better solution is to design a domain-specific language, an idea not very different from your first one. This DSL would model the interaction between the application and the DB as it is now, and would be designed to evolve. That's it.
The problem I discuss is how to best access the data store from an application. I believe the right answer is different for each project, but it's not difficult to evaluate the alternatives, one by one, in a given context. I think it is worthwhile to do that in the context of MediaWiki.
I will refer to wiki modules and databases as if they were 'hosts' connected to a 'network', to highlight the role of languages in the operation of the system at runtime.
The first way to access the data store is the 'direct' one:
[polyglot wiki] <--- mysDataL ---> [mysql] [polyglot wiki] <--- posDataL ---> [postgresql] [polyglot wiki] <--- db2DataL ---> [db2]
Here, the polyglot wiki module talks to every database using the proper languages. 'mysDataL' means 'the data language understood by MySQL', 'posDataL' means 'the data language understood by PostgreSQL', etc.
The polyglot wiki promises to learn several languages and to speak them correctly forever, so, if a new database comes along or any of their data languages evolves, the polyglot wiki is forced to adapt at a potentially great cost. Besides, any change to the database schema can trigger lots of updates to the wiki code, and be very costly too.
The advantages of this way are well known: it is fast, no need to do design, easy to understand. The drawbacks are apparently few, but devastating: verbose and complex code in multiple places in the wiki module, very costly to maintain, even more costly to evolve. All changes cost a lot, in time and effort.
The second way to access the data store that is usually considered is the 'indirect' one:
[wiki] <--- wikiDataL ---> [polyglot translator]
[polyglot translator] <--- mysDataL ---> [mysql] [polyglot translator] <--- posDataL ---> [postgresql] [polyglot translator] <--- db2DataL ---> [db2]
Here, wikiDataL means 'some relational data definition and manipulation language suitable for use by the wiki'.
The polyglot translator promises to learn wikiDataL and the other dialects and to evolve with them, so it has all the problems the wiki had in the direct way, but now the cost is lower because a lot of complexity is 'hidden' inside the translator and can't reach the wiki. As a result, wiki code is not updated as much, and it's much cleaner and less verbose.
The advantages of this way are: wiki module code is simpler, cost of evolution is reduced. The drawbacks are apparently many: it's slower, design is needed, harder to understand, a new language (wikiDataL), translator can be very complex. However, the need to reduce the cost to achieve change is usually so great that these inconveniences are minor in comparison.
Now the interesting bit begins. A third possible way to access the data store, the 'interpreted' one:
[wiki] <--- wikiNeedL ---> [polyglot interpreter]
[polyglot interpreter] <--- mysDataL ---> [mysql] [polyglot interpreter] <--- posDataL ---> [postgresql] [polyglot interpreter] <--- db2DataL ---> [db2]
Here, wikiNeedL means 'some language adequate for the wiki to express its data access needs and nothing else'.
wikiNeedL is the domain-specific language I wrote about in my previous message.
The differences between wikiDataL and wikiNeedL are mainly these: - wikiNeedL would contain just enough wiki concepts to express the wiki's needs, so it's effectively confined to that domain. wikiDataL belongs to the relational data model domain, which is quite different. - in general, wikiNeedL would have different semantics than the dialects understood by the databases, so the translation step becomes more like interpretation, rather than just syntactic transformations. wikiDataL usually has the same semantics than the dialects. - wikiNeedL would contain just enough concepts to satisfy current needs, and will be open to extension. wikiDataL aims to be general-purpose and to fulfill current and future needs.
The main reason to consider the 'interpreted' way is, of course, that it helps reduce even more the cost to achieve change.
So that's what I was talking about. I will say more about the differences between the indirect and the interpreted ways in a future message.
Thanks for your attention.
Message: 3 Date: Sat, 27 Dec 2008 13:05:53 +0100 (CET) From: Danny B.Wikipedia.Danny.B@email.cz Subject: Re: [Wikitech-l] Anchors haven't id attribute To: Wikimedia developerswikitech-l@lists.wikimedia.org Message-ID: 18263.21683-30277-135341947-1230379553@email.cz Content-Type: text/plain; charset="iso-8859-2"
------------ P?vodn? zpr?va ------------ Od: Brion Vibber brion@wikimedia.org P?edm?t: Re: [Wikitech-l] Anchors haven't id attribute Datum: 26.12.2008 06:30:00
On 12/25/08 4:32 AM, Danny B. wrote:
I have reverted both revisions in r45021 and r45022 because it caused massive
invalidity of pages.
Given that we've been outputting these as "id" attributes for the last few years already (as output by Tidy), I have reverted your revert in r45044 pending further discussion.
-- brion
Well, the id was added _only_ to those tags, where name was transferable to id - thus had to start with ASCII letter. _Never_ to those, which did not conform this rule (the regexp mentioned in my previous post). Easily provable by either running older revision of MediaWiki or testing in Tidy directly:
Take this code excerpt (and wrap it with minimal XHTML document stuff) and run it through Tidy:
<a name="X"></a><h2> <span class="mw-headline"> X </span></h2> <a name="1X"></a><h2> <span class="mw-headline"> 1X </span></h2> <a name=".C3.81X"></a><h2> <span class="mw-headline"> ?X </span></h2> <a name="-X"></a><h2> <span class="mw-headline"> -X </span></h2>
The result will be:
<a name="X" id="X"></a><h2><span class="mw-headline">X</span></h2> <a name="1X"></a><h2><span class="mw-headline">1X</span></h2> <a name=".C3.81X"></a><h2><span class="mw-headline">?X</span></h2> <a name="-X"></a><h2><span class="mw-headline">-X</span></h2>
Now, let me repeat, how the "id" is defined:
1: XHTML is reformulation of HTML 4 as an XML 1.0 application. 2: That means it takes every single definition from HTML 4 and keeps it unless it is overriden in XHTML. 3: The id and name has been defined in HTML 4 as /[A-Za-z][A-Za-z0-9:_.-]*/ [1] [2] 4: The name has been redefined to NMTOKEN [2] [3] 5: The id has never been redefined thus stays on definition mentioned in point 3 above.
This is how the id in XHTML was always handled since the XHTML is out. I also think that such important thing like handling of id is, was fixed in validator during so many years if it wasn't correct.
So currently, all non-latin-chars wikis are now totally invalid according to W3C validator. Major parts of non-ASCII-chars wikis are invalid as well. Therefore is very hard to find other invalid mistakes in code when having worthless positives on every other page. :-(
Also one thing at the end: I think that the current rendering with controversial ids brought more negatives (such as much lowering down the ability to find the real invalid parts of the code) than positives - well, it was working correctly before, so what benefit it actually brought? On the other hand it brought this controversy.
I take the point that I (and majority of people over the world, the validator, Tidy and so many other tools etc.) _may_ be wrong with the interpretation of definition of id. But I guess unless the authority tools, as validator or Tidy are, are fixed in this issue - thus can be proved we render the page correctly - we should not render that way. As I mentioned above - it was working correctly before so there is no urge to force the new rendering since it is not correcting any mistake or misfunctionality.
[1] http://www.w3.org/TR/html401/types.html#type-name [2] http://www.w3.org/TR/xhtml1/#C_8 [3] http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Nmtoken
Kind regards
Danny B.
Message: 4 Date: Sat, 27 Dec 2008 12:14:33 -0800 From: Brion Vibber brion@wikimedia.org Subject: Re: [Wikitech-l] Anchors haven't id attribute To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: 49568CA9.6090104@wikimedia.org Content-Type: text/plain; charset=ISO-8859-2; format=flowed
[snip]
Maybe we should just fix the normalization function the way we'd already planned to, so that it'll work right the way we'd already planned to?
-- brion
Message: 5 Date: Sat, 27 Dec 2008 18:25:10 -0500 From: "Aryeh Gregor" Simetrical+wikilist@gmail.com Subject: Re: [Wikitech-l] IBM DB2 patch for MediaWiki To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Message-ID: 7c2a12e20812271525g3055d1ffr855bc071028262b@mail.gmail.com Content-Type: text/plain; charset=UTF-8
On Sat, Dec 27, 2008 at 1:23 AM, Jes?s Quiroga jquiroga@pobox.com wrote:
The second way to access the data store that is usually considered is the 'indirect' one:
[wiki] <--- wikiDataL ---> [polyglot translator]
[polyglot translator] <--- mysDataL ---> [mysql] [polyglot translator] <--- posDataL ---> [postgresql] [polyglot translator] <--- db2DataL ---> [db2]
Here, wikiDataL means 'some relational data definition and manipulation language suitable for use by the wiki'.
This is what we currently use, and I don't think we're going to seriously consider changing it without some very compelling arguments being presented. Incremental improvements to our current way of doing things (cutting back on raw queries, moving MySQL-specific stuff from Database to DatabaseMySql, defining more clearly what Database methods mean and avoiding undefined behavior) seem entirely sufficient to allow support for any number of additional database backends.
The differences between wikiDataL and wikiNeedL are mainly these: - wikiNeedL would contain just enough wiki concepts to express the wiki's needs, so it's effectively confined to that domain. wikiDataL belongs to the relational data model domain, which is quite different. - in general, wikiNeedL would have different semantics than the dialects understood by the databases, so the translation step becomes more like interpretation, rather than just syntactic transformations. wikiDataL usually has the same semantics than the dialects. - wikiNeedL would contain just enough concepts to satisfy current needs, and will be open to extension. wikiDataL aims to be general-purpose and to fulfill current and future needs.
In practice, wikiNeedL would be drastically more complicated, if I understand you correctly. Its basic semantic units would be things like articles, users, revisions, etc., instead of rows, columns, and tables. We *have* a wikiNeedL, in fact: it's called "calling the appropriate Article method" or whatever. Most code doesn't have to manually do queries. Further abstraction of the database queries would be possible, but I question its usefulness.
Message: 6 Date: Sat, 27 Dec 2008 19:06:24 -0500 From: "Aryeh Gregor" Simetrical+wikilist@gmail.com Subject: Re: [Wikitech-l] Anchors haven't id attribute To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Message-ID: 7c2a12e20812271606u6b188edj22a6579803ccd43d@mail.gmail.com Content-Type: text/plain; charset=UTF-8
On Sat, Dec 27, 2008 at 3:14 PM, Brion Vibber brion@wikimedia.org wrote:
[snip]
Maybe we should just fix the normalization function the way we'd already planned to, so that it'll work right the way we'd already planned to?
Done in r45109. I notice, by the way, that HTML5 allows any string not containing whitespace for id's . . . yet another case where it clearly wins the "don't gratuitously cause pain to developers" contest.
Message: 7 Date: Sun, 28 Dec 2008 03:02:26 +0100 (CET) From: Danny B.Wikipedia.Danny.B@email.cz Subject: Re: [Wikitech-l] Anchors haven't id attribute To: Wikimedia developerswikitech-l@lists.wikimedia.org Message-ID: 18278.21698-2886-1817746719-1230429746@email.cz Content-Type: text/plain; charset="iso-8859-2"
------------ P?vodn? zpr?va ------------ Od: Aryeh Gregor Simetrical+wikilist@gmail.com P?edm?t: Re: [Wikitech-l] Anchors haven't id attribute Datum: 28.12.2008 01:07:08
On Sat, Dec 27, 2008 at 3:14 PM, Brion Vibber brion@wikimedia.org wrote:
[snip]
Maybe we should just fix the normalization function the way we'd already planned to, so that it'll work right the way we'd already planned to?
Done in r45109. I notice, by the way, that HTML5 allows any string not containing whitespace for id's . . . yet another case where it clearly wins the "don't gratuitously cause pain to developers" contest.
*sigh*
Why do we have to hunt for some other solution when we have fully working, fully valid and fully intuitive one?
OK, let's make some summary about three versions we have:
Terms used:
- old version - the for-many-years used version until r44896
- mid version - r44896 way
- new version - r45109 way
Old version was used for many years. It was fully valid - ids were only there where they could have been copied from name AND comply to the regexp mentioned in previous posts. It has been done automatically by Tidy. And it was fully intuitive - you just wrote [[#Foo]] and it linked to section named Foo. Or you've added #Foo in URL in address bar and you got to the proper section as well. And it was fully working properly.
The mid version brought the "feature" that all name attributes have been duplicated to ids. That caused massive invalidity of pages, especially non-latin and non-ASCII. However, the intuitivity of anchors creation has still been kept.
The new version prepends x to all anchors to solve the problem which was spread here in mid version - the massive invalidity of pages. So it solved one problem (which actually didn't have to be solved if we kept the old version) but brought at least two major other: First major problem is, that this change is breaking millions of existing links to sections. Links used on pages on wikis, links used on external sites, links in people's bookmarks, in emails, forum threads etc. Well, OK, let's discount all external stuff, since we don't have any influence on it, but we still have millions of links left on our own wikis which won't work anymore since r45109. The other major problem is, that since this point further the anchor links are no longer intuitive - we are now pushing people to constantly think about prepending x when creating anchor links. No more simple copy pasting of the headline. As a side effect we are now adding unnecessary work to people from non-latin wikis by pushing them to always switch to latin keyboard, or to click on edittools or whatever just to get the one "x" character in editbox to create the anchor link.
So let me summarize in points:
- First we did not have any problem at all.
- Second we had one problem.
- Third we "solved" the problem but created at least two new.
I am pretty scared what's coming next... :-/
One question for the end: What is the benefit of either mid or new version over the old one - what new functionality or feature it brings or which existing bug it fixes?
Kind regards
Danny B.
Message: 8 Date: Sat, 27 Dec 2008 22:15:24 -0500 From: "Aryeh Gregor" Simetrical+wikilist@gmail.com Subject: Re: [Wikitech-l] Anchors haven't id attribute To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Message-ID: 7c2a12e20812271915gf2bb722gd33f461fb180b946@mail.gmail.com Content-Type: text/plain; charset=UTF-8
2008/12/27 Danny B. Wikipedia.Danny.B@email.cz:
*sigh*
Why do we have to hunt for some other solution when we have fully working, fully valid and fully intuitive one?
Because:
- Our previous behavior arguably violated the XHTML 1 specification
by allowing name attributes to begin with nonletters. Please don't ignore this argument because you think it's wrong. I think you're wrong on this issue too, but I don't just ignore your opinion when discussing what the software that we *both* develop should do. Note "arguably" in the first sentence here -- your opinion counts as much as mine.
- It's not arguable at all that the XHTML 1 specification strongly
recommends that <a> elements with a name attribute also have an id attribute. In fact, section 4.10 states: "In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use the id attribute when defining fragment identifiers on the elements listed above [including <a>]."
I'm not saying these reasons outweigh the reasons against, but those are the reasons it was done. In particular, I don't think I've seen an argument from you against (2).
Old version was used for many years. It was fully valid
Could you *please* stop pretending that a debate doesn't even exist here? It's obnoxious and uncivil, and you keep on doing it.
First major problem is, that this change is breaking millions of existing links to sections. Links used on pages on wikis, links used on external sites, links in people's bookmarks, in emails, forum threads etc. Well, OK, let's discount all external stuff, since we don't have any influence on it, but we still have millions of links left on our own wikis which won't work anymore since r45109.
First of all, all auto-generated internal links (in TOCs) will automatically switch to the new format. Second of all, it should be one extra line of code to fix up all manually-created internal links as well, so that the x is automatically added as part of the encoding process. (I didn't find where this needed to be done at a quick glance.) So we're only talking about external links here.
This is a one-time cost and I don't think it's a big problem -- at worst, a few users will end up on the wrong part of the page. It should be pointed out that this will affect *all* section links on non-Latin wikis (since they get encoded to begin with dots and then need to start with a letter), but again, only as a one-time cost, and only external links (links from external sites or links using external link syntax), and it will still get viewers to almost the right place.
The other major problem is, that since this point further the anchor links are no longer intuitive - we are now pushing people to constantly think about prepending x when creating anchor links. No more simple copy pasting of the headline. As a side effect we are now adding unnecessary work to people from non-latin wikis by pushing them to always switch to latin keyboard, or to click on edittools or whatever just to get the one "x" character in editbox to create the anchor link.
Again, not an issue if internal links are fixed to work correctly. I didn't think about that aspect, but it should be very simple to fix (I'd do it now except I'm going to bed).
It seems to me that there are only weak reasons in favor (following recommended best practice with no practical effect) and only weak reasons against (small one-time transition cost -- unless you're correct that there will be longer-term costs, in which case please clarify why you think this). Normally I would say that standards compliance by itself (as opposed to standards compliance that brings concrete benefit) is worth small one-time costs, although not large enough one-time costs and probably not even fairly small recurring costs. So as it stands, without further arguments, I'd still be weakly in favor of keeping the current state of trunk, of course with the fix for anchors on internal links.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
End of Wikitech-l Digest, Vol 65, Issue 34