Sorry, I forgot a topic ; reposting the previous message:
I would like to extend the syntax of the <ref> tag (Cite extension), in order to deal with footnotes that are spread on several transcluded pages. Since the Cite extension is widely used, I guess I better ask here first.
Here is an illustration of the problem : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
On the bottom of the scan you can see the second half of a footnote. That footnote begins at the previous page : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
Wikisourcers currently have no way to deal with these cases in a clean way. I have written a patch for this (the code is here : http://dpaste.org/QOMH/ ). This patch extends the "ref" syntax by adding a "follow" parameter, like this :
<ref follow="foo">bar</ref>
After two pages are transcluded, the wikitext passed to the parser will look like this :
blah blah blah blah blah blah<ref name="note1">beginning of note 1</ref> blah blah blah blah blah blah blah blah blah<ref follow="note1">end of note</ref> blah blah blah
This wikitext is rendered as a single footnote, located in the text at the position of the parent <ref>. If the parent <ref> is not found (as is the case when you render only the second page), then the text inside the tag is rendered at the beginning of the list of references, with no number and no link.
does this make sense ?
Thomas
On Wed, Aug 11, 2010 at 6:05 PM, thomasV1@gmx.de wrote:
I guess you also mean "see multiple scans at once". This would be another workaround, but it would not deter contributors from trying to leave the text where it was in the first place, and where we think it belongs.
Why do you think it belongs split across separate pages, instead of in one place, when it's logically one unit? And why do you think there'd by any big added risk that people won't obey Wikisource conventions in transcription? To the contrary: if you add a new magic <ref> attribute, *nobody* will be able to figure out the right way to do it unless they're told, because this will be the only place in any wiki anywhere where that attribute is actually used. If you do a multi-page approach, then at least proofreaders don't have to remember anything extra on a technical level.
We have been dealing with this problem for several years now, and all the solutions that we have found have drawbacks. I do not think that we can solve this without extending the tool that manages references.
You've made a reasonable case that *some* software change is needed. However, I think you've got the wrong one. Trying to add this weird special-case feature to Cite, which is totally useless unless you're using ProofreadPage in the particular way Wikisource is using it, loses major points for inelegance, complexity, and mixing extensions together. If the use-case can be adequately addressed by just having ProofreadPage display multiple scans and edit boxes on one page, that would be a much simpler and more intuitive solution. Not only that, but you could also stop using magic templates to split words across pages and things like that, so it would be considerably easier to use.
Aryeh Gregor a écrit :
On Wed, Aug 11, 2010 at 6:05 PM, thomasV1@gmx.de wrote:
I guess you also mean "see multiple scans at once". This would be another workaround, but it would not deter contributors from trying to leave the text where it was in the first place, and where we think it belongs.
Why do you think it belongs split across separate pages, instead of in one place, when it's logically one unit?
I am not sure if I really understand this question ; you said earlier that you agree with the idea of splitting books into physical pages. When we do this, the logical organization of the book (chapters, etc) is achieved through transclusion.
And why do you think there'd by any big added risk that people won't obey Wikisource conventions in transcription?
It is not a risk, it is a fact. I try to promote the use of <ref> instead of templates, because it is the only way for the software to know that a footnote is a footnote. However, proofreaders are reluctant to move the text of footnotes to a different page. They do whatever they can to keep the text in front of the scan.
Here are a few examples of what they do.
1.They often do something like this : http://fr.wikisource.org/wiki/Page:Sima_qian_chavannes_memoires_historiques_... In this example, the user did not use <ref> at all. The page uses a combination of labeled sections and templates. The page is transcluded here, in the corresponding chapter : http://fr.wikisource.org/wiki/M%C3%A9moires_historiques/Introduction/Chapitr... Because of the sections, the user is unable to use the <pages/> tag ; he wrote a bunch of section transclusions , manually. This user tried to do it in a consistent way, by adding two sections to all pages, regardless of whether they actually contain footnotes or not. Other users do the same thing but are more messy.
2. Another example is here, on the English Wikisource: http://en.wikisource.org/wiki/Page:Geology_and_Mineralogy_considered_with_re... In this case, the user decided to replicate the text in two places: at the second page of the footnote, and at the page where the footnote begins : http://en.wikisource.org/w/index.php?title=Page:Geology_and_Mineralogy_consi...
3. Here is a third example, on the Italian wikisource : http://it.wikisource.org/wiki/Pagina:Manzoni.djvu/22 This page uses a template and labeled sections. However, this template fails when the footnote lasts for more than 2 pages.
I can give you many more examples; the point is that proofreaders do not want to move away the text of a footnote.
To fully appreciate the situation, you have to know that :
1. When a page is created, it comes preloaded with OCR text. Thus, the text of the footnote is initially on the page where the footnote is written. Moving it to the previous page involves some extra work. see here : http://fr.wikisource.org/w/index.php?title=Page:S%C3%A9vign%C3%A9_-_Lettres,...
2. Footnotes are not small. Footnotes can be very long. Footnotes can be spread over more than 2 pages. In some books, footnotes can weight more than half of the total of the text. Moving around 50% of the text of a book is a real pain. Please have a look at this example, which is 3 pages long: http://en.wikisource.org/wiki/Page:Geology_and_Mineralogy_considered_with_re... While I agree that hyphenated words can easily be moved to the previous or next page, it is not practical to do so with footnotes.
3. Even if we had a way to move the text of footnotes without pain, I do not think that this convention would be adopted. Some users absolutely want to preserve the page-by-page structure of the book, even if it involves using LST and the complicated constructs that you saw.
To the contrary: if you add a new magic <ref> attribute, *nobody* will be able to figure out the right way to do it unless they're told, because this will be the only place in any wiki anywhere where that attribute is actually used.
"nobody" outside of Wikisource ? I can understand that this feature is of no direct relevance to Wikipedia, but is this a reason to reject it ? it sounds a bit like "WMF==Wikipedia". Note that the proposed feature does not interfer with the way Cite currently works, so Wikipedia users do not need to figure out how it works.
Also, for Wikisource users, compare the complexity of the solutions that have been deployed in the above examples, to the complexity of writing <ref follow="foo"></ref> around the text of a reference. Which one do you find more complex ?
If you do a multi-page approach, then at least proofreaders don't have to remember anything extra on a technical level.
No, but they would need to move around big chunks of text. And when they want to proofread it, they wound need to search for the page where it has been moved from (which is not always the previous one). In addition, the examples I provided above demonstrate that proofreaders do whatever they can in order to keep the text at the page where it comes from, even if it means insanely complicated work ; I do not think that adding the capability to display multiple scans in front of the text will change anything to that.
You've made a reasonable case that *some* software change is needed. However, I think you've got the wrong one. Trying to add this weird special-case feature to Cite, which is totally useless unless you're using ProofreadPage in the particular way Wikisource is using it, loses major points for inelegance, complexity, and mixing extensions together. If the use-case can be adequately addressed by just having ProofreadPage display multiple scans and edit boxes on one page, that would be a much simpler and more intuitive solution. Not only that, but you could also stop using magic templates to split words across pages and things like that, so it would be considerably easier to use.
A few years ago, I introduced the <pages/> tag, which replaces manual transclusions. This command adds a new line between all transcluded pages. Before that, with manual transclusions, it was possible to transclude 2 pages containing a hyphenated word next to eachother, not separated by a newline, so that the word was not split. This was a bad practice, at least seen by programmer, because the information needed for text rendering is not embedded in the pages; it is much better if the target page (where text is transcluded) does not need to know anything about the content of individual pages. This is why I recommended to use <pages/>, and to keep hyphenated words unsplit, on a single page.
However, users did not like that solution ; they wanted to reproduce exactly what is on the scan. In order to adapt to the new <pages/> tag, they designed those magic templates for hyphenated words. Using these templates is in no way mandatory. However, it turned out that users prefer to use them, because they make it possible to display exactly what is on the scan. And after some time, I too became convinced that these templates are a good thing, in part because they create a convention that is unambiguous. This is just another fact, that illustrates how we wikisourcers are : we want to be able to display the text in a way that faithfully reproduces the scan. And this is why I doubt that adding a possibility to display multiple scans in front of the text will change anything to the way footnotes are handled.
Thomas
On Fri, Aug 13, 2010 at 9:13 AM, ThomasV thomasV1@gmx.de wrote:
To fully appreciate the situation, you have to know that :
- When a page is created, it comes preloaded with OCR text. Thus, the
text of the footnote is initially on the page where the footnote is written. Moving it to the previous page involves some extra work. see here : http://fr.wikisource.org/w/index.php?title=Page:S%C3%A9vign%C3%A9_-_Lettres,...
- Footnotes are not small. Footnotes can be very long. Footnotes can be
spread over more than 2 pages. In some books, footnotes can weight more than half of the total of the text. Moving around 50% of the text of a book is a real pain. Please have a look at this example, which is 3 pages long: http://en.wikisource.org/wiki/Page:Geology_and_Mineralogy_considered_with_re... While I agree that hyphenated words can easily be moved to the previous or next page, it is not practical to do so with footnotes.
- Even if we had a way to move the text of footnotes without pain, I do
not think that this convention would be adopted. Some users absolutely want to preserve the page-by-page structure of the book, even if it involves using LST and the complicated constructs that you saw.
. . .
A few years ago, I introduced the <pages/> tag, which replaces manual transclusions. This command adds a new line between all transcluded pages. Before that, with manual transclusions, it was possible to transclude 2 pages containing a hyphenated word next to eachother, not separated by a newline, so that the word was not split. This was a bad practice, at least seen by programmer, because the information needed for text rendering is not embedded in the pages; it is much better if the target page (where text is transcluded) does not need to know anything about the content of individual pages. This is why I recommended to use <pages/>, and to keep hyphenated words unsplit, on a single page.
However, users did not like that solution ; they wanted to reproduce exactly what is on the scan. In order to adapt to the new <pages/> tag, they designed those magic templates for hyphenated words. Using these templates is in no way mandatory. However, it turned out that users prefer to use them, because they make it possible to display exactly what is on the scan. And after some time, I too became convinced that these templates are a good thing, in part because they create a convention that is unambiguous. This is just another fact, that illustrates how we wikisourcers are : we want to be able to display the text in a way that faithfully reproduces the scan. And this is why I doubt that adding a possibility to display multiple scans in front of the text will change anything to the way footnotes are handled.
This is pretty convincing, but not totally. Maybe the reason that people are so adamant about keeping content on the same page is just because they have to click links back and forth to view the other pages, which is disruptive. If they could more easily see the next and previous page, and maybe a few pages beyond that if necessary, don't you think it would be much more attractive to not split things up?
As far as I can tell, there's not any way at all to go to the next or previous page from the current edit interface. You'd have to open a separate tab and switch back and forth. What if instead, the existing scrolling box could be scrolled to display the next and previous several pages as well as the current one, with some clear indication of which page is current so people don't get confused? Wouldn't that be convenient for other purposes as well, just to easily get context on what you're transcribing?
But this still doesn't help with the fact that the original OCR text is on a different page, I guess. I suspect there's a better solution here somewhere, but that might take more work, and be subtler (UI design issues). You've convinced me that your approach is pretty reasonable given the circumstances, although I wish it didn't have to intrude into unrelated extensions.
"nobody" outside of Wikisource ? I can understand that this feature is of no direct relevance to Wikipedia, but is this a reason to reject it ? it sounds a bit like "WMF==Wikipedia".
No, I'm thinking of all the typical MediaWiki users whose wikis I see all the time, and basically none of them use MediaWiki for anything like page transcription. MediaWiki is almost always used for a regular old wiki or CMS. Things like book transcription or translation are very much abnormal uses of MediaWiki. The only reason Wikisource or TranslateWiki use MediaWiki extensions at all instead of dedicated software is probably because they were associated with MediaWiki/Wikimedia to begin with.
So it would be nice if their code could be kept confined to their respective extensions, so it doesn't confuse the large majority of developers who aren't familiar with these particular narrow use-cases. But if it's not possible, then oh well.
On 13 August 2010 19:10, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
This is pretty convincing, but not totally. Maybe the reason that people are so adamant about keeping content on the same page is just b ecause they have to click links back and forth to view the other pages, which is disruptive. If they could more easily see the next and previous page, and maybe a few pages beyond that if necessary, don't you think it would be much more attractive to not split things up?
Evidently not, as you are ignoring that text fidelity and preserving pages are considered important.
(Most obnoxious possible responses to a bug report: "Oh, what you want is just stupid, you don't *really* want that, you want this other thing instead which is nothing like what you asked for.")
- d.
On Fri, Aug 13, 2010 at 4:57 PM, David Gerard dgerard@gmail.com wrote:
Evidently not, as you are ignoring that text fidelity and preserving pages are considered important.
Because they aren't. The goal is transcribing the pages easily and accurately, and having one wiki page in exact correspondence with each text page is only one means to that end, which has both advantages and disadvantages. It's very easy to get so wrapped up in a solution that you don't realize that an entirely different approach is better. This is particularly so when the current approach was developed by a community that largely had to work with the software they were given, while we're software developers and have the option of improving the software.
(Most obnoxious possible responses to a bug report: "Oh, what you want is just stupid, you don't *really* want that, you want this other thing instead which is nothing like what you asked for.")
Most annoying possible type of bug report: "Please fix the half of my problem that makes no sense." http://blogs.msdn.com/b/oldnewthing/archive/2006/03/23/558887.aspx Bug reports need to state what the ultimate problem is that they're trying to solve, not explain intermediate goals that make little sense by themselves. Once the ultimate goal is stated, the best response is sometimes to suggest adopting an entirely different means of solving the problem. The developer is often in a good position to say "This weird special thing you're asking for is unreasonable, but if I implemented this other feature that other people would also find useful, wouldn't that solve your problem too, if you took a step back and thought about what you really want this for?"
In this case I don't see a particularly better solution than the one proposed, offhand. But in general, implementing features just because people request them, without demanding concrete use-cases and considering what the best way to solve them is from the perspective of the software, is a recipe for writing software that's a mess of special-case hacks. We have enough of that in MediaWiki already.
On 15 August 2010 17:41, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
In this case I don't see a particularly better solution than the one proposed, offhand. But in general, implementing features just because people request them, without demanding concrete use-cases and considering what the best way to solve them is from the perspective of the software, is a recipe for writing software that's a mess of special-case hacks. We have enough of that in MediaWiki already.
However, you do seem to be requiring a nonexistent perfect proposal over one that would bring the present software to mere adequacy.
- d.
On Sun, Aug 15, 2010 at 12:47 PM, David Gerard dgerard@gmail.com wrote:
However, you do seem to be requiring a nonexistent perfect proposal over one that would bring the present software to mere adequacy.
You may have missed the part where I said "You've convinced me that your approach is pretty reasonable given the circumstances, although I wish it didn't have to intrude into unrelated extensions." Understandable, it was a long post and I should have emphasized the conclusion more. (It didn't help that I changed my mind on the issue twice while writing the post. Tends to obscure your point when that happens.)
Thanks for the clarification. I understand your initial concern.
Another limitation of Cite that would be worth fixing is that it is not possible to change the numbering style.
If you use the <references/> tag inside an element where the CSS list-style-type property is set to roman numbering, then you get roman-numbered references in the references list, and links in the text that are decimal-numbered.
Thomas
Aryeh Gregor a écrit :
You may have missed the part where I said "You've convinced me that your approach is pretty reasonable given the circumstances, although I wish it didn't have to intrude into unrelated extensions." Understandable, it was a long post and I should have emphasized the conclusion more. (It didn't help that I changed my mind on the issue twice while writing the post. Tends to obscure your point when that happens.)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
If we have a wish list going for Cite, two that I have noted on the talk page of Extension:Cite are * cannot subst: from within <ref> * cannot perform the pipe trick from within <ref>
Regards, Andrew
On 16 Aug 2010 at 12:22, ThomasV wrote:
Thanks for the clarification. I understand your initial concern.
Another limitation of Cite that would be worth fixing is that it is not possible to change the numbering style.
If you use the <references/> tag inside an element where the CSS list-style-type property is set to roman numbering, then you get roman-numbered references in the references list, and links in the text that are decimal-numbered.
Thomas
Aryeh Gregor a écrit :
You may have missed the part where I said "You've convinced me that your approach is pretty reasonable given the circumstances, although I wish it didn't have to intrude into unrelated extensions." Understandable, it was a long post and I should have emphasized the conclusion more. (It didn't help that I changed my mind on the issue twice while writing the post. Tends to obscure your point when that happens.)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, Aug 16, 2010 at 6:40 AM, Billinghurst billinghurst@gmail.com wrote:
If we have a wish list going for Cite, two that I have noted on the talk page of Extension:Cite are
- cannot subst: from within <ref>
- cannot perform the pipe trick from within <ref>
https://bugzilla.wikimedia.org/show_bug.cgi?id=2700
You can work around it with {{#tag:ref}}.
ThomasV wrote:
Thanks for the clarification. I understand your initial concern.
Another limitation of Cite that would be worth fixing is that it is not possible to change the numbering style.
If you use the <references/> tag inside an element where the CSS list-style-type property is set to roman numbering, then you get roman-numbered references in the references list, and links in the text that are decimal-numbered.
Thomas
In fact, you can. I implemented that exactly three months ago in r66749 (of course it hasn't been deployed yet...). Look at bug 22265 for the feature. You can now set the items that will produce the list numbering for each group.
wikitech-l@lists.wikimedia.org