Wikitech-l January 2010

wikitech-l@lists.wikimedia.org

91 participants
68 discussions

(no subject)
by 李琴 28 Jan '10

28 Jan '10

Hi all, I have built a LocalWiki. Now I want the data of it to keep consistent with the Wikipedia and one work I should do is to get the data of update from Wikipedia. I get the URLs through analyzing the RSS (http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%…) and get all HTML content of the edit box by analyzing these URLs after opening an URL and clicking the ’edit this page’. (eg: http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%… and its edit interface is http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%… . However, I encounter two problems during my work. Firstly, sometimes I can’t open a URL which is from the RSS and I don’t know why. That’s because I visit it too frequently and my IP address is prohibited or the network is too slow? If the reason is the former, how often can I visit a page of Wikipedia? Is there a timeout? Secondly, just as mentioned before I want to download all HTML of the content in the edit box from Wikipedia, however, I can do sometimes but other times I just can download part of it, what’s the reason? Thanks vanessa

4 3

Range Contribs tool; anti-vandal software; anon IPs' User-Agents; the netblocks that vandalize the most; etc.
by Unforgettableid 27 Jan '10

27 Jan '10

Hi all, I was curious about a vandalistic edit[1]: the logged-out vandal, who uses a US-based home broadband ISP[2][3], has made only one edit: the vandalistic edit I mentioned. The edit was made two days ago. I reverted it, then tried using Soxred93's useful Range Contributions tool[4] to see if any of the 255 IP addresses closest to the vandal's IP had ever made any other edits. Nope.[5] In fact, not even any of the closest 131072 have done so.[6] But when I expanded my search to the closest 262144, I found lots of edits over the past few weeks, made by a variety of IPs. I looked at the first seven. One was vandalism: an edit[7] to [[Patrick Stump]]. Someone else has since reverted it. It was made by another user from the same ISP.[8] I am just curious: A) Did I go too far when I did all the research I described above? Do you yourself often use the Range Contributions tool[4] for looking at vandals' ISPs' contributions? B) What do you think are the chances that the same person made both the first[1] and the second[7] vandalistic edits? The IP addresses' binary representations are quite different. C) Why did no anti-vandalism software automatically revert either edit? D) When I look at the history[9] of [[Patrick Stump]], I see that there were fourteen edits between 06:51 and 07:03, most vandalism. Yet the vandalistic edits come from a variety of IP addresses and usernames. The IP addresses differ widely from each other. Why is this? E) When comparing two vandals' edits in other situations, is there any quick way for editors to find out both IPs' hostnames, User-Agents, Accept-Charset strings, Accept-Language strings, screen resolutions, and/or IP geolocation results? I do very little vandalism removal, so I myself am not sure. F) Which netblocks do the most vandalism and the least useful editing? Which cities? Which entire countries? Should those netblocks, cities, and countries be forced to log in before editing? G) Wouldn't it be cool if some web browsers or ISPs would tell Wikipedia what a contributor's PPPoE username was whenever the contributor made an edit? If you reply to only one of A), B), C), D), E), F), or G) then please use a different subject line than I used. And add a "(was: ...)" tag at the end of the subject line. That way, it'll be easier for others to follow just the parts of the discussion that they want to follow. Kind regards, --[[User:Unforgettableid]] ^ [1]. http://en.wikipedia.org/w/index.php?title=Fetus_in_fetu&diff=prev&oldid=339… ^ [2]. http://toolserver.org/~chm/whois.php?ip=174.105.248.31 ^ [3]. http://en.wikipedia.org/wiki/Road_Runner_High_Speed_Online ^ [4]. http://toolserver.org/~soxred93/rangecontribs/ ^ [5]. http://toolserver.org/~soxred93/rangecontribs/index.php?type=range&ips=174.… ^ [6]. http://toolserver.org/~soxred93/rangecontribs/index.php?type=range&ips=174.… ^ [7]. http://en.wikipedia.org/w/index.php?title=Patrick_Stump&diff=prev&oldid=339… ^ [8]. http://toolserver.org/~chm/whois.php?ip=174.106.99.246 ^ [9]. http://en.wikipedia.org/w/index.php?title=Patrick_Stump&action=history

5 5

Extention to add data error
by 李琴 27 Jan '10

27 Jan '10

Hi all, I added a extention to my local wiki.It works for adding data to the wiki.And I used insertNewArticle() or updateArticle() to add articles.but sometimes it couldn't work .For example $title = 'mywiki'; $content = '{{Infobox_housi...}} This is my wiki'; $wgTitle = Title::newFromText($title); $wgArticle = new Article( $wgTitle ); $wgArticle->insertNewArticle( "$content", '', false, false ); Also actually I get the '$title' and '$content' from a xml file. It shows the error: Preprocessor_DOM::preprocessToObj generated invalid XML Backtrace: #0 D:\wamp\www\mediawiki\includes\parser\Parser.php(2579): Preprocessor_DOM->preprocessToObj('<div style="bor...', 1) #1 D:\wamp\www\mediawiki\includes\parser\Parser.php(3008): Parser->preprocessToDom('<div style="bor...', 1) #2 D:\wamp\www\mediawiki\includes\parser\Parser.php(2880): Parser->getTemplateDom(Object(Title)) #3 D:\wamp\www\mediawiki\includes\parser\Preprocessor_DOM.php(959): Parser->braceSubstitution(Array, Object(PPFrame_DOM)) #4 D:\wamp\www\mediawiki\includes\parser\Parser.php(2632): PPFrame_DOM->expand(Object(PPNode_DOM), 0) #5 D:\wamp\www\mediawiki\includes\parser\Parser.php(875): Parser->replaceVariables('{{Infobox_housi...') #6 D:\wamp\www\mediawiki\includes\parser\Parser.php(327): Parser->internalParse('{{Infobox_housi...') #7 D:\wamp\www\mediawiki\includes\Article.php(2955): Parser->parse('{{Infobox_housi...', Object(Title), Object(ParserOptions), true, true, NULL) #8 D:\wamp\www\mediawiki\includes\Article.php(1665): Article->prepareTextForEdit('{{Infobox_housi...') #9 D:\wamp\www\mediawiki\includes\Article.php(1541): Article->doEdit('{{Infobox_housi...', '', 98) #10 D:\wamp\www\mediawiki\extensions\update\update_body.php(65): Article->myUpdateArticle('{{Infobox_housi...', '', false, false) #11 D:\wamp\www\mediawiki\includes\SpecialPage.php(559): Update->execute(NULL) #12 D:\wamp\www\mediawiki\includes\Wiki.php(229): SpecialPage::executePath(Object(Title)) #13 D:\wamp\www\mediawiki\includes\Wiki.php(59): MediaWiki->initializeSpecialCases(Object(Title), Object(OutputPage), Object(WebRequest)) #14 D:\wamp\www\mediawiki\index.php(116): MediaWiki->initialize(Object(Title), NULL, Object(OutputPage), Object(User), Object(WebRequest)) #15 {main} why does this happen?how can I solve this problem? Thanks vanessa

1 0

image service switchover
by Ariel T. Glenn 26 Jan '10

26 Jan '10

We are going to move image serving from ms1 (plus a few other miscellaneous things that also live on that host) to the new much bigger ms7 starting at 1 pm EST, 9 pm UTC. We expect 30 to 60 minutes of down time (= no uploads of media files). Depending on how things go, reads of pages may be spotty as well during this time. Any questions or comments, I'm in the #wikimedia-tech irc channel. Ariel Glenn

1 1

(no subject)
by 李琴 24 Jan '10

24 Jan '10

hi all, I built a local wiki, and I want to set the recentchange limit to 500|1000|5000|10000. I changed the $wgRCLinkLimits = array( 50, 100, 250, 500 ); to $wgRCLinkLimits = array( 500, 1000, 5000, 10000 ); and 'rclimit' => 10000. Is this right? Or is there something more to do? Thanks vanessa

3 4

Schema changes
by Victor Vasiliev 24 Jan '10

24 Jan '10

Hello everyone! I wonder what's the schema change approval process we have presently. Is it still "ask Brion or Tim" or something else is used now? --vvv

3 2

Managing group of pages in white-list
by Sylvain Leroux 23 Jan '10

23 Jan '10

Hi, For a private wiki, I had the request to add groups of pages to the white-list. Contributors will regularly add (and possibly delete) pages in those groups. So manually editing $wgWhitelistRead appears to be a maintenance nightmare. So, is there a way to add regexp or namespace (or any other "collection" of pages) in $wgWhitelistRead? If not (as I think), is there a hook I could use to patch the white-list validation? Thanks in advance for your answers, Sylvain Leroux -- sylvain(a)chicoree.fr http://www.chicoree.fr

2 3

recentchange limit
by 李琴 23 Jan '10

23 Jan '10

1 0

Re: [Wikitech-l] RDFa and Microdata in MediaWiki
by Henri Sivonen 22 Jan '10

22 Jan '10

Aryeh Gregor wrote: > RDFa is a way to embed data in HTML more robustly than with attributes > like class and title, which are reserved for author use or have > existing functionality. It allows you to specify an external > vocabulary that adds some semantics to your page that HTML is not > capable of expressing by itself. More to the point, it allows an RDF graph to be overlaid onto an XHTML document so that the XHTML document and the RDF graph can share some strings. The XHTML data model isn't extended per se. Instead, a separate RDF graph can be extracted. > Both RDFa+HTML and Microdata are Working Drafts at the W3C right now It's true that both HTML+RDFa and Microdata have been published in Working Drafts at the W3C. However, Microdata has never been through a Working Group Decision to publish as a First Public Working Draft while HTML+RDFa has. Microdata was added to a Working Draft after FPWD and there has since been a Working Group decision to take Microdata out of that spec. It is reasonable to expect that soon HTML+RDFa and Microdata could be in the same stage Process-wise, but it's inaccurate to portray them as being at the same stage Process-wise right now. > I should note that currently Google and a couple of others support > RDFa but not Microdata. See http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Sep/0126.html (search for the word "deviate"). Manu Sporny wrote: > The general points that you made were riddled with technical > inaccuracies, bad advice, and if implemented by the MediaWiki community, > would have resulted in semantic data that would have been ambiguous at > best and erroneous at worst. With that introduction, I think it's fair to evaluate your message for inaccuracies or relevant omissions as well. > The above could be marked up in RDFa, with pre-defined vocabs, like so: It should be noted that the concept of "pre-defined vocabs" is neither in the HTML+RDFa draft nor in the RDFa in XHTML spec from the XHTML2 WG. > typeof="dctype:StillImage"> > Emery Molyneux Terrestrial Globe > by <a rel="cc:attributionUrl" href=" > http://example.org/bob/" > > property="cc:attributionName">Bob Smith > is licensed under a <a rel="license" > href=" > http://creativecommons.org/licenses/by-sa/3.0/us/" > >Creative > Commons Attribution-Share Alike 3.0 United States License</a>. Hiding the CURIE declarations is a common pattern when advocating RDFa: It makes RDFa appear tidier than it is. To write this in RDFa in XHTML (the RDFa spec you say is safe to use for deployment), one would need to declare the CURIE prefixes: Emery Molyneux Terrestrial Globe by <a xmlns:cc="http://creativecommons.org/ns#" rel="cc:attributionUrl" href=" http://example.org/bob/" property="cc:attributionName">Bob Smith is licensed under a <a rel="license" href=" http://creativecommons.org/licenses/by-sa/3.0/us/" >Creative Commons Attribution-Share Alike 3.0 United States License</a>. Philip Jägenstedt already covered other points about the examples. > However - XHTML1+RDFa is a published W3C Recommendation and it is safe to use it for deployment. RDFa in XHTML has indeed been published as a Recommendation jointly by the Semantic Web Deployment Working Group and the XHTML2 Working Group. However, you fail to mention that even though the document mentions "HTML" in its first sentence, all the normative matter concerns strictly XHTML and the document has gone through the W3C Process as a specification that applies to XML. MediaWiki uses the text/html and, thus, its pages get processed as HTML, so it would be inappropriate to rely on a spec that had been reviewed as an XML spec. I think it's misleading to promote text/html deployment of specs whose normative matter has been written and reviewed for XML. The most egregious example of this is that the XHTML2 WG has written the normative matter of XHTML 1.x specs for XML but then published a Working Group Note (Notes can be pretty much anything and don't go through the W3C Recommendation track Process) that gives advice on deployment as text/html (http://www.w3.org/TR/xhtml-media-types/). Furthermore, the ease of getting a spec to REC at the W3C depends on how many people are interested in the spec. The more people are interested in a spec, the more review comments there are. The flip side is that when there's *less* interest in a spec, it's easier to get it to Recommendation due to fewer comments raised. Thus, progress along the REC track isn't a commensurable indicator of technical merit or technical maturity across different specs and WGs. Also, when assessing the "safe" deployability of RDFa in XHTML, it's relevant to consider that 1) RDFa in XHTML was knowingly (see http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-August/015913.html) progressed on the Recommendation track without resolving how RDFa works with HTML first. 2) An RDFa 1.1 is in the works, and the changes being considered make RDFa 1.0 look like a beta release. (Which is understandable, since a good part of the technical review of RDFa has occurred after RDFa in XHTML was rushed to REC.) -- Henri Sivonen hsivonen(a)iki.fi http://hsivonen.iki.fi/

16 26

Hiding Special:UnwatchedPages
by Aryeh Gregor 21 Jan '10

21 Jan '10

Why do we hide Special:UnwatchedPages from regular users? Unwatched pages are something that people should know about so they can be sure to watch them. If no one is actively watching a page, it's more likely that vandalism will stick around. Yes, vandals and trolls could abuse the info, but they could abuse all sorts of other features too, and that's not a reason to deny them to legitimate users. If there is any such threat, then that will just encourage legitimate users to watch the pages, thereby removing them from the list. So I suggest we set $wgGroupPermissions['*']['unwatchedpages'] = true; in DefaultSettings.php. Or maybe 'user' instead of '*', if people prefer. Does anyone object?

7 20

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2010