Hi all,
I have built a LocalWiki. Now I want the data of it to keep consistent
with the
Wikipedia and one work I should do is to get the data of update from
Wikipedia.
I get the URLs through analyzing the RSS
(http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%…)
and get all HTML content of the edit box by analyzing
these URLs after opening an URL and clicking the ’edit this page’.
(eg:
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%…
and its edit interface is
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%…
. However, I encounter two problems during my work.
Firstly, sometimes I can’t open a URL which is from the RSS and I don’t
know why.
That’s because I visit it too frequently and my IP address is prohibited
or the network is too slow?
If the reason is the former, how often can I visit a page of Wikipedia?
Is there a timeout?
Secondly, just as mentioned before
I want to download all HTML of the content in the edit box from Wikipedia,
however,
I can do sometimes but other times I just can download part of it, what’s
the reason?
Thanks
vanessa
Hi all,
I was curious about a vandalistic edit[1]: the logged-out vandal, who uses a
US-based home broadband ISP[2][3], has made only one edit: the vandalistic edit
I mentioned. The edit was made two days ago. I reverted it, then tried using
Soxred93's useful Range Contributions tool[4] to see if any of the 255 IP
addresses closest to the vandal's IP had ever made any other edits. Nope.[5]
In fact, not even any of the closest 131072 have done so.[6] But when I
expanded my search to the closest 262144, I found lots of edits over the past
few weeks, made by a variety of IPs. I looked at the first seven. One was
vandalism: an edit[7] to [[Patrick Stump]]. Someone else has since reverted
it. It was made by another user from the same ISP.[8] I am just curious:
A) Did I go too far when I did all the research I described above? Do you
yourself often use the Range Contributions tool[4] for looking at vandals' ISPs'
contributions?
B) What do you think are the chances that the same person made both the
first[1] and the second[7] vandalistic edits? The IP addresses' binary
representations are quite different.
C) Why did no anti-vandalism software automatically revert either edit?
D) When I look at the history[9] of [[Patrick Stump]], I see that there were
fourteen edits between 06:51 and 07:03, most vandalism. Yet the vandalistic
edits come from a variety of IP addresses and usernames. The IP addresses
differ widely from each other. Why is this?
E) When comparing two vandals' edits in other situations, is there any quick
way for editors to find out both IPs' hostnames, User-Agents, Accept-Charset
strings, Accept-Language strings, screen resolutions, and/or IP geolocation
results? I do very little vandalism removal, so I myself am not sure.
F) Which netblocks do the most vandalism and the least useful editing? Which
cities? Which entire countries? Should those netblocks, cities, and countries
be forced to log in before editing?
G) Wouldn't it be cool if some web browsers or ISPs would tell Wikipedia what a
contributor's PPPoE username was whenever the contributor made an edit?
If you reply to only one of A), B), C), D), E), F), or G) then please use a
different subject line than I used. And add a "(was: ...)" tag at the end of
the subject line. That way, it'll be easier for others to follow just the parts
of the discussion that they want to follow.
Kind regards,
--[[User:Unforgettableid]]
^ [1].
http://en.wikipedia.org/w/index.php?title=Fetus_in_fetu&diff=prev&oldid=339…
^ [2]. http://toolserver.org/~chm/whois.php?ip=174.105.248.31
^ [3]. http://en.wikipedia.org/wiki/Road_Runner_High_Speed_Online
^ [4]. http://toolserver.org/~soxred93/rangecontribs/
^ [5].
http://toolserver.org/~soxred93/rangecontribs/index.php?type=range&ips=174.…
^ [6].
http://toolserver.org/~soxred93/rangecontribs/index.php?type=range&ips=174.…
^ [7].
http://en.wikipedia.org/w/index.php?title=Patrick_Stump&diff=prev&oldid=339…
^ [8]. http://toolserver.org/~chm/whois.php?ip=174.106.99.246
^ [9]. http://en.wikipedia.org/w/index.php?title=Patrick_Stump&action=history
Hi all,
I added a extention to my local wiki.It works for adding data to the
wiki.And I used insertNewArticle() or updateArticle() to add articles.but
sometimes it couldn't work .For example
$title = 'mywiki';
$content = '{{Infobox_housi...}} This is my wiki';
$wgTitle = Title::newFromText($title);
$wgArticle = new Article( $wgTitle );
$wgArticle->insertNewArticle( "$content", '', false, false );
Also actually I get the '$title' and '$content' from a xml file.
It shows the error:
Preprocessor_DOM::preprocessToObj generated invalid XML
Backtrace:
#0 D:\wamp\www\mediawiki\includes\parser\Parser.php(2579):
Preprocessor_DOM->preprocessToObj('<div style="bor...', 1)
#1 D:\wamp\www\mediawiki\includes\parser\Parser.php(3008):
Parser->preprocessToDom('<div style="bor...', 1)
#2 D:\wamp\www\mediawiki\includes\parser\Parser.php(2880):
Parser->getTemplateDom(Object(Title))
#3 D:\wamp\www\mediawiki\includes\parser\Preprocessor_DOM.php(959):
Parser->braceSubstitution(Array, Object(PPFrame_DOM))
#4 D:\wamp\www\mediawiki\includes\parser\Parser.php(2632):
PPFrame_DOM->expand(Object(PPNode_DOM), 0)
#5 D:\wamp\www\mediawiki\includes\parser\Parser.php(875):
Parser->replaceVariables('{{Infobox_housi...')
#6 D:\wamp\www\mediawiki\includes\parser\Parser.php(327):
Parser->internalParse('{{Infobox_housi...')
#7 D:\wamp\www\mediawiki\includes\Article.php(2955):
Parser->parse('{{Infobox_housi...', Object(Title), Object(ParserOptions),
true, true, NULL)
#8 D:\wamp\www\mediawiki\includes\Article.php(1665):
Article->prepareTextForEdit('{{Infobox_housi...')
#9 D:\wamp\www\mediawiki\includes\Article.php(1541):
Article->doEdit('{{Infobox_housi...', '', 98)
#10 D:\wamp\www\mediawiki\extensions\update\update_body.php(65):
Article->myUpdateArticle('{{Infobox_housi...', '', false, false)
#11 D:\wamp\www\mediawiki\includes\SpecialPage.php(559):
Update->execute(NULL)
#12 D:\wamp\www\mediawiki\includes\Wiki.php(229):
SpecialPage::executePath(Object(Title))
#13 D:\wamp\www\mediawiki\includes\Wiki.php(59):
MediaWiki->initializeSpecialCases(Object(Title), Object(OutputPage),
Object(WebRequest))
#14 D:\wamp\www\mediawiki\index.php(116):
MediaWiki->initialize(Object(Title), NULL, Object(OutputPage), Object(User),
Object(WebRequest))
#15 {main}
why does this happen?how can I solve this problem?
Thanks
vanessa
We are going to move image serving from ms1 (plus a few other
miscellaneous things that also live on that host) to the new much bigger
ms7 starting at 1 pm EST, 9 pm UTC.
We expect 30 to 60 minutes of down time (= no uploads of media files).
Depending on how things go, reads of pages may be spotty as well during
this time.
Any questions or comments, I'm in the #wikimedia-tech irc channel.
Ariel Glenn
hi all,
I built a local wiki, and I want to set the recentchange limit to
500|1000|5000|10000.
I changed the $wgRCLinkLimits = array( 50, 100, 250, 500 );
to $wgRCLinkLimits = array( 500, 1000, 5000, 10000 ); and 'rclimit' =>
10000.
Is this right? Or is there something more to do?
Thanks
vanessa
Hello everyone!
I wonder what's the schema change approval process we have presently. Is
it still "ask Brion or Tim" or something else is used now?
--vvv
Hi,
For a private wiki, I had the request to add groups of pages to the white-list.
Contributors will regularly add (and possibly delete) pages in those groups. So
manually editing $wgWhitelistRead appears to be a maintenance nightmare.
So, is there a way to add regexp or namespace (or any other "collection" of
pages) in $wgWhitelistRead?
If not (as I think), is there a hook I could use to patch the white-list validation?
Thanks in advance for your answers,
Sylvain Leroux
--
sylvain(a)chicoree.fr
http://www.chicoree.fr
hi all,
I built a local wiki, and I want to set the recentchange limit to
500|1000|5000|10000.
I changed the $wgRCLinkLimits = array( 50, 100, 250, 500 );
to $wgRCLinkLimits = array( 500, 1000, 5000, 10000 ); and 'rclimit' =>
10000.
Is this right? Or is there something more to do?
Thanks
vanessa
Aryeh Gregor wrote:
> RDFa is a way to embed data in HTML more robustly than with attributes
> like class and title, which are reserved for author use or have
> existing functionality. It allows you to specify an external
> vocabulary that adds some semantics to your page that HTML is not
> capable of expressing by itself.
More to the point, it allows an RDF graph to be overlaid onto an XHTML document so that the XHTML document and the RDF graph can share some strings. The XHTML data model isn't extended per se. Instead, a separate RDF graph can be extracted.
> Both RDFa+HTML and Microdata are Working Drafts at the W3C right now
It's true that both HTML+RDFa and Microdata have been published in Working Drafts at the W3C. However, Microdata has never been through a Working Group Decision to publish as a First Public Working Draft while HTML+RDFa has. Microdata was added to a Working Draft after FPWD and there has since been a Working Group decision to take Microdata out of that spec.
It is reasonable to expect that soon HTML+RDFa and Microdata could be in the same stage Process-wise, but it's inaccurate to portray them as being at the same stage Process-wise right now.
> I should note that currently Google and a couple of others support
> RDFa but not Microdata.
See http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Sep/0126.html (search for the word "deviate").
Manu Sporny wrote:
> The general points that you made were riddled with technical
> inaccuracies, bad advice, and if implemented by the MediaWiki community,
> would have resulted in semantic data that would have been ambiguous at
> best and erroneous at worst.
With that introduction, I think it's fair to evaluate your message for inaccuracies or relevant omissions as well.
> The above could be marked up in RDFa, with pre-defined vocabs, like so:
It should be noted that the concept of "pre-defined vocabs" is neither in the HTML+RDFa draft nor in the RDFa in XHTML spec from the XHTML2 WG.
> <p about="EmeryMolyneux-terrestrialglobe-1592-20061127.jpg"
> typeof="dctype:StillImage">
> <span property="dc:title">Emery Molyneux Terrestrial Globe</span>
> by <a rel="cc:attributionUrl" href="
> http://example.org/bob/"
>
> property="cc:attributionName">Bob Smith</span>
> is licensed under a <a rel="license"
> href="
> http://creativecommons.org/licenses/by-sa/3.0/us/"
> >Creative
> Commons Attribution-Share Alike 3.0 United States License</a>.</p>
Hiding the CURIE declarations is a common pattern when advocating RDFa: It makes RDFa appear tidier than it is. To write this in RDFa in XHTML (the RDFa spec you say is safe to use for deployment), one would need to declare the CURIE prefixes:
<p xmlns:dctype="http://purl.org/dc/dcmitype/" about="EmeryMolyneux-terrestrialglobe-1592-20061127.jpg"
typeof="dctype:StillImage">
<span xmlns:dc="http://purl.org/dc/elements/1.1/" property="dc:title">Emery Molyneux Terrestrial Globe</span>
by <a xmlns:cc="http://creativecommons.org/ns#" rel="cc:attributionUrl" href="
http://example.org/bob/"
property="cc:attributionName">Bob Smith</span>
is licensed under a <a rel="license"
href="
http://creativecommons.org/licenses/by-sa/3.0/us/"
>Creative
Commons Attribution-Share Alike 3.0 United States License</a>.</p>
Philip Jägenstedt already covered other points about the examples.
> However - XHTML1+RDFa is a published W3C Recommendation and it is safe to use it for deployment.
RDFa in XHTML has indeed been published as a Recommendation jointly by the Semantic Web Deployment Working Group and the XHTML2 Working Group. However, you fail to mention that even though the document mentions "HTML" in its first sentence, all the normative matter concerns strictly XHTML and the document has gone through the W3C Process as a specification that applies to XML.
MediaWiki uses the text/html and, thus, its pages get processed as HTML, so it would be inappropriate to rely on a spec that had been reviewed as an XML spec.
I think it's misleading to promote text/html deployment of specs whose normative matter has been written and reviewed for XML. The most egregious example of this is that the XHTML2 WG has written the normative matter of XHTML 1.x specs for XML but then published a Working Group Note (Notes can be pretty much anything and don't go through the W3C Recommendation track Process) that gives advice on deployment as text/html (http://www.w3.org/TR/xhtml-media-types/).
Furthermore, the ease of getting a spec to REC at the W3C depends on how many people are interested in the spec. The more people are interested in a spec, the more review comments there are. The flip side is that when there's *less* interest in a spec, it's easier to get it to Recommendation due to fewer comments raised. Thus, progress along the REC track isn't a commensurable indicator of technical merit or technical maturity across different specs and WGs.
Also, when assessing the "safe" deployability of RDFa in XHTML, it's relevant to consider that
1) RDFa in XHTML was knowingly (see http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-August/015913.html) progressed on the Recommendation track without resolving how RDFa works with HTML first.
2) An RDFa 1.1 is in the works, and the changes being considered make RDFa 1.0 look like a beta release. (Which is understandable, since a good part of the technical review of RDFa has occurred after RDFa in XHTML was rushed to REC.)
--
Henri Sivonen
hsivonen(a)iki.fi
http://hsivonen.iki.fi/
Why do we hide Special:UnwatchedPages from regular users? Unwatched
pages are something that people should know about so they can be sure
to watch them. If no one is actively watching a page, it's more
likely that vandalism will stick around. Yes, vandals and trolls
could abuse the info, but they could abuse all sorts of other features
too, and that's not a reason to deny them to legitimate users. If
there is any such threat, then that will just encourage legitimate
users to watch the pages, thereby removing them from the list.
So I suggest we set $wgGroupPermissions['*']['unwatchedpages'] = true;
in DefaultSettings.php. Or maybe 'user' instead of '*', if people
prefer. Does anyone object?