I've been asked to bring this up here as a result of the discussion on
http://en.wikipedia.org/wiki/Wikipedia:Featured_picture_candidates/Image:St…
where a video was nominated for featured status, but several people
objected because they could not play Ogg Theora. We currently do not
allow other video formats.
I would suggest implementing the following policy on all Wikimedia wikis:
"It is allowed to upload files in patent-encumbered formats like MP3 or
the MPEG-4 codecs only provided that a version in a non-encumbered
format is also uploaded. Files which are only provided in
patent-encumbered formats should be deleted."
Thoughts, comments, objections? Ideally, the conversion could be done
automatically, but a soft policy might do the trick for now.
Thanks,
Erik
> I want to be able to mirror an article from our wiki onto a separate
> website, leaving out the navigation links, tabs and other material
> that isn't specific to my article. Is there a technique for doing this?
Not that I am aware of, but (depending on how badly you want this
behaviour), and assuming you always want the latest "live" version,
you could perhaps have the separate website call Special:Export (e.g.
http://en.wikipedia.org/wiki/Special:Export/test ), then parse the
XML, to get the raw wiki text, and then manually remove any links, and
then display it in the way that you want.
e.g. in PHP you could do all of the above apart from removing the
links and the displaying like so:
======================================
function getWebArticleText($title) {
// build the URL
$url = "http://en.wikipedia.org/wiki/Special:Export/" . $title;
// init CURL resource
$ch = curl_init();
// set url to pull
curl_setopt($ch, CURLOPT_URL, $url);
// return pulled web text into a variable.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// retrieve the web text.
$str = curl_exec ($ch);
// if we encountered an error, then log it, and exit.
if (curl_error($ch)) {
trigger_error("Curl error #: " . curl_errno($ch) . " - " .
curl_error ($ch) );
print "Curl error #: " . curl_errno($ch) . " - " . curl_error
($ch) . " - exiting.\n";
exit();
}
// close the CURL resource
curl_close ($ch);
// just want article wiki text found in via [[Special:export]] XML output
xml_parse_into_struct(xml_parser_create('UTF-8'), $str, $val, $ind);
// if we got valid data.
if (isset($ind['TEXT'][0])) {
$text = $val[$ind['TEXT'][0]]['value'];
}
// otherwise we got invalid data (most likely article was deleted,
or possibly wikipedia may be malfunctioning)
else {
$text = "";
}
// return the article's wiki text.
return $text;
}
======================================
Then you'll need some regexes to remove the links, and then have to
work out how to display it (given that it's wiki text, and you want
HTML presumably).
All the best,
Nick.
An automated run of parserTests.php showed the following failures:
Running test BUG 361: URL within URL, not bracketed... FAILED!
Running test External links: invalid character... FAILED!
Running test Bug 2702: Mismatched <i> and <a> tags are invalid... FAILED!
Running test A table with no data.... FAILED!
Running test A table with nothing but a caption... FAILED!
Running test Link containing "#<" and "#>" % as a hex sequences... FAILED!
Running test Magic links: PMID incorrectly converts space to underscore... FAILED!
Running test Template with thumb image (wiht link in description)... FAILED!
Running test Link to image page... FAILED!
Running test BUG 1887: A ISBN with a thumbnail... FAILED!
Running test BUG 1887: A <math> with a thumbnail... FAILED!
Running test BUG 561: {{/Subpage}}... FAILED!
Running test Simple category... FAILED!
Running test Section headings with TOC... FAILED!
Running test Media link with nasty text... FAILED!
Running test Bug 2095: link with pipe and three closing brackets... FAILED!
Running test Sanitizer: Validating the contents of the id attribute (bug 4515)... FAILED!
Passed 264 of 281 tests (93.95%) FAILED!
If I want to restrict read access to certain namespaces (template,
special pages, mediawiki), other than main and talk namespaces, where
is the best single entry point place to do this using a regex
expression? I am aware of hidden page patch and the extension to limit
access based on pages or groups. I am looking for a very simple
solution to a very specific problem. If you are not a sysop, then I
want to limit access to talk and main namespace.
Thanks,
Jamil
> How can I disable the link on the pictures ?
Discussed at http://bugzilla.wikimedia.org/show_bug.cgi?id=539#c11
You can vote for that bug if you like, or you can dig through the code
to modify the bit that makes links on images.
All the best,
Nick.
> We definitely care! The parser should never produce invalid HTML output, if it
> does that's a bug.
Well, if you're serious about this, I'd like to propose a
mutually-beneficial swap: something small that will help you, in
exchange for something small that will help me.
What I would like is a fix for
http://bugzilla.wikimedia.org/show_bug.cgi?id=3693 to be checked into
the tree (and in particular, to be made available on the English
Wikipedia). This is a small bug/feature request that :
* Already has a patch attached.
* The patch is one line long.
* The patch restores previously existing behaviour (albeit
unintentional behaviour).
* The previous behaviour was useful, and was used (by me at least).
* I am not aware of any security problems introduced by restoring this
behaviour.
What I am offering in return is a bit of testing software that is:
* Written in PHP.
* About 200 lines long.
* That will find invalid HTML output bugs in the Parser code.
To demonstrate that I am serious and that I am not wasting your time,
I have included below five small examples, all found by the test
software, of wiki text that produces invalid XHTML output.
The output was generated using MediaWiki 1.5.6 (which I believe is the
latest version tarball you can get) *without* tidy enabled, using PHP
4.1.2, and MySQL 3.23. Also reproduced when running PHP 4.4.2, and
MySQL 5.0.18. Any browser behaviour I describe was observed in Firefox
1.5.0.1.
* Example 1)
Wiki Text input:
=====================================
<s>a
{|
=====================================
Current invalid XHTML output:
=====================================
<p><s>a
</p>
<table>
</table>
=====================================
Suggested valid XHTML output:
=====================================
<p><s>a</s></p>
=====================================
Result: Invalid XHTML, and when rendered has the effect of striking
all content from the page, including footers, headers, basically
everything, as the <s> tag is not closed.
View output: http://nickj.org/index.php?title=MediaWiki/Parser1
If you want to see a fun variant of this, with hiding almost all of
the text on the page, then check this out:
=====================================
<font style="visibility:hidden">a
{|
=====================================
This has the potential to seriously confuse less techie people running
small wikis, most especially if applied to the main page.
View output: http://nickj.org/index.php?title=MediaWiki/Parser1-hidden
* Example 2)
Wiki Text input:
=====================================
a<center>
c
=====================================
Current invalid XHTML output:
=====================================
<p>a<center>
c</center>
</p>
=====================================
Suggested valid XHTML output:
=====================================
<p>a</p>
<center>c</center>
<br />
=====================================
Result: Renders fine, but page is invalid XHTML.
View output: http://nickj.org/index.php?title=MediaWiki/Parser2
* Example 3)
Wiki Text input:
=====================================
<code>b
;
=====================================
Current invalid XHTML output:
=====================================
<p><code>b
</p>
<dl><dt></code>
=====================================
Suggested valid XHTML output:
=====================================
<p><code>b</code></p>
=====================================
Result: Renders fine, but page is invalid XHTML.
View output: http://nickj.org/index.php?title=MediaWiki/Parser3
* Example 4)
Wiki Text input:
=====================================
;<div>a
=====================================
Current invalid XHTML output:
=====================================
<dl><dt><div>a</div>
</dt></dl>
=====================================
Suggested valid XHTML output:
=====================================
<dl>
<dd>
<div>a</div>
</dd>
</dl>
=====================================
Result: Renders fine, but page is invalid XHTML.
View output: http://nickj.org/index.php?title=MediaWiki/Parser4
* Example 5)
Wiki Text input:
=====================================
;<div><blockquote>
=====================================
Current invalid XHTML output:
=====================================
<dl><dt><div><blockquote></blockquote>
</dt></dl>
</div>
=====================================
Suggested valid XHTML output:
=====================================
<!-- Nothing. -->
=====================================
Result: Invalid XHTML, and results in the page appearing to be a bit
stuffed up, with the page's text font several sizes smaller than
usual, and the article/discussion/edit/history links being shifted
upwards and leftwards.
View output: http://nickj.org/index.php?title=MediaWiki/Parser5
All the best,
Nick.
The latest edition of wikizine said:
> [No Tinyurl] On Meta there is a global
> blacklist of domains. Url's of domains listed
> there can not be posted on any Wikimedia wiki.
> This is an anti-spam function of Wikimedia.
> The widely used domain "tinyurl.com" is now
> added to the blacklist. Tinyurl.com is a free
> redirect service to make a very long url very
> short. This service has been abused by
> spammers to link sites already on the blacklist.
Is there any evidence that this service has been abused by spammers?
Since tinyurl links wouldn't increase the page rank of a spammer's
site, I don't see why they'd bother using it, especially since most
wiki spam is now hidden from humans in invisible div tags, meaning the
only aim can be to influence search results.
The Spam Blacklist is shared by a lot of non-Wikimedia sites, so we
need to be sure the URLs listed there are really spam, and not
something which is going to cause problems for valid links on other
wikis.
Angela.
It's not a bug. The pages exist, and are thus eligible for indexing.
Negotiate a solution if you wish, but please don't falsely label the
issue.
Rob Church
On 13/02/06, homey2005(a)sympatico.ca <homey2005(a)sympatico.ca> wrote:
> The problem is it's a bug and adds wikipedia-generated garbage to search engines which I would think dilutes the value of wikipedia entries on google.
>
> Also, it partly defeats the purpose of protected a deleted page in that it encourages vandals to force deletion protection because, at least, they'll have some way of embarassing a non-notable person they target, particularly if there are uncharitable comments on the linked to AFD page.
>
> >
> > From: The Cunctator <cunctator(a)gmail.com>
> > Date: 2006/02/12 Sun PM 01:04:46 EST
> > To: Wikimedia developers <wikitech-l(a)wikimedia.org>
> > Subject: Re: [Wikitech-l] Protected deleted pages and google
> >
> > On 2/11/06, homey2005(a)sympatico.ca <homey2005(a)sympatico.ca> wrote:
> > > Can something be done to prevent "protected deleted" pages from being indexed by google? See, for instance, http://www.google.ca/search?hl=en&q=shane+ruttle&btnG=Google+Search&meta=
> > >
> > Oh my god! Someone might learn about Shane Ruttle!
> >
> > I fail to see what the problem is here.
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)wikimedia.org
> > http://mail.wikipedia.org/mailman/listinfo/wikitech-l
> >
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l
>
We appear to have suffered a complete site outage because "zwinger" is down.
What does this machine do, and why is there not a backup for it?
Are there any tasks which can *not* have backup machines?
--
Phil
[[en:User:Phil Boswell]]