Hi,
Our set up is: MediaWiki: 1.9.2 (r2132) PHP: 5.2.2 (cgi-fcgi) MySQL: 5.0.15-nt running on IIS
Our users sometimes author in MS Word and then copy and paste their article text into the MediaWiki edit box. Word often automatically changes some characters as you type such as turning a dash into an mdash, or a single quote into an apostrophe. These things generally display fine in MediaWiki, however, we have to export the contents of the wiki to be imported into a final environment for client approval. Unfortunately I don't have the authority to change the rather odd process. Since I think the users will continue to paste things in from Word, does anyone know of an extension or tool which would "clean" the text input into the wiki? I was unable to find anything like this in the Extensions pages.
- Courtney Christensen
I don't know of any currently existing extension to handle this, but the creation of one should be relatively straight forward. Here's a stub:
$wfHooks['ArticleSave'][] = 'removeBadCharsOnSave'; function removeBadCharsOnSave( &$article, &$user, &$text, &$summary ) { // do something to $text return true; }
Not sure exactly what to do to $text. PHP's native utf8_decode() may do the trick, or you may have to use preg_replace(), or something else. :/
-- Jim R. Wilson (jimbojw)
On 9/25/07, Christensen, Courtney ChristensenC@battelle.org wrote:
Hi,
Our set up is: MediaWiki: 1.9.2 (r2132) PHP: 5.2.2 (cgi-fcgi) MySQL: 5.0.15-nt running on IIS
Our users sometimes author in MS Word and then copy and paste their article text into the MediaWiki edit box. Word often automatically changes some characters as you type such as turning a dash into an mdash, or a single quote into an apostrophe. These things generally display fine in MediaWiki, however, we have to export the contents of the wiki to be imported into a final environment for client approval. Unfortunately I don't have the authority to change the rather odd process. Since I think the users will continue to paste things in from Word, does anyone know of an extension or tool which would "clean" the text input into the wiki? I was unable to find anything like this in the Extensions pages.
- Courtney Christensen
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Looks like the method described here is probably what you want:
http://shiflett.org/blog/2005/oct/convert-smart-quotes-with-php
So:
$wfHooks['ArticleSave'][] = 'removeBadCharsOnSave'; function removeBadCharsOnSave( &$article, &$user, &$text, &$summary ) { $search = array(chr(145), chr(146), chr(147), chr(148), chr(151));
$replace = array("'", "'", '"', '"', '-');
$text = str_replace($search, $replace, $text); return true; }
Disclaimer: I have not tested any of this code, YMMV
-- Jim
On 9/25/07, Jim Wilson wilson.jim.r@gmail.com wrote:
I don't know of any currently existing extension to handle this, but the creation of one should be relatively straight forward. Here's a stub:
$wfHooks['ArticleSave'][] = 'removeBadCharsOnSave'; function removeBadCharsOnSave( &$article, &$user, &$text, &$summary ) { // do something to $text return true; }
Not sure exactly what to do to $text. PHP's native utf8_decode() may do the trick, or you may have to use preg_replace(), or something else. :/
-- Jim R. Wilson (jimbojw)
On 9/25/07, Christensen, Courtney ChristensenC@battelle.org wrote:
Hi,
Our set up is: MediaWiki: 1.9.2 (r2132) PHP: 5.2.2 (cgi-fcgi) MySQL: 5.0.15-nt running on IIS
Our users sometimes author in MS Word and then copy and paste their article text into the MediaWiki edit box. Word often automatically changes some characters as you type such as turning a dash into an mdash, or a single quote into an apostrophe. These things generally display fine in MediaWiki, however, we have to export the contents of the wiki to be imported into a final environment for client approval. Unfortunately I don't have the authority to change the rather odd process. Since I think the users will continue to paste things in from Word, does anyone know of an extension or tool which would "clean" the text input into the wiki? I was unable to find anything like this in the Extensions pages.
- Courtney Christensen
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Looks like the method described here is probably what you want:
http://shiflett.org/blog/2005/oct/convert-smart-quotes-with-php
-- Jim
That looks like exactly what I need. Thank you, Jim, and I applaud your mad search skills. I'd never heard them called "smart quotes". If anyone cares to know the outcome I'll let you know once I've tested?
-Courtney
It works, the code you posted was perfect. It seems to change the characters to their htmlspecial character equivalent, which I wasn't expecting, but which is infinitely better than before.
Thanks again! -Courtney
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Christensen, Courtney Sent: Tuesday, September 25, 2007 3:14 PM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] users copy and paste "bad" special characters from MS Word
Looks like the method described here is probably what you want:
http://shiflett.org/blog/2005/oct/convert-smart-quotes-with-php
-- Jim
That looks like exactly what I need. Thank you, Jim, and I applaud your mad search skills. I'd never heard them called "smart quotes". If anyone cares to know the outcome I'll let you know once I've tested?
-Courtney
That's interesting - and unexpected. :(
I wonder why the'd be htmlspecialchar'd? Maybe running the result through html_enty_decode() might help. Not sure why that'd be necessary though.
Anyway, glad to help!
-- Jim
On 9/25/07, Christensen, Courtney ChristensenC@battelle.org wrote:
It works, the code you posted was perfect. It seems to change the characters to their htmlspecial character equivalent, which I wasn't expecting, but which is infinitely better than before.
Thanks again! -Courtney
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Christensen, Courtney Sent: Tuesday, September 25, 2007 3:14 PM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] users copy and paste "bad" special characters from MS Word
Looks like the method described here is probably what you want:
http://shiflett.org/blog/2005/oct/convert-smart-quotes-with-php
-- Jim
That looks like exactly what I need. Thank you, Jim, and I applaud your mad search skills. I'd never heard them called "smart quotes". If anyone cares to know the outcome I'll let you know once I've tested?
-Courtney
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org