Sven Hartrumpf wrote:
Hi.
A question about dumpHTML.inc:
function getFriendlyName( $name ) {
global $wgLang;
# Replace illegal characters for Windows paths with underscores
$friendlyName = strtr( $name, '/\\*?"<>|~',
'_________' );
# Work out lower case form. We assume we're on a system with
case-insensitive
# filenames, so unless the case is of a special form, we have to
disambiguate
if ( function_exists( 'mb_strtolower' ) ) {
$lowerCase = $wgLang->ucfirst( mb_strtolower( $name ) );
} else {
$lowerCase = ucfirst( strtolower( $name ) );
}
# Make it mostly unique
if ( $lowerCase != $friendlyName ) {
$friendlyName .= '_' . substr(md5( $name ), 0, 4);
}
# Handle colon specially by replacing it with tilde
# Thus we reduce the number of paths with hashes appended
$friendlyName = str_replace( ':', '~', $friendlyName );
return $friendlyName;
}
How can this last str_replace "reduce the number of paths with hashes
appended"?
The hash append happens _before_ this line.
Sorry, for my ignorance.
Sven
I think it reduces the number of paths with hashes appended compared
with treating the ':' in the initial strtr.
':' is a really common character on mediawiki due to the namespace, and
if it was replaced before, lots and lots of pages would enter the
$lowerCase != $friendlyName check.