CMS MADE SIMPLE FORGE

CMS Made Simple Core

 

[#2391] munge_string_to_url doesn't elliminate all bad characters

avatar
Created By: Rimas Kudelis (rimas)
Date Submitted: Fri Jul 04 04:45:29 -0400 2008

Assigned To: Ted Kulp (wishy)
Version: None
CMSMS Version: None
Severity: None
Resolution: Won't Fix
State: Closed
Summary:
munge_string_to_url doesn't elliminate all bad characters
Detailed Description:
Lines 1071 and 1072 in lib/misc.functions.php (cmsms v. 1.3.1) look like this:

include(dirname(__FILE__) . '/replacement.php');
$alias = str_replace($toreplace, $replacement, $alias);

They replace certain characters that aren't allowed in URL's with their ASCII
equivalents. However, this mechanism doesn't really take into account all
possible characters (in fact, it only checks a small subset of unicode). For
example, it ignores curly quotes, and if these happen to be used in News titles,
you get an invalid RSS feed in result.

My suggestion is to use iconv() here, like this:

$alias = iconv('UTF-8', 'ASCII//TRANSLIT', $alias);

it effectively replaces all accented latin characters and symbols like curly
quotes and longer dashes with their ASCII equivalents.

However, it seems like iconv() doesn't know how to translit Cyrillic, Greek and,
most probably, other non-latin characters, and it stops processing the string
immediately after first such character occurs. Thus, str_replace and
replacement.php is still probably needed, but it should focus on non-latin
characters.


History

Comments
avatar
Date: 2011-01-07 13:46
Posted By: Ronny Krijt (ronnyk)

This will be fixed when utf-8 URLs are handled.
      
Updates

Updated: 2011-01-07 13:46
resolution_id: 6 => 8
severity_id: => 12
state: Open => Closed