mm wrote:
Thereis a place where I can read about relation between cur_title and last part of URL. I am wrting some script and can't find some rows in cur using a query like: SELECT cur_id, cur_title from cur WHERE cur_title BETWEEN 'title1' AND 'title1' for some title1 from URL like:
Ok, several things: 1) URLs may be URL-encoded. Titles stored in the database are not. 2) Unless a title actually has a backslash in it (like ') it will not have a backslash in it. Apostrophes are just plain apostrophes. (But remember to encode your SQL queries -- SQL is *not* text, it's a protocol.) 3) page_namespace and page_title are always used in a pair, never page_title alone. (It's cur_namespace and cur_title, and old_namespace and old_title, in 1.4 and earlier.) page_namespace is an enumerated field; see Defines.php for the list of key numbers, and the various Language*.php files for the localized names of each. 4) If you're looking at an old SQL dump for the English Wikipedia or a handful of other languages, the data is encoded in ISO 8859-1 (or in places the nonstandard superset Windows-1252). In current databases all data is encoded in UTF-8. So if using an old database as a data source you may need to convert.
See Title::getLocalUrl() etc if you want a closer look at how the URLs are generated.
-- brion vibber (brion @ pobox.com)