Hi! Please help me.
Hungarian dates are in the form yyyy. mm. dd., or yyyy. <monthname> dd., without leading zeros. In a text environment we use the month names, so I replace numbered months with named months, and I remove leading zeros from day numbers. The line in fixes.py is, for January: (ur'(\d{1,4}(?:]])?). ?01. ?(\d\d?)', ur'\1. január \2'), This is OK, no problem up to this point.
The rule is that the day number has to be followed by a dot, except it is followed by a hyphen and a suffix. First level of enhancement is to write a dot if necessary.
- if there is a dot there, don't remove it anyway (a hyphen is often used erroneously, and I don't want to make a bigger problem) - if there is no dot, but the day is followed by a hyphen, don't put a dot - if there is anything but a dot or a hyphen after the day number, put a dot after the number
I made some experiments with (?(id/name)yes-pattern|no-pattern) syntax ( http://docs.python.org/py3k/library/re.html), but with no valuable result. Can you help me? There will be further levels if this task is solved because users are very creative in making errors. Further problems are:
- Hyphen is often used instead of ndash when describing an interval of two dates. In this case a dot and a space is required between the day number and the ndash. I don't want to correct this in this session or this fix if it is too difficult, but the dot should not be removed in this case (either they write a space or not). - Sometimes hyphen with no dot is correct, but there is an extra space that should be removed. This can be recognized by means of writing a limited set of suffixes after the hyphen in the regex. - Sometimes there is a word after the day but space is omitted and should be supplied.