Hi! Please help me.
Hungarian dates are in the form yyyy. mm. dd., or yyyy. <monthname> dd.,
without leading zeros.
In a text environment we use the month names, so I replace numbered months
with named months, and I remove leading zeros from day numbers.
The line in fixes.py is, for January:
(ur'(\d{1,4}(?:\]\])?)\. ?01\. ?(\d\d?)', ur'\1. január \2'),
This is OK, no problem up to this point.
The rule is that the day number has to be followed by a dot, except it is
followed by a hyphen and a suffix.
First level of enhancement is to write a dot if necessary.
- if there is a dot there, don't remove it anyway (a hyphen is often used
erroneously, and I don't want to make a bigger problem)
- if there is no dot, but the day is followed by a hyphen, don't put a
dot
- if there is anything but a dot or a hyphen after the day number, put a
dot after the number
I made some experiments with (?(id/name)yes-pattern|no-pattern) syntax (
http://docs.python.org/py3k/library/re.html), but with no valuable result.
Can you help me? There will be further levels if this task is solved because
users are very creative in making errors.
Further problems are:
- Hyphen is often used instead of ndash when describing an interval of
two dates. In this case a dot and a space is required between the day number
and the ndash. I don't want to correct this in this session or this fix if
it is too difficult, but the dot should not be removed in this case (either
they write a space or not).
- Sometimes hyphen with no dot is correct, but there is an extra space
that should be removed. This can be recognized by means of writing a limited
set of suffixes after the hyphen in the regex.
- Sometimes there is a word after the day but space is omitted and should
be supplied.
--
Bináris