Hi, dear colleagues!
We need some technical help.
I've created a Template *Шаблон:WLM-рядок* http://uk.wikipedia.org/wiki/%D0%A8%D0%B0%D0%B1%D0%BB%D0%BE%D0%BD:WLM-%D1%80%D1%8F%D0%B4%D0%BE%D0%BAnearly the same as polish *Wikiprojekt:Wiki Lubi Zabytki/wykazy/wiersz *and now I have 2 actual questions for continuing work.
1. How could I transform in automatic mode a list from a standard wiki form to templated? For example herehttp://uk.wikipedia.org/wiki/%D0%92%D1%96%D0%BA%D1%96%D0%BF%D0%B5%D0%B4%D1%96%D1%8F:%D0%92%D1%96%D0%BA%D1%96_%D0%BB%D1%8E%D0%B1%D0%B8%D1%82%D1%8C_%D0%BF%D0%B0%D0%BC%27%D1%8F%D1%82%D0%BA%D0%B8/%D0%92%D1%96%D0%BD%D0%BD%D0%B8%D1%86%D1%8C%D0%BA%D0%B0_%D0%BE%D0%B1%D0%BB%D0%B0%D1%81%D1%82%D1%8C/%D0%96%D0%BC%D0%B5%D1%80%D0%B8%D0%BD%D0%BA%D0%B0is a table where I've templated only first 2 rows but the others have not yet. Of course whole the list must be templated but doing it by hands is too dull work.
2. How could I make in automatic mode some more complex transformations in case if I have a list with more parameters? For example herehttp://uk.wikipedia.org/wiki/%D0%9F%D0%B0%D0%BC%27%D1%8F%D1%82%D0%BA%D0%B8_%D1%96%D1%81%D1%82%D0%BE%D1%80%D1%96%D1%97_%D0%91%D0%B0%D1%80%D1%81%D1%8C%D0%BA%D0%BE%D0%B3%D0%BE_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D1%83 we have 10 parametres, and only 4 of them are needed for WLM-lists and 3 are missing. So this list also need transformation, and also doing it by hands is too dull work.
The problem is that hundreds of tables need such transformations. For example only for Vinnica districthttp://uk.wikipedia.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D1%96%D1%8F:%D0%9F%D0%B5%D1%80%D0%B5%D0%BB%D1%96%D0%BA%D0%B8_%D0%BF%D0%B0%D0%BC%27%D1%8F%D1%82%D0%BE%D0%BA_%D0%92%D1%96%D0%BD%D0%BD%D0%B8%D1%86%D1%8C%D0%BA%D0%BE%D1%97_%D0%BE%D0%B1%D0%BB%D0%B0%D1%81%D1%82%D1%96we have 27 lists of historical monuments, 28 - architect, 30 - archaeology, 1 great list - memorials, and additionaly 5 are monuments in cities. All of them are to be transformed. So I need a script which could perform it.
-- Regards, Andrij
Hi Андрій, just a few remarks from best practice, (I'm sure one of our technical people will offer a more detailed approach here...)
Am 28.06.12 10:04, schrieb Андрій Бондаренко:
- How could I transform in automatic mode a list from a standard wiki
form to templated? For example here http://uk.wikipedia.org/wiki/%D0%92%D1%96%D0%BA%D1%96%D0%BF%D0%B5%D0%B4%D1%96%D1%8F:%D0%92%D1%96%D0%BA%D1%96_%D0%BB%D1%8E%D0%B1%D0%B8%D1%82%D1%8C_%D0%BF%D0%B0%D0%BC%27%D1%8F%D1%82%D0%BA%D0%B8/%D0%92%D1%96%D0%BD%D0%BD%D0%B8%D1%86%D1%8C%D0%BA%D0%B0_%D0%BE%D0%B1%D0%BB%D0%B0%D1%81%D1%82%D1%8C/%D0%96%D0%BC%D0%B5%D1%80%D0%B8%D0%BD%D0%BA%D0%B0 is a table where I've templated only first 2 rows but the others have not yet. Of course whole the list must be templated but doing it by hands is too dull work.
I saw a bot converting with the help of regular expressions (if the tables are in a more or less clear structure), but it's much easier to convert from csv (Calc/Excel) files or from a database. Did you edit them a lot after putting the lists into Wiki tables?
- How could I make in automatic mode some more complex transformations
in case if I have a list with more parameters? For example here http://uk.wikipedia.org/wiki/%D0%9F%D0%B0%D0%BC%27%D1%8F%D1%82%D0%BA%D0%B8_%D1%96%D1%81%D1%82%D0%BE%D1%80%D1%96%D1%97_%D0%91%D0%B0%D1%80%D1%81%D1%8C%D0%BA%D0%BE%D0%B3%D0%BE_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D1%83 we have 10 parametres, and only 4 of them are needed for WLM-lists and 3 are missing. So this list also need transformation, and also doing it by hands is too dull work.
You should definitely add the most common of your Ukrainian parameters to your template - parameters can be left empty if not needed, or filled later. However it's much easier to put parameters in in the beginning than adding them later.
I'll be in the Ukraine on vacation from mid-July, if I can be of any help (other than taking images of remote monuments ;-)), just tell me.
Regards, Elke
Hi, Elke
2012/6/29 elya ew_wp@web.de
Hi Андрій, just a few remarks from best practice, (I'm sure one of our technical people will offer a more detailed approach here...)
Am 28.06.12 10:04, schrieb Андрій Бондаренко:
- How could I transform in automatic mode a list from a standard wiki
form to templated? For example here <
http://uk.wikipedia.org/wiki/%D0%92%D1%96%D0%BA%D1%96%D0%BF%D0%B5%D0%B4%D1%9...
is a table where I've templated only first 2 rows but the others have not yet. Of course whole the list must be templated but doing it by hands is too dull work.
I saw a bot converting with the help of regular expressions (if the tables are in a more or less clear structure), but it's much easier to convert from csv (Calc/Excel) files or from a database. Did you edit them a lot after putting the lists into Wiki tables?
Some of them - yes, some of them - not yet. It's much easier to work with lists in a wiki-table form, they look more clear.
You should definitely add the most common of your Ukrainian parameters to your template - parameters can be left empty if not needed, or filled later. However it's much easier to put parameters in in the beginning than adding them later.
Of course. Our parametres are next
| ID = (special ID for WLM) | назва = (name of the object) | рік = (year of its creation) | нас_пункт = (city or village) | адреса = (adress) | координати = (coordinates) | охоронний номер = (official number) | тип = (type of monument) | фото = <br /> (photo) | галерея = (link to commons gallery)
Difficulty is that lists which we receive from our goverment have differents structures. They don't have such parametres as coordinates or photo, but sometimes have additional parametres as author of a monument, it's height, protected area and so on. Using list-to-template transformation must be easy for standardizing lists.
I think the script must add names of parametres in such way:
|- |
it transforme to }} {{WLM-рядок|
- first " || " it transformes to " |ID " = - second " || " it transformes to "|назва =" - third " || " it transformes "| рік ="
and so on...
I'll be in the Ukraine on vacation from mid-July, if I can be of any help (other than taking images of remote monuments ;-)), just tell me.
OK, welcome! If you'll attend after wikimania we could organize a wikimeetup.
-- Андрій Бондаренко
Hi Andrey, I don't have the time to help you with your specific problem, but I may be able to give you some hints. For example, your template has an interwiki link to the "Rijksmonument" template on the English Wikipedia, but this *should* be an interwiki link to the English translation of your Russian table row template. To help you I can show you what I did for the Belgian Wallonia lists. Belgium is tri-lingual (Dutch/German/French) so English seemed to be a logical place to put the lists where no one would object, since almost none of the lists are in all three languages, but are either in Dutch/French or Dutch/German or German/French. I made a Wallonia set of templates and starting translating the Dutch lists with Excel. I am no where near done so this is a work in progress, but you at least can follow what is happening on all language Wikipedias with the same data.
By the way, this is hand work. There is no way you can easily update and synchronize lists across all Wikipedias. Here is a link to the English template for header: http://en.wikipedia.org/wiki/Template:Table_header_Wallonia Here is a link to the English template for row: http://en.wikipedia.org/wiki/Template:Table_row_Wallonia
Unlike you, I had a structured list to begin with from the Dutch Wikipedia, and all I needed to do was add the English translation. With an excel macro, I first dumped the Dutch list into Excel and created an extra field for the English translation in the description parameter of the Table row template with this line: Selection.Replace What:="descr_nl", Replacement:="descr_en=house|descr_nl" I then translated the other fields as follows: What:="adres", Replacement:="address" What:="bouwjaar", Replacement:="date" What:="deelgemeente", Replacement:="town" What:="gemeente", Replacement:="section"
Since most of the monuments are houses, the default "house" is now already in there, but to get the complete Dutch descriptions translated, I first extracted the Dutch descriptions, numbered them, and dumped them by numbered line into Google translate. The result was fairly usable but needs editting line by line.
Then after I was done with the translations, I did a replace line by line with the replace formula: =REPLACE(A5;65;5;VLOOKUP(B5;'rows nl-en'!A:B;2;FALSE)) Sorry if this is too technical, I know there are a lot of Excel users out there who can't read this because they have a native version with different commands, but at least it gives the gist. I am sure you can easily do this with a Python script as well. good luck, Jane
2012/6/29 Андрій Бондаренко a1@wikimediaukraine.org.ua
Hi, Elke
2012/6/29 elya ew_wp@web.de
Hi Андрій, just a few remarks from best practice, (I'm sure one of our technical people will offer a more detailed approach here...)
Am 28.06.12 10:04, schrieb Андрій Бондаренко:
- How could I transform in automatic mode a list from a standard wiki
form to templated? For example here <
http://uk.wikipedia.org/wiki/%D0%92%D1%96%D0%BA%D1%96%D0%BF%D0%B5%D0%B4%D1%9...
is a table where I've templated only first 2 rows but the others have not yet. Of course whole the list must be templated but doing it by hands is too dull work.
I saw a bot converting with the help of regular expressions (if the tables are in a more or less clear structure), but it's much easier to convert from csv (Calc/Excel) files or from a database. Did you edit them a lot after putting the lists into Wiki tables?
Some of them - yes, some of them - not yet. It's much easier to work with lists in a wiki-table form, they look more clear.
You should definitely add the most common of your Ukrainian parameters to your template - parameters can be left empty if not needed, or filled later. However it's much easier to put parameters in in the beginning than adding them later.
Of course. Our parametres are next
| ID = (special ID for WLM) | назва = (name of the object) | рік = (year of its creation) | нас_пункт = (city or village) | адреса = (adress) | координати = (coordinates) | охоронний номер = (official number) | тип = (type of monument) | фото = <br /> (photo) | галерея = (link to commons gallery)
Difficulty is that lists which we receive from our goverment have differents structures. They don't have such parametres as coordinates or photo, but sometimes have additional parametres as author of a monument, it's height, protected area and so on. Using list-to-template transformation must be easy for standardizing lists.
I think the script must add names of parametres in such way:
|- |
it transforme to }} {{WLM-рядок|
- first " || " it transformes to " |ID " =
- second " || " it transformes to "|назва ="
- third " || " it transformes "| рік ="
and so on...
I'll be in the Ukraine on vacation from mid-July, if I can be of any help (other than taking images of remote monuments ;-)), just tell me.
OK, welcome! If you'll attend after wikimania we could organize a wikimeetup.
-- Андрій Бондаренко
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Hi, Jane
Thank you for your reply, but my issue is not about translation. The question of translation is not actual at all - we work with Ukrainian language lists only. The issue is a conversion wiki to template format.
-- Regards, Andrij
Andrij, I was afraid you would say that! My point was to show how you can merge fields in one row in a list. This is assuming of course that your list is structured by line! The idea is that you first need to define all required fields (and potentially required fields) in the row template, and then you can merge two or more fields into one table row parameter. Similarly, you can add or leave fields out of the wikitable by adjusting the table header fields.
If your lists are just 100% prose then of course it all remains quite a lot of work for one person and you will need the assistance of LOTS of volunteers,
best, Jane
2012/6/29 Андрій Бондаренко bondareandre@gmail.com
Hi, Jane
Thank you for your reply, but my issue is not about translation. The question of translation is not actual at all - we work with Ukrainian language lists only. The issue is a conversion wiki to template format.
-- Regards, Andrij
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Hi, Jane.
The idea is that you first need to define all required fields (and
potentially required fields) in the row template, and then you can merge two or more fields into one table row parameter.
Of course you are right. That's exactly the reason why I wrote this letter * after* the row template was created and required fields was defined :))
Now the only question is to create a program which will change markers " || " to "|*parameter name* = "
For example:
Now we have
|- | a1 || b1 || c1 || d1 |- | a2 || b2 || c2 || d2
We must have after conversion:
{{WLM-рядок| *parameter name1 = *a1* | **parameter name2 = *b1* | **parameter name3 = *c1* | parameter name4 = *d1}}* *{{WLM-рядок| *parameter name1 = *a2* | **parameter name2 = *b2* | **parameter name3 = *c2* | parameter name4 = *d2}}* * * *Preparing of this conversion in automatic mode is the issue of my request. * *
-- Regards, Andrij
Andrij, Again, I am a heavy Excel user, so I can only tell you how to do this in Excel. To do what you require, I would first dump the list into Notepad and then reimport it to Excel with the delimeter "|". This gives you columns for each field (including double columns for the "|", but that doesn't matter).
Next I would replace the text "|" in the appropriate columns with the "|*parameter name* = " text and whn I am done, dump it back into notepad for further editting of the headers and footers etc.
Once you have done one list, you can create an Excel macro to do this for all other lists.
Jane
2012/6/30 Андрій Бондаренко bondareandre@gmail.com
Hi, Jane.
The idea is that you first need to define all required fields (and
potentially required fields) in the row template, and then you can merge two or more fields into one table row parameter.
Of course you are right. That's exactly the reason why I wrote this letter *after* the row template was created and required fields was defined :))
Now the only question is to create a program which will change markers " || " to "|*parameter name* = "
For example:
Now we have
|- | a1 || b1 || c1 || d1 |- | a2 || b2 || c2 || d2
We must have after conversion:
{{WLM-рядок| *parameter name1 = *a1* | **parameter name2 = *b1* | **parameter name3 = *c1* | parameter name4 = *d1}}* *{{WLM-рядок| *parameter name1 = *a2* | **parameter name2 = *b2* | **parameter name3 = *c2* | parameter name4 = *d2}}*
*Preparing of this conversion in automatic mode is the issue of my request. *
-- Regards, Andrij
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Good morning Jane,
Am 01.07.12 10:35, schrieb Jane Darnell:
Again, I am a heavy Excel user, so I can only tell you how to do this in Excel. To do what you require, I would first dump the list into Notepad and then reimport it to Excel with the delimeter "|". This gives you columns for each field (including double columns for the "|", but that doesn't matter).
Next I would replace the text "|" in the appropriate columns with the "|/parameter name/ = " text and whn I am done, dump it back into notepad for further editting of the headers and footers etc.
Once you have done one list, you can create an Excel macro to do this for all other lists.
Did you create one? I'd like to see and learn, did much of the above steps in the same way last year manually...
Elke
Hi everybody,
We have found quite a simple decision in the page http://ua.wikimedia.org/wiki/%D0%9E%D0%B1%D0%B3%D0%BE%D0%B2%D0%BE%D1%80%D0%B...
Thanks everybody for your replies, especially interesting was Maarten's idea (though it does not work, but I believe must).
-- Andrij
Andrij, Great news! It never occurred to me to use AWB for this, so I will try it out soon. Thanks for the feedback.
Elke, My Excel template is for translating the descriptions of Belgian Wallonia WLM lists. I will try to pare it down for one simple table example, such as for this (non-WLM, but same idea) one: http://en.wikipedia.org/wiki/Netherlandish_Proverbs When I have it done I will send it to you by mail and see whether you can follow my Excal tabs!
So basic steps are 1) Convert target list to numbered Excel lines 2) Using Excel template (consists of template table rows with extra column for numbered lines), extract the description text to be translated into extra tab with numbered lines 3) Dump the numbered lines into Google translate 4) Copy result into the extraction page and line up the numbers (and check the translations) 5) add, not replace, the translated descriptions line by line into the table rows (original description from cultural heritage agency is needed for proper attribution) 6) Adjust header/footers
Jane 2012/7/2 Андрій Бондаренко bondareandre@gmail.com
Hi everybody,
We have found quite a simple decision in the page http://ua.wikimedia.org/wiki/%D0%9E%D0%B1%D0%B3%D0%BE%D0%B2%D0%BE%D1%80%D0%B...
Thanks everybody for your replies, especially interesting was Maarten's idea (though it does not work, but I believe must).
-- Andrij
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Hi Andrij,
Op 2-7-2012 9:41, Андрій Бондаренко schreef:
Hi everybody,
We have found quite a simple decision in the page http://ua.wikimedia.org/wiki/%D0%9E%D0%B1%D0%B3%D0%BE%D0%B2%D0%BE%D1%80%D0%B...
Thanks everybody for your replies, especially interesting was Maarten's idea (though it does not work, but I believe must).
That's not a solution. Unnamed variables won't work. This is just a workaround. Please just do it properly this time.
Maarten
Am 28.06.12 10:04, schrieb Андрій Бондаренко:
We need some technical help.
One more thing:
Last year a WLM fellow recommended this tool for cleaning up "messy" data:
http://code.google.com/p/google-refine/
Could be a help too, though I never checked it personally.
regards elke
It seems to be interesting but how could it resolve the main issue - wikitable-to-template convertion?
2012/6/29 elya ew_wp@web.de
Am 28.06.12 10:04, schrieb Андрій Бондаренко:
We need some technical help.
One more thing:
Last year a WLM fellow recommended this tool for cleaning up "messy" data:
http://code.google.com/p/google-refine/
Could be a help too, though I never checked it personally.
regards elke
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
-- Andrij
Hi Andrij,
Op 29-6-2012 22:59, Андрій Бондаренко schreef:
It seems to be interesting but how could it resolve the main issue - wikitable-to-template convertion?
You could use pywikipedia. This line should be a good basis:
replace.py -lang:uk -regex "|-[\s\r\n]+|(.+)||(.+)||(.+)||(.+)||(.+)||||(.+)||(.+)||(.+)" "{{Templatename\n|p1=\1\n|p2=\2\n|p3=\3\n|p4=\4\n|p5=\5\n|p6=\6\n|p7=\7\n|p8=\8\n}}" -page:%D0%92%D1%96%D0%BA%D1%96%D0%BF%D0%B5%D0%B4%D1%96%D1%8F:%D0%92%D1%96%D0%BA%D1%96_%D0%BB%D1%8E%D0%B1%D0%B8%D1%82%D1%8C_%D0%BF%D0%B0%D0%BC%27%D1%8F%D1%82%D0%BA%D0%B8/%D0%92%D1%96%D0%BD%D0%BD%D0%B8%D1%86%D1%8C%D0%BA%D0%B0_%D0%BE%D0%B1%D0%BB%D0%B0%D1%81%D1%82%D1%8C/%D0%96%D0%BC%D0%B5%D1%80%D0%B8%D0%BD%D0%BA%D0%B0
The "|-[\s\r\n]+|(.+)||(.+)||(.+)||(.+)||(.+)||||(.+)||(.+)||(.+)" will match the row and it's fields. "{{Templatename\n|p1=\1\n|p2=\2\n|p3=\3\n|p4=\4\n|p5=\5\n|p6=\6\n|p7=\7\n|p8=\8\n}}" can be modified to use your template with the right parameters.
You should probably not include WLM in the template name. WLM is just a project in September, your heritage project should outlast it! Something like "Пам'ятки_рядок" and similar for the header. You seem to be having multiple sources. How is everything structured exactly? We might have to create multiple lists. We did this for example in Denmark. The have 2 registers and we created two sets of templates and lists to be able to properly handle that.
You should have a chat with Elke, she understands the language much better than I do ;-)
Maarten
wikilovesmonuments@lists.wikimedia.org