Maarten,
I performed some [reusable] fixes on the dump which are visible at the tool - along with a lazy character encoding problem that I should fix should this approach advance.
However only Vicenç Riullop replied and he seems to lack technical ability [that was my understanding], he was trying to gather more tech people into this. I can't carry on this by myself as we're getting busier for the WLM kickoff.
In the meantime, is there any method in the API to export sublists in Wiki format? Or maybe converting to CSV to use a tool I saw somewhere...?
-NT
On 28-08-2011 16:43, Maarten Dammers wrote:
Hi Nuno,
Op 25-8-2011 11:41, Nuno Tavares schreef:
Em 25-08-2011 09:28, Maarten Dammers escreveu:
Of course I'm getting it from the wiki lists. :-) The basis is at http://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Monuments_2011/Structur... The overview of the database is at http://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Monuments_2011/Monument... The source is at https://svn.toolserver.org/svnroot/p_erfgoed/erfgoedbot/
On the input side I could really need some help with:
- Adding more sources to the database
- Tweaking the current sources
- Adding some converters (for example for different coordinates format)
*Sigh* Should you have used PERL, I could expand the tool considerably. It would take me ages to do it in Python...
Only the harvest part is python. Output (api) is all php.
What I was about to propose would be something like this flow:
- grab your database dump
- use data quality tools to apply corrections (it's far easier for me)
- stuff into our tools/plist/ to be able to export them into wiki format
- paste the exported to the wikipedia lists
And then your bot would grab everything fine, I hope.
It will. Did you manage to do this?
On the output side I really need help:
- Need some clean up functions to be implemented
- Option to filter on with or without images
- KML output needs to be prettified
- (more)
I'm updating the database now. I'm afraid the weekly run failed because of some toolserver SQL issues. I will schedule it to run every day now. Did you add the coordinates for Portugal Nuno?
We are in the process of copying the database into the wiki lists (step 4 above), so you should grab them in the following days.
Anyway, I just converted our table to monuments_all (you probably noticed in the last index2.php listing), I can do it for monuments_pt as well if you want an updated version quickly, as this process will take some time to finish...
I changed the bot to run every night so you have a fresh copy every morning.
Maarten
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu