On Sun, Sep 9, 2012 at 6:34 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
I have developed an offline Wikipedia, Wikibooks, Wiktionary, etc. app for the iPhone, which does a somewhat decent job at interpreting the wiki markup into HTML. However, there are too many templates for me to program (not to mention, it's a moving target). Without converting these templates, many articles are simply unreadable and useless.
Templates are dumped just like all other pages are. Have you found them in the dumps? which dump are you looking at right now?
Could you please provide HTML dumps (I mean, with the templates pre-processed into HTML, everything else the same as now) every 3 or 4 months?
3 or 4 month frequency seems unlikely to be useful to many people. Otherwise no comment.
Or alternatively, could you make the template API available so I could import it in my program?
How would this template API function? What does import mean?
-Jeremy
Allow me to reply to each point:
(By the way, my offline app is called WikiGear Offline:) http://itunes.apple.com/us/app/wikigear-offline/id453614487?mt=8
Templates are dumped just like all other pages are...
Yes, but that's only a text description of what the template does. Code must be written to actually process them into HTML. There are tens of thousands of them, and some can't be even programmed by me (e.g., Wiktionary's conjugation templates) If they were already pre-processed into HTML inside the articles' contents, that would solve all of my problems.
what purpose would the dump serve? you dont want to keep the full dump on the device.
I made an indexing program that selects only content articles (namespaces included) and compresses it all to a reasonable size (e.g. about 7gb for the English Wikipedia)
How would this template API function? What does import mean?
By this I mean, a set of functions written in some computer language to which I could send them the template within the wiki markup and receive HTML to display.
Wikipedia does this whenever a page is requested, but I ignore the exact mechanism through which it's performed. Maybe you just need to make that code publicly available, and I'll try to make it work with my application somehow.
2012/9/9 Jeremy Baron jeremy@tuxmachine.com
On Sun, Sep 9, 2012 at 6:34 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
I have developed an offline Wikipedia, Wikibooks, Wiktionary, etc. app
for
the iPhone, which does a somewhat decent job at interpreting the wiki markup into HTML. However, there are too many templates for me to program (not to mention, it's a moving target). Without converting these templates, many articles are simply unreadable
and
useless.
Templates are dumped just like all other pages are. Have you found them in the dumps? which dump are you looking at right now?
Could you please provide HTML dumps (I mean, with the templates pre-processed into HTML, everything else the same as now) every 3 or 4 months?
3 or 4 month frequency seems unlikely to be useful to many people. Otherwise no comment.
Or alternatively, could you make the template API available so I could import it in my program?
How would this template API function? What does import mean?
-Jeremy
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Take a look at http://en.wikipedia.org/w/api.php?action=parse it is exactly what you are looking for. Also a 7GB app is something you want to CLEARLY state as eating up that much device space/ download bandwidth is probably a problem for most users
On Sun, Sep 9, 2012 at 3:07 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
Allow me to reply to each point:
(By the way, my offline app is called WikiGear Offline:) http://itunes.apple.com/us/app/wikigear-offline/id453614487?mt=8
Templates are dumped just like all other pages are...
Yes, but that's only a text description of what the template does. Code must be written to actually process them into HTML. There are tens of thousands of them, and some can't be even programmed by me (e.g., Wiktionary's conjugation templates) If they were already pre-processed into HTML inside the articles' contents, that would solve all of my problems.
what purpose would the dump serve? you dont want to keep the full dump on the device.
I made an indexing program that selects only content articles (namespaces included) and compresses it all to a reasonable size (e.g. about 7gb for the English Wikipedia)
How would this template API function? What does import mean?
By this I mean, a set of functions written in some computer language to which I could send them the template within the wiki markup and receive HTML to display.
Wikipedia does this whenever a page is requested, but I ignore the exact mechanism through which it's performed. Maybe you just need to make that code publicly available, and I'll try to make it work with my application somehow.
2012/9/9 Jeremy Baron jeremy@tuxmachine.com
On Sun, Sep 9, 2012 at 6:34 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
I have developed an offline Wikipedia, Wikibooks, Wiktionary, etc. app
for
the iPhone, which does a somewhat decent job at interpreting the wiki markup into HTML. However, there are too many templates for me to program (not to mention, it's a moving target). Without converting these templates, many articles are simply unreadable
and
useless.
Templates are dumped just like all other pages are. Have you found them in the dumps? which dump are you looking at right now?
Could you please provide HTML dumps (I mean, with the templates pre-processed into HTML, everything else the same as now) every 3 or 4 months?
3 or 4 month frequency seems unlikely to be useful to many people. Otherwise no comment.
Or alternatively, could you make the template API available so I could import it in my program?
How would this template API function? What does import mean?
-Jeremy
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sun, Sep 9, 2012 at 7:07 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
How would this template API function? What does import mean?
By this I mean, a set of functions written in some computer language to which I could send them the template within the wiki markup and receive HTML to display.
Wikipedia does this whenever a page is requested, but I ignore the exact mechanism through which it's performed. Maybe you just need to make that code publicly available, and I'll try to make it work with my application somehow.
See https://gerrit.wikimedia.org/r/gitweb?p=operations%2Fmediawiki-config.git (master branch) for the current configuration (including which extensions are enabled or not for a specific wiki) and https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=shortlog;h=refs... (wmf/1.20wmf10 branch) for the core code and extensions (extensions are in submodules) with versions of each (repo) that are currently deployed. That branch name changes about every 2 weeks.
-Jeremy
Shouldn't you be using ZIM, and aren't dumpHTML and siblings The Right Way to do it? See also http://openzim.org/Build_your_ZIM_file
Nemo
I’m all vote for continuing the HTML wiki dumps that were once done, 2007 was the last? Why are these discontinued? they would be more useful than the so called “XML”.
There is no complete solution to processing dumps, the XML is most certainly not XML in its lowest form, and it IS DEFINITELY a moving target!
Regards,
From: Roberto Flores Sent: Sunday, September 09, 2012 8:07 PM To: Wikimedia developers Cc: Wikipedia Xmldatadumps-l Subject: Re: [Xmldatadumps-l] [Wikitech-l] HTML wikipedia dumps: Could you please provide them, or make public the code for interpreting templates?
Allow me to reply to each point:
(By the way, my offline app is called WikiGear Offline:) http://itunes.apple.com/us/app/wikigear-offline/id453614487?mt=8
Templates are dumped just like all other pages are...
Yes, but that's only a text description of what the template does. Code must be written to actually process them into HTML. There are tens of thousands of them, and some can't be even programmed by me (e.g., Wiktionary's conjugation templates) If they were already pre-processed into HTML inside the articles' contents, that would solve all of my problems.
what purpose would the dump serve? you dont want to keep the full dump on the device.
I made an indexing program that selects only content articles (namespaces included) and compresses it all to a reasonable size (e.g. about 7gb for the English Wikipedia)
How would this template API function? What does import mean?
By this I mean, a set of functions written in some computer language to which I could send them the template within the wiki markup and receive HTML to display.
Wikipedia does this whenever a page is requested, but I ignore the exact mechanism through which it's performed. Maybe you just need to make that code publicly available, and I'll try to make it work with my application somehow.
2012/9/9 Jeremy Baron jeremy@tuxmachine.com
On Sun, Sep 9, 2012 at 6:34 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
I have developed an offline Wikipedia, Wikibooks, Wiktionary, etc. app for the iPhone, which does a somewhat decent job at interpreting the wiki markup into HTML. However, there are too many templates for me to program (not to mention, it's a moving target). Without converting these templates, many articles are simply unreadable and useless.
Templates are dumped just like all other pages are. Have you found them in the dumps? which dump are you looking at right now?
Could you please provide HTML dumps (I mean, with the templates pre-processed into HTML, everything else the same as now) every 3 or 4 months?
3 or 4 month frequency seems unlikely to be useful to many people. Otherwise no comment.
Or alternatively, could you make the template API available so I could import it in my program?
How would this template API function? What does import mean?
-Jeremy
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-------------------------------------------------------------------------------- _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
I also think the HTML dumps would be super useful!
Cheers Pablo On Sep 17, 2012 8:05 PM, "James L" james_leaver@hotmail.com wrote:
I’m all vote for continuing the HTML wiki dumps that were once done, *2007 was the last*? Why are these discontinued? they would be more useful than the so called “XML”.
There is no complete solution to processing dumps, the XML is most certainly not XML in its lowest form, and it IS DEFINITELY a moving target!
Regards,
*From:* Roberto Flores f.roberto.isc@gmail.com *Sent:* Sunday, September 09, 2012 8:07 PM *To:* Wikimedia developers wikitech-l@lists.wikimedia.org *Cc:* Wikipedia Xmldatadumps-l xmldatadumps-l@lists.wikimedia.org *Subject:* Re: [Xmldatadumps-l] [Wikitech-l] HTML wikipedia dumps: Could you please provide them, or make public the code for interpreting templates?
Allow me to reply to each point:
(By the way, my offline app is called WikiGear Offline:) http://itunes.apple.com/us/app/wikigear-offline/id453614487?mt=8
Templates are dumped just like all other pages are...
Yes, but that's only a text description of what the template does. Code must be written to actually process them into HTML. There are tens of thousands of them, and some can't be even programmed by me (e.g., Wiktionary's conjugation templates) If they were already pre-processed into HTML inside the articles' contents, that would solve all of my problems.
what purpose would the dump serve? you dont want to keep the full dump on the device.
I made an indexing program that selects only content articles (namespaces included) and compresses it all to a reasonable size (e.g. about 7gb for the English Wikipedia)
How would this template API function? What does import mean?
By this I mean, a set of functions written in some computer language to which I could send them the template within the wiki markup and receive HTML to display.
Wikipedia does this whenever a page is requested, but I ignore the exact mechanism through which it's performed. Maybe you just need to make that code publicly available, and I'll try to make it work with my application somehow.
2012/9/9 Jeremy Baron jeremy@tuxmachine.com
On Sun, Sep 9, 2012 at 6:34 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
I have developed an offline Wikipedia, Wikibooks, Wiktionary, etc. app
for
the iPhone, which does a somewhat decent job at interpreting the wiki markup into HTML. However, there are too many templates for me to program (not to mention, it's a moving target). Without converting these templates, many articles are simply unreadable
and
useless.
Templates are dumped just like all other pages are. Have you found them in the dumps? which dump are you looking at right now?
Could you please provide HTML dumps (I mean, with the templates pre-processed into HTML, everything else the same as now) every 3 or 4 months?
3 or 4 month frequency seems unlikely to be useful to many people. Otherwise no comment.
Or alternatively, could you make the template API available so I could import it in my program?
How would this template API function? What does import mean?
-Jeremy
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org