Is there a utility that will dump an article or list of articles to HTML or PDF at the command line from a locked-down Wiki (i.e., login required to view)? The pages in question are mostly self-contained and heavily utilize tables and other formatting.
I found http://en.wikipedia.org/wiki/Wikipedia:Database_download and http://meta.wikimedia.org/wiki/Data_dumps, but they seem to be more geared to dumping the whole wiki and not individual pages.
Any pointers?
Dan
On 12/05/06, Dan Davis hokie99cpe+wiki@gmail.com wrote:
Is there a utility that will dump an article or list of articles to HTML or PDF at the command line from a locked-down Wiki (i.e., login required to view)? The pages in question are mostly self-contained and heavily utilize tables and other formatting.
I found http://en.wikipedia.org/wiki/Wikipedia:Database_download and http://meta.wikimedia.org/wiki/Data_dumps, but they seem to be more geared to dumping the whole wiki and not individual pages.
The dumpHTML maintenance script allows you to specify a start and end page identifier which could be used to dump a single page or a number of pages with consecutive identifiers.
Rob Church
On 5/12/06, Rob Church robchur@gmail.com wrote:
On 12/05/06, Dan Davis hokie99cpe+wiki@gmail.com wrote:
Is there a utility that will dump an article or list of articles to HTML or PDF at the command line from a locked-down Wiki (i.e., login required to view)? The pages in question are mostly self-contained and heavily utilize tables and other formatting.
The dumpHTML maintenance script allows you to specify a start and end page identifier which could be used to dump a single page or a number of pages with consecutive identifiers.
No way to do this by title? Only by page ID? Is it possible for this to work with pages that require Login? The output is giving me a page that says I must login to view the page.
Is it possible dump the article text only without headers, footers and sidebars but with the normal formatting parsed correctly? I want to have an automated process that will do wonderful thinks with the article text without the navigational features, etc.
Dan
Moin,
On Tuesday 16 May 2006 14:59, Dan Davis wrote:
On 5/12/06, Rob Church robchur@gmail.com wrote:
On 12/05/06, Dan Davis hokie99cpe+wiki@gmail.com wrote:
Is there a utility that will dump an article or list of articles to HTML or PDF at the command line from a locked-down Wiki (i.e., login required to view)? The pages in question are mostly self-contained and heavily utilize tables and other formatting.
The dumpHTML maintenance script allows you to specify a start and end page identifier which could be used to dump a single page or a number of pages with consecutive identifiers.
No way to do this by title? Only by page ID? Is it possible for this to work with pages that require Login? The output is giving me a page that says I must login to view the page.
Is it possible dump the article text only without headers, footers and sidebars but with the normal formatting parsed correctly? I want to have an automated process that will do wonderful thinks with the article text without the navigational features, etc.
Have you looked at wiki2xml - see
http://bloodgate.com/wiki/index.php?title=Special:Wiki2XML
for example.
Best wishes,
Tels
mediawiki-l@lists.wikimedia.org