Hi,
can someone come up with a Perl-like regular expression for wiki tables? I just want to throw away all wiki tables in a given string (e.g. a page).
Haven't been successfull so far..
Thanks a lot, matsch
On the wiki-text side or the html side?
If you told us the problem there might even be a better solution.
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Matthias Korn Sent: Tuesday, August 04, 2009 4:42 AM To: mediawiki-l@lists.wikimedia.org Subject: [Mediawiki-l] REGEXP for wiki tables
Hi,
can someone come up with a Perl-like regular expression for wiki tables? I just want to throw away all wiki tables in a given string (e.g. a page).
Haven't been successfull so far..
Thanks a lot, matsch
Hi Courtney,
thanks for your reply!
There is actually no problem. I was trying to feed the new pages feed into another application and wanted to get rid of the tables in there, because they where not needed. So I am talking wiki syntax tables.
I have now settled with "{[^}]+}" which is far from perfect, because it also matches (partially) on templates. But that's ok.
The ultimate goal was to run new pages through a twitterfeed and into our twitter.. ;-) Twitterfeed only grabs the title and the first couple of words (until the 140 character limit obviously). Currently I am using Yahoo Pipes to clean up the mess that is wiki syntax... ;-)
Another idea, which comes to my mind just now, would be to have the new pages show the rendered html page rather than the wiki source. Possible? Is there a switch?
Thanks, matsch
Am Wed, 05 Aug 2009 08:56:27 -0400 schrieb "Christensen, Courtney" ChristensenC@BATTELLE.ORG:
On the wiki-text side or the html side?
If you told us the problem there might even be a better solution.
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Matthias Korn Sent: Tuesday, August 04, 2009 4:42 AM To: mediawiki-l@lists.wikimedia.org Subject: [Mediawiki-l] REGEXP for wiki tables
Hi,
can someone come up with a Perl-like regular expression for wiki tables? I just want to throw away all wiki tables in a given string (e.g. a page).
Haven't been successfull so far..
Thanks a lot, matsch
-----Original Message----- From: Matthias Korn
Another idea, which comes to my mind just now, would be to have the new pages show the rendered html page rather than the wiki source. Possible? Is there a switch?
Thanks, matsch
_________________________________________________
There are a few ways to do this I think. Do you know that there is an RSS feed available from your wiki pages? I bet you could pass the RSS feed for the recent changes page to twitterfeed or something like that.
You could definitely write a new extension that made a cleaned RSS feed of new pages on your wiki available. (If one hasn't already been written.) That would be useful to anyone who wanted to subscribe with an aggregator and not just follow you on twitter.
Good luck, Courtney
Hi Courtney,
I was already using the New Pages RSS feed: http://wiki.rockinchina.com/index.php?title=Special:NewPages&feed=rss&am...
But I am looking into http://www.mediawiki.org/wiki/Extension:News, which we already use but without generating RSS feeds. Maybe this is a better idea...
Thanks, matsch
Am Wed, 05 Aug 2009 11:29:21 -0400 schrieb "Christensen, Courtney" ChristensenC@BATTELLE.ORG:
-----Original Message----- From: Matthias Korn
Another idea, which comes to my mind just now, would be to have the new pages show the rendered html page rather than the wiki source. Possible? Is there a switch?
Thanks, matsch
There are a few ways to do this I think. Do you know that there is an RSS feed available from your wiki pages? I bet you could pass the RSS feed for the recent changes page to twitterfeed or something like that.
You could definitely write a new extension that made a cleaned RSS feed of new pages on your wiki available. (If one hasn't already been written.) That would be useful to anyone who wanted to subscribe with an aggregator and not just follow you on twitter.
Good luck, Courtney
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 2009-08-05 Matthias Korn wrote:
I have now settled with "{[^}]+}" which is far from perfect, because it also matches (partially) on templates. But that's ok.
s/{|.+?|}//gs would do a lot better, since wikitables start in "{|" and end in "|}" and would not touch templates.
/BP
Benct Philip Jonsson wrote:
On 2009-08-05 Matthias Korn wrote:
I have now settled with "{[^}]+}" which is far from perfect, because it also matches (partially) on templates. But that's ok.
s/{|.+?|}//gs would do a lot better, since wikitables start in "{|" and end in "|}" and would not touch templates.
/BP
But that would sometimes grab more than just one table. -Courtney
Christensen, Courtney skrev:
Benct Philip Jonsson wrote:
On 2009-08-05 Matthias Korn wrote:
I have now settled with "{[^}]+}" which is far from perfect, because it also matches (partially) on templates. But that's ok.
s/{|.+?|}//gs would do a lot better, since wikitables start in "{|" and end in "|}" and would not touch templates.
/BP
But that would sometimes grab more than just one table. -Courtney
Not with the lazy quantifier +?: /.+?|}/ means "as few chars as possible up to an "|}"; thus the match will stop at the first "|}", and thus at most one table can participate in each match. Contrast /.+|}/ without the ? which will match everything up to the last "|}" in the string/file. (Ignoring for the sake of uncluttering the /s modifier necessary to have .+? or .+ match spans including newlines!)
/BP
I believe that making the match for } non-greedy as below would match just the one table, but I've not tried it.
s/{|.+?|?}//gs
/Sam
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Christensen, Courtney Sent: 06 August 2009 21:05 To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] REGEXP for wiki tables
Benct Philip Jonsson wrote:
On 2009-08-05 Matthias Korn wrote:
I have now settled with "{[^}]+}" which is far from perfect, because it also matches (partially) on templates. But that's ok.
s/{|.+?|}//gs would do a lot better, since wikitables start in "{|"
and end in "|}" and would not touch templates.
/BP
But that would sometimes grab more than just one table. -Courtney
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Sam.Sexton@thomsonreuters.com skrev:
I believe that making the match for } non-greedy as below would match just the one table, but I've not tried it.
s/{|.+?|?}//gs
/Sam
I take it you mean s/{|.+?|}?//gs . That would not work because it would make the match stop at the first "|" inside the table, which would probably be part of a caption marker "|+" or a row marker "|-", or a cell marker "|".
The substitution s/{|.+?|}//gs (and no other AFAICT) will delete all tables. To be fool-proof against parser function and template syntax like "{{{1|}}}" it should probably be amended to
s/(?<!{){|.+?|}(?!})//gs
which makes sure that the opening/closing brace isn't preceded/followed by another brace. Incidentally and correctly any /{{+|/ or /|}}+/ *inside* a table will be included in the match and deleted.
/BP
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Christensen, Courtney Sent: 06 August 2009 21:05 To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] REGEXP for wiki tables
Benct Philip Jonsson wrote:
On 2009-08-05 Matthias Korn wrote:
I have now settled with "{[^}]+}" which is far from perfect, because it also matches (partially) on templates. But that's ok.
s/{|.+?|}//gs would do a lot better, since wikitables start in "{|"
and end in "|}" and would not touch templates.
/BP
But that would sometimes grab more than just one table. -Courtney
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Hi,
I don't have anything to add to the REGEXP discussion in itself, but does the OP also have to take into account that the table-start and table-end can be provided by templates, as can the table contents?
In that case, the initial page wiki text only contains template calls, but after template expansion it will have table syntax.
Hi,
thanks for all the suggestions. I settled with BP's last solution. Seems to work fine. Templates are not an issue for me atm.
Best, matsch
Am Fri, 7 Aug 2009 14:00:13 +0200 schrieb Jean-Marc van Leerdam j.m.van.leerdam@gmail.com:
Hi,
I don't have anything to add to the REGEXP discussion in itself, but does the OP also have to take into account that the table-start and table-end can be provided by templates, as can the table contents?
In that case, the initial page wiki text only contains template calls, but after template expansion it will have table syntax.
mediawiki-l@lists.wikimedia.org