-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello all!
Just wanted to ask the short question: Do we have a tool able to convert RSS2HTML like e.g. [1] (which needs [2]) installed on the toolserver? Does someone operate such a tool?
Currently I am using [3] to convert RSS2HTML for dewiki. I think it would be a good thing to have a solution that is not dependent on external tools.
[1] http://scott.yang.id.au/2005/05/feed2html/ [2] http://feedparser.org/ [3] http://rss.bloople.net/
Greetings DrTrigon
Dr. Trigon (2011-09-04 22:36):
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello all!
Just wanted to ask the short question: Do we have a tool able to convert RSS2HTML like e.g. [1] (which needs [2]) installed on the toolserver? Does someone operate such a tool?
Currently I am using [3] to convert RSS2HTML for dewiki. I think it would be a good thing to have a solution that is not dependent on external tools.
You can use XSLT or simply any DOM library to transform RSS to HTML. Depends on what you want to do, but any RSS is a very simple XML document.
Regards, Nux.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello all!
Just wanted to ask the short question: Do we have a tool able to convert RSS2HTML like e.g. [1] (which needs [2]) installed on the toolserver? Does someone operate such a tool?
Currently I am using [3] to convert RSS2HTML for dewiki. I think it would be a good thing to have a solution that is not dependent on external tools.
You can use XSLT or simply any DOM library to transform RSS to HTML. Depends on what you want to do, but any RSS is a very simple XML document.
Thanks for your reply!
In fact the question is not "how to do it" but "is someone doing it on the TS already"?
(XSLT sounds intressting and was new to me - but I would prefer python since this is something I know already... ;)
Greetings
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello TS users!
To close the topic [1] I finally decided to follow the hints given by Maciej Jaros and Merlissimo and created (since it seams nobody did this already - please correct me, if I am wrong)
"XSaLT: XSL/XSLT Simple and Lightweight Tool" [2]
Which is a very, very, very simple python cgi script that takes an url (pointing to any XML source document) and an XSLT stylesheet. Both are passed to lxml to transform the XML to a destination document. Any XSLT stylesheet you might need can be added if you send me a mail.
A first example is "rss2html.xslt" which converts RSS feeds to HTML content, as can be seen in the example [3] (it is specialized to this feed and may give worse results on others).
[1] http://lists.wikimedia.org/pipermail/toolserver-l/2011-September/004375.html [2] https://wiki.toolserver.org/view/~drtrigon/cgi-bin/xsalt.py [3] http://toolserver.org/~drtrigon/cgi-bin/xsalt.py?url=http%3A%2F%2Fblog.wikim...
Thanks for all your help and hints! Greetings DrTrigon
(anonymous) wrote:
To close the topic [1] I finally decided to follow the hints given by Maciej Jaros and Merlissimo and created (since it seams nobody did this already - please correct me, if I am wrong)
"XSaLT: XSL/XSLT Simple and Lightweight Tool" [2]
Which is a very, very, very simple python cgi script that takes an url (pointing to any XML source document) and an XSLT stylesheet. Both are passed to lxml to transform the XML to a destination document. Any XSLT stylesheet you might need can be added if you send me a mail. [...]
Please consider that very, very, very simple scripts typi- cally have very, very, very bad security protections :-). In this case, all files on the toolserver can be checked for existence, if they are XML files and the attacker can depos- it an XSLT file somewhere on the toolserver they can be read and accesses to external URLs can be triggered.
Tim
Hello, At Sunday 11 September 2011 20:49:25 DaB. wrote:
all files on the toolserver can be checked for existence, if they are XML files
disabled for this reason.
@drtrigon: Please fix your script BEFORE you put it back in action.
Sincerly, DaB.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 11.09.2011 20:50, schrieb DaB.:
Hello, At Sunday 11 September 2011 20:49:25 DaB. wrote:
all files on the toolserver can be checked for existence, if they are XML files
disabled for this reason.
@drtrigon: Please fix your script BEFORE you put it back in action.
Sorry for the inconveniences I caused here!
What is exactly the critical point you are mentioning? Do I understand you right and would inserting the code
import os allowed = [item for item in os.listdir('.') if '.xslt' in item] if xslt not in allowed: # return some neutral/blank message (hiding all sentive data)
which just allows access to "my" 'xslt' files in 'cgi-bin' satisfy those needs in security? Or do you have something else in mind? (disabling debug info, moving 'xslt' files to another directory, or even more restrictive, ...?)
Thanks for your feedback and greetings DrTrigon
Dr. Trigon wrote:
Sorry for the inconveniences I caused here!
What is exactly the critical point you are mentioning? Do I understand you right and would inserting the code
import os allowed = [item for item in os.listdir('.') if '.xslt' in item] if xslt not in allowed: # return some neutral/blank message (hiding all sentive data)
which just allows access to "my" 'xslt' files in 'cgi-bin' satisfy those needs in security? Or do you have something else in mind? (disabling debug info, moving 'xslt' files to another directory, or even more restrictive, ...?)
Thanks for your feedback and greetings DrTrigon
I would check that xslt is only composed by alphanumeric characters* and do something like "/home/drtrigon/xslt/" + xslt + ".xslt" (this ensures there's no ../ and doesn't contain \0)
Also, I'm not sure if urllib.open() works with file:// urls, but I'd verify it's a http or https url .
On 11 September 2011 22:59, Platonides platonides@gmail.com wrote:
Also, I'm not sure if urllib.open() works with file:// urls, but I'd verify it's a http or https url .
It even works without. For urllib2, you do need to use file:// urls.
valhallasw@dorthonion:~$ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import urllib, urllib2 urllib.urlopen('/etc/passwd').readlines()
['root:x:0:0:root:/root:/bin/bash\n', (...) ,'usbmux:x:109:46:usbmux daemon,,,:/home/usbmux:/bin/false\n']
urllib2.urlopen('file:///etc/passwd').readlines()
['root:x:0:0:root:/root:/bin/bash\n', (...)
Of course, it all boils back to the old motto 'never trust user input' - and be sure standard libraries are not more general than you think...
(and this is something that might have bitten more of us, including me :-))
Best, Merlijn
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
It even works without. For urllib2, you do need to use file:// urls.
valhallasw@dorthonion:~$ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import urllib, urllib2 urllib.urlopen('/etc/passwd').readlines()
['root:x:0:0:root:/root:/bin/bash\n', (...) ,'usbmux:x:109:46:usbmux daemon,,,:/home/usbmux:/bin/false\n']
urllib2.urlopen('file:///etc/passwd').readlines()
['root:x:0:0:root:/root:/bin/bash\n', (...)
That's nice - never used it - of course in this case it's a pitty... ;)
btw.: ...is the pywikipedia framework's 'getUrl' safe in this sence?
Thanks for all hints so far!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 11.09.2011 22:59, schrieb Platonides:
Dr. Trigon wrote:
import os allowed = [item for item in os.listdir('.') if '.xslt' in item] if xslt not in allowed: # return some neutral/blank message (hiding all sentive data)
I would check that xslt is only composed by alphanumeric characters* and do something like "/home/drtrigon/xslt/" + xslt + ".xslt" (this ensures there's no ../ and doesn't contain \0)
Sorry that answer confuses me; "check that xslt is only composed by alphanumeric characters" is just a second (more paranoid) check to be very sure? Since only xslt from my path are allowed, I would have to put them into this directory and do check them then... The other thing is the content of this xslt will be passed to 'etree.XML' like:
from lxml import etree doc = etree.parse(f) xslt_root = etree.XML( open(xslt).read() )
so why should there be a problem if the xslt would contain binary data (which in fact they would not since I have to upload them... ;)
Also, I'm not sure if urllib.open() works with file:// urls, but I'd verify it's a http or https url .
Am 11.09.2011 23:29, schrieb Merlijn van Deen:
On 11 September 2011 22:59, Platonides <platonides@gmail.com mailto:platonides@gmail.com> wrote:
Also, I'm not sure if urllib.open() works with file:// urls, but I'd verify it's a http or https url .
It even works without. For urllib2, you do need to use file:// urls.
valhallasw@dorthonion:~$ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import urllib, urllib2 urllib.urlopen('/etc/passwd').readlines()
['root:x:0:0:root:/root:/bin/bash\n', (...) ,'usbmux:x:109:46:usbmux daemon,,,:/home/usbmux:/bin/false\n']
urllib2.urlopen('file:///etc/passwd').readlines()
['root:x:0:0:root:/root:/bin/bash\n', (...)
What would be the best / most safe verification? Check for "http" in the beginning of the string? Or is there a good way to prevent urllib from allowing such accesses?
Of course, it all boils back to the old motto 'never trust user input' - and be sure standard libraries are not more general than you think...
I would never ever trust my own input at all... ;)) An can only cite DNA here: "To summarize the summary of the summary: 'People are a problem'"... ;)))
And to be quite honest, the fact of having (python) standard libraries that are more general than I (could ever) think, is one of those things that amaze me every time... :)
(and this is something that might have bitten more of us, including me :-))
(makes me somehow happy not to be the only one... ;)
Greetings
Am 12.09.2011 13:33, schrieb Dr. Trigon:
from lxml import etree doc = etree.parse(f) xslt_root = etree.XML( open(xslt).read() )
so why should there be a problem if the xslt would contain binary data (which in fact they would not since I have to upload them... ;)
I think he reffered to "the path of the xslt file" which should be protected against containing \0 and ither special characters.
What would be the best / most safe verification? Check for "http" in the beginning of the string?
Or prepending http:// if the input doesn't start with http://
Peter
Hello, At Monday 12 September 2011 13:41:09 DaB. wrote:
Sorry that answer confuses me; "check that xslt is only composed by alphanumeric characters" is just a second (more paranoid) check to be very sure?
to prevent something like
"../../dab/text.xml" as parameter with would result in
"/home/drtrigon/xslt/"../../dab/text.xml" which would result to
"/home/dab/text.xml"
Sincerly, DaB.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 12.09.2011 13:43, schrieb DaB.:
to prevent something like
"../../dab/text.xml" as parameter with would result in
"/home/drtrigon/xslt/"../../dab/text.xml" which would result to
"/home/dab/text.xml"
Yes I assumed something similar, BUT python 'open' does not accept "/home/drtrigon/xslt/../../dab/text.xml" as path, it returns an "IOError: [Errno 2] No such file or directory: ..."
My idea was just to create a list of all files I allow (in fact all '.xslt' in the same dir as the script is) and check the given parameter agains this. Consider this list ["atom2html.xslt", "rss2html.xslt"] now if I do a "xslt in ["atom2html.xslt", "rss2html.xslt"]" I would have caught all the possible cases with any combination of "../.." and binary "\0" and else... or am I missing something here...?!? ;)
Thanks for all your patience!
Hello, At Monday 12 September 2011 14:21:05 DaB. wrote:
"/home/drtrigon/xslt/../../dab/text.xml" as path, it returns an "IOError: [Errno 2] No such file or directory: ..."
which is true. There is no text.xml-file in my home. It was just an example.
Sincerly, DaB.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 12.09.2011 14:21, schrieb DaB.:
Hello, At Monday 12 September 2011 14:21:05 DaB. wrote:
"/home/drtrigon/xslt/../../dab/text.xml" as path, it returns an "IOError: [Errno 2] No such file or directory: ..."
which is true. There is no text.xml-file in my home. It was just an example.
Sorry may be I had to point out, that this was just your example. I tried it of course with an existing and accessible file in my home. (in fact I tried it on my local computer, but giving you this path example would not help, since you don't know my local file system tree... ;))
So this "IOError: [Errno 2] No such file or directory: ..." was NOT triggered because of an not existing file, BUT because of the syntax not accepted. I do not want to state that there is no possibility to cheat this way, but the obvious one suggested, does not work in python (I used Python 2.7.1 (r271:86832, Apr 12 2011, 16:15:16) [GCC 4.6.0 20110331 (Red Hat 4.6.0-2)] on linux2)...
On 12 September 2011 16:00, Dr. Trigon dr.trigon@surfeu.ch wrote:
So this "IOError: [Errno 2] No such file or directory: ..." was NOT triggered because of an not existing file, BUT because of the syntax not accepted. I do not want to state that there is no possibility to cheat this way, but the obvious one suggested, does not work in python
Interesting theory, but not true:
valhallasw@nightshade:~$ cat > test.file blah valhallasw@nightshade:~$ python Python 2.7.1 (r271:86832, Jan 4 2011, 13:57:14) [GCC 4.5.2] on sunos5 Type "help", "copyright", "credits" or "license" for more information.
open("/home/valhallasw/src/../test.file").readlines()
['blah\n']
Please always double-check these things in security-related issues.
Best, Merlijn
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 12.09.2011 16:43, schrieb Merlijn van Deen:
On 12 September 2011 16:00, Dr. Trigon <dr.trigon@surfeu.ch mailto:dr.trigon@surfeu.ch> wrote:
So this "IOError: [Errno 2] No such file or directory: ..." was NOT triggered because of an not existing file, BUT because of the syntax not accepted. I do not want to state that there is no possibility to cheat this way, but the obvious one suggested, does not work in python
Interesting theory, but not true:
valhallasw@nightshade:~$ cat > test.file blah valhallasw@nightshade:~$ python Python 2.7.1 (r271:86832, Jan 4 2011, 13:57:14) [GCC 4.5.2] on sunos5 Type "help", "copyright", "credits" or "license" for more information.
open("/home/valhallasw/src/../test.file").readlines()
['blah\n']
May sound curios but I'm happy to read this. Since this is what I also expected. But honestly I tried this - and did it again right now. The result; you are right - but me too! Strange, isn't it... In fact i was fooled because instead of the 'src' directory you use in your example, I used a symlink and '..' does not lead back to the same directory as it was... (my mistake)
Please always double-check these things in security-related issues.
That is exactly the reason why I wrote this mail and bothered you... ;) Thus; thanks a lot for correcting me!
Greetings
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
btw.: ...is the pywikipedia framework's 'getUrl' safe in this sence?
Just for information: no it is not! The following works:
print site.getUrl("file:///etc/passwd", no_hostname = True)
(this could be an issue for other homebrew bots blindly count on the framework... may be... ;)
I would check that xslt is only composed by alphanumeric characters* and do something like "/home/drtrigon/xslt/" + xslt + ".xslt" (this ensures there's no ../ and doesn't contain \0)
I considered this solution, since it sounded to be very easy. BUT the check for alphanum does exclude all files with '-' or '_'. Thus I decided to use my proposal. As far as I can see this does protect from '../' and '\0' in the path of the xslt file also - but please correct me if I am wrong here (and you have a scenario where this breaks down).
Also, I'm not sure if urllib.open() works with file:// urls, but I'd verify it's a http or https url .
Or prepending http:// if the input doesn't start with http://
Looking at the first 4 bytes of the string does not involve any python or implementation specific party.
Obvious solutions are better then magical ones.
So I implemented a list and check the first chars from url string against this list in order to be sure nothing bad goes on here.
The full code (for python-gurus) is given here:
######################################## # security # check url not to point to a local file on the server, e.g. 'file://' s1 = False for item in ['http://', 'https://']: s1 = s1 or (url[:len(item)] == item) # check xslt does point to allowed local files on the server (the # '.xslt' in same directory as script) and not any other, e.g. '../' import os allowed = [item for item in os.listdir('.') if '.xslt' in item] s2 = (xslt in allowed) secure = s1 and s2 ########################################
if secure=False the default starting page will be displayed, as if nothing happened (which is actually the case).
Can somebody (may be DaB) confirm if this is ok? Or still to weak?
Thanks a lot for all your help, hints and participation!! Greetings to all! DrTrigon
Dr. Trigon wrote:
I would check that xslt is only composed by alphanumeric characters* and do something like "/home/drtrigon/xslt/" + xslt + ".xslt" (this ensures there's no ../ and doesn't contain \0)
I considered this solution, since it sounded to be very easy. BUT the check for alphanum does exclude all files with '-' or '_'. Thus I decided to use my proposal.
Heh, you could have added - and _ to the list of allowed characters (that's why I pointed out *what* I wanted to protect from).
As far as I can see this does protect from '../' and '\0' in the path of the xslt file also - but please correct me if I am wrong here (and you have a scenario where this breaks down).
Spelling out the list of allowed values is always safer, but it is bothersome (I see you listed the folder instead).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 14.09.2011 00:08, schrieb Platonides:
Heh, you could have added - and _ to the list of allowed characters (that's why I pointed out *what* I wanted to protect from).
Because you mentioned alphanumeric I thought of using "str.isalnum()" but there no additional chars can be added (as far as I know). Thus I would have to consider regex - but then I was so lazy to use my first idea... ;) But of course you are right! :)
Spelling out the list of allowed values is always safer, but it is bothersome (I see you listed the folder instead).
"is always safer" is good news to me and because "it is bothersome" I chosed a way somewhere in between... :) (or may be because - as mentioned - it was my first idea... ;))
Greetings and thanks for the feedback - was very helpful!
toolserver-l@lists.wikimedia.org