Hi! I would like to discuss an idea.
In MediaWiki is not very convenient to docomputingusing the syntax of the wiki. We have to use several extensions like Variables, Arrays, ParserFunctions and others. If there are a lot of computing, such as data processing received from Semantic MediaWiki, the speed of page construction becomes unacceptable. To resolve this issue have to do another extension (eg Semantic Maps displays data from SMW on Maps). Becomes a lot of these extensions, they don't work well with each other and these time-consuming to maintain.
I know about the existence of extension Scribunto, but I think that you can solve this problem by another, more natural way. I suggest using PHP code in wiki pages, in the same way as it is used for html files. In this case, extension can be unificated. For example, get the data from DynamicPageList, if necessary to process, and transmit it to display other extensions, such as Semantic Result Formats.This will give users more freedom for creativity.
In order to execute PHP code safely I decided to try to make a controlled environment. I wrote it in pure PHP, it is lightweight and in future can be included in the core. It can be viewed as an extension Foxway. The first version in branch master. It gives an idea of what it is possible in principle to do and there's even something like a debugger. It does not work very quickly and I decided to try to fix it in a branch develop. There I created two classes, Compiler and Runtime.
The first one processes PHP source code and converts it into a set of instructions that the class Runtime can execute very quickly. I took a part of the code from phpunit tests to check the performance. On my computer, pure PHP executes them on average in 0.0025 seconds, and the class Runtime in 0.05, it is 20 times slower, but also have the opportunity to get even better results. I do not take in the calculation time of class Compiler, because it needs to be used once when saving a wiki page. Data returned from this class is amenable to serialize and it can be stored in the database. Also, if all the dynamic data handle as PHP code, wiki markup can be converted into html when saving and stored in database. Thus, when requesting a wiki page from the server it will be not necessary to build it every time (I know about the cache). Take the already prepared data (for Runtime and html) and enjoy. Cache is certainly necessary, but only for pages with dynamic data, and the lifetime of the objects in it can be greatly reduced since performance will be higher.
I also have other ideas associated with the use of features that provide this realization. I have already made some steps in this direction and I think that all of this is realistic and useful. I'm not saying that foxway ready for use. It shows that this idea can work and can work fast enough. It needs to be rewritten to make it easier to maintain, and I believe that it can work even faster.
I did not invent anything new. We all use the html + php. Wiki markup replaces difficult html and provides security, but what can replace the scripting language?
I would like to know your opinion: is it really useful or I am wasting my time?
Best wishes. Pavel Astakhov (pastakhov).
How does this compare to the PECL runkit extension? Also have you benchmarked it against Scribunto? Because Scribunto does kind of the same exact thing except just with a different programming language (and Scribunto uses a native interpreter rather than one written in PHP).
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science
On Mon, Jan 13, 2014 at 3:58 AM, Pavel Astakhov pastakhov@yandex.ru wrote:
Hi! I would like to discuss an idea.
In MediaWiki is not very convenient to docomputingusing the syntax of the wiki. We have to use several extensions like Variables, Arrays, ParserFunctions and others. If there are a lot of computing, such as data processing received from Semantic MediaWiki, the speed of page construction becomes unacceptable. To resolve this issue have to do another extension (eg Semantic Maps displays data from SMW on Maps). Becomes a lot of these extensions, they don't work well with each other and these time-consuming to maintain.
I know about the existence of extension Scribunto, but I think that you can solve this problem by another, more natural way. I suggest using PHP code in wiki pages, in the same way as it is used for html files. In this case, extension can be unificated. For example, get the data from DynamicPageList, if necessary to process, and transmit it to display other extensions, such as Semantic Result Formats.This will give users more freedom for creativity.
In order to execute PHP code safely I decided to try to make a controlled environment. I wrote it in pure PHP, it is lightweight and in future can be included in the core. It can be viewed as an extension Foxway. The first version in branch master. It gives an idea of what it is possible in principle to do and there's even something like a debugger. It does not work very quickly and I decided to try to fix it in a branch develop. There I created two classes, Compiler and Runtime.
The first one processes PHP source code and converts it into a set of instructions that the class Runtime can execute very quickly. I took a part of the code from phpunit tests to check the performance. On my computer, pure PHP executes them on average in 0.0025 seconds, and the class Runtime in 0.05, it is 20 times slower, but also have the opportunity to get even better results. I do not take in the calculation time of class Compiler, because it needs to be used once when saving a wiki page. Data returned from this class is amenable to serialize and it can be stored in the database. Also, if all the dynamic data handle as PHP code, wiki markup can be converted into html when saving and stored in database. Thus, when requesting a wiki page from the server it will be not necessary to build it every time (I know about the cache). Take the already prepared data (for Runtime and html) and enjoy. Cache is certainly necessary, but only for pages with dynamic data, and the lifetime of the objects in it can be greatly reduced since performance will be higher.
I also have other ideas associated with the use of features that provide this realization. I have already made some steps in this direction and I think that all of this is realistic and useful. I'm not saying that foxway ready for use. It shows that this idea can work and can work fast enough. It needs to be rewritten to make it easier to maintain, and I believe that it can work even faster.
I did not invent anything new. We all use the html + php. Wiki markup replaces difficult html and provides security, but what can replace the scripting language?
I would like to know your opinion: is it really useful or I am wasting my time?
Best wishes. Pavel Astakhov (pastakhov).
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Пн. 13 янв. 2014 12:59:18 пользователь Pavel Astakhov (pastakhov@yandex.ru) написал:
Hi! I would like to discuss an idea. In MediaWiki is not very convenient to docomputingusing the syntax of the wiki. We have to use several extensions like Variables, Arrays, ParserFunctions and others. If there are a lot of computing, such as data processing received from Semantic MediaWiki, the speed of page construction becomes unacceptable. To resolve this issue have to do another extension (eg Semantic Maps displays data from SMW on Maps). Becomes a lot of these extensions, they don't work well with each other and these time-consuming to maintain. I know about the existence of extension Scribunto, but I think that you can solve this problem by another, more natural way. I suggest using PHP code in wiki pages, in the same way as it is used for html files. In this case, extension can be unificated. For example, get the data from DynamicPageList, if necessary to process, and transmit it to display other extensions, such as Semantic Result Formats.This will give users more freedom for creativity. In order to execute PHP code safely I decided to try to make a controlled environment. I wrote it in pure PHP, it is lightweight and in future can be included in the core. It can be viewed as an extension Foxway. The first version in branch master. It gives an idea of what it is possible in principle to do and there's even something like a debugger. It does not work very quickly and I decided to try to fix it in a branch develop. There I created two classes, Compiler and Runtime. The first one processes PHP source code and converts it into a set of instructions that the class Runtime can execute very quickly. I took a part of the code from phpunit tests to check the performance. On my computer, pure PHP executes them on average in 0.0025 seconds, and the class Runtime in 0.05, it is 20 times slower, but also have the opportunity to get even better results. I do not take in the calculation time of class Compiler, because it needs to be used once when saving a wiki page. Data returned from this class is amenable to serialize and it can be stored in the database. Also, if all the dynamic data handle as PHP code, wiki markup can be converted into html when saving and stored in database. Thus, when requesting a wiki page from the server it will be not necessary to build it every time (I know about the cache). Take the already prepared data (for Runtime and html) and enjoy. Cache is certainly necessary, but only for pages with dynamic data, and the lifetime of the objects in it can be greatly reduced since performance will be higher. I also have other ideas associated with the use of features that provide this realization. I have already made some steps in this direction and I think that all of this is realistic and useful. I'm not saying that foxway ready for use. It shows that this idea can work and can work fast enough. It needs to be rewritten to make it easier to maintain, and I believe that it can work even faster. I did not invent anything new. We all use the html + php. Wiki markup replaces difficult html and provides security, but what can replace the scripting language? I would like to know your opinion: is it really useful or I am wasting my time? Best wishes. Pavel Astakhov (pastakhov).
Hi, Pavel! I implemented something similar before Scribunto was stable enough and deployed: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/QPoll/interpretat...
However, instead of creating interpreter I just used php built-in tokenizer via token_get_all() to sanitize the code (to disallow some operators and calls) then eval'ed() it when security check passed successfully.
I probably should convert everything to use Scribunto instead because many people say that Lua has more secure VM and better language in general and also it's VM is one of the fastest (only JVM is faster). And Wikimedia needs Scribunto because they had high CPU load while executing some large templates at their servers.
However it's of course a bit sad that PHP runkit is so much outdated and abandoned and non-mainstream.
Dmitriy
From: Dmitriy Sintsov <questpc <at> rambler.ru> Subject: Re: Is Foxway a right way? http://news.gmane.org/find-root.php?message_id=1389610306.562859.4565.53906%40mail.rambler.ru Newsgroups: gmane.science.linguistics.wikipedia.technical http://news.gmane.org/gmane.science.linguistics.wikipedia.technical Date: 2014-01-13 10:51:46 GMT (29 minutes ago)
I implemented something similar before Scribunto was stable enough and deployed
My idea is different
... many people say that Lua ... better language in general ...
This is a very controversial statement. I use the PHP interpreter because mediawiki written in PHP and PHP is more powerful.
From: Tyler Romeo <tylerromeo <at> gmail.com> Subject: Re: Is Foxway a right way? http://news.gmane.org/find-root.php?message_id=CAE0Q5ovAtE%5f5WDgXr6rYUwkCqt%5fVNawuiMjWP9i%5fLbrXU5CAKA%40mail.gmail.com Newsgroups: gmane.science.linguistics.wikipedia.technical http://news.gmane.org/gmane.science.linguistics.wikipedia.technical Date: 2014-01-13 09:37:46 GMT (43 minutes ago) How does this compare to the PECL runkit extension? Also have you benchmarked it against Scribunto? Because Scribunto does kind of the same exact thing except just with a different programming language (and Scribunto uses a native interpreter rather than one written in PHP).
I do not propose to improve what is already there. I'm sure there is nothing faster LUA and one is the best choice.
Why is there a need for LUA? Because building html page from wiki markup without LUA takes a long time. Why? Because wiki page use a lot of function calls that are working together very slowly. So let it be. Can not we just all be cached? No, pages change frequently and cache is not dimensionless.
I propose to discuss the new principle of building html pages from the wiki markup. 1. Need to separate Wiki markup from the functions, just as html separated from PHP code. In this case, need to cache only the result of these functions. 2. Let functions to work quickly. I checked, it is possible.
I'm not trying to build a page quickly, I'm trying to do this very effectively. That is my idea. Efficient use of resources gives bigger win in speed.
Пн. 13 янв. 2014 16:12:29 пользователь Pavel Astakhov (pastakhov@yandex.ru) написал:
From: Dmitriy Sintsov <questpc <at> rambler.ru> Subject: Re: Is Foxway a right way? http://news.gmane.org/find-root.php?message_id=1389610306.562859.4565.53906%40mail.rambler.ru Newsgroups: gmane.science.linguistics.wikipedia.technical http://news.gmane.org/gmane.science.linguistics.wikipedia.technical Date: 2014-01-13 10:51:46 GMT (29 minutes ago) I implemented something similar before Scribunto was stable enough and deployed
My idea is different
I know. But the interpreting probably is too slow for huge-load wikis.
... many people say that Lua ... better language in general ...
This is a very controversial statement. I use the PHP interpreter because mediawiki written in PHP and PHP is more powerful.
Lua VM is not stack-based, it's RAM usage can be easily controlled. The people behind Scribunto (Tim Starling and Victor Vasiliev) are better programmers than me, if they choosed Lua then it's really worth something. Also Lua is used as scripting language in huge number of various applications (games, scientific) while PHP is not. Lua VM also was a bit faster than both PHP and Python some year ago.
From: Tyler Romeo <tylerromeo <at> gmail.com> Subject: Re: Is Foxway a right way? http://news.gmane.org/find-root.php?message_id=CAE0Q5ovAtE%5f5WDgXr6rYUwkCqt%5fVNawuiMjWP9i%5fLbrXU5CAKA%40mail.gmail.com Newsgroups: gmane.science.linguistics.wikipedia.technical http://news.gmane.org/gmane.science.linguistics.wikipedia.technical Date: 2014-01-13 09:37:46 GMT (43 minutes ago) How does this compare to the PECL runkit extension? Also have you benchmarked it against Scribunto? Because Scribunto does kind of the same exact thing except just with a different programming language (and Scribunto uses a native interpreter rather than one written in PHP).
I do not propose to improve what is already there. I'm sure there is nothing faster LUA and one is the best choice. Why is there a need for LUA? Because building html page from wiki markup without LUA takes a long time. Why? Because wiki page use a lot of function calls that are working together very slowly. So let it be. Can not we just all be cached? No, pages change frequently and cache is not dimensionless.
I guess they have enough of logged-in non-anonymous users so not everything and not always may be cached. Also I remember bad things could happen when source of template changes and lots of pages has to be regenerated, Scribunto probably reduced CPU load a lot in such case. Do not forget they are not small / medium size wiki but a really huge wiki.
I propose to discuss the new principle of building html pages from the wiki markup.
- Need to separate Wiki markup from the functions, just as html
separated from PHP code. In this case, need to cache only the result of these functions. 2. Let functions to work quickly. I checked, it is possible. I'm not trying to build a page quickly, I'm trying to do this very effectively. That is my idea. Efficient use of resources gives bigger win in speed.
Maybe you could apply your extension to Goole Summer of Code, or to another similar experimental project. They are announcing such projects regularly. Dmitriy
I implemented something similar before Scribunto was stable enough
and > deployed My idea is different
I know. But the interpreting probably is too slow for huge-load wikis.
Do not allow Foxway extension to deceive you. It is a cheap intermediary, between functions of PHP and extensions written in PHP. It understands and executes the commands as PHP code(no more of this). This allows you get the data from a extension, process these data in functions of pure PHP and transmit it to other extension. Yes, there is an additional expense, but they are small.
When you can (albeit slowly) to manage the fast and strong extensions and use powerfull functions in an arbitrary way, you can do anything you like and it will be quickly. You do not pay much, but you have the ability to easily manage this.
... Lua VM also was a bit faster than both PHP and Python some year ago.
I know this, but If we rewrite MediaWiki in LUA, it will not work faster.
On 13.01.2014 18:17, Pavel Astakhov wrote:
I implemented something similar before Scribunto was stable enough
and > deployed My idea is different
I know. But the interpreting probably is too slow for huge-load wikis.
Do not allow Foxway extension to deceive you. It is a cheap intermediary, between functions of PHP and extensions written in PHP. It understands and executes the commands as PHP code(no more of this). This allows you get the data from a extension, process these data in functions of pure PHP and transmit it to other extension. Yes, there is an additional expense, but they are small.
When you can (albeit slowly) to manage the fast and strong extensions and use powerfull functions in an arbitrary way, you can do anything you like and it will be quickly. You do not pay much, but you have the ability to easily manage this.
Imagine such pages created by Wikipedia users. They can slow down, bring down or exploit the wiki via calling extensions directly. While English Wikipedia is one of the top sites in the world.
... Lua VM also was a bit faster than both PHP and Python some year ago.
I know this, but If we rewrite MediaWiki in LUA, it will not work faster.
Maybe a bit faster, but their point is different - they run Lua in controlled almost isolated environment (users cannot hack Wiki), while it's much harder to achieve that with PHP without interpretation. While PHP interpretation is probably too slow for high-load and huge English Wikipedia. They also a non-profit thus probably cannot buy as many servers as let's say Facebook or Google. So, reducing CPU load it's important to them. Dmitriy
Hi, I have written example of how my idea works[1].
Also, I have benchmark test with LUA. [2] Surprisingly, the performance of my solution is higher than has LUA. Maybe it's not a very good comparison, but I made it is not to be used in this way. If it is used for its intended purpose, the performance will be even much higher.
[1] https://meta.wikimedia.org/wiki/Grants:IEG/Magic_expression#How_Magic_expres... [2] https://meta.wikimedia.org/wiki/Grants:IEG/Magic_expression#Compared_to_LUA_...
13.01.2014 22:37, Pavel Astakhov пишет:
13.01.2014 22:04, Dmitriy Sintsov пишет:
... exploit the wiki via calling extensions directly. ...
There are no calling extensions directly. Never.
I see that my idea is hard to understand. I will try to explain everything in detail with examples.
Thank you for the discussion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2014-01-13 6:17 AM, Pavel Astakhov wrote:
... Lua VM also was a bit faster than both PHP and Python some year ago.
I know this, but If we rewrite MediaWiki in LUA, it will not work faster.
Actually, not to nitpick... ok, no, yeah I'm going to nitpick. Rewriting MediaWiki in any well accepted programming language besides PHP would have an extremely good chance of making it faster (well perhaps with the exception of our parser). I probably wouldn't pick Lua specifically as a target for rewriting, I'd probably Python or Node.js, bonus points if you use a flavor of Python that works async like gevent, Twisted, Tornado, etc...
We've spent a pile of development time on autoloading, message caching, etc... but no mater what we do there is ultimately a cost to re-doing the exact same setup of the environment over and over on every page view even if the step of compiling a PHP script is optimized. And frankly this may even hold true with HipHop. PHP is the ONLY programming language I am aware of that inescapably forces you to be stuck with this. Every other language has some server that will handle requests with a callback instead of globals and a full CGI style isolated script execution and will re-use the same pre-initialized environment over and over. And some will even do this async allowing the same pre-initialized environment to not only be re-used but used to simultaneously handle multiple requests from the same environment.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
13.01.2014 23:06, Daniel Friesen wrote:
On 2014-01-13 6:17 AM, Pavel Astakhov wrote:
... Lua VM also was a bit faster than both PHP and Python some year ago.
I know this, but If we rewrite MediaWiki in LUA, it will not work faster.
Actually, not to nitpick... ok, no, yeah I'm going to nitpick. ...
I do not know. It goes far beyond the theme, my understanding, and that can be checked. I am in favor, that all programs must be written in assembly language, but nobody listens to me. :-) It's like a holy war, I'm sorry.
On Mon, Jan 13, 2014 at 12:06 PM, Daniel Friesen <daniel@nadir-seen-fire.com
wrote:
Actually, not to nitpick... ok, no, yeah I'm going to nitpick. Rewriting MediaWiki in any well accepted programming language besides PHP would have an extremely good chance of making it faster (well perhaps with the exception of our parser). I probably wouldn't pick Lua specifically as a target for rewriting, I'd probably Python or Node.js, bonus points if you use a flavor of Python that works async like gevent, Twisted, Tornado, etc...
This is not true. *Maybe* Python would be a good language to implement MediaWiki in, but recommending something like node.js for a website that needs to scale as big as Wikipedia will just not work (not to mention it's a terrible decision to voluntarily program in Javascript).
PHP's stateless per-request design is done on purpose, and it is not worse than a global state design; it's just different. PHP replaces global state with things like job queues and caches, and it tends to work pretty well. If anything it makes development easier because, not surprisingly, managing global state is incredibly difficult.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science
13.01.2014 14:58, Pavel Astakhov пишет:
Hi! I would like to discuss an idea.
In MediaWiki is not very convenient to docomputingusing the syntax of the wiki. We have to use several extensions like Variables, Arrays, ParserFunctions and others. If there are a lot of computing, such as data processing received from Semantic MediaWiki, the speed of page construction becomes unacceptable. To resolve this issue have to do another extension (eg Semantic Maps displays data from SMW on Maps). Becomes a lot of these extensions, they don't work well with each other and these time-consuming to maintain.
I know about the existence of extension Scribunto, but I think that you can solve this problem by another, more natural way. I suggest using PHP code in wiki pages, in the same way as it is used for html files. In this case, extension can be unificated. For example, get the data from DynamicPageList, if necessary to process, and transmit it to display other extensions, such as Semantic Result Formats.This will give users more freedom for creativity.
In order to execute PHP code safely I decided to try to make a controlled environment. I wrote it in pure PHP, it is lightweight and in future can be included in the core. It can be viewed as an extension Foxway. The first version in branch master. It gives an idea of what it is possible in principle to do and there's even something like a debugger. It does not work very quickly and I decided to try to fix it in a branch develop. There I created two classes, Compiler and Runtime.
The first one processes PHP source code and converts it into a set of instructions that the class Runtime can execute very quickly. I took a part of the code from phpunit tests to check the performance. On my computer, pure PHP executes them on average in 0.0025 seconds, and the class Runtime in 0.05, it is 20 times slower, but also have the opportunity to get even better results. I do not take in the calculation time of class Compiler, because it needs to be used once when saving a wiki page. Data returned from this class is amenable to serialize and it can be stored in the database. Also, if all the dynamic data handle as PHP code, wiki markup can be converted into html when saving and stored in database. Thus, when requesting a wiki page from the server it will be not necessary to build it every time (I know about the cache). Take the already prepared data (for Runtime and html) and enjoy. Cache is certainly necessary, but only for pages with dynamic data, and the lifetime of the objects in it can be greatly reduced since performance will be higher.
I also have other ideas associated with the use of features that provide this realization. I have already made some steps in this direction and I think that all of this is realistic and useful. I'm not saying that foxway ready for use. It shows that this idea can work and can work fast enough. It needs to be rewritten to make it easier to maintain, and I believe that it can work even faster.
I did not invent anything new. We all use the html + php. Wiki markup replaces difficult html and provides security, but what can replace the scripting language?
I would like to know your opinion: is it really useful or I am wasting my time?
Best wishes. Pavel Astakhov (pastakhov).
Do not consider Foxway as PHP interpreter. Consider it as a faster and more powerful alternative to "Magic words".
Why is it faster?
1) Foxway compiles the source code. ("Magic words" does not). Not important here PHP, LUA or wiki markup. It could be anything. It is important that the compiled code is executed very quickly. (20 times slower than pure PHP)
2) Foxway itself does not touch the data to be operated on, they can be anything. (All data in the "Magic words" only as strings.) It's safe, because foxway can only transmit the data. All responsibility is only on extensions that process these data. All built-in functions that process data, are built-in extensions and can be disabled or replaced by others. They were integrated for convenience only. Foxway uses function "echo" to push data to html page. It is pre-disinfected string or data that gives other extension under their own responsibility (just like "Magic words").
Why is it more powerful?
1) All extensions can be unified. They will do one thing, but do it well. Such extensions will be easier to maintain and develop. We all get a lot of freedom to choose what to use and how. This is the Unix philosophy.
2) Huge number of functions already done, just use them.
3) It should look like script language and it looks like this. "Magic words" looks very confusing and inconvenient for more use.
Why does foxway understand the syntax of PHP code and not the other, such as LUA or Python? Indeed it could understand any syntax, but I chose PHP for several reasons. 1) Mediawiki is written in PHP, and PHP provides the lexical analysis of the PHP source code and generating tokens. 2) PHP has many useful functions, no better way to give the opportunity to use them, how not to be like inerpretator PHP. 3) PHP is popular. 4) I like PHP :-)
How does this help to make MediaWiki faster?
1) "Magic words" is a bottleneck. I propose to replace it with a much faster foxway.
2) Inclusion pages is a bottleneck. We began to use a huge number of templates. Many of them are used so often that they cause a huge load on the servers when they are modified. I suggest using the power foxway to reduce the number of templates and make them less used.
What do you think about this? -- Pavel Astakhov
wikitech-l@lists.wikimedia.org