It appears that the precision setting for PHP may be set inconsistently across the WMF server pool.
PHP's internal floating point format is the same as a C double giving ~15 digits of precision. By default, current versions of PHP display 14 digits when rendering numbers for output. This is controlled by a "precision" setting in php.ini that defaults to 14.
Operations like {{#expr:1.11111111111222}} on Wikipedia usually displays 14 digits, but every so often the parser will uniformly truncate all calls to #expr: to only 12 digits (i.e. for the example given, you get "1.11111111111" without the "22" at the end). The easiest way to understand this would be if a few of the servers have "precision=12" set in php.ini.
By performing tests like: {{#expr:1.11111111111222-1.11111111111}} it appears that in all cases the server is aware of those extra digits and able to perform operations involving them, but it simply chooses not to display them if it leads to too high a displayed precision. The fact that the servers all seem to operate on the digits correctly would suggest that this is a variation in software rather than some more fundamental variation in hardware, and supports my theory that this is caused by inconsistent configuration settings.
Would someone be willing to check whether some of the servers are set to precision=12 and others to precision=14? I'm not sure if it is useful, but I made a point of capturing the "Served by" comment a couple times when I saw truncation to 12 digits. This included srv112 and srv176.
If there is variation in this PHP setting across the servers, then I think one or the other setting should be adopted universally unless there is some good reason not to. Since getting 14 digits seems to happen most of the time right now, that would seem to be the natural choice.
-Robert Rohde
On Feb 11, 2009, at 7:05 AM, Robert Rohde wrote:
It appears that the precision setting for PHP may be set inconsistently across the WMF server pool.
hi, are you still doing your school homework assignments with wikitext? get a calculator, already!
On Feb 11, 2009, at 7:25 AM, Domas Mituzas wrote:
On Feb 11, 2009, at 7:05 AM, Robert Rohde wrote:
It appears that the precision setting for PHP may be set inconsistently across the WMF server pool.
hi, are you still doing your school homework assignments with wikitext? get a calculator, already!
-- Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
The requested check of WMF’s serve pool settings is so the {val} template can return consistent performance at 14 digits when delimiting high-precision numerical equivalencies and universal constants, which are today often expressed to 13 digits or more in physics and science.
See http://en.wikipedia.org/wiki/User:Greg_L/Val_microsandbox
About 80% of the time, {val} works to 14 digits. But about 20% of the time, only the 12-digit value doesn’t return an error message. Thus, as it currently stands, {val} can only be recommended for use up to 12 digits. Already, in just one en.Wiki article, [[Kilogram]], there is one 13-digit value that can’t use {val}.
[[User:Greg L]]
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Feb 11, 2009 at 7:25 AM, Domas Mituzas midom.lists@gmail.com wrote:
On Feb 11, 2009, at 7:05 AM, Robert Rohde wrote:
It appears that the precision setting for PHP may be set inconsistently across the WMF server pool.
hi, are you still doing your school homework assignments with wikitext? get a calculator, already!
Yes Domas, haha, because no one would ever want to write about math or high precision scientific measurements in an encyclopedia.
PHP has its limits. We know that. But the fundamental issue is that there are real applications whose output is changing depending on which server happens to do the parsing. For example, 80% of the time one might see the correct 13 digits expression for the number of atoms in a kilogram (formated via template according to NIST standards for scientific notation) and 20% of the time the page gets a big red error message.
If the parser always failed, people would know they have to avoid the formatting templates and format the number by hand. However, when something fails only intermittently depending on which server happened to get the job that is a problem that is a lot harder for average users to diagnose. I would generally think the server admins would want all of the servers to generate identical behavior, if at all practical.
Am I wrong in thinking that the server admins should care when different machines produce different output from the same code? In this case, the behavior suggests it may be as simple as ensuring that the servers have the same php.ini precision settings.
-Robert Rohde
On Wed, Feb 11, 2009 at 12:29 PM, Robert Rohde rarohde@gmail.com wrote:
Yes Domas, haha, because no one would ever want to write about math or high precision scientific measurements in an encyclopedia.
Holy crud! You don't use floating point for this! If you need deterministic behaviour and high accuracy you need to confine yourself to integer mathematics.
Sure, *Write about* high precision scientific measurements in Wikipedia, but don't use Wikipedia to *make them*.
[snip]
Am I wrong in thinking that the server admins should care when different machines produce different output from the same code? In this case, the behavior suggests it may be as simple as ensuring that the servers have the same php.ini precision settings.
Is there any reason to think that this is related to to a PHP setting rather than being a result of differences in compiler decisions with respect to moving variables in in off the x87 stack and into memory or the use of SSE? Or some libc difference in how the FPU rounding mode is set?
At 12 digits you are beyond the expected precision of single precision floating point, and not far from what you get with doubles. On x86 the delivered precision can vary wildly depending on the precise sequence of calculations are register spills. For code compiled without -ffast-math the former should be stable for a single piece of code, but the latter is anyone's guess.
On Wed, Feb 11, 2009 at 9:58 AM, Gregory Maxwell gmaxwell@gmail.com wrote:
On Wed, Feb 11, 2009 at 12:29 PM, Robert Rohde rarohde@gmail.com wrote:
Yes Domas, haha, because no one would ever want to write about math or high precision scientific measurements in an encyclopedia.
Holy crud! You don't use floating point for this! If you need deterministic behaviour and high accuracy you need to confine yourself to integer mathematics.
Actually PHP's maxint craps out at 10 digits. That's not currently the issue though. <snip>
Am I wrong in thinking that the server admins should care when different machines produce different output from the same code? In this case, the behavior suggests it may be as simple as ensuring that the servers have the same php.ini precision settings.
Is there any reason to think that this is related to to a PHP setting rather than being a result of differences in compiler decisions with respect to moving variables in in off the x87 stack and into memory or the use of SSE? Or some libc difference in how the FPU rounding mode is set?
<snip>
Yes there is. Testing with various types of expressions supports the view that PHP is aware of the extra digits (out to the double limit of ~15 digits) and able to consistently perform math with them regardless of which server is processing the request. For example, if you perform operations that reduce the precision by removing the highest most digits, then the lowest digits (that had previously been hidden) will reappear. So it appears the internal representations are the same and it is merely a matter of how the information is being output. Since the number of digits to output is governed by the PHP setting that would seem to be the most likely cause.
-Robert Rohde
Hello!
Yes Domas, haha, because no one would ever want to write about math or high precision scientific measurements in an encyclopedia.
Since when you need floating point math in presentation language, when you're writing about floating point math? There're plenty of topics that our encyclopedia is about, that don't have tools (like physics/biology simulators) inside wikitext. Damnit, we write about all these people and we don't even have tools that would evaluate their genome and provide probabilistic matching and evaluation of article text based on that.
Am I wrong in thinking that the server admins should care when different machines produce different output from the same code?
This is exactly my point, you don't give a single valid reason for us to care, except some random "oh no, 12 numbers, oh no 14 numbers". Why should we care?
Ok Domas, you’re just not getting it here. Try going to this example page…
http://en.wikipedia.org/wiki/User:Greg_L/val_failure_example
Do you see all those numeric equivalencies in high-precision scientific notation? Those types of numbers appear in science articles all over Wikipedia. Do you see the ones that are generating error codes? Suppose they didn’t do that when you wrote the article but it randomly happens to others that view the page.
Robert is saying that if there is a software setting to make the servers perform the same way, then we should make them the same. Why? Because, imagine that editors use {val} and check (twice, or thrice) to make sure all looks good. But then 20% of the time—purely randomly based upon the server—a couple of numbers with 13 digits will generate error codes like shown in the article. That’s what’s happening here. Roughly 80% of the time, 13 and 14-digit numbers are rendered properly. But 20% of the time, they don’t.
Why not just advise editors that they should limit {val} for 12 digits? The issue is that 12-digits is just barely cutting it. Ten years ago, there were few numbers in physics that were measured to this precision. But the are becoming increasingly common—particularly anything related to time or length.
That’s why editors should care. If you don’t that’s fine. But please don’t suggest that others shouldn’t care either.
Greg L
On Feb 11, 2009, at 1:13 PM, Domas Mituzas wrote:
Hello!
Yes Domas, haha, because no one would ever want to write about math or high precision scientific measurements in an encyclopedia.
Since when you need floating point math in presentation language, when you're writing about floating point math? There're plenty of topics that our encyclopedia is about, that don't have tools (like physics/biology simulators) inside wikitext. Damnit, we write about all these people and we don't even have tools that would evaluate their genome and provide probabilistic matching and evaluation of article text based on that.
Am I wrong in thinking that the server admins should care when different machines produce different output from the same code?
This is exactly my point, you don't give a single valid reason for us to care, except some random "oh no, 12 numbers, oh no 14 numbers". Why should we care?
On 2/10/09 9:05 PM, Robert Rohde wrote:
Would someone be willing to check whether some of the servers are set to precision=12 and others to precision=14? I'm not sure if it is useful, but I made a point of capturing the "Served by" comment a couple times when I saw truncation to 12 digits. This included srv112 and srv176.
Since page renderings are cached, that doesn't necessarily mean it's the server that rendered your page...
If there is variation in this PHP setting across the servers, then I think one or the other setting should be adopted universally unless there is some good reason not to. Since getting 14 digits seems to happen most of the time right now, that would seem to be the natural choice.
'precision = 12' is set, based on the defaults from PHP's own php.ini-dist, in the php.ini used on our remaining Fedora-based servers.
Our newer Ubuntu-based servers have no 'precision' setting in their php.ini files, so the default value will be used... PHP's internal default being 14. Yay for PHP configuration consistency! :)
Additionally I've found that 5 servers recently moved from the search group to the application servers group were running PHP 5.1 still. I've taken these out of service pending upgrades.
-- brion
On Wed, Feb 11, 2009 at 10:47 AM, Brion Vibber brion@wikimedia.org wrote: <snip>
'precision = 12' is set, based on the defaults from PHP's own php.ini-dist, in the php.ini used on our remaining Fedora-based servers.
Our newer Ubuntu-based servers have no 'precision' setting in their php.ini files, so the default value will be used... PHP's internal default being 14. Yay for PHP configuration consistency! :)
<snip>
Thanks Brion.
It is nice to know I had the right understanding of the problem. :-)
Any thoughts on whether one or the other setting might be applied as a standard across all of the servers? Based on your comment about Fedora, should I understand that those servers may be replaced / upgraded eventually anyway?
-Robert Rohde
On 2/11/09 11:15 AM, Robert Rohde wrote:
Thanks Brion.
It is nice to know I had the right understanding of the problem. :-)
Any thoughts on whether one or the other setting might be applied as a standard across all of the servers? Based on your comment about Fedora, should I understand that those servers may be replaced / upgraded eventually anyway?
All the remaining Fedora boxes will be reinstalled with our current, consistent Ubuntu installation over the next few weeks -- we just need to ensure that all text external storage has been successfully migrated out of the app server cluster before we wipe them. :)
In the meantime I'm not in a real rush to tweak and redeploy config files on the old setup for a cosmetic issue of this sort; but yes, it'll go away in time.
-- brion
On Wed, Feb 11, 2009 at 3:45 PM, Brion Vibber brion@wikimedia.org wrote:
All the remaining Fedora boxes will be reinstalled with our current, consistent Ubuntu installation over the next few weeks -- we just need to ensure that all text external storage has been successfully migrated out of the app server cluster before we wipe them. :)
In the meantime I'm not in a real rush to tweak and redeploy config files on the old setup for a cosmetic issue of this sort; but yes, it'll go away in time.
That's great, and thanks for your attention.
-Robert Rohde
Ditto. Thanks Brian.
I agree with your assessment that there is no imperative whatsoever to do software fixes for something that will go away on its own in a few weeks. I will just advise editors that {val} may LOOK like it can do 13 and 14-digit numbers, but it can’t do them consistently and they should limit {val} to 12-digit significands. That will suffice for the vast majority of uses for {val}.
When the servers have all been upgraded to Ubuntu (I have a programmer friend with a triple-boot Mac with Ubuntu), then we can change the advise to editors about {val} being able to reliably handle 14 digits.
If you can remember, can you put a Post-it Note on your last Fedora box to e-mail me when you take it off-line?
[[User:Greg L]]
On Feb 11, 2009, at 4:05 PM, Robert Rohde wrote:
On Wed, Feb 11, 2009 at 3:45 PM, Brion Vibber brion@wikimedia.org wrote:
All the remaining Fedora boxes will be reinstalled with our current, consistent Ubuntu installation over the next few weeks -- we just need to ensure that all text external storage has been successfully migrated out of the app server cluster before we wipe them. :)
In the meantime I'm not in a real rush to tweak and redeploy config files on the old setup for a cosmetic issue of this sort; but yes, it'll go away in time.
That's great, and thanks for your attention.
-Robert Rohde
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Greg L wrote:
If you can remember, can you put a Post-it Note on your last Fedora box to e-mail me when you take it off-line?
[[User:Greg L]]
You can watch yourself bug 17452 https://bugzilla.wikimedia.org/show_bug.cgi?id=17452
On 2/10/09 10:47 AM, Brion Vibber wrote: <snip> Since page renderings are cached, that doesn't necessarily mean it's the server that rendered your page...
Brion, you can ?action-purge or go to edit mode and hit “Show preview” to work around cacheing. Doing so demonstrates that about 20% of the time, the request is processed by a server that has some sort of setting that is different from the others.
In the final analysis, no real “math” is performed on these significands; all that is happening is delimiters (either a comma or a <span> gap) are being applied every three characters until there is only two, three, or four characters left. I am still rather surprised that there aren’t some simple, bug-free string functions that could tackle this task. This would allow significands of unlimited length.
[[User:Greg L]], greg_l_at_wikipedia@comcast.net
On Wed, Feb 11, 2009 at 2:17 PM, greg_l_at_wikipedia greg_l_at_wikipedia@comcast.net wrote: [snip]
In the final analysis, no real "math" is performed on these significands; all that is happening is delimiters (either a comma or a <span> gap)
Division is very much math and has significant implications on precision.
You might think that by dividing only in powers of 10 you are avoiding precision problems, but you would be wrong because the computer isn't doing math in base 10.
are being applied every three characters until there is only two, three, or four characters left. I am still rather surprised that there aren't some simple, bug-free string functions that could tackle this task. This would allow significands of unlimited length.
There exist plenty of such string functions. Every proper programming language provides them. The parser functions, however, do not.
Gregory, others:
Division is very much math and has significant implications on precision.
Uhm… yeah, division is math; that much is not lost on an R&D engineer. What I mean is that “in the final analysis” (what {val} *does* for delimiting numbers), is that no math services are performed for the editor/user as there is with conversion templates. This is all about inserting span gaps every third character. What the template user does is input the value and {val} inserts spaces. This entire {val}-like function can be handled with string functions acting on the signficand as if they are just simple characters, numeric, Latin, Roman, Martian; it doesn’t matter.
There exist plenty of such string functions. Every proper programming language provides them. The parser functions, however, do not.
Bingo.
[[User:Greg L]]
On Feb 11, 2009, at 11:43 AM, Gregory Maxwell wrote:
On Wed, Feb 11, 2009 at 2:17 PM, greg_l_at_wikipedia greg_l_at_wikipedia@comcast.net wrote: [snip]
In the final analysis, no real "math" is performed on these significands; all that is happening is delimiters (either a comma or a <span> gap)
Division is very much math and has significant implications on precision.
You might think that by dividing only in powers of 10 you are avoiding precision problems, but you would be wrong because the computer isn't doing math in base 10.
are being applied every three characters until there is only two, three, or four characters left. I am still rather surprised that there aren't some simple, bug-free string functions that could tackle this task. This would allow significands of unlimited length.
There exist plenty of such string functions. Every proper programming language provides them. The parser functions, however, do not.
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org