Okay. Thanks for making the extra effort.
-Robert
On Thu, Oct 29, 2015 at 6:05 AM, MZMcBride <z(a)mzmcbride.com> wrote:
Robert Rohde wrote:
Which, after substituting
"display:none;" I think translates directly to
the regex search:
insource:/style[ ]*=[ ]*\"display:[ ]*none;[ ]*\"/i
That gives me 487 articles.
Almost, but not quite. You actually want this:
insource:/style[ ]*=[ ]*\"display:[ ]*none;?[ ]*\"/i
With the semicolon being made optional, the search results increase from
487 to 2,487 currently on the English Wikipedia. The normalization script
(<https://phabricator.wikimedia.org/P2229>) made the trailing semicolon
consistent, in addition to lowercasing and trying to account for strange
spacing. For whatever reason, "display: none;" is often written without
the trailing semicolon in main namespace pages on the English Wikipedia.
I was worried that I may have made a major coding mistake, so I re-ran my
script using this pattern:
pattern = r'style[ ]*=[ ]*"[ ]*display[ ]*:[ ]*none[ ]*;?[ ]*"'
The results are available here: <https://phabricator.wikimedia.org/P2255>.
Sixteen articles have over 1,000 instances of "display: none;" each! The
total is 142,176 instances of "display: none;" (normalized) in 2,507 main
namespace pages on the English Wikipedia, as of about 2015-10-02.
I am happy to agree that searching the XML should
be better than the local
search tool, but I still find these numbers hard to reconcile.
After re-reviewing the code and re-running the script to focus on
"display: none;" specifically, there's strong evidence to suggest that the
numbers are accurate, if not a bit surprising in some cases. :-)
MZMcBride
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l