Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
(including analytics@ public list)
Rafael:
As I think we have mention before please be so kind to e-mail analytics@
rather than individual people.
>Again, we are exclusively looking for the absolute number of Wikipedia
updates per year per county.
I think you mean "edits" to wikipedia. If so we are currently working on a
project that sets up to deliver an estimate (likely an interval) of edits
happening a country. we do not have an ETA for such a deliverable but the
major project that you can follow is this one:
https://phabricator.wikimedia.org/T130256
You can ping us again by the end of next quarter (April 2017) by which we
can probably give you more specific information.
Thanks,
Nuria
On Wed, Nov 2, 2016 at 12:53 PM, Rafael Escalona Reynoso <re32(a)cornell.edu>
wrote:
> Dear Dan,
>
> I hope you are doing fine and that you remember me. I am the lead
> researcher at The Global Innovation Index (GII). I contacted you last year
> searching for data on Wikipedia uploads per country. I believe that this
> request got assigned a task number at some point. Here is what I know:
>
>
>
>
>
> Can you please let me know of this request’s status?
>
>
>
> Also, if any legal issues seem to be obstructing the compilation of this
> data, can you please refer us to someone from your legal department to
> explore the possibility of tailoring a contract/confidentiality agreement
> between the GII and Wikimedia?
>
>
>
> Again, we are exclusively looking for the absolute number of Wikipedia
> updates per year per county. These used to be available via Wikimedia here:
>
> https://stats.wikimedia.org/wikimedia/squids/
>
>
>
> Hope to hear from you soon.
>
>
>
> Sincerely,
>
> Rafael Escalona Reynoso, PhD, MPA.
>
> Lead Researcher at The Global Innovation Index
>
> Samuel Curtis Johnson Graduate School of Management
>
> 207 Sage Hall
> Cornell University
> Ithaca, NY 14853-6201
>
> Phone: +1 (607) 262-0983
>
> Email: re32(a)cornell.edu <soumitra.dutta(a)cornell.edu>
>
> http://www.johnson.cornell.edu
>
> [image: cid:image001.png@01CB662F.A467E740]
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Apr 4, 2016 at 10:58 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
> wrote:
>
> Yes, once this data is properly anonymized, it should continue to be
> released in the same shape. We just have to make sure it's properly safe
> first.
>
>
>
> *From: *Rafael Escalona Reynoso
>
> *Sent: *Monday, April 4, 2016 19:21
>
> *To: *Dan Andreescu
>
> *Cc: *Jordan Litner; sacha.wunschvincent(a)wipo.int
>
> *Subject: *RE: On Wikipedia edits archive per county.
>
>
>
> Dan,
>
> Thank you for the update. This is kind of what we were expecting. I have a
> follow-up question: Would the data be collected in the same fashion for
> subsequent years (2016, 2017, etc.)? Or will this be a single time
> exercise? Do let me know whenever you can.
>
>
>
> Best,
>
>
>
>
>
> Rafael Escalona Reynoso, PhD, MPA.
>
> Lead Researcher at The Global Innovation Index
>
> Samuel Curtis Johnson Graduate School of Management
>
> 207 Sage Hall
>
> Cornell University
>
> Ithaca, NY 14853-6201
>
>
>
> Phone 1: +1 (607) 262-0983
>
> Phone 2: +1 (607) 255-9245
>
> Email: re32(a)cornell.edu
>
> http://www.johnson.cornell.edu
>
>
>
> [image: cid:image001.png@01CB662F.A467E740]
>
>
>
> See www.globalinnovationindex.org
>
>
>
>
>
>
>
> *From:* Dan Andreescu [mailto:dandreescu@wikimedia.org]
> *Sent:* Monday, April 04, 2016 6:13 PM
>
> *To:* Rafael Escalona Reynoso
> *Cc:* Jordan Litner; sacha.wunschvincent(a)wipo.int
> *Subject:* Re: On Wikipedia edits archive per county.
>
>
>
> Hey Rafael,
>
>
>
> We haven't been able to prioritize this work yet. It's been moved here:
>
>
>
> https://phabricator.wikimedia.org/T131280
>
>
>
> It has two stakeholders but no resources to get it done due to privacy
> issues. So we won't be able to get the 2015 data cleaned up before your
> deadline. But we're meeting about it again tomorrow and we will still do
> it so you can have this data for either next year's report or an amendment.
>
>
>
> On Mon, Apr 4, 2016 at 11:49 AM, Rafael Escalona Reynoso <re32(a)cornell.edu>
> wrote:
>
> Dear Dan,
>
> Hope you are doing fine. Just is quick note to follow up on the Wikipedia
> data. When do you think this data will be available? We are about to close
> the model and would very much like to have 2015 data included. Let me know.
>
>
>
> Best,
>
>
>
> Rafael Escalona Reynoso, PhD, MPA.
>
> Lead Researcher at The Global Innovation Index
>
> Samuel Curtis Johnson Graduate School of Management
>
> 207 Sage Hall
>
> Cornell University
>
> Ithaca, NY 14853-6201
>
>
>
> Phone 1: +1 (607) 262-0983
>
> Phone 2: +1 (607) 255-9245
>
> Email: re32(a)cornell.edu
>
> http://www.johnson.cornell.edu
>
>
>
> [image: cid:image001.png@01CB662F.A467E740]
>
>
>
> See www.globalinnovationindex.org
>
>
>
>
>
>
>
> *From:* Dan Andreescu [mailto:dandreescu@wikimedia.org]
> *Sent:* Thursday, February 18, 2016 8:19 PM
>
>
> *To:* Rafael Escalona Reynoso
> *Cc:* Jordan Litner; sacha.wunschvincent(a)wipo.int
> *Subject:* Re: On Wikipedia edits archive per county.
>
>
>
> That's perfect, I added it to the request: https://phabricator.
> wikimedia.org/T127409
>
> On Thursday, February 18, 2016, Rafael Escalona Reynoso <re32(a)cornell.edu>
> wrote:
>
> Dan,
>
> These were my thoughts exactly. Let me then elaborate on the value of the
> report, the index and why we feel that Wikipedia data is essential.
>
>
>
> The report is co-published by Cornell University, INSEAD, and the World
> Intellectual Property Organization (WIPO, a specialized agency of the
> United Nations), with the collaboration of three Knowledge Partners: the
> Confederation of Indian Industry, du, and A.T. Kearney and IMP³rove –
> European Innovation Management Academy. Now in its ninth edition, the
> report has established itself as a premier reference among innovation
> metrics and as a tool to facilitate public-private dialogue and
> evidence-based policymaking.
>
>
>
> The Global Innovation Index (GII) is a ranking of 141 economies in terms
> of their innovation capabilities and results. A total of 79 metrics in the
> form of data-based indicators are at its core. These rich metrics can be
> used —on the level of the index, the sub-indices, or as individual
> variables—to monitor performance over time and to benchmark developments
> against their peers. These can also help study country profiles over time,
> and to identify their relative strengths and weaknesses from the rich and
> unique GII dataset.
>
>
>
> Each year the GII results are presented within the framework of a
> top-level international event:
>
> • 2013 Geneva, Switzerland at the Opening Session of
> the United Nations Economic and Social Council (ECOSOC) High-Level Segment,
> organized by WIPO;
>
> • 2014 Sydney, Australia in the context of the B20/G20
> preparations; and
>
> • 2015 London, United Kingdom before the Minister of
> Innovation and Industry.
>
>
>
> This year the launch is scheduled for the summer in Beijing, China
> preceding the preparations for the 2016 G20 summit.
>
>
>
> Recognizing the need for a broad horizontal vision of innovation
> applicable to developed and emerging economies alike, the GII includes
> indicators that go beyond the traditional measures such as expenditure in
> research and development. That said, an area that is of great relevance and
> limited to the GII is that of creative outputs. Within it, *Wikipedia
> monthly page edits (per million population 15-69 y/o)* is a key metric.
> This indicator, along with others that measure the number of generic
> top-level and country-code top-level domains and video uploads in YouTube,
> helps capture what we define as online creativity.
>
>
>
> Lastly, we believe that the GII can be an important vehicle to signal that
> Wikipedia is a critical lever to innovation and a factor contributing to a
> new understanding of the digital information landscape and innovation
> globally.
>
>
>
> Based on all the above, we would like to request that our petition to
> collect data on Wikipedia monthly page edits per country, reported
> quarterly per year be given priority within your tasks.
>
>
>
> Sincerely,
>
>
>
>
>
> Rafael Escalona Reynoso, PhD, MPA.
>
> Lead Researcher at The Global Innovation Index
>
> Samuel Curtis Johnson Graduate School of Management
>
> 207 Sage Hall
>
> Cornell University
>
> Ithaca, NY 14853-6201
>
>
>
> Phone 1: +1 (607) 262-0983
>
> Phone 2: +1 (607) 255-9245
>
> Email: re32(a)cornell.edu
>
> http://www.johnson.cornell.edu
>
>
>
> [image: cid:image001.png@01CB662F.A467E740]
>
>
>
> See www.globalinnovationindex.org
>
>
>
>
>
>
>
> *From:* Dan Andreescu [mailto:dandreescu@wikimedia.org
> <dandreescu(a)wikimedia.org>]
> *Sent:* Thursday, February 18, 2016 11:17 AM
> *To:* Rafael Escalona Reynoso
> *Cc:* Jordan Litner; sacha.wunschvincent(a)wipo.int
> *Subject:* Re: On Wikipedia edits archive per county.
>
>
>
> Where the request is coming from, with all due respect, does not matter.
> We aim to be neutral in how we make knowledge available (namely, we try to
> make it available to everyone, for free).
>
>
>
> But, we have to prioritize somehow, and that process definitely takes into
> consideration the value our work has to the world. So, if you tell me more
> about what this data could help you accomplish, we could use that to argue
> that prioritizing your request might save lives, serve the mission of open
> knowledge, etc.
>
>
>
> But to answer your other question directly, yes, a letter to Jimmy Wales
> would not have any effect on this priority process and might be seen by the
> community we serve as an attempt to circumvent our planning process.
>
>
>
> On Thu, Feb 18, 2016 at 10:54 AM, Rafael Escalona Reynoso <
> re32(a)cornell.edu> wrote:
>
> Dan,
>
> Let me share with you the following thought. I just had a call with the
> Dean at the business school here at Cornell, who is the creator of the
> Global Innovation Index (and my direct boss). I explained the situation
> with the Wikipedia uploads data and how methodological changes are now
> making it impossible for us to collect it in the fashion that we were used
> to. He mentioned that he is an acquaintance with Jimmy Wales and offered to
> direct him a letter explaining what we need and the importance of the
> indicator for our index. My notion here is that the issue has more to do
> with a shortage of labor hand and quite large a backlog than with where the
> request is coming from. Also, I do not want the letter to come across as an
> imposition or to give the wrong message. Based on the above, would this
> letter help prioritize the collection of this data?
>
>
>
> Let me know what you think.
>
>
>
> Best,
>
>
>
> Rafael Escalona Reynoso, PhD, MPA.
>
> Lead Researcher at The Global Innovation Index
>
> Samuel Curtis Johnson Graduate School of Management
>
> 207 Sage Hall
>
> Cornell University
>
> Ithaca, NY 14853-6201
>
>
>
> Phone 1: +1 (607) 262-0983
>
> Phone 2: +1 (607) 255-9245
>
> Email: re32(a)cornell.edu
>
> http://www.johnson.cornell.edu
>
>
>
> [image: cid:image001.png@01CB662F.A467E740]
>
>
>
> See www.globalinnovationindex.org
>
>
>
>
>
> *From:* Dan Andreescu [mailto:dandreescu@wikimedia.org
> <dandreescu(a)wikimedia.org>]
> *Sent:* Wednesday, February 17, 2016 2:57 PM
> *To:* Rafael Escalona Reynoso
> *Cc:* Jordan Litner; sacha.wunschvincent(a)wipo.int
> *Subject:* Re: On Wikipedia edits archive per county.
>
>
>
> As much as I love to help a fellow Cornellian, we are too small of a team
> to create one-off solutions like that. We either publish it for everyone
> or no-one. But even if we did that, we'd still have a lot of work to check
> whether cross-referencing that data with other data wouldn't hurt privacy.
>
>
>
> What would help is if you filed a task in Phabricator and tagged it with
> the "Analytics" project, and described very precisely what data you need,
> at what time granularity, and what you need it for. We'll use that as
> proof that we need to prioritize the work sooner than later.
>
>
>
> On Wed, Feb 17, 2016 at 2:45 PM, Rafael Escalona Reynoso <re32(a)cornell.edu>
> wrote:
>
> Dan,
>
> One last thing. We also report scaled data from Google on YouTube uploads
> and, as you mention, they have to protect privacy. However, we prepare for
> them an Excel sheet where they simply need to upload the totals for each
> country (which we never get to see) and they report back to us exclusively
> the normalized scores (0-100) and rankings for all countries we request
> information for. Using this procedure it becomes impossible to
> reverse-engineer the raw values used to obtain these totals and – again –
> we never get to see the actual data. Is there a chance that we could
> establish a similar type of arrangement with Wikimedia? Let me know.
>
>
>
> Best,
>
>
>
> Rafael.
>
>
>
> *From:* Dan Andreescu [mailto:dandreescu@wikimedia.org
> <dandreescu(a)wikimedia.org>]
> *Sent:* Wednesday, February 17, 2016 2:23 PM
> *To:* Rafael Escalona Reynoso
> *Subject:* Re: On Wikipedia edits archive per county.
>
>
>
> Thank you for this, again. Sorry to pester you with this again but, do you
> know of any other data (from a source different from Google) where online
> activity could be measured? Any leads would be quite appreciated.
>
>
>
> mmm, not geolocated that I know of, and it's unlikely that you'd find
> that. Because either
>
>
>
> * an organization is for-profit, in which case they would sell that data
>
> * an organization is non-profit, in which case they'd likely need to
> protect their users and bump up against the same *hard* problems we did
>
>
>
> But I could be wrong, good luck in your search and do report back if you
> find any, and especially if you find approaches that help to protect
> privacy.
>
>
>
>
>
>
>
>
>
>
>
Hi all,
the webrequest and pageview_hourly tables on Hive contain the very
useful user_agent_map field, which stores the following data extracted
from the raw user agent (still available as a separate field):
device_family, browser_family, browser_major, os_family, os_major,
os_minor and wmf_app_version. (The Analytics Engineering team has
built a dashboard that uses this data and last month published a
popular blog post about it.) I understand it is mainly based on the
ua-parser library (http://www.uaparser.org/ ) .
In contrast, the event capsule in our EventLogging tables only
contains the raw, unparsed user agent.
* Does anyone on this list have experience in parsing user agents in
EventLogging data for the purpose of detecting browser family, version
etc, and would like to share advice on how to do this most
efficiently? (In the past, I have written some expressions in MySQL to
extract the app version number for the Wikipedia apps. But it seems a
bit of a pain to do that for classifying browsers in general. One
option would be to export the data and use the Python version of
ua-parser, however doing it directly in MySQL would fit better into
existing workflows.)
* Assuming it is technically possible to add such a pre-parsed
user_agent_map field to the EventLogging tables, would other analysts
be interested in using it too?
This came up recently with the Reading web team, for the purpose of
investigating whether certain issues are caused by certain browsers
only. But I imagine it has arisen in other places as well.
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
Dear Analytics Mailing List,
Recently while querying pageviews of various pages, I discovered that
the page whose title is a single hyphen character (i.e. with the title
"-", with URL <https://en.wikipedia.org/wiki/->, which redirects to
<https://en.wikipedia.org/wiki/Hyphen-minus>) receives an unusually high
number of pageviews under the Pageview API. Taking October 2015 as an
example, the page received 5.4 million pageviews during that month
according to the API:
<
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedi…
>.
However, according the stats.grok.se (which was still operational in the
same month), the page received only 1209 pageviews:
<http://stats.grok.se/en/201510/->.
Looking at the tabulation of pageviews on Wikipedia Views, the increase
in pageviews for this page coincides with the change to the Pageview
API in July 2015:
<
http://wikipediaviews.org/displayviewsformultiplemonths.php?page=-&allmonth…
>.
As I understand, page titles must be URL-encoded before the query,
but the URL-encoding of "-" is itself.
I looked at the API documentation but did not see this behavior listed,
so I am wondering where these numbers are coming from.
Best regards,
Issa
Roland,
We have no internal information on this regard. I think that there is work
that has been done on this topic by WMF researchers (Wikipedia's traffic
referrals from google) and you are likely to get better pointers asking on
list that pinging people individually. Please be so kind as to post your
question to analytics@ public e-mail list (analytics(a)lists.wikimedia.org).
Please have in mind that questions such as this one might not have easy and
fast answers so you might get information that is technical in nature.
Thanks,
Nuria
On Mon, Nov 28, 2016 at 6:41 AM, Erik Zachte <erikzachte(a)infodisiac.com>
wrote:
> Hey Nuria,
>
>
>
> Roland Eisenbrand (see below for introduction) would like to know more
> about how Wikipedias traffic from Google changed (or not) in recent years
> as a result on new Google services.
>
> Can you please answer or relay?
>
>
>
> Thanks!
>
>
>
> Erik
>
>
>
> *From:* Roland Eisenbrand [mailto:re@onlinemarketingrockstars.de]
> *Sent:* Monday, November 28, 2016 15:33
> *To:* Erik Zachte
> *Subject:* Re: Wikimedia Stats
>
>
>
> Hello Erik,
>
>
>
> thank you very much for the reply! I’m very interested in how Wikipedias
> Google traffic developed. There have been some reports in 2015, that Google
> is bringing less users to Wikipedia. Some people attributed this
> development to things like Google Instant Answers and Google Knowledge
> Graph. I haven’t yet found a statement from Wikipedia if all of this is
> true or not, so I would love to speak someone from you guys.
>
>
>
> Best,
>
>
>
> Roland
>
>
>
> Am 28.11.2016 um 15:30 schrieb Erik Zachte <erikzachte(a)infodisiac.com>:
>
>
>
> Hello Roland,
>
>
>
> Sorry for delay, I was away for a few days.
>
> Can you tell me a bit about what kind of questions you have?
>
> Depending on that I might relay your question to Nuria Ruiz Head of WMF
> Analytics Team.
>
>
>
> I am still involved on some traffic reports, but mostly on traffic by
> region, and traffic by wiki.
>
> Nuria and her colleagues do supply those and other data and may have a
> broader understanding of recent trends in general.
>
>
>
> Best regards,
>
> Erik Zachte
>
>
>
>
>
>
>
>
>
> *From:* Roland Eisenbrand [mailto:re@onlinemarketingrockstars.de
> <re(a)onlinemarketingrockstars.de>]
> *Sent:* Friday, November 25, 2016 9:46
> *To:* ericzachte(a)infodisiac.com
> *Subject:* Wikimedia Stats
>
>
>
> Hello Eric,
>
>
>
> I’m a german journalist writing for a large German Online Marketing Blog.
> I was wondering if you were willing to talk to me on the phone or via Skype
> about the development of Wikipedias Traffic in the last two years?
>
>
>
> Best,
>
>
>
> Roland
>
> ________________
>
> Roland Eisenbrand
>
> Head of Content
>
>
>
>
> <image001.png>
>
>
>
>
> Ramp 106 GmbH
>
> Lagerstraße 36
>
> 20357 Hamburg
>
>
>
> re(a)onlinemarketingrockstars.de
>
> +49 40 20 93 10 869
>
> www.xing.to/eisenbrand
>
> www.omr.io
>
>
>
> Geschäftsführer: Christian Müller l Tobias Schlottke l Philipp Westermeyer
>
> Registergericht Hamburg, Nr: HRB113109
>
>
>
>
> <image002.png>
>
>
>
>
>
> ________________
>
> Roland Eisenbrand
>
> Head of Content
>
>
>
>
>
>
>
> Ramp 106 GmbH
>
> Lagerstraße 36
>
> 20357 Hamburg
>
>
>
> re(a)onlinemarketingrockstars.de
>
> +49 40 20 93 10 869
>
> www.xing.to/eisenbrand
>
> www.omr.io
>
>
>
> Geschäftsführer: Christian Müller l Tobias Schlottke l Philipp Westermeyer
>
> Registergericht Hamburg, Nr: HRB113109
>
>
>
>
>
>
Please comment on whether to approve the "Conflict of interest" section
of the draft Code of conduct for technical spaces. This section did not
reach not reach consensus earlier, but changes were made seeking to
address the previous concerns.
You can find the section at
https://www.mediawiki.org/w/index.php?title=Code_of_Conduct/Draft&oldid=229…
You can comment at
https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Finalize_new_vers…
.
A position and brief comment is fine.
You can also send private feedback to conduct-discussion(a)wikimedia.org .
I really appreciate your participation.
Thanks again,
Matt Flaschen
Hi all,
Has anyone tried to find the frequency of acronyms used in AfD queues? Any
information about the deletion queue in language is welcome, thanks.
This came up during a discussion about "enyclopedia worthiness" and how to
explain this concept to newbies.
Jane
OCG contains a "plaintext" backend which generates quite nice plain-text
versions of WP articles. Try clicking "create a book" in the enwiki
sidebar, "start book creator", go to some article, click "add this page to
your book" in the header then "show book", then change the format in the
drop down to "Word processor (plain text)" and click "download".
You can also take the "download as PDF" link, something like
https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=render_arti…
and replace the 'writer=rdf2latex' part at the end with 'writer=rdf2text',
like:
https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=render_arti…
These tools can be used from the command-line, as described at
https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContent…
I hope that helps!
--scott
On Fri, Nov 18, 2016 at 3:15 AM, Reem Al-Kashif <reemalkashif(a)gmail.com>
wrote:
> Hi Scott,
>
> Thank you so much for your reply and offer to help with Parsoid. I used
> DizzyLogic as an easy parser to get Wikipedia articles' content stripped
> off the wiki markup. The results were in plain text files. I used it to
> parse the whole English and Arabic Wikipedia dumps back in January. It was
> easy to use because my coding knowledge is limited.
> I read the link you kindly provided about Parsoid and I think it can help
> me with parsing. However, I'm not sure how to start on testing this.
>
> Thank you :)
>
> Best,
> Reem
>
> On 11 November 2016 at 19:55, C. Scott Ananian <cananian(a)wikimedia.org>
> wrote:
>
>> It was removed from that article recently (19 Oct 2016:
>> https://www.mediawiki.org/w/index.php?title=Alternativ
>> e_parsers&type=revision&diff=2265815&oldid=2247632) with the following
>> comment:
>>
>> "That link has been dead for over a year now as per this stackoverflow
>> comment: http://stackoverflow.com/questions/13546254/whats-a-fast-
>> way-to-parse-a-wikipedia-xml-dump-for-article-content-and-populate"
>>
>> If you'd like to explain what you would have used DizzyLogic for, I'd
>> love to help you figure out how to use Parsoid to accomplish your goals.
>> It's an officially-supported WMF parser which has much better correctness
>> that any 'alternative' parser out there, implements a friendly API similar
>> to mwparserfromhell (see https://doc.wikimedia.org
>> /Parsoid/master/#!/guide/jsapi), and has a well-documented AST (
>> https://www.mediawiki.org/wiki/Specs/HTML/1.2.1) which can be directly
>> fetched via the REST api (cf https://en.wikipedia.org/api/ ). I believe
>> dumps have also been planned, but I'm not sure what the current status is.
>> --scott
>>
>>
>> On Fri, Nov 11, 2016 at 7:57 AM, Reem Al-Kashif <reemalkashif(a)gmail.com>
>> wrote:
>>
>>> Hi Pine,
>>>
>>> Thank you for your reply. It is an alternative parser. I believe I first
>>> saw on MediaWiki (here
>>> <http://t.sidekickopen68.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs7…>
>>> ).
>>>
>>> Best,
>>> Reem
>>>
>>> On 11 November 2016 at 09:47, Pine W <wiki.pine(a)gmail.com> wrote:
>>>
>>>> Was this something on Labs? If so, it might have been purged during one
>>>> of the Labs cleanups.
>>>>
>>>> Pine
>>>>
>>>>
>>>> On Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif <reemalkashif(a)gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm just wondering if anybody knows what happened to DizzyLogic wiki
>>>>> parser? The website and program vanished. I used it in January 2016 so I
>>>>> know it was there at this time.
>>>>>
>>>>> Best,
>>>>> Reem
>>>>>
>>>>> --
>>>>>
>>>>> *Kind regards,Reem Al-Kashif*
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics(a)lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Kind regards,Reem Al-Kashif*
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> (http://cscott.net)
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
>
> *Kind regards,Reem Al-Kashif*
>
--
(http://cscott.net)