Wikitech-l March 2016

wikitech-l@lists.wikimedia.org

109 participants
98 discussions

ArchCom RFC update
by Gabriel Wicke 24 Mar '16

24 Mar '16

Hi, please have a look at this week's summary of new and ongoing RFC discussions. There are several new RFCs, and some existing ones are moving close to a decision. No RFCs were decided finally this week. Because of the parallel Hackathon, no IRC discussion is scheduled for next week. Gabriel New RFCs: T130663 WIP RFC: Reference API requirements and options <https://phabricator.wikimedia.org/T130663> (Timo): API access and component markup for references; focus on gathering use cases / requirements. T122942 RFC: Support language variants in the REST API <https://phabricator.wikimedia.org/T122942> (Gabriel): Different options for supporting languange variant selection in the REST API. Needed for languages like Chinese. T122825 Service Ownership and Maintenance <https://phabricator.wikimedia.org/T122825> (Gabriel): Ownership and minimum maintenance requirements for production services. Strongly driven by unclear ownership of OCG (PDF renderer). T39902 RFC: Implement rendering of redlinks (in a post-processor?) <https://phabricator.wikimedia.org/T39902> (Gabriel): Solutions for highlighting links to non-existing pages in Parsoid HTML. Main question is preprocessing vs. separate metadata processed on client. T130528 RFC: PSR-6 Cache interface in Mediawiki core <https://phabricator.wikimedia.org/T130528> (No shepherd): Exploring use of standard PHP cache interface. Today's IRC session: T124792 Service Locator for MediaWiki core <https://phabricator.wikimedia.org/T124792> (Daniel): Introduce a service locator (aka DI container) to allow code in mediaWiki core to make use of the Dependency Injection (DI) and Service Locator (SL) patterns. The discussion showed general support. Several participants expressed a desire to write more code with it before making a final call. Concrete suggestions on areas would be welcome. Tentative working group forming, aiming to discuss at Jerusalem Hackathon. Under discussion: T129435 RFC: drop support for running without mbstring <https://phabricator.wikimedia.org/T129435> (Gabriel): Very focused RFC by Max. Main question in discussion so far is whether polyfilling is worth it. Max reaching out to mediawiki-l. T108655 Standardise on how to access/register JavaScript interfaces <https://phabricator.wikimedia.org/T108655> (Roan): No update since last week, I need to split this task but I haven’t had time to yet. Last week’s update: Considering to split out contentious part (file-based require, or something like it; to support embedding libraries), move forward on less controversial part (basic module-name-based require infrastructure) T18691 RFC: Section headings should have a clickable anchor <https://phabricator.wikimedia.org/T18691> (Timo): Under discussion with Volker and Frontend Standards Group. Volker and team to collect different benefits and concerns to determine whether this is generally a desirable feature. And to explore other conceptually different solutions to the underlying use case of “sharing a link to a section” (e.g. a better table of contents, or live address bar). T124504 Transition WikiDev '16 working areas into working groups <https://phabricator.wikimedia.org/T124504> (RobLa): No concrete progress; MZMcBride advocates for organic growth. T128351 RfC: Notifications in core <https://phabricator.wikimedia.org/T128351> (Brion): No movement last week. Needs clarification of interfaces & scope as follow-up from IRC meeting. T66214 Use content hash based image / thumb URLs & define an official thumb API <https://phabricator.wikimedia.org/T66214> (Brion): Clarified requirements & priorities in last week's IRC discussion. Needs update to reflect discussion. T118517 [RFC] Use <figure> for media <https://phabricator.wikimedia.org/T118517> (Brion): Revisit soon. T88596 Improving extension management <https://phabricator.wikimedia.org/T88596> (Daniel): Discussion is picking up again, patch for review. T113034 RFC: Overhaul Interwiki map, unify with Sites and WikiMap <https://phabricator.wikimedia.org/T113034> (Daniel): Has been discussed before, needs somebody to actually take this on. T114444 [RFC] Introduce notion of DOM scopes in wikitext <https://phabricator.wikimedia.org/T114444> (Tim): Active related discussion and prototyping at Balanced templates <https://phabricator.wikimedia.org/T114445> and Hygienic transclusions for WYSIWYG, incremental parsing & composition: Options and trade-offs <https://phabricator.wikimedia.org/T130567>.

2 1

Tech Talk: New Readership Data: March 18th
by Rachel Farrand 24 Mar '16

24 Mar '16

Please join for the following tech talk: *Tech Talk**:* New readership data: Some things we've been learning recently about how Wikipedia is read *Presenter:* Tilman Bayer *Date:* March 18th, 2016 *Time: *18:00 UTC <http://www.timeanddate.com/worldclock/fixedtime.html?msg=Tech+Talk%3A+New+r…> Link to live YouTube stream <http://www.youtube.com/watch?v=Qo4XIzCJZVs> *IRC channel for questions/discussion:* #wikimedia-office *Summary: *This talk will highlight various recent insights and new sources of data on how readers read Wikipedia, going beyond the familiar pageview numbers (that tell us which topics are popular and how overall traffic is developing, but not e.g. which parts of articles are being read). While we are still only beginning to understand some of these aspects, we now know more than a year or two ago. The presentation is centered around data analysis done by the Reading team, but will also include findings by other WMF teams and by external researchers.

2 2

Review Service Locator / DI Container proposal (upcoming RFC discussion on March 23)
by Daniel Kinzler 24 Mar '16

24 Mar '16

Hi all! Over the last couple of months, I have worked on introducing a dependency injection mechanism into MediaWiki core (don't fear, no auto-wiring). My proposal is described in detail at <https://phabricator.wikimedia.org/T124792> (yea, TL;DR - just read the top and search the rest if you have a question). Before we discuss this again on IRC at the RFC meeting on Wednesday (March 23, 2pm PST / 22:00 CEST due to daylight confusion), I would like to invite you to review the proposal as well as the patches that are up on gerrit. In particular, any feedback would be appreciated on: * Introduce top level service locator <https://gerrit.wikimedia.org/r/#/c/264403/29>. * Allow reset of global services <https://gerrit.wikimedia.org/r/#/c/270020/> * WIP: Make storage layer services injectable. <https://gerrit.wikimedia.org/r/#/c/267692/> Perhaps also have a look at the documentation included in the change, in particular the migration part: <https://gerrit.wikimedia.org/r/#/c/264403/29/docs/injection.txt> Before commenting on design choices on gerrit, please have a look at T124792 and see whether I have written something about the issue in question there. I would like to focus conceptual discussion on the RFC ticket on phabricator, rather than on gerrit. On gerrit, we can talk about the implementation. I very much want this to move forward. Perhaps we can even get the first bits of this merged at the hackathon. So, criticize away! Thanks for your help! -- daniel PS: phabricator event page (still blank, we'll fix that soon): <https://phabricator.wikimedia.org/E66/27>

1 2

Google Summer of Code 2016 portal now accepting applications
by Tony Thomas 24 Mar '16

24 Mar '16

Hello all, Google announced the start of accepting proposals for GSoC 2016 few hours ago. Interested and eligible candidates should submit their proposals at http://g.co/gsoc before the deadline of Friday, March 25 at 19:00 UTC. Wikimedia evaluates your proposal Phabricator task, but it is required that you have a copy of the same in the GSoC portal too, to make sure it gets a slot ( if eligible ). By March 25th, every possible application should have *2 mentors* connected with it, and should have a proposal copy in Phabricator, as well as the GSoC portal. Please make sure you mention the phab task details in your proposal, for convenience. If you are planning to apply, you should be looking at Life_of_a_successful_project#Coming_up_with_a_proposal <https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_proje…> As of today, we have *8* projects featured for this round ( strong idea + 2 mentors connected ), and *13* projects missing one among the two mentors. Interested in mentoring ? see https://phabricator.wikimedia.org/tag/possible-tech-projects/ and add yourself as one. The Outreachy round May - August 2016 is open, with a deadline of *March 22, 2016 *and eligible applicants are advised to apply for *both* GSoC and Outreachy, so that the project can still make it, in case we are missing a slot with a strong applicant. Thinking of motivating someone in your locality to take part in ? Find flyers and presentations here <https://developers.google.com/open-source/gsoc/resources/media#logos_and_ar…> for GSoC 2016 round! Thanks, Tony Thomas <https://www.mediawiki.org/wiki/User:01tonythomas> Home <http://www.thomastony.me> | Blog <http://blog.thomastony.me> | ThinkFOSS <http://www.thinkfoss.com>

2 3

Re: [Wikitech-l] [Analytics] [Data Release] [Data Deprecation] [Analytics Dumps]
by Dan Andreescu 23 Mar '16

23 Mar '16

On Wed, Mar 23, 2016 at 1:06 PM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote: > Dan Andreescu, 23/03/2016 15:58: > >> >> *Clean-up:* Analytics data on dumps was crammed into /other with >> unrelated datasets. We made a new page to receive current and future >> datasets [3] and linked to it from /other and /. Please let us know if >> anything there looks confusing or opaque and I'll be happy to clarify. >> > > I assume the old URLs will redirect to the new ones, right? > Good question, we didn't change any old URLs actually, so if you're trying to get to other/pagecounts-ez, other/pagecounts-raw and all that, they're all still there, just linked-to from /analytics. We did it this way because we figured people had scripts that depended on those URLs. We thought about moving and symlinking but it's probably unlikely that we'll ever be able to delete the other/** location. So mainly we just have a new page where we can do a better job of focusing on the analytics datasets.

1 0

Re: [Wikitech-l] [Data Release] [Data Deprecation] [Analytics Dumps]
by Dan Andreescu 23 Mar '16

23 Mar '16

cc-ing our friends in research and wikitech (sorry I forgot initially) We're happy to announce a few improvements to Analytics data releases on > dumps.wikimedia.org: > > * We are releasing a new dataset, an estimate of Unique Devices accessing > our projects [1] > * We are officially making available a better Pageviews dataset [2] > * We are deprecating two older pageview statistics datasets > * We moved Analytics data from /other to /analytics [3] > > Details follow: > > > *Unique Devices:* Since 2009, the Wikimedia Foundation used comScore to > report data about unique web visitors. In January 2016, however, we > decided to stop reporting comScore numbers [4] because of certain > limitations in the methodology, these limitations translated into > misreported mobile usage. We are now ready to replace comscore numbers with > the Unique Devices Dataset [5][1]. While unique devices does not equal > unique visitors, it is a good proxy for that metric, meaning that a major > increase in the number of unique devices is likely to come from an increase > in distinct users. We understand that counting uniques raises fairly big > privacy concerns and we use a very private conscious way to count unique > devices, it does not include any cookie by which your browser history can > be tracked [6]. > > We invite you to explore this new dataset and hope it’s helpful for the > Wikimedia community in better understanding our projects. This data can > help measurethe reach of wikimedia projects on the web. > > *Pageviews:* This [2] is the best quality data available for counting the > number of pageviews our projects receive at the article and project level. > We've upgraded from pagecounts-raw to pagecounts-all-sites, and now to > pageviews, in order to filter out more spider traffic and measure something > closer to what we think is a real user viewing content. A short history > might be useful: > > * pagecounts-raw: was maintained by Domas Mituzas originally and taken > over by the analytics team. It was and still is the most used dataset, > though it has some majore problems. It does not count access to the mobile > site, it does not filter out spider or bot traffic, and it suffers from > unknown loss due to logging infrastructure limitations. > * pagecounts-all-sites: uses the same pageview definition as > pagecounts-raw, and so also does not filter out spider or bot traffic. But > it does include access to mobile and zero sites, and is built on a more > reliable logging infrastructure. > * pagecounts-ez: is derived from the best data available at the time. > So until December 2015, it was based on pagecounts-raw and > pagecounts-all-sites, and now it's based on pageviews. This dataset is > great because it compresses very large files without losing any > information, still providing hourly page and project level statistics. > > So the new dataset, pageviews, is what's behind our pageview API and is > now available in static files for bulk download back to May 2015. But the > multiple ways to download pageview data is confusing for consumers, so > we're keeping only pageviews and pagecounts-ez and deprecating the other > two. If you'd like to read more about the current pageview definition, > details are on the research page [7]. > > *Deprecating:* We are deprecating the pagecounts-raw and > pagecounts-all-sites datasets in May 2016 (discussion here: > https://phabricator.wikimedia.org/T130656 ). This data suffers from many > artifacts, lack of mobile data, and/or infrastructure problems, and so is > not comparable to the new way we track pageviews. It will remain here > because we have historical data that may be useful, but it will not be > maintained or updated beyond May 2016. > > *Clean-up:* Analytics data on dumps was crammed into /other with > unrelated datasets. We made a new page to receive current and future > datasets [3] and linked to it from /other and /. Please let us know if > anything there looks confusing or opaque and I'll be happy to clarify. > > > [1] http://dumps.wikimedia.org/other/unique_devices > [2] http://dumps.wikimedia.org/other/pageviews > [3] http://dumps.wikimedia.org/analytics/ > [4] https://meta.wikimedia.org/wiki/ComScore/Announcement > [5] https://meta.wikimedia.org/wiki/Research:Unique_Devices > [6] > https://meta.wikimedia.org/wiki/Research:Unique_Devices#How_do_we_count_uni… > [7] https://meta.wikimedia.org/wiki/Research:Page_view >

1 0

Tech Talk: Reflections on WMF: Community Dynamics: March 22
by Rachel Farrand 22 Mar '16

22 Mar '16

Please join for the following tech talk: *Tech Talk**:* Reflections on WMF: Community Dynamics *Presenter:* Chris Keating (User: The Land) *Date:* March 22nd, 2016 *Time: *18:30 UTC <http://www.timeanddate.com/worldclock/fixedtime.html?msg=Tech+Talk%3A+Commu…> *Length: *1 hour, 30 min talk followed by 30 min discussion/questions Link to live YouTube stream <http://www.youtube.com/watch?v=fQuCE0gJrXw> *IRC channel for questions/discussion:* #wikimedia-office *Summary: *Why is it that no matter what happens, people are always sending shouty messages on email lists? Why is nothing ever good enough for some people? How can we work in a way that maximises community happiness and buy-in? https://meta.wikimedia.org/wiki/User:The_Land/Why_do_They_always_do_It_wrong

1 1

Small Amendment to User-Agent Policy
by Marcel Ruiz Forns 22 Mar '16

22 Mar '16

Hi wikitech-l, After the discussion in analytics-l [1][2] and Phabricator [3], the Analytics team added a small amendment [4] to Wikimedia's user-agent policy [5] with the intention of improving the quality of WMF's pageview statistics. The amendment asks Wikimedia bot/framework maintainers to optionally add the word *bot* (case insensitive) to their user-agents. With that, the analytical jobs that process request data into pageview statistics will be capable of better identifying traffic generated by bots, and thus of better isolating traffic originated by humans (corresponding code is already in production [6]). The convention is optional, because modifications to the user-agent can be a breaking change. Targets of this convention are: bots/frameworks that can generate Wikimedia pageviews [7] to Wikimedia sites and/or API and are not for in-situ human consumption. Not targets: bots/frameworks used to assist in-situ human consumption, and bots/frameworks that are otherwise well known and recognizable like WordPress, Scrapy, etc. Note that there are many editing bots that also generate pageviews, like when trying to copy content from one page to another the source page is requested and the corresponding pageview is generated. Cheers! [1] https://lists.wikimedia.org/pipermail/analytics/2016-January/004858.html [2] https://lists.wikimedia.org/pipermail/analytics/2016-February/004882.html [3] https://phabricator.wikimedia.org/T108599 [4] https://meta.wikimedia.org/w/index.php?title=User-Agent_policy&type=revisio… [5] https://meta.wikimedia.org/wiki/User-Agent_policy [6] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery… [7] https://meta.wikimedia.org/wiki/Research:Page_view -- *Marcel Ruiz Forns* Analytics Developer Wikimedia Foundation

2 4

Discovery Weekly Update for the week starting 2016-03-14
by Chris Koerner 21 Mar '16

21 Mar '16

Hello, Here is the Discovery department's weekly status update. * The completion suggester left beta and is now the default search-as-you-type for all wikis (except Wikidata). ** http://blog.wikimedia.org/2016/03/17/completion-suggester-find-what-you-nee… * Last week we enabled Kartographer extension for Wikivoyage sites, allowing users to add maps to wiki pages without any additional wmf labs and JavaScript tricks. ** A demo of Kartographer and VisualEditor integration can be found here: http://vem3.wmflabs.org/wiki/Main_Page This is our second week summarizing our work in this way and our first week sharing it with wikitech-l. Feedback and suggestions are welcome. Read the full update at the following link. https://www.mediawiki.org/wiki/Discovery/Status_updates/2016_03_18 -- Yours, Chris Koerner Community Liaison - Discovery Wikimedia Foundation

1 0

Re: [Wikitech-l] Reply: [GSoC 2016] Query about the ideas' project time
by James Salsman 21 Mar '16

21 Mar '16

> > > Hi Linxuan, Thank you for your question: >... What does the "reputation score" in the description refer to? I've asked Priyanka to reply with her current design, but here is some of the advice I gave her: "Each reviewer needs, at a minimum, data indicating the number and proportion of reviewers who have agreed with them. However, the third level of tie-breaking review introduces an extra bit for each disagreement which determines whether agreement or disagreement should be counted in their favor. So, even if a given reviewer only agrees with 50% of the other reviewers, the determination of the tie breaker in each case of disagreement controls whether their reputation score ranges from 0% to 100%. (As too does the agreement proportion, which is unlikely to be exactly 50%.) "Do you want the reviewers to know their agreement ratios and reputation scores? How might their behavior change if they are and aren't told those? Could there ever be a case when you might want to withhold them? Would there ever be a benefit from distorting them? How about displaying them as a range instead of distorting or withholding them? That last possibility seems superior to me. You might want to do that when you are unsure that the precision of the mathematical values is near the accuracy of the knowledge they represent. Do you want to be able to tell each reviewer the responses which have contributed to defects in their reputation scores, i.e., do you want them to know which disagreements were tie-broken against their favor?" Her reply at the time was: "In case of two reviewers agreeing, we add a +1 to the reputation. In case of disagreement, we seek the opinion of a 3rd reviewer. If A says Yes, B says No and C says Yes to an edit, A and C will have an agreement ratio of 50% and reputation of 100%, whereas B will have an agreement ratio of 0 and reputation 0%? This would of course change as more edits are reviewed by them." I believe that is still an accurate description of the current design. Finally, I regret that the GSoC program doesn't allow more than one student per Best regards, Jim Salsman

1 0

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2016