Cross posting. Please reply directly to Dan or use the wikitech-l list if
responding.
-Adam
---------- Forwarded message ----------
From: Dan Garry <dgarry(a)wikimedia.org>
Date: Mon, Jun 8, 2015 at 3:54 PM
Subject: [Wikitech-l] Feedback requested on our search APIs
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Do you use our search API? If so, I'd like to hear from you!
The Discovery Department
<https://wikimediafoundation.org/wiki/Staff_and_contractors#Discovery> at
the Wikimedia Foundation is tasked with building a path of discovery to
relevant and trusted knowledge. In line with that, one of our primary
responsibilities is to ensure that our search APIs are stable, fast, and
easy to use. We'd love to hear from the people that are using our APIs, so
we can learn what you love about them, what frustrates you, and what we can
do to improve them for you.
I'd prefer that you keep the comments about the API itself rather than the
relevance of the results it returns; I plan to start a separate thread
about the result relevance, since they're separate topics.
If you have some feedback, please reply in this thread or reach out to me
privately.
Thanks!
Dan
--
Dan Garry
Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Mavericks has something similar:
I wonder what will happen if one of our breaking api changes fucks one of
these queries... They are probably running a proxy cached service to avoid
any of those problems.
On Fri, Jun 5, 2015 at 7:48 PM, Corey Floyd <cfloyd(a)wikimedia.org> wrote:
> @Jon you also 3-finger click to get that popover - or if you have a fancy
> new force touch track pad you can just push really hard.
>
> On Fri, Jun 5, 2015 at 1:34 PM, Brian Gerstle <bgerstle(a)wikimedia.org>
> wrote:
>
>> Our users also *really* want popovers (have a 1-star review on our
>> current version in the US app store complaining we don't have link preview
>> yet).
>>
>> On Fri, Jun 5, 2015 at 1:32 PM, Jon Katz <jkatz(a)wikimedia.org> wrote:
>>
>>> I love this feature and it has changed how I read. Do we know of any
>>> browser extensions that do same? Yosemite has a native spotlight built-in
>>> that works in any browser (I'm using chrome), but it is hard to discover
>>> (command-ctrl-d).
>>>
>>> Meta screenshot:
>>> [image: Inline image 2]
>>>
>>>
>>>
>>> On Fri, Jun 5, 2015 at 9:49 AM, Luis Villa <lvilla(a)wikimedia.org> wrote:
>>>
>>>> FWIW, they are also doing basically the same thing in the e-ink
>>>> hardware Kindles.
>>>>
>>>> On Fri, Jun 5, 2015 at 8:25 AM, Dmitry Brant <dbrant(a)wikimedia.org>
>>>> wrote:
>>>>
>>>>> +mobile-l
>>>>>
>>>>>
>>>>> On Fri, Jun 5, 2015 at 11:23 AM, Adam Baso <abaso(a)wikimedia.org>
>>>>> wrote:
>>>>>
>>>>>> Okay to move this to mobile-l?
>>>>>>
>>>>>>
>>>>>> On Friday, June 5, 2015, Brian Gerstle <bgerstle(a)wikimedia.org>
>>>>>> wrote:
>>>>>>
>>>>>>> While they strip out links/citations, they do preserve text
>>>>>>> formatting (italics & bold).
>>>>>>>
>>>>>>> On Fri, Jun 5, 2015 at 10:39 AM, Bernd Sitzmann <bernd(a)wikimedia.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Nice find. I also like being able to swipe those cards left/right
>>>>>>>> between different information sources. Looks like depending on the selected
>>>>>>>> words it's: Dictionary, Wikipedia, Translation
>>>>>>>>
>>>>>>>> On Thu, Jun 4, 2015 at 10:45 PM, Dmitry Brant <dbrant(a)wikimedia.org
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> I was using the Kindle app on the plane today, and I noticed a few
>>>>>>>>> interesting things, including this:
>>>>>>>>> ​
>>>>>>>>> device-2015-06-04-225651.png
>>>>>>>>> <https://docs.google.com/a/wikimedia.org/file/d/0BzcksMsMNpY1SzA3bHY4WF9hM1U…>
>>>>>>>>> ​
>>>>>>>>> When highlighting a word or phrase, the user is presented with a
>>>>>>>>> definition of the word from Wikipedia. The content is presented in a native
>>>>>>>>> component, with only the first section of text shown (all links,
>>>>>>>>> references, infoboxes, etc. are stripped out). (I wonder what API they're
>>>>>>>>> using?)
>>>>>>>>>
>>>>>>>>> It looks very similar to the link preview prototypes we've been
>>>>>>>>> developing in our apps, and it's very telling that the Kindle app has such
>>>>>>>>> a feature, since it helps emphasize the usefulness of this feature in any
>>>>>>>>> kind of "reader" app. Perhaps, in addition to link previews, we may also
>>>>>>>>> want to think about allowing users to highlight words and show definitions
>>>>>>>>> (from Wiktionary?), pronunciations, translations, etc...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> p.s. I was able to get the Kindle app to crash by clicking a link
>>>>>>>>> inside one of the Wikipedia "previews" that wasn't stripped out correctly.
>>>>>>>>> In other words, no app is safe from the edge cases of wikitext!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> reading-wmf mailing list
>>>>>>>>> reading-wmf(a)lists.wikimedia.org
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> reading-wmf mailing list
>>>>>>>> reading-wmf(a)lists.wikimedia.org
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> EN Wikipedia user page:
>>>>>>> https://en.wikipedia.org/wiki/User:Brian.gerstle
>>>>>>> IRC: bgerstle
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> reading-wmf mailing list
>>>>>> reading-wmf(a)lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mobile-l mailing list
>>>>> Mobile-l(a)lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Luis Villa
>>>> Sr. Director of Community Engagement
>>>> Wikimedia Foundation
>>>> *Working towards a world in which every single human being can freely
>>>> share in the sum of all knowledge.*
>>>>
>>>> _______________________________________________
>>>> reading-wmf mailing list
>>>> reading-wmf(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>
>>>>
>>>
>>> _______________________________________________
>>> reading-wmf mailing list
>>> reading-wmf(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>
>>>
>>
>>
>> --
>> EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
>> IRC: bgerstle
>>
>> _______________________________________________
>> reading-wmf mailing list
>> reading-wmf(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>
>>
>
>
> --
> Corey Floyd
> Software Engineer
> Mobile Apps / iOS
> Wikimedia Foundation
>
> _______________________________________________
> reading-wmf mailing list
> reading-wmf(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>
>
Hi team -
There was a question yesterday if we had caching headers in the APIs
consumed by the apps.
I did a quick cURL on a couple URLs for Android and iOS and it seems like
maybe requests are served cold from the Varnish cache (as with many API
responses), even if there may be some memcached/Redis hot storage at the
web server for the underlying content.
Does anyone know if there's a safe approach to implement caching headers
(e.g., Cache-Control, Expires, ETag, etc.) to optimize for hot cache hits
for the API endpoints used by the apps and experimental webapps? In
particular, would we benefit from doing so on action=mobileview? Is there a
way to canonicalize the request parameters to increase the odds of a cache
hit regardless of whether iOS or Android made the first hit (either
client-side or perhaps at the edge with scare VCL)? What about for other
API endpoints?
-Adam
$ curl -s -D - "
http://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&page=Fish…"
-o /dev/null | grep -v GeoIP
HTTP/1.1 200 OK
Server: Apache
X-Powered-By: HHVM/3.6.1
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
Content-Type: application/json; charset=utf-8
X-Varnish: 2886917462, 1761146385
Via: 1.1 varnish, 1.1 varnish
Transfer-Encoding: chunked
Date: Wed, 03 Jun 2015 13:20:08 GMT
Age: 0
Connection: keep-alive
X-Cache: cp1046 miss (0), cp1046 frontend miss (0)
Set-Cookie: WMF-Last-Access=03-Jun-2015;Path=/;HttpOnly;Expires=Sun, 05 Jul
2015 12:00:00 GMT
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
$ curl -s -D - "
https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&noheadin…"
-o /dev/null | grep -v GeoIP
HTTP/1.1 200 OK
Server: nginx/1.6.2
Date: Wed, 03 Jun 2015 13:22:23 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: HHVM/3.6.1
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
X-Varnish: 2583358918, 1834266080
Via: 1.1 varnish, 1.1 varnish
Age: 0
X-Cache: cp1047 miss (0), cp1060 frontend miss (0)
X-Analytics: https=1
Set-Cookie: WMF-Last-Access=03-Jun-2015;Path=/;HttpOnly;Expires=Sun, 05 Jul
2015 12:00:00 GMT
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Migrating thread to mobile-l.
On Thu, Jun 4, 2015 at 11:23 AM, Tilman Bayer <tbayer(a)wikimedia.org> wrote:
> See https://phabricator.wikimedia.org/T66930 and the blocking tasks there
> for the previous conversation on Twitter cards.
>
> On Thu, Jun 4, 2015 at 7:52 AM, Brian Gerstle <bgerstle(a)wikimedia.org>
> wrote:
>
>> Indeed, you're right:
>>
>> <head>
>>
>> ...
>>
>>
>>
>> <meta name="twitter:card" content="summary">
>>> <meta name="twitter:site" content="@NSHipster">
>>> <meta name="twitter:creator" content="@mattt">
>>>
>>> ...
>>
>>
>> Seems a bit inefficient to denormalize page content into the <meta> tags:
>>
>> <meta name="description" content="Reflection in Swift is a limited
>>> affair, providing read-only access to a subset of type metadata. While far
>>> from the rich array of run-time hackery familiar to seasoned Objective-C
>>> developers, Swift's tools enable the immediate feedback and sense of
>>> exploration offered by Xcode Playgrounds. This week, we'll reflect on
>>> reflection in Swift, its mirror types, and `MirrorType`, the protocol that
>>> binds them together.">
>>
>>
>> and
>>
>> <meta name="twitter:description" content="Reflection in Swift is a
>>> limited affair, providing read-only access to a subset of type metadata.
>>> While far from the rich array of run-time hackery familiar to seasoned
>>> Objective-C developers, Swift's tools enable the immediate feedback and
>>> sense of exploration offered by Xcode Playgrounds. This week, we'll reflect
>>> on reflection in Swift, its mirror types, and `MirrorType`, the protocol
>>> that binds them together.">
>>
>>
>> However, does seem like more validation for the annotated HTML approach.
>>
>>
>> On Thu, Jun 4, 2015 at 10:47 AM, Adam Baso <abaso(a)wikimedia.org> wrote:
>>
>>> http://nshipster.com/mirrortype/ seems to be using Twitter cards. To a
>>> point in the brainstorming Etherpad and Corey's message I think it was
>>> yesterday, support for de facto meta data and deep linking would be rad
>>> (sounds like a good opportunity to have a consistent service for images,
>>> dare I say).
>>>
>>> -Adam
>>>
>>> On Thu, Jun 4, 2015 at 7:29 AM, Brian Gerstle <bgerstle(a)wikimedia.org>
>>> wrote:
>>>
>>>> Yay alliteration! Check out this informative use of HTML extraction in
>>>> the field:
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> Not sure what "conventions" there are when structuring HTML to make it
>>>> "scraper friendly," but Twitter seems to be grabbing & restyling the h1 & p
>>>> tags.
>>>>
>>>
(removing internal list, we don't need it)
On Fri, Jun 5, 2015 at 10:48 AM, Corey Floyd <cfloyd(a)wikimedia.org> wrote:
> @Jon you also 3-finger click to get that popover - or if you have a fancy
> new force touch track pad you can just push really hard.
>
> On Fri, Jun 5, 2015 at 1:34 PM, Brian Gerstle <bgerstle(a)wikimedia.org>
> wrote:
>
>> Our users also *really* want popovers (have a 1-star review on our
>> current version in the US app store complaining we don't have link preview
>> yet).
>>
>> On Fri, Jun 5, 2015 at 1:32 PM, Jon Katz <jkatz(a)wikimedia.org> wrote:
>>
>>> I love this feature and it has changed how I read. Do we know of any
>>> browser extensions that do same? Yosemite has a native spotlight built-in
>>> that works in any browser (I'm using chrome), but it is hard to discover
>>> (command-ctrl-d).
>>>
>>> Meta screenshot:
>>> [image: Inline image 2]
>>>
>>>
>>>
>>> On Fri, Jun 5, 2015 at 9:49 AM, Luis Villa <lvilla(a)wikimedia.org> wrote:
>>>
>>>> FWIW, they are also doing basically the same thing in the e-ink
>>>> hardware Kindles.
>>>>
>>>> On Fri, Jun 5, 2015 at 8:25 AM, Dmitry Brant <dbrant(a)wikimedia.org>
>>>> wrote:
>>>>
>>>>> +mobile-l
>>>>>
>>>>>
>>>>> On Fri, Jun 5, 2015 at 11:23 AM, Adam Baso <abaso(a)wikimedia.org>
>>>>> wrote:
>>>>>
>>>>>> Okay to move this to mobile-l?
>>>>>>
>>>>>>
>>>>>> On Friday, June 5, 2015, Brian Gerstle <bgerstle(a)wikimedia.org>
>>>>>> wrote:
>>>>>>
>>>>>>> While they strip out links/citations, they do preserve text
>>>>>>> formatting (italics & bold).
>>>>>>>
>>>>>>> On Fri, Jun 5, 2015 at 10:39 AM, Bernd Sitzmann <bernd(a)wikimedia.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Nice find. I also like being able to swipe those cards left/right
>>>>>>>> between different information sources. Looks like depending on the selected
>>>>>>>> words it's: Dictionary, Wikipedia, Translation
>>>>>>>>
>>>>>>>> On Thu, Jun 4, 2015 at 10:45 PM, Dmitry Brant <dbrant(a)wikimedia.org
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> I was using the Kindle app on the plane today, and I noticed a few
>>>>>>>>> interesting things, including this:
>>>>>>>>> ​
>>>>>>>>> device-2015-06-04-225651.png
>>>>>>>>> <https://docs.google.com/a/wikimedia.org/file/d/0BzcksMsMNpY1SzA3bHY4WF9hM1U…>
>>>>>>>>> ​
>>>>>>>>> When highlighting a word or phrase, the user is presented with a
>>>>>>>>> definition of the word from Wikipedia. The content is presented in a native
>>>>>>>>> component, with only the first section of text shown (all links,
>>>>>>>>> references, infoboxes, etc. are stripped out). (I wonder what API they're
>>>>>>>>> using?)
>>>>>>>>>
>>>>>>>>> It looks very similar to the link preview prototypes we've been
>>>>>>>>> developing in our apps, and it's very telling that the Kindle app has such
>>>>>>>>> a feature, since it helps emphasize the usefulness of this feature in any
>>>>>>>>> kind of "reader" app. Perhaps, in addition to link previews, we may also
>>>>>>>>> want to think about allowing users to highlight words and show definitions
>>>>>>>>> (from Wiktionary?), pronunciations, translations, etc...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> p.s. I was able to get the Kindle app to crash by clicking a link
>>>>>>>>> inside one of the Wikipedia "previews" that wasn't stripped out correctly.
>>>>>>>>> In other words, no app is safe from the edge cases of wikitext!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> reading-wmf mailing list
>>>>>>>>> reading-wmf(a)lists.wikimedia.org
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> reading-wmf mailing list
>>>>>>>> reading-wmf(a)lists.wikimedia.org
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> EN Wikipedia user page:
>>>>>>> https://en.wikipedia.org/wiki/User:Brian.gerstle
>>>>>>> IRC: bgerstle
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> reading-wmf mailing list
>>>>>> reading-wmf(a)lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mobile-l mailing list
>>>>> Mobile-l(a)lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Luis Villa
>>>> Sr. Director of Community Engagement
>>>> Wikimedia Foundation
>>>> *Working towards a world in which every single human being can freely
>>>> share in the sum of all knowledge.*
>>>>
>>>> _______________________________________________
>>>> reading-wmf mailing list
>>>> reading-wmf(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>
>>>>
>>>
>>> _______________________________________________
>>> reading-wmf mailing list
>>> reading-wmf(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>
>>>
>>
>>
>> --
>> EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
>> IRC: bgerstle
>>
>> _______________________________________________
>> reading-wmf mailing list
>> reading-wmf(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>
>>
>
>
> --
> Corey Floyd
> Software Engineer
> Mobile Apps / iOS
> Wikimedia Foundation
>
> _______________________________________________
> reading-wmf mailing list
> reading-wmf(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>
>
+mobile-l
On Fri, Jun 5, 2015 at 11:23 AM, Adam Baso <abaso(a)wikimedia.org> wrote:
> Okay to move this to mobile-l?
>
>
> On Friday, June 5, 2015, Brian Gerstle <bgerstle(a)wikimedia.org> wrote:
>
>> While they strip out links/citations, they do preserve text formatting
>> (italics & bold).
>>
>> On Fri, Jun 5, 2015 at 10:39 AM, Bernd Sitzmann <bernd(a)wikimedia.org>
>> wrote:
>>
>>> Nice find. I also like being able to swipe those cards left/right
>>> between different information sources. Looks like depending on the selected
>>> words it's: Dictionary, Wikipedia, Translation
>>>
>>> On Thu, Jun 4, 2015 at 10:45 PM, Dmitry Brant <dbrant(a)wikimedia.org>
>>> wrote:
>>>
>>>> I was using the Kindle app on the plane today, and I noticed a few
>>>> interesting things, including this:
>>>> ​
>>>> device-2015-06-04-225651.png
>>>> <https://docs.google.com/a/wikimedia.org/file/d/0BzcksMsMNpY1SzA3bHY4WF9hM1U…>
>>>> ​
>>>> When highlighting a word or phrase, the user is presented with a
>>>> definition of the word from Wikipedia. The content is presented in a native
>>>> component, with only the first section of text shown (all links,
>>>> references, infoboxes, etc. are stripped out). (I wonder what API they're
>>>> using?)
>>>>
>>>> It looks very similar to the link preview prototypes we've been
>>>> developing in our apps, and it's very telling that the Kindle app has such
>>>> a feature, since it helps emphasize the usefulness of this feature in any
>>>> kind of "reader" app. Perhaps, in addition to link previews, we may also
>>>> want to think about allowing users to highlight words and show definitions
>>>> (from Wiktionary?), pronunciations, translations, etc...
>>>>
>>>>
>>>> p.s. I was able to get the Kindle app to crash by clicking a link
>>>> inside one of the Wikipedia "previews" that wasn't stripped out correctly.
>>>> In other words, no app is safe from the edge cases of wikitext!
>>>>
>>>>
>>>> _______________________________________________
>>>> reading-wmf mailing list
>>>> reading-wmf(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>>
>>>>
>>>
>>> _______________________________________________
>>> reading-wmf mailing list
>>> reading-wmf(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>
>>>
>>
>>
>> --
>> EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
>> IRC: bgerstle
>>
>
> _______________________________________________
> reading-wmf mailing list
> reading-wmf(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>
>
Team,
Summary of message:
* Reading web engineers should take on Reading-oriented api.php work and,
if they like and as appropriate, Node.js work
* App engineers who will come forward to work on Reading-oriented api.php
or Node.js appreciated
* Reading Infra (Bryan/Brad/Gergo) mandatory for Q4 code review or
API/Core, Q1 please engage Reading Infra for up front consulting and code
review for API/Core
Now, the full message:
On another thread there was some discussion around transfer of API features
from Max to Reading web engineers. Concurrently, we've been cataloging pent
up demand for api.php and Node.js API/Services style work.
https://www.mediawiki.org/wiki/Wikimedia_Reading/Mobile_needs_from_MediaWik…
Bryan, Toby, and I met yesterday to discuss the engagement model for
Reading web/apps and Reading Infrastructure. The outcome of that discussion
was carried into the standing Reading API/Services tri-weekly meeting
afterward, where there was basic agreement on an approach going forward.
Last evening, I discussed this with Jon Katz and Kristen as well. And the
Reading web/app leads met this morning to socialize further. Here's where
the thinking is:
* Generally, Reading web engineers will need to take on Reading-oriented
api.php work or, if they think it the right fit for a given problem,
Node.js service work, or both. App engineers should come forward if they're
interested in fuller stack development - for example, Bernd is already
writing Node.js code and it sounds like Brian may be interested in pairing
to tackle Text Extracts api.php stuff related to Share a Fact.
* Q4: Reading Infrastructure (Bryan, Brad, Gergo) available for code review
of API and MediaWiki core-related patches. Otherwise, it is focused on a
core set of problems that are set for Q4 (also, Gergo is half time on web
engineering at the for this quarter, although he'll taper off).
* Q1 and beyond: Reading Infrastructure available to Reading engineers for
** Up front consulting on API and MediaWiki core-related work. It's
critical to avoid missteps up front.
** Code review of API and MediaWiki core-related patches
** Approximately one api.php task implementation per Reading Infrastructure
resource per quarter for the web/apps area. In the API/Services meeting we
talked about perhaps something like pageimages and createaccount related
Phabricator tasks for Q1 as good candidates. The purpose of this work is to
demonstrate good examples of how work can be implemented, incrementally
increasing velocity, and providing an opportunity for web/app engineers to
pair up with their Reading Infrastructure peers.
This model mirrors the Services (not part of Reading) team model for
consulting, code review, and software architecture scaffolding. As the
Services team focuses much of its time on broadly available software
architecture, so too does the Reading Infrastructure team plan to focus
most of its work.
Practically speaking, it's hard to think at all levels of the stack. But we
have an extremely talented team, and an engineer can deepen one's
experience tier by tier over time.
In Reading web engineering there's a level of familiarity and expertise
with PHP - and where there's room for improvement I'm confident
professional growth goals can make this a reality. As with HTML, CSS, and
JavaScript, in the Wikimedia development environment PHP - and the
MediaWiki-oriented way of coding it - is critical to delivering high
quality user experiences. We fully support and encourage pursuit of
training / dedicated study in this area.
With respect to api.php and extension code hooked into it: a really key
part of writing this sort of code is understanding the boilerplate and
scaffolding pieces. Bryan's team is able to point people to existing good
examples and a number of web engineers are comfortable in this area as well.
As for growing more application server (PHP) skills in the MediaWiki
environment outside of api.php specifically (i.e., Core and other thornier
pieces of the codebase), Sam has offered to pair with anyone who wants his
help.
Thinking ahead, there is the more systemic question of staffing mix
dedicated to application layer services in Reading. The current plan, which
I have to note could be subject to change (budgeting is in flight, and we
have backfill and new resource requests in there), is to acquire talent for
web engineering in application layer services for the Reading experience.
While it's good for all web engineers (and some app engineers) to be able
to code to api.php or Node.js, there's value in having someone really
focused on this middle tier / middleware stuff in the Reading web/app area,
too. Max Semenik is now focused upon geo/Search & Discovery, and he had
deep expertise in this area, and we should be thinking about staffing for
this sort of skill level.
In a related matter, I've heard a number of people talking about a more
convergence-based approach to software development in Reading. For example,
implement a feature in the API portion of the MobileFrontend extension,
then leverage it in the apps and desktop web. This is the place we should
be for some (maybe for new stuff, most?) set of features, and part of
getting to this place requires small steps:
* Picking a few simple problems to solve and working across web & apps (&
maybe infra)
* Cataloging features by channel in some sort of matrix, and delivering
intentionally
We now have a weekly Reading web/apps engineering leads meeting, where we
can make some incremental progress on the first part for picking a few
simple problems.
For the cataloging piece, I'm working to figure out an approach (ideally
with a SPOC). The more intentional we are about which features get rolled
out where and in which order, plus think carefully about convergence versus
doubled effort (and to be sure there will be cases for both), the greater
our success will be delivering the most beautiful and channel relevant
experiences for our readers.
-Adam
Cross posting. See follow up discussion on analytics list web archive.
---------- Forwarded message ----------
From: *Dan Andreescu* <dandreescu(a)wikimedia.org>
Date: Friday, June 5, 2015
Subject: [Analytics] Pageview API Status update
To: Analytics List <analytics(a)lists.wikimedia.org>
I just posted a comment on the famous task:
https://phabricator.wikimedia.org/T44259#1341010 :)
Here it is for those who would rather discuss on this list:
We have finished analyzing the intermediate hourly aggregate with all the
columns that we think are interesting. The data is too large to query and
anonymize in real time. We'd rather get an API out faster than deal with
that problem, so we decided to produce smaller "cubes" [1] of data for
specific purposes. We have two cubes in mind and I'll explain those here.
For each cube, we're aiming to have:
* Direct access to a postgresql database in labs with the data
* API access through RESTBase
* Mondrian / Saiku access in labs for dimensional analysis
* Data will be pre-aggregated so that any single data point has k-anonymity
(we have not determined a good k yet)
* Higher level aggregations will be pre-computed so they use all data
And, the cubes are:
**stats.grok.se Cube: basic pageview data**
Hourly resolution. Will serve the same purpose as stats.grok.se has served
for so many years. The dimensions available will be:
* project - 'Project name from requests host name'
* dialect - 'Dialect from requests path (not set if present in project
name)'
* page_title - 'Page Title from requests path and query'
* access_method - 'Method used to access the pages, can be desktop, mobile
web, or mobile app'
* is_zero - 'accessed through a zero provider'
* agent_type - 'Agent accessing the pages, can be spider or user'
* referer_class - 'Can be internal, external or unknown'
**Geo Cube: geo-coded pageview data**
Daily resolution. Will allow researchers to track the flu, breaking news,
etc. Dimensions will be:
* project - 'Project name from requests hostname'
* page_title - 'Page Title from requests path and query'
* country_code - 'Country ISO code of the accessing agents (computed using
MaxMind GeoIP database)'
* province - 'State / Province of the accessing agents (computed using
MaxMind GeoIP database)'
* city - 'Metro area of the accessing agents (computed using MaxMind GeoIP
database)'
So, if anyone wants another cube, **now** is the time to speak up. We'll
probably add cubes later, but it may be a while.
[1] OLAP cubes: https://en.wikipedia.org/wiki/OLAP_cube
Hi People Interested in Gather,
Given the reorg and the traffic being driven to beta, we need to revisit
Gather's Q4 goals:
TLDR: Continue working on gather to finish MVP features (end of the month,
max), not pushing to stable unless we see 2x the number of logged-in edits
on beta (or >10x current state).
Before the serious stuff:
Top 5 best collections since Monday:
https://en.m.wikipedia.org/wiki/Special:Gather/by/Johnrenfrohttps://en.m.wikipedia.org/wiki/Special:Gather/id/3454/Philadelphia_watchhttps://en.m.wikipedia.org/wiki/Special:Gather/id/3401/engineeringhttps://en.m.wikipedia.org/wiki/Special:Gather/id/3532/Mental_Healthhttps://en.m.wikipedia.org/wiki/Special:Gather/id/3132/Libertarianismo
Okay, now the goals. Please feel free to comment in email or in this google
doc
<https://docs.google.com/a/wikimedia.org/document/d/1DK1pa3PEpIbiON0Bj6BrhGA…>
Thanks,
Jon
Given the reorg and an improvement made to beta, we need to revisit
Gather's goals
Pushing to Stable
Given that we can now test Gather numbers in beta, the original goal of
launching on stable to test adoption is no longer valid. It is expensive
in terms of future maintenance and commitments to launch features to
stable, so we only want to do so after we have proven success, if possible.
Target numbers
Originally, the benchmarks for success on stable that were agreed on were
low (10k creators a month on stable, and 1k shares). Given the current
beta numbers, it looks like we will blow the first number out of the
water. Share has been deprioritized given current usage patterns.
However, given that we now have 4 engineers in charge of the entire web,
the standards for what we work on have to be more rigorous. We cannot
allocate multiple engineers to an experimental product unless it shows
promise of impacting greater numbers
1. Goal
1.
New Goal:
1.
Round out Gather hypothesis (criteria below)
2.
By end of June know whether or not we want to push Gather to stable
1. Next Eng+PM Steps (to round out hypothesis)
1.
Improve onboarding (a few tasks)
2.
Surface collections publicly (this is a big missing feature)
3.
Qualitative and quantitative research
2. Criteria for passing to stable:
We don’t have a great way to measure success of Gather based on usage by a
proportion of users or logged in users. However, we can compare to
something similar like edits. In terms of pure value to WP, we can
consider a collection to be like a low-value edit.
-
There are 2x ‘good’ collections made as there are total logged-in
edits/month
-
in May there were 2,180 edits by logged in users (2,694 logged out).
-
At current rates this suggests ~4,500 collections per month is our
target.
-
‘good’ here, means >1 collection--it’s not a perfect definition, but
it’s a strong proxy.
-
Our current rate of ‘good’ collections is roughly 250 a month, so we
will need to increase the number by almost 20x.
-
It might be worth exploring % of those 2,180 edits that are reverted
and adjusting down accordingly
OR
-
Views of collections or where collection is the referrer > .5% of total
pageviews
-
Currently, the views of collections are minimal.
-
If very few people create collections, but they drive a significant
boost in page views, say .5% of total PVs (not an end goal, but also not
bad for an MVP introducing a new use case), then the feature is a success.
1. What if we don’t pass to stable:
Lets burn that bridge when we get to it :) . Seriously though, until we get
some qualitative data back from our readers, we will not be able to make
important calls on the feature as is. There are a few great alternatives I
can think of right off the bat:
-
Keep code in beta and work on Gather opportunistically or as qualitative
data dictates
-
One example might be to make collections private by default and
launch as bookmarker for readers (primary current use-case)
-
Promote as beta feature on desktop
-
Use codebase as start of multiple watchlists (some good work started
here by JRobson)
-
Codebase is fairly generic list table that has some basic and
interesting features built in that could be used for a number of other ends
1. Validation and why we choose this criteria:
Qualitative questions to answer:
Why aren't more people using Gather? (correctly)
Why aren't people returning to use Gather more
Success metric questions to answer (that we can’t already):
What % of logged in users use Gather
What % of users who visit > 1 page use Gather
What is our denominator
Status
Measure logins and signup funnel directed from Gather, what is % success
rate?
This is not instrumented
Measure % of sessions with more than 1 pageview
Working on this
Measure % of sessions with logged in users
Cannot get this
What is our baseline? Logged in edits is probably the best thing to
measure against.