Thanks for thinking through! Responses inline
On Tue, Oct 13, 2015 at 8:36 AM, Brian Gerstle <bgerstle@wikimedia.org> wrote:
Great experiment!  A couple questions/comments:
  1. The % clickthrough per category shows SF Landmarks at 120%. Is that correct, and if so, what does it mean?
It means that for every 10 times the SF Landmarks list was visited from a tag, there were 12 clicks on to other articles (this is possible because a user can visit 2 or more articles from the page by clicking on the 'back' button in their browser)
  1. As a big believer in the power of categories as a driver for engagement, I would love to see more variations of this experiment w/ different placements, in a feed, different categories, add'n of portals, as a FTUE, etc. (likely to have a great deal of overlap w/ cascade D: deep dive educational experience)
Me too!  it is very hard to learn anything from a first stab like this 
  1. Also loved the win/needs-improvement breakdown at the end
Thanks!
 

On Tue, Oct 13, 2015 at 11:23 AM, Jon Katz <jkatz@wikimedia.org> wrote:
Thanks, Joaquin!

On Tue, Oct 13, 2015 at 4:32 AM, Joaquin Oltra Hernandez <jhernandez@wikimedia.org> wrote:
Thanks a lot for the detailed report Jon.

I've parsed it and posted it to https://www.mediawiki.org/wiki/Reading/Web/Projects/Categories_Browse so that can keep it more accessible than the mailing list archive.

Any help with formatting or text corrections would be appreciated.


On Sun, Oct 11, 2015 at 8:32 PM, Jon Katz <jkatz@wikimedia.org> wrote:

Hi Team,

I just wanted to update you on the results of something we internally referred to as the 'browse' prototype.  
TLDR: as implemented the mobile 'browse by category' test did not drive significant engagement.  In fact, as implemented, it seemed inferior to blue links.  However, we started with a very rough and low-impact prototype, so a few tweaks would give us more definitive results.


Questions/comments welcome!
Best,

J


Browse Prototype Results





Intro

Process

Results

Blue links in general

Category tags

Conclusion and Next Steps

Process

Do people want to browse by categories?



Intro

As outlined in this doc, the concept is a tag that allows readers to navigate WP via categories that are meaningful and populated in order of 'significance' (as determined by user input).  The hypothesis:

  • users will want to navigate by category if there are fewer, more meaningful categories per page and those category pages showed the most ‘notable’ members first.

Again, see the full doc to understand the premise.


Process

The first step was to validate: do users want to navigate via category?  So we built a very lightweight prototype on mobile web, en wikipedia (stable, not beta) using hardcoded config variables, in the following categories ( ~4000 pages).  Here we did not look into sub-categories with one exception (see T94732 for details).  There was also an error and 2 of the categories did not have tags implemented (struck through, below)


Category

Pagecount

NBA All Stars

400

American Politicians

818

Object-Oriented Programming Languages

164

European States

24

American Female Pop Singers

326

American drama television series

1048

Modern Painters

983

Landmarks in San Francisco, California

270




Here is how it appeared on the Alcatraz page



When the user clicked the tag, they were taken to a gather-like collection based on manually estimated relevance

(sorry cropped shot)





The category pages were designed to show the most relevant (as deemed by me) to the broadest audience, first. Here is the ordering: https://docs.google.com/spreadsheets/d/12xLXQsH1zcg6E8lDuSonumZNdBvfaBuHOS1a1TCASK4/edit#gid=0


This was intended to lie in contrast with our current category pages, which are alphabetical and not really intended for human browsing: https://en.wikipedia.org/wiki/Category:American_male_film_actors



We primarily measured a few things:


  • when a tag was seen by a user

  • when a tag was clicked on by a user

  • when a page in the new ‘category view’ was clicked on by a user


As a side effort, I looked to see if overall referrals from pages with tags went up--this was a timed intervention rather than an a/b test and given the click-thru on the tags, the impact would have been negligible anyway.  This was confirmed by some very noisy results.   



Results


Blue links in general

One benefit of the side study mentioned in the previous paragraph is that I was able to generate a table that looked at the pages in question before we started the test that shows a ratio of total pageviews/pageviews referred by a page (estimate of how many links were opened from that page).  Though it is literally just for 0-1 GMT, 6/29/15, now  that we have the pageview hourly table, a more robust analysis can tell us how categories differ in this regard:



Category

links clicked

#pvs

clicks/pvs

Category:20th-centuryAmericanpoliticians

761

1243

61%

Category:Americandramatelevisionseries

5981

8844

68%

Category:Americanfemalepopsingers

2502

4280

58%

Category:LandmarksinSanFrancisco,

104

287

36%

Category:Modernpainters

136

369

37%

Category:NationalBasketballAssociationAll-Stars

1908

3341

57%

Category:Object-orientedprogramminglanguages

48

181

27%

Category:WesternEurope

657

1221

54%

Grand Total

12099

19766

50%



You can see here that for pages in the category  ‘Landmarks in San Francisco’, if there are 10 pageviews, 5.4 clicks to other pages are generated on average.


I do not have the original queries for this handy, but can dig them up if you’re really interested.


Category tags

Full data and queries here: https://docs.google.com/a/wikimedia.org/spreadsheets/d/1vD3DopxGyeh9FQsuTQDMo6f5y43Yoy5gnJQqKn9hEQg/edit?usp=sharing


The tags themselves generated an average click-through rate of .18%.  Given the overall click thru rate on the pages estimated above ~50%, this single tag is not driving anything significant.  Furthermore, given Leila and Bob’s paper suggest that this is performing no better than a mid-article click--given the mobile web sections are collapsed, I would need to understand more about their method to know just how to interpret their results against our mobile-web only implementation.  Furthermore, our click through rate used the number of times the tag appeared on screen as the denominator, whereas their research looked at overall pageviews.



This being noted, the tag was implemented to be as obscure as possible to establish a baseline.  Furthermore, any feature like this would probably be different in the following ways:

  • each page would be in 1-4 tag groups (as opposed to just 1)

  • each page would be tagged, creating the expectation on the part of the user that this was something to look for

  • presumably the categories could be implemented as a menu item as opposed to being buried at the bottom of the page (and competing with features like read more.

  • Using the learnings from ‘read more’ tags with images or buttons would likely fare much better.


The follow graph shows:

  • number of impressions on the right axis

  • click-thru-rate on the left-axis.  



When you look at click through rates on the ‘category’ pages themselves, you see that they average at 41% (Chart below)  Meaning that for every 10 times a user visited a category page, there were 4.1 clicks to one of those pages as a result.



Here is the same broken up by category:


Each ‘category’ page here had at least 400 visits, and you can see that the interest seems to vary dramatically across categories.  It is worth noting that the top three categories here are the ones with the fewest entities.  Each list, however, was capped at ~50 articles, so it is unclear what might be causing this effect, if it is real.


As mentioned above, the average article page has an overall click rate of 50%. So this page of categories did not have the click-through rate that a page has.  However, this page had summaries of each of the pages, so it could be that users were generating value beyond what a blue link would provide.  A live-user test of Gather collections, from whom this format was borrowed, suggested that the format used up too much vertical space on each article and was hard to flip through.  Shortening the amount of text or image space might be something to try to make the page more useful



Conclusion and Next Steps

Process

  • This was the first time I am aware of that we ran a live prototype and learn something without building a scalable solution. Win  

  • Developer time was estimated at 1 FTE for 2 weeks (by pheudx), but the chronological time for pushing to stable took a quarter. Room for improvement

  • The time to analysis was almost 2 quarters, due to a lack of data analysis support (I ran the initial analysis within 2 weeks of launch, during paternity leave, but was unable to go back and get it ready to distribute for 3 months).  Room for improvement--possibly solved by additional Data Analyst.  


This experiment was not designed to answer questions definitively in one round, but with the understanding that multiple iterations would allow us to fully answer our questions.


The long turn-around time, particularly around analysis and communication, meant that tweaking a variable to test the conclusions or the new questions that arosee below will involve a whole lot more work and effort than if we had been able to explore modifications within a few weeks of the initial launch.



Do people want to browse by categories?

Category tags at the bottom of the mobile web page in a dull gray background that lead to manually curated categories are not a killer feature :)


I would be reluctant to say that this means users are not interested in browsing by category, however.  For instance, it is likely that

  • users did not notice the tag, even if it appeared on screen

  • users are accustomed to our current category tags on desktop and not interested in that experience

  • users who did like the tag were unlikely to find another page that had it--there was no feedback mechanism by which the improved category page would drive additional tag interactions

  • the browse experience created was not ideal




If we decide to pursue what is currently termed “cascade c: update ux”, I would like to proceed with more tests in this arena, by altering the appearance and position of the tags, and by improving the flow of the ‘category’ pages.  If we choose a different strategy, hopefully other teams can build off of what was learned here.






_______________________________________________
reading-wmf mailing list
reading-wmf@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/reading-wmf



_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l



_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l




--
EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
IRC: bgerstle