Discovery

discovery@lists.wikimedia.org

1 participants
755 discussions

mediawiki -> kafka -> hadoop -> hive data pipeline up and running
by Erik Bernhardson 03 Nov '15

03 Nov '15

I've been too busy to move this forward in the last few weeks but finally found some time to deploy what we had been working on. This pipeline is now up, running and queryable from hive. Its sampling 1:1000 right now as i didn't want a flood of errors if it went wrong, but based on the success so far will be dropping the sampling so it captures everything our old logs did. For the time being we will continue logging CirrusSearchRequests and CirrusSearchUserTesting to fluorine (and rsync to stat1002 for processing) but that can be turned off once we move any existing data processing over to hive. Very exciting! There are still a few minor things to figure out, my first edition of the table in hive doesn't handle the external partitioning right but will fix that soon enough.

1 0

BART jam, gonna miss the meeting EOM
by Max Semenik 02 Nov '15

02 Nov '15

1 0

FYI: Discovery time spent on maintenance
by Kevin Smith 31 Oct '15

31 Oct '15

There is an initiative within the WMF to figure out how much time/effort teams spend on "new functionality" vs. "maintenance". As a pilot project, I have been tracking that in our Discovery Cirrus project[1] for a couple months. As shown on this graph[2], we have been spending somewhere between 25% and 50% of our time on "maintenance". Note that this should not be considered at all scientific. For starters, there are several glaring issues with this graph: - Because we are not doing point estimation, this graph is based on task counts, not actual effort. - Data around Oct 1 is missing/funky due to the offsite. - The bars are pure percentages, so 50% of 2 tasks completed would look the same as 50% of 40 tasks completed. That 100% bar, in particular, is misleading because I believe it is based on a single task being resolved that week. - The counts are based on my snap decision for each task, whether to add the #worktype-new-functionality or the #worktype-maintenance tag. Still, it's a higher fraction than I would have guessed. Is it worth my time (or someone else's) to continue to track this data? [1] https://phabricator.wikimedia.org/tag/search-and-discovery-cirrus-sprint/ [2] http://phlogiston.wmflabs.org/discir_maint_count_frac.png Kevin Smith Agile Coach, Wikimedia Foundation

3 3

New dashboard: search engine traffic to Wikipedia
by Oliver Keyes 28 Oct '15

28 Oct '15

Hey all, We're pleased to announce the provisioning and release of a new dashboard. This one contains something a bit different; it breaks down our pageviews and shows how external search engines influence the traffic that hits our wikis. Both a simple count of search-referred pageviews versus other pageviews, and a breakdown of how much traffic is coming from what specific search engines, is included. You can see it at http://discovery.wmflabs.org/external/ - hope it's useful! -- Oliver Keyes Count Logula Wikimedia Foundation

6 8

An Analysis of Google's "Rich Answers"
by Trey Jones 26 Oct '15

26 Oct '15

Greetings all, This weekend I stumbled across this interesting bit of research (done by a Search Engine Optimization consultant) analyzing the increase in "rich answers" provided by Google. Rich answers are where Google tries to provide a full or partial answer to a question without requiring a click to another website. The end of the article is concerned with SEO, and the effect different kinds of rich answers have on website traffic (e.g., partial answers lead people to your site, full answers don't), but the bulk of the article is a breakdown of the kinds of rich answers Google provides. The most surprising to me is that they license song lyrics in order to provide them (without attribution). Not surprisingly, Wikipedia comes up several times in screenshots. Whether you care about SEO or not, it's a nice survey of the kind of rich answers Google provides: https://www.stonetemple.com/the-growth-of-rich-answers-in-googles-search-re… —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation

2 1

Mobile web and mobile app schemas broken
by Oliver Keyes 26 Oct '15

26 Oct '15

The mobile web and mobile app search schemas are currently broken. I'm contacting the apps and web teams to get this worked out, but I thought I'd let people know. -- Oliver Keyes Count Logula Wikimedia Foundation

2 4

FYI: Internal team restructuring
by Kevin Smith 23 Oct '15

23 Oct '15

This week, the Discovery Department reconfigured its internal teams, to better align with our quarterly goals[1]. This new arrangement is considered experimental and temporary, although we expect it to last through the end of this quarter. We believe this change will improve our focus on our goals, reduce context-switching by individuals, reduce the total amount of time spent in meetings, and generally improve communication. The new internal sub-teams are: "Language Search", focused on our goal of "Improve language support for search". The people on this team include David, Erik, Stas, and Trey. For now, this team will continue to track its work on the Cirrus board[2]. "Portal", focused on our goal of "Make www.wikipedia.org a portal for exploring open content on Wikimedia sites". The people on this team include Jan, Julien, Max, and Moiz. For now, this team will continue to track its work on the UX board[3]. "Maps", which will continue to gather user feedback on our newly-deployed service, as well as doing maintenance and minor enhancements. For now, this will just be Yuri, who is also splitting his time with supporting Zero and Graphs. Maps work will continue to be tracked on the Maps board[4]. As a side note, Wikidata Query Service (WDQS) is in a similar position to maps this quarter, and thus will only receive a fraction of Stas's attention. Work will continue to be tracked on the WDQS board[5]. Product Manager Dan will work most closely with the Language Search team. He will help the other teams as needed, but our intent is that they should largely be self-sufficient from a product standpoint, for this quarter. All the teams will continue to use a Kanban process with a weekly cadence. The Analysis folks will continue to support the entire department. To ensure coordination, an analyst will attend the planning meetings and standups of each of the two big new sub-teams. Mikhail will work with the Language Search team, and Oliver will work with the Portal team. All analysis work will continue to be tracked on the Analysis board[6]. People from external departments (TPG, Ops, Community Liaisons) will interact with whichever sub-team(s) make sense at the time. And of course everyone in the department will be available to help out other sub-teams as needed. After each departmental retrospective, we will evaluate the new structure, and will consider changes. Our next retro is 2015-11-02. We expect the structure to change next quarter, when we have a new set of goals to support. Questions and comments are welcome. [1] https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q2_Goals#Disco… [2] https://phabricator.wikimedia.org/tag/discovery-cirrus-sprint/ [3] https://phabricator.wikimedia.org/tag/discovery-ux-sprint/ [4] https://phabricator.wikimedia.org/tag/discovery-maps-sprint/ [5] https://phabricator.wikimedia.org/tag/discovery-wikidata-query-service-spri… [6] https://phabricator.wikimedia.org/tag/discovery-analysis-sprint/ Kevin Smith Agile Coach, Wikimedia Foundation

1 0

Common terms A/B test extended for ~one more week
by Erik Bernhardson 20 Oct '15

20 Oct '15

After reviewing a weeks worth of data for the commons terms A/B test we have decided that we have not collected enough information. The initial sampling was: 1:1000 users chosen to participate in test Those users split into 6 buckets, giving each bucket a 1:6000 sampling This has collected ~100 events per bucket, much less in the "strict" bucket We are increasing the main sampling by 5x, to 1:200. This will give each bucket a 1:1200 sampling of users. The reason these collect so little data is that quite a few queries don't meet the minimum requirements to be effected by the tests. The "aggressive recall" test requires at least 3 words in the query, and the "strict" test requires at least 6 words in the query. Erik B.

1 0

Fleet management.
by John Ljungqvist 19 Oct '15

19 Oct '15

Nyhetsbrev med senaste nytt. Problem att visa det? Se det i webbläsaren. http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url… NORDTRACK FLEET MANAGEMENT –ÖVERSIKT ÖVER FORDONSFLOTTAN 1832 kr ------------------------- FLEET MANAGEMENT – ÖVERSIKT ÖVER FÖRETAGETS FORDON Med NORDTRACK MINI http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url… installerad i era fordon ser ni i realtid var de befinner sig och i vilken riktning de färdas. Det blir enklare att planera och följa upp rutter, och hjälper till att förbättra service gentemot kund. Så fort ett fordon startar och kör iväg börjar NordTrack LivePro automatiskt att registrera fordonets positioner och skicka dessa till en server. Dessa presenteras på karta och i tabellform på ert eget användarkonto på NordTrack.se http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url…. Det webbaserade gränssnittet gör att ni kan följa fordonen från vilken dator eller surfplatta som helst ------------------------- STÖLDSKYDD- NÄR DU ÄGER NÅGOT DU VILL HA EXTRA KOLL PÅ Med ett NordTrack -larm, en sk. GPS-tracker, får du koll på din egendom Live via din mobiltelefon eller PC. Du får larm till din mobil om larmet skulle förflyttas, eller om någon försöker göra åverkan på fordonet eller båten där det är placerat. _F__ÖR 1832 KR / INGA MÅNADSKOSTNADER , SÅ HAR DU FULL KOLL PÅ DIN EGENDOM_ ------------------------- NordTrack LivePro Fleet är en enkel och prisvärd Fleet management produkt som tillgodoser basbehovet hos åkerier och transportintensiva företag, som vill ha överblick över sina fordon. PRIS 1832 :- INGA LÖPANDE AVGIFTER NORDTRACK SÖKER ÄVEN ÅTERFÖRSÄLJARE ! Kontakta oss via mail gps(a)nordtrack.se Eller ring på 013-9913935 _LOGGA IN NEDAN MED KONTO "DIA" LÖSENORD 123456_ _FÖR ATT SE NÅGRA AV VÅRA FORDON LIVE_ ------------------------- Spåra live http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url… ------------------------- Kontakta oss redan idag http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url… NORDINFO SVERIGE FILIAL Address: Kungsbergsgatan 2 A 583 22 Linköping Telefon: 013 - 9913935 E-post: gps(a)nordtrack.se http://mx.nordtrack.de/mailwizz/index.php/lists/vc1127mn1g8f8/unsubscribe/b…, NordTrack Kungsbergsgatan 2 Linköping 58224 Sweden ------------------------- http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url… http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url… http://mx.nordtrack.de/mailwizz/index.php/campaigns/nh918q3p591fa/track-url…

1 0

Fwd: Wikimedia Foundation quarterly reviews for July-September 2015
by Tilman Bayer 18 Oct '15

18 Oct '15

Forwarding regarding the Discovery team's quarterly review documentation ---------- Forwarded message ---------- From: Tilman Bayer <tbayer(a)wikimedia.org> Date: Fri, Oct 16, 2015 at 10:15 PM Subject: Wikimedia Foundation quarterly reviews for July-September 2015 To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org> Cc: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Greetings everyone, the Wikimedia Foundation's quarterly reviews of teams' work in the past quarter (July-September, Q1 of the 2015-16 fiscal year) took place last week. Minutes and slides for those meetings are now available: Community Engagement: https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarter… Discovery: https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarter… Reading and Advancement (with Fundraising Tech): https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarter… Editing (comprising the Collaboration, Language Engineering, Multimedia, Parsing, and VisualEditor teams): https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarter… Infrastructure (comprising the Analytics, Release Engineering, Services, TechOps, and Labs teams) and CTO (comprising the Design Research, Research & Data, Performance, and Security teams): https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarter… Legal, Talent & Culture (HR), Communications, Finance & Administration & Office IT, and Team Practices: https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarter… As usual, much of this information will also be available in consolidated form as part of the general WMF quarterly report for Q1, which is planned to be published on October 19. See https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarter… for some general background about the Foundation's quarterly review process. -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB

1 1

← Newer
1
...
62
63
64
65
66
67
68
...
76
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery