Hi Everyone,
I've done further analysis on the ~1400 zero-results non-DOI query corpus,
looking at the effects of perfect (or at least human-level) language
detection, and the effects of running all queries against many wikis.
In summary:
> More that 85% of failed queries to enwiki are in English, or are not in a
> particular language. Only about 35% of non-English queries in some language
> (<4.5% of zero-results queries), if funneled to the right language wiki,
> get any results.
>
The types of queries most likely to get results from the non-enwikis are
> names and queries in English. There are lots of English words in
> non-English wikis (enough that they can do decent spelling correction!),
> and the idiosyncrasies of language processing on other wikis allow certain
> classes of typos in names and English words to match, or the typos happen
> to exist uncorrected in the non-enwiki.
>
Perhaps a better approach to handling non-English queries is user-specified
> alternate languages.
More details:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_…
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
Hey all,
We currently have a data outage on our dashboards - they display, but
we're missing the last few days.
The good news is that we know exactly what happened here; as part of
our work to (amusingly enough) make the data pipeline here more robust
and standardised, we switched all of our data retrieval scripts over
to a new project and repository (previously they'd lived in the repo
for the dashboard they referred to, which doesn't scale). A bug in the
shell script that tied them all together meant none of them ran - and
of course we switched everything over immediately before a long
weekend. Doh ;p.
The original bug has a patchset in awaiting review, and as soon as
it's +2d we're going to begin backfilling the datasets. You can follow
our progress on that at https://phabricator.wikimedia.org/T111749
Thanks,
--
Oliver Keyes
Count Logula
Wikimedia Foundation
I've written up my analysis of the ElasticSearch language detection plugin
that Erik recently enabled:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_E…
The short version is that it really likes Romanian (and Italian, and has a
bit of a thing for French), and precision on English is great, but recall
is poor (probably because of all the typos and other crap that go to enwiki
that is still technically "English"). Chinese and Arabic are good.
I think we could do better, and we should evaluate (a) other language
detectors and (b) the effect of a good language detector on zero results
rate (i.e., simulate sending queries to the right place and see how much of
a difference it makes).
Moderately pretty pictures included.
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
Cross-posting from wikidata-l.
---------- Forwarded message ----------
From: Dan Garry <dgarry(a)wikimedia.org>
Date: 7 September 2015 at 15:29
Subject: Announcing the release of the Wikidata Query Service
To: wikidata-l(a)lists.wikimedia.org
The Discovery Department at the Wikimedia Foundation is pleased to announce
the release of the Wikidata Query Service
<https://www.mediawiki.org/wiki/Wikidata_query_service>! You can find the
interface for the service at https://query.wikidata.org.
The Wikidata Query Service is designed to let users run queries on the data
contained in Wikidata. The service uses SPARQL
<https://en.wikipedia.org/wiki/SPARQL> as the query language. You can see
some example queries in the user manual
<https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual>.
Right now, the service is still in beta. This means that our goal
<https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q2_Goals#Wikid…>
is
to monitor of the service usage and collect feedback about what people
think should be next. To do that, we've created the Wikidata Query Service
dashboard <https://searchdata.wmflabs.org/wdqs/> to track usage of the
service, and we're in the process
<https://phabricator.wikimedia.org/T111403> of setting up a feedback
mechanism for users of the service. Once we've got monitored the usage of
the service for a while and got user feedback, we'll decide on what's next
for development of the service.
If you have any feedback, suggestions, or comments, please do send an email
to the Discovery Department's public mailing list,
wikimedia-search(a)lists.wikimedia.org.
Thanks,
Dan
--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
Hi all,
If you've been to http://searchdata.wmflabs.org/ recently, you would have
noticed that we have a new dashboard (and a work-in-progress facelift).
Introducing… The Wikidata Query Service dashboard:
http://searchdata.wmflabs.org/wdqs/ ! Yay! Hopefully this will help the
WDQS team as they continue their work on that awesome project.
As with the Search Metrics dashboard
<http://searchdata.wmflabs.org/metrics/>, we welcome constructive criticism
and feature suggestions with an open mind.
One suggestion that I'm going to look into is finding out how many people
who visited the homepage ended up submitting a query. We also have failure
stats, so those will be showing up in the near future.
Thank you,
Mikhail // Junior Swifty
--
*Mikhail Popov* // Data Analyst, The Swifties, Discovery
<https://www.mediawiki.org/wiki/Wikimedia_Discovery>
https://wikimediafoundation.org/
*Imagine a world in which every single human being can freely share in
the **sum
of all knowledge. That's our commitment.* Donate
<https://donate.wikimedia.org/>.
A few of us met this morning, to ensure that we have a plan for everyone in
the department to be productive on Gerrit Cleanup Day (Wednesday
2015-09-23). We think most folks are accounted for, and came up with ideas
for others.
I added Gerrit Cleanup Day as an upcoming event on our wiki page[1], and
created a page with the proposed plan[2] that came out of this morning's
meeting.
Action items prior to the day (mostly listing them here for my own
convenience):
- Erik will coordinate with the developers to help them be productive
- Kevin will ask Quim to try to get David paired up with someone in his
timezone (maybe Trey also)
- Kevin will talk to Oliver, who can guide Mikhail
- Kevin will get a gerrit account, to be able to +1/-1
- Kevin will organize some kind of kickoff meeting the morning of the
big day
- Kevin will check with Moiz
- Kevin will check with Wes to see what he is planning
[1] https://www.mediawiki.org/wiki/Wikimedia_Discovery#Upcoming_events
[2]
https://www.mediawiki.org/wiki/Discovery_plans_for_gerrit_cleanup_day_2015
Kevin Smith
Agile Coach, Wikimedia Foundation
I understand that we are shifting to a "minimum 2-week" cadence, but I'm
not sure exactly what that means. Reading Mikhail's email, it sounds like
we plan to run each test for one week, and then have one week "off" to
analyze those results and to prepare for the following test. Is that true?
Regardless of those details, would it be helpful to have a "recipe" for
each test? To know that on Day T-7, we would be thinking about X, and by
Day T-4, we had better have Y in place. And then to expect Z by day T+8.
Basically, to document all the little steps that might be necessary or
optional before, during, and after a test.
If that seems helpful, I can create a phab task to create and populate a
wiki page with that kind of information. Obviously the population of that
page would have to be a group effort, with input from product, engineering,
analysis, and possibly others.
Kevin Smith
Agile Coach, Wikimedia Foundation
- This mail is in HTML. Some elements may be ommited in plain text. -
Assalamualikum,
I am Hussein Ali from Syria, presently now with the United Nations
on asylum. I got your contact from a web business directory on
investment. Please I seek your assistance in the following ways:
1.To assist me look for a profitable business in your country (where I can
invest to sustain my living until the political crisis in my country is
over).
2. To assist me purchase a living home, .I have huge sum fifteen million us dollars in
financial institution .Should there be a need for an evidence, or a
prove of my seriousness and genuineness. I have a Certificate of Deposit as
a prove of fund.
Please assist me to come over to your country for resettlement and
investment. I will compensate you greatly for this help. I am also ready to
associate with a local partner, provided
Your Government will give me a Residence Permit.
Could you please send me an email on (syriaoil.aleppo(a)gmail.com ) to enable me know you
have received my email.
Regards,
Hussein Ali.