The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service, Wikimedia
Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, February 3rd, 2021
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you Wednesday!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-5 / EST
Dear Wikimedia search platform team,
I'm cross posting this from StackOverflow since it's a bit niche:
https://stackoverflow.com/questions/65303450/how-to-authenticate-to-wikimed…
. I hope this is okay.
I am trying to use the Wikimedia Commons Query Service[1] programmatically
using Python, but am having trouble authenticating via OAuth 1. I
understand the service is subject to change, but am mostly trying to
prototype things knowing they will have to be reworked later.
Please find enclosed my self contained Python example which does not work
as expected. The expected behaviour is that a result set is returned, but
instead a HTML response of the login page is returned. You can get the
dependencies with `pip install --user sparqlwrapper oauthlib certifi`. The
script should then be given the path to a text file containing the pasted
output given after applying for an owner only token[2]. e.g.
```
Consumer token
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Consumer secret
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access token
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access secret
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
```
[1] https://wcqs-beta.wmflabs.org/ ;
https://diff.wikimedia.org/2020/10/29/sparql-in-the-shadow-of-structured-da…
[2] https://www.mediawiki.org/wiki/OAuth/Owner-only_consumers
```python
import sys
from SPARQLWrapper import JSON, SPARQLWrapper
import certifi
from SPARQLWrapper import Wrapper
from functools import partial
from oauthlib.oauth1 import Client
ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
?file wdt:P180 wd:Q42 .
}
"""
def monkeypatch_sparqlwrapper():
# Deal with old system certificates
if not hasattr(Wrapper.urlopener, "monkeypatched"):
Wrapper.urlopener = partial(Wrapper.urlopener,
cafile=certifi.where())
setattr(Wrapper.urlopener, "monkeypatched", True)
def oauth_client(auth_file):
# Read credential from file
creds = []
for idx, line in enumerate(auth_file):
if idx % 2 == 0:
continue
creds.append(line.strip())
return Client(*creds)
class OAuth1SPARQLWrapper(SPARQLWrapper):
# OAuth sign SPARQL requests
def __init__(self, *args, **kwargs):
self.client = kwargs.pop("client")
super().__init__(*args, **kwargs)
def _createRequest(self):
request = super()._createRequest()
uri = request.get_full_url()
method = request.get_method()
body = request.data
headers = request.headers
new_uri, new_headers, new_body = self.client.sign(uri, method,
body, headers)
request.full_url = new_uri
request.headers = new_headers
request.data = new_body
print("Sending request")
print("Url", request.full_url)
print("Headers", request.headers)
print("Data", request.data)
return request
monkeypatch_sparqlwrapper()
client = oauth_client(open(sys.argv[1]))
sparql = OAuth1SPARQLWrapper(ENDPOINT, client=client)
sparql.setQuery(QUERY)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print("Results")
print(results)
```
Best regards,
Frankie
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month—though this month we're
going to be a week later.
Come talk to us about anything related to Wikimedia search, Wikidata Query
Service, Wikimedia Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, January 13th, 2021
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you tomorrow!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-5 / EST
Hi everyone,
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service, Wikimedia
Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, December 2nd, 2020
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you in a week!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-5 / EST
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service, Wikimedia
Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, November 4th, 2020
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you in a week!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-4 / EDT
Hi everyone,
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service, Wikimedia
Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, October 7th, 2020
Time: 15:00-16:00 GMT / 08:00-09:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you in a week!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-4 / EDT
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service, Wikimedia
Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, September 2nd, 2020
Time: 15:00-16:00 GMT / 08:00-09:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you in a week!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-4 / EDT
David Chan from the Editing Team has written up a nice summary
<https://meta.wikimedia.org/wiki/User:DChan_(WMF)/Forms_of_writing_used_in_C…>
of what's going on in the Chinese Wikipedias, discussing the languages,
writing systems, and tools used to deal with them in the eight Wikipedias
people might refer to as "Chinese" (seven modern languages plus Classical
Chinese).
Good stuff!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-4 / EDT
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, August 5th, 2020
Time: 15:00-16:00 GMT / 08:00-09:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you in a week!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-4 / EDT
Hi everyone,
Last week I attended and presented at the virtual Celtic Knot Conference
<https://meta.wikimedia.org/wiki/Celtic_Knot_Conference_2020>. There were
plenty of interesting talks, some live, some pre-recorded, all now
available on YouTube; links are available on the “Main/Live program
<https://meta.wikimedia.org/wiki/Celtic_Knot_Conference_2020/Live_program>”
page, and the “Videos pool
<https://meta.wikimedia.org/wiki/Celtic_Knot_Conference_2020/Videos_pool>”
page.
I wanted to point out some of presentations and other things that might be
interesting:
- You can see a demo <https://www.youtube.com/watch?v=WIeJ_0aqgPg> of
what the Growth Team has been up to with their newcomer task work that our
team has been supporting.
- There’s a workshop-like demo of the Lexeme project on Wikidata
<https://www.youtube.com/watch?v=oDM5QJAJzNc>, which still has a long
way to go, but already has a *lot* of data.
- There’s also Lexeme-related tool in ToolForge called Ordia
<https://ordia.toolforge.org/>, which has all sorts of nifty
capabilities. A nice one is looking to see how many lexemes each
language has <https://ordia.toolforge.org/language/>.
- I had not previously heard of Wikidata Bridge
<https://www.mediawiki.org/wiki/Wikidata_Bridge>, which aims to allow
people to edit Wikidata from infoboxes!
- A recent article from *Java Magazine* lists the 25 greatest Java apps
ever written
<https://blogs.oracle.com/javamagazine/the-top-25-greatest-java-apps-ever-wr…>,
and #6 is “Wikipedia Search”, even though the Java bit is mostly
Elasticsearch and the “Wikipedia” part is mostly PHP. Still, it’s nice to
be appreciated.
- Amir has some nice ideas about how to make the Wikimedia Incubator
better <https://www.youtube.com/watch?v=DdyzrDzD0qg>. One positive side
effect of his proposal might be better search on new wikis.
I don’t particularly recommend my talk
<https://www.youtube.com/watch?v=Pi3-w9ne3zg> since it is a short version
of the same old overview of the basic kinds of text processing we can do
for search—unless you want to see a few more examples in Irish (I don’t try
to *pronounce* any of the Irish words, though, so it isn’t as entertaining
as it could have been).
I already got a line on some Breton stop words, and I’m going to look into
what we are doing for Breton as a 10% project.
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC-4 / EDT