Re: [Wikimedia-l] September 28: Strategy update - Final draft of movement direction and endorsement process (#25)

20 Oct 2017

Erik,

Should interactive web, internet of things, or offline services
relying on Foundation encyclopedia CC-BY-SA content be required to
attribute authorship by specifying the revision date from which the
transluded content is derived?

On Thu, Oct 12, 2017 at 7:01 AM, Erik Moeller &lt;eloquence(a)gmail.com&gt; wrote:
...
  On Tue, Oct 10, 2017 at 7:31 AM, Andreas Kolbe
&lt;jayen466(a)gmail.com&gt; wrote:

  Wikidata has its own problems in that regard that
have triggered ongoing
 discussions and concerns on the English Wikipedia.[1] 
 Tensions between different communities with overlapping but
 non-identical objectives are unavoidable. Repository projects like
 Wikidata and Wikimedia Commons provide huge payoff: they dramatically
 reduce duplication of effort, enable small language communities to
 benefit from the work done internationally, and can tackle a more
 expansive scope than the immediate needs of existing projects. A few
 examples include:

 - Wiki Loves Monuments, recognized as the world's largest photo competition
 - Partnerships with countless galleries, libraries, archives, and museums
 - Wikidata initiatives like mySociety's "Everypolitician" project or Gene
Wiki

 This is not without its costs, however. Differing policies, levels of
 maturity, and social expectations will always fuel some level of
 conflict, and the repository approach creates huge usability
 challenges. The latter is also true for internal wiki features like
 templates, which shift information out of the article space,
 disempowering users who no longer understand how the whole is
 constructed from its parts.

 I would call these usability and "legibility" issues the single
 biggest challenge in the development of Wikidata, Structured Data for
 Commons, and other repository functionality. Much related work has
 already been done or is ticketed in Phabricator, such as the effective
 propagation of changes into watchlists, article histories, and
 notifications. Much more will need to follow.

 With regard to the issue of citations, it's worth noting that it's
 already possible to _conditionally_ load data from Wikidata, excluding
 information that is unsourced or only sourced circularly (i.e. to
 Wikipedia itself). [1] Template invocations can also override values
 provided by Wikidata, for example, if there is a source, but it is not
 considered reliable by the standards of a specific project.

  If a digital voice assistant propagates a
Wikimedia mistake without telling
 users where it got its information from, then there is not even a feedback
 form. Editability is of no help at all if people can't find the source. 
 I'm in favor of always indicating at least provenance (something like
 "Here's a quote from Wikipedia:"), even for short excerpts, and I
 certainly think WMF and chapters can advocate for this practice.
 However, where short excerpts are concerned, it's not at all clear
 that there is a _legal_ issue here, and that full compliance with all
 requirements of the license is a reasonable "ask".

 Bing's search result page manages a decent compromise, I think: it
 shows excerpts from Wikipedia clearly labeled as such, and it links to
 the CC-BY-SA license if you expand the excerpt, e.g.:
 https://www.bing.com/search?q=france

 I know that over the years, many efforts have been undertaken to
 document best practices for re-use, ranging from local
 community-created pages to chapter guides and tools like the
 "Lizenzhinweisgenerator". I don't know what the best-available of
 these is nowadays, but if none exists, it might be a good idea to
 develop a new, comprehensive guide that takes into account voice
 applications, tabular data, and so on.

 Such a guide would ideally not just be written from a license
 compliance perspective, but also include recommendations, e.g., on how
 to best indicate provenance, distinguishing "here's what you must do"
 from "here's what we recommend".

 > Wikidata will often provide a shallow first
level of information about
> a subject, while other linked sources provide deeper information. The
> more structured the information, the easier it becomes to validate in
> an automatic fashion that, for example, the subset of country
> population time series data represented in Wikidata is an accurate
> representation of the source material. Even when a large source
> dataset is mirrored by Wikimedia (for low-latency visualization, say),
> you can hash it, digitally sign it, and restrict modifiability of
> copies. 
  Interesting, though I'm not aware of that
being done at present. 
 At present, Wikidata allows users to model constraints on internal
 data validity. These constraints are used for regularly generated
 database reports as well as on-demand lookup via
 https://www.wikidata.org/wiki/Special:ConstraintReport . This kicks
 in, for example, if you put in an insane number in a population field,
 or mark a country as female.

 There is a project underway to also validate against external sources; see:

https://www.mediawiki.org/wiki/Wikibase_Quality_Extensions#Special_Page_Cro…

 Wikidata still tends to deal with relatively small amounts of data; a
 highly annotated item like Germany (Q183), for example, comes in at
 under 1MB in uncompressed JSON form. Time series data like GDP is
 often included only for a single point in time, or for a subset of the
 available data. The relatively new "Data:" namespace on Commons exists
 to store raw datasets; this is only used to a very limited extent so
 far, but there are some examples of how such data can be visualized,
 e.g.:

   https://en.wikipedia.org/wiki/Template:Graph:Population_history

 Giving volunteers more powerful tools to select and visualize data
 while automating much of the effort of maintaining data integrity
 seems like an achievable and strategic goal, and as these examples
 show, some building blocks for this are already in place.

 > But the proprietary knowledge graphs are
valuable to users in ways
> that the previous generation of search engines was not. Interacting
> with a device like you would with a human being ("Alexa/Google/Siri,
> is yarrow edible?") makes knowledge more accessible and usable,
> including to people who have difficulty reading long texts, or who are
> not literate at all. In this sense I don't think WMF should ever find
> itself in the position to argue _against_ inclusion of information
> from Wikimedia projects in these applications. 
  There is a distinct likelihood that they will
make reading Wikipedia
 articles progressively obsolete, just like the availability of Googling has
 dissuaded many people from sitting down and reading a book. 
 There is an important distinction between "lookup" and "learning";
the
 former is a transactional activity ("Is this country part of the Euro
 zone?") and the latter an immersive one ("How did the EU come
 about?"). Where we now get instant answers from home assistants or
 search engines, we may have previously skimmed, or performed our own
 highly optimized search in the local knowledge repository called a
 "bookshelf".

 In other words, even if some instant answers lead to a drop in
 Wikipedia views, it would be unreasonable to assume that those views
 were "reads" rather than "skims". When you're on a purely
 transactional journey, you appreciate almost anything that shortens
 it.

 I don't think Wikimedia should fight the gravity of a user's
 intentions out of its own pedagogical motives. Rather, it should make
 both lookup and learning as appealing as possible. Doing well in the
 "lookup" category is important to avoid handing too much control off
 to gatekeepers, and being good in the "learning" category holds the
 greatest promise for lasting positive impact.

 As for the larger social issue, at least in the US, the youngest (most
 googley) generation is the one that reads the most books, and
 income/education are very strong predictors of whether people do or
 not:

http://www.pewresearch.org/fact-tank/2015/10/19/slightly-fewer-americans-ar…

 > The applications themselves are not the
problem; the centralized
> gatekeeper control is. Knowledge as an open service (and network) is
> actually the solution to that root problem. It's how we weaken and
> perhaps even break the control of the gatekeepers. Your critique seems
> to boil down to "Let's ask Google for more crumbs". In spite of all
> your anti-corporate social justice rhetoric, that seems to be the path
> to developing a one-sided dependency relationship. 
  I considered that, but in the end felt that given
the extent to which
 Google profited from volunteers' work, it wasn't an unfair ask. 
 While I think your proposal to ask Google to share access to resources
 it already has digitized or licensed is worth considering, I would
 suggest being very careful about the long term implications of any
 such agreements. Having a single corporation control volunteers'
 access to proprietary resources means that such access can also be
 used as leverage down the road, or abruptly be taken away for other
 reasons.

 I think it would be more interesting to spin off the existing
 "Wikipedia Library" into its own international organization (or home
 it with an existing one), tasked with giving free knowledge
 contributors (including potentially to other free knowledge projects
 like OSM) access to proprietary resources, and pursuing public and
 private funding of its own. The development of many relationships may
 take longer, but it is more sustainable in the long run. Moreover, it
 has the potential to lead to powerful collaborations with existing
 public/nonprofit digitization and preservation efforts.

  Publicise the fact that Google and others profit
from volunteer work, and
 give very little back. The world could do with more articles like this:

https://www.washingtonpost.com/news/the-intersect/wp/2015/07/22/you-dont-kn…

 I have plenty of criticisms of Facebook, but the fact that users don't
 get paid for posting selfies isn't one of them. My thoughts on how the
 free culture movement (not limited to Wikipedia) should interface with
 the for-profit sector are as follows, FWIW:

 1) Demand appropriate levels of taxation on private profits, [2]
 sufficient investments in public education and cultural institutions,
 and "open licensing" requirements on government contracts with private
 corporations.

 2) Require compliance with free licenses, first gently, then more
 firmly. This is a game of diminishing returns, and it's most useful to
 go after the most blatant and problematic cases. As noted above, "fair
 use" limits should be understood and taken into consideration.

 3) Encourage corporations to be "good citizens" of the free culture
 world, whether it's through indicating provenance beyond what's
 legally required, or by contributing directly (open source
 development, knowledge/data donations, in-kind goods/services,
 financial contributions). The payoff for them is goodwill and a
 thriving (i.e. also profitable) open Internet that more people in more
 places use for more things.

 4) Build community-driven, open, nonprofit alternatives to
 out-of-control corporate quasi-monopolies. As far as proprietary
 knowledge graphs are concerned, I will reiterate: open data is the
 solution, not the problem.

 Cheers,
 Erik

 [1] See the getValue function in
 https://en.wikipedia.org/wiki/Module:WikidataIB , specifically its
 "onlysourced" parameter. The module also adds a convenient "Edit this
 on Wikidata" link to each claim included from there.

 [2] As far as Wikimedia organizations are concerned, specific tax
 policy will likely always be out of scope of political advocacy, but
 the other points need not be.

 _______________________________________________
 Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
 New messages to: Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] September 28: Strategy update - Final draft of movement direction and endorsement process (#25)