Hi all;
I'm thinking about notability in Wikidata and how it may conflict with
Wikipedia current policies and community conceptions. Will Wikidata allow
to create entities for small villages, asteroids, galaxies, stars, species,
etc, that are not allowed today at Wikipedia? Including about those that
don't have article in any Wikipedia?
I will be happy if so.
Regards,
emijrp
Hi all;
I have read that every fact for every entity must include a reference. How
is Wikidata going to deal with dead links? I hope we can work on this
developing an archivist bot, to archive links into WebCitation or using
Internet Archive. This is an old problem in all Wikipedias, and it is
correctly addressed (the only example I know is French Wikipedia using
Wikiwix.com to archive references and external links).
Regards,
emijrp
On Sun, Apr 1, 2012 at 1:58 PM, JFC Morfin <jefsey(a)jefsey.com> wrote:
> Dear Lydia,
>
>> Hmmm I have to confess I don't understand this completely and therefor
>> can't give you an answer. Could you give me an example of what you are
>> talking about and how you see Wikidata fit in?
>
>
> Hmmmmmmm :-) I am somewhat at loss here. Let start from some basic, then.
>
> 1) is there a concise consensual definition of what Wikidata is
> expected to achieve?
http://meta.wikimedia.org/wiki/Wikidata right at the top
> 2) Has this Wikidata project some Charter or terms of reference (TOR),
> or dedicated web site, you could give the URL? http://wikidata.org is
> redirected to http://en.wikipedia.org/wiki/Main_Page
No.
> 3) Is there a list of the needs of wikimedia projects that wikidata is
> to address ?
Some are mentioned on http://meta.wikimedia.org/wiki/Wikidata/FAQ I believe
> 4) What is/are going to be the language(s) of Wikidata?
the languages of the wikipedias
> 5) What is the Wikidata's architectural framework? WDE (whole digital
> ecosystem)? Human communications? The Internet services? Users
> applications? Wikimedia?
>
> 6) What are the normative choices, strategic objectives and
> constraints imposed by the sponsors and the WMF?
This is a really broad question ;-) If you have more concrete
questions I can try to find answers.
> 7) How the Wikidata project is to liaise with its potential users'
> representatives (depending on the response to (5))
You mean the initial development team? Through me.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata
Wikimedia Deutschland e.V.
Eisenacher Straße 2
10777 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Oren Bochman wrote:
> Each project has its own criteria of Notability, since it requires its
own noise
> filter.
As far as I understand, WikiData is about any item, not only about
things
that have an article in en.Wikipedia or any other Wikipedia. The
question
"What items may be included in WikiData" should be added to the WikiData
FAQ. I can think of a very broad definition: every thinkable thing or
object can
be included as item with its own page in WikiData if the thing is topic
of an
article in Wikipedia or in any other Wikimedia project (e.g. Wiktionary
or Wikisource) or if the thing is referenced in any of this articles.
This includes
for example:
- normal Wikipedia articles of any language
- things that are included as items in a list-of-X-article
- words (Wiktionary)
- digital objects (Commons)
- single pages from digitized books (Wikisource)
- publications used as references
Managing the vast number of items will only be possible with backlinks,
for
instance you can get the list of Wikidata items linked to or used in
some
selected Wikipedia article and you can get a list of items not used in
any
article for cleanup.
Jakob
--
Verbundzentrale des GBV (VZG)
Digitale Bibliothek - Jakob Voß
Platz der Goettinger Sieben 1
37073 Goettingen - Germany
+49 (0)551 39-10242
http://www.gbv.de
jakob.voss(a)gbv.de
Some initial ideas on the statement. I realize that this is not
priority in the first phase, but perhaps on the wiki a place could be
created to collect some thoughts like those below?
http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model#The_Metamodel explains:
Statement = (StatementID, Property, Value, Qualifier*, Reference*)
The StatementID is a unique identifier for the given statement and
used only internally and for export.
A Property is defined on a property page. This definition includes a type.
The structure of the Value is given by the type of the property.
Simple types could demand an EntityID or a number. More complex types
could demand dates, date ranges, numbers with units, a geocoordinate,
a geo shape, etc.
----
1. Will Property include information on
observation/recording/measurement methodology?
2. What happens if the main entity has variants or parts for which
values are recorded?
An example is car models, where typically the revisions sold under the
same name are subsumed in one Wikipedia article. Example:
http://en.wikipedia.org/wiki/Renault_Kangoo
with different length / weight in the "subclass" info boxes for
first/second generation.
Cars also easily serve as an example for variable parts, see in
http://en.wikipedia.org/wiki/%C5%A0koda_Roomster the list of engine
specifications. The engines are not Wikipedia entities in their own
right.
3. Values may be well defined RDF-resources, but now available in Wikipedia.
In my work, many statements I would like to express in a future
Wikidata are not allowed as Wikipedia articles at all. You can express
"Wowereit" is_mayor_of "Berlin, Germany"
but not
"Plantago lanceolata" has_leaf_shape "lanceolate"
because
http://en.wikipedia.org/wiki/Lanceolate
is a redirect to
http://en.wikipedia.org/wiki/Leaf_shape
I personally would love to have illustrated definitions of things
people want to learn about being allowed on Wikipedia, but the
argument is generally that Wikipedia is not Wiktionary.
I believe Wikidata should right from the start be defined to allow
references to Wiktionary as well as Wikipedia. And while we are at it,
references to Commons as well (semantic image annotation...)
This would change
Wikipedialink = (Title, LanguageId, Badge?)
to
Link = (Project, LanguageId, Title, Badge?)
---
Gregor
Binaris, Oren,
At 13:43 02/04/2012, Oren Bochman wrote:
>>>WikiData's mandate appears to be to improve en.Wikipedia (and then
>>>the other WMF projects). If WikiData cannot serve the needs of
>>>en.wikipedia it will be of little consequence to the other
>>>projects since it will need a community to maintain.
>
>>This is a new piece of information for me. Does enwiki really have
>>any priority to other Wikipedias? :-O I would be unhappy with that.
>
>My point is a pragmatic one not a political one. Note that
>Wikidata is planned to add language prototypes in the third part of
>the time line - that is my reference.
>
>If you read my note you will notice I propose to take a realistic
>view of the communities. It would be a mistake to forget
>that en.wikipeia is where the bulk of the information exists, the
>largest community and the source of the most difficult policies i.e.
>the greatest challenges.
>
>However getting data into Wikidata is only half of the problem. To
>get it out as to ar...zu.Wikpedias without causing chaos -- is what
>we are here to figure out.
We have a good experience of that type of problem with the "layer
below": languages (data are to be expressed in a language). This is
why I raised the point with Lydia and helped her in asking for a more
organized road map when she said she did not fully understand.
This experience has been with the Governance (countries, and
languages), with the character sets (ISO 10646 and Unicode), with the
langtags (RFC 5646) and filtering (RFC 4647), with the non-ASCII
domain names and mail addresses. In theses cases we met with the
interlinguistic (languages), inter-readibility (scripts),
interconnectibility (infrastructure) and the interoperability
(protocols) issues which made the Internet so far. The Internet+
layer set is new, in use, and clear but by far not stabilized yet (I
quoted the IETF Drafts on this) . It concerns fringe to fringe
interintelligibility (the next upper network stratum will be the
Intersem (semiotic Internet, the Internet of thoughts) which is my
main interest with auxiliary intelligence (not for today !))
Wikimedia follows the same strata advancement, but on top of the web
application. This means that three fundamental questions are raised
by the Wikidata project:
1. is it technically to be a web or a network+ application (this has
no impact on the content, but this does impact the architectural and
referential issues and the future of the Intersem).
2. how is to be interintelligible: in being internationalized (the
leading culture hosts the others) or multilingualised (every language
is treated equal).
3. how is it going to support (actually not to block further
innovation in) intercomprehensibility i.e. to be able to address the
Intersem stratum "capital of Israel" issue.
Point 1 : is to decide who (humanists or technicalists) are to be the
leaders in the Wikidata specifications.
Point 2 : is to decide if we address the logical needs of en.wiki and
then propagate the solutions to the other languages, or if we
actually are pragmatic and acknowledge that the need is for a
multilingual system. So, we try first to address this fully new
problem among a limited set of wikis (let say the three leading ones
[amenable to ASCII, but with diacritics] and then with a non-roman one.
Point 3 : is to understand and accept the complexity that Wikidata is
having to face and to try not to over complexify its future with
today's "great" solutions and habits.
Examples:
1. Network related issues may become a first nightmare as many URIs
will become IRIs (different scripts) and the digital convergence will
need to use DNS CLASSes, at least between different technologies.
This means both that en.wiki will have to support URI/IRI
indifferently. English contributors have no Chinese keyboards.
CLASSes are not even supported yet in the way to publish a domain
name. The more communications layers, the more complexity, the less
security, the less room and power in mobile devices.
2. The ISO and IETF experience shows that the
multicultural/multilinguistic issue is already very complex to
architecturally and technically address (it took ten years, and it is
not by far digested) due to people intuitive agreements with
stereotypes. The Network side solution to this complexity is
"subsidiarity" (i.e. the current Wikipedia solution). Wikidata has
therefore to be conceived as a service to subsidiarity (the wikimedia
and the Internet+ ones).
3. Then, the third problem no one has addressed yet except ISO 3166,
is variance : two identical particulars (effects, names, data, etc.)
may be different. eg. there are many ways to compute and present the
same date. Are the results to be stored in Wikidata in all these ways
every day and bridges to be built? or are they to be stored as a
single data with the formulas to compute them, then how to be sure
some parameters have not changed (i.e. death of the Emperor) and
computation was not tampered with? Variance is everywhere (actually
variance is most probably Life). ISO 3166 has no variance, because it
is the sovereign reference: the list of States and laws languages
(however, Palestine is in it already, Taiwan is there). ISO documents
are in French, English and possibly in Russian. ISO 3166:1 states
which are the normative languages in every country by reference to
ISO 639 (list of language names). ISO 3166 defines the ccTLDs and is
used in langtags to document languages and cultures. ISO 10646
(supported by UNICODE) is the scripts character coded tables. At
binary layer it is full of variants (same graphs being supported by
different code points).
Best
jfc
Markus Krötzsch wrote:
> This is a valid point. It is intended to address this as follows:
>
> * Wikidata items (our "content pages") will be in *exact*
> correspondence to (zero or more) Wikipedia articles in different
> languages.
>
> * Differences in scope will lead to different Wikidata items.
I suppose that this will either lead to fuzzy scopes, because there
are many subtle differences in scope between articles in different
Wikipedia language, or to an inflation of poorly connected, rather
similar Wikidata entities because every Wikipedia language requires
its own Wikidata entities.
> * Relationships such as "broader" or "narrower" can be expressed
> as relations between these items, if desired.
In any way, you must admit that there is no simple 1-to-1 relationship
between articles in different Wikipedias. This fact should be
acknowledged
in the design of Wikidata. To me the current solution looks like you
just ignore the problem at the basic layer and hope that it gets
magically
solved by the community ;-)
> The advantage of this is that the possible relationships are not
> system-defined but can be selected and modified by the community.
This is not an advantage per se but a design decision. You always need
to define some constraints by the system and let some other
possibilities
open. If you say "Relationships _such as_ 'broader' or 'narrower'" you
do
not acknowledge any basic kinds of relationships between Wikipedia
articles in different languages. There may be arguments to do so, but
I don't see them at the moment.
Jakob
P.S: Could you please all try to use the terms from Wikidata glossary?
For instance there is no such thing as a "Wikidata item" but an "Entity"
or a "Page".
--
Verbundzentrale des GBV (VZG)
Digitale Bibliothek - Jakob Voß
Platz der Goettinger Sieben 1
37073 Goettingen - Germany
+49 (0)551 39-10242
http://www.gbv.de
jakob.voss(a)gbv.de
remove me
Original Message:
-----------------
From: JFC Morfin jefsey(a)jefsey.com
Date: Mon, 02 Apr 2012 02:37:05 +0200
To: wikidata-l(a)lists.wikimedia.org
Subject: Re: [Wikidata-l] Archiving references for facts?
At 00:46 02/04/2012, John Erling Blad wrote:
>US users can use <http://archive.org>archive.org, but foreigners
>might get into troubles. Remember that wikipedia/-data isn't US
>only. Both (?) Archive.org and webcitation removes conten on
>request, it shall be some tricks to automate it for catalogues and
>sites. Check it out.
Another issue is to be considered in that line of thinking :
copyrights protection by new/coming laws that permit lawyers to block
a site preventively (eg. Cyberdefense proposition in the USA).
Wikidata should permit a permanent legal operational to attend to a
lawyer's demand in one country without blocking users from other
countries. This would mean modularity, i.e. a conflict with the
concept of a central source.
Another issue is that "truth is not always good to say". In lingual
versions, not telling a full truth is acceptable. Not for a central
reference system like Wikidata. With probable retaliations against
its credibility.
I will pick a well known example. Many interests (including Govs)
wish to hide people the way the DNS is really designed.
http://en.wikipedia.org/wiki/Domain_Name_System does not hide that
the DNS uses CLASSes and it correctly states that "Each class is an
independent name space with potentially different delegations of DNS
zones". But it only states: "The CLASS of a record is set to IN (for
Internet) for common DNS records involving Internet hostnames,
servers, or IP addresses. In addition, the classes Chaos (CH) and
Hesiod (HS) exist."
This actually hides Wikipedia's readers that the DNS actually counts
65,536 CLASSes, and that a CLASS actually means a fully independent
Root file, meaning that the Internet could run perfectly well with
65,536 ICANN/NTIA similar set-ups. And incidentally without root file
systems. This is not something that Wikidata acting as a world unique
source on the DNS could hide. The only response of those who want a
status quo in the people's beliefs about the DNS, would be to fight
the credibility of Wikidata. In spite of decades old URLs to the DNS RFCs.
Such campaign would multiply with all the truths which are not good
to say and that Wikidata will have to collect. We should technically
be prepared to opposes such campaigns in having a very easy system to
use in order to confirm our sources : the actual words of the
concerned texts, not only the URL to a document containing them.
jfc
--------------------------------------------------------------------
mail2web.com What can On Demand Business Solutions do for you?
http://link.mail2web.com/Business/SharePoint
At 00:46 02/04/2012, John Erling Blad wrote:
>US users can use <http://archive.org>archive.org, but foreigners
>might get into troubles. Remember that wikipedia/-data isn't US
>only. Both (?) Archive.org and webcitation removes conten on
>request, it shall be some tricks to automate it for catalogues and
>sites. Check it out.
Another issue is to be considered in that line of thinking :
copyrights protection by new/coming laws that permit lawyers to block
a site preventively (eg. Cyberdefense proposition in the USA).
Wikidata should permit a permanent legal operational to attend to a
lawyer's demand in one country without blocking users from other
countries. This would mean modularity, i.e. a conflict with the
concept of a central source.
Another issue is that "truth is not always good to say". In lingual
versions, not telling a full truth is acceptable. Not for a central
reference system like Wikidata. With probable retaliations against
its credibility.
I will pick a well known example. Many interests (including Govs)
wish to hide people the way the DNS is really designed.
http://en.wikipedia.org/wiki/Domain_Name_System does not hide that
the DNS uses CLASSes and it correctly states that "Each class is an
independent name space with potentially different delegations of DNS
zones". But it only states: "The CLASS of a record is set to IN (for
Internet) for common DNS records involving Internet hostnames,
servers, or IP addresses. In addition, the classes Chaos (CH) and
Hesiod (HS) exist."
This actually hides Wikipedia's readers that the DNS actually counts
65,536 CLASSes, and that a CLASS actually means a fully independent
Root file, meaning that the Internet could run perfectly well with
65,536 ICANN/NTIA similar set-ups. And incidentally without root file
systems. This is not something that Wikidata acting as a world unique
source on the DNS could hide. The only response of those who want a
status quo in the people's beliefs about the DNS, would be to fight
the credibility of Wikidata. In spite of decades old URLs to the DNS RFCs.
Such campaign would multiply with all the truths which are not good
to say and that Wikidata will have to collect. We should technically
be prepared to opposes such campaigns in having a very easy system to
use in order to confirm our sources : the actual words of the
concerned texts, not only the URL to a document containing them.
jfc
Thanks for the link, Helder! You can keep up on the current status of
Kevin's project at
https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks/status
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
> Maybe this GSoC project (from 2011) will be relevant:
> http://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks/Design
>
> Best regards,
> Helder
>
> On Sun, Apr 1, 2012 at 06:31, emijrp <emijrp at gmail.com> wrote:
>> Hi all;
>>
>> I have read that every fact for every entity must include a reference. How
>> is Wikidata going to deal with dead links? I hope we can work on this
>> developing an archivist bot, to archive links into WebCitation or using
>> Internet Archive. This is an old problem in all Wikipedias, and it is
>> correctly addressed (the only example I know is French Wikipedia using
>> Wikiwix.com to archive references and external links).
>>
>> Regards,
>> emijrp