Hello all,
As you know, the Wikimedia hackathon https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017 will take place on May 19-21 in Vienna.
During this event, we will organize a documentation sprint to help volunteers to improve the user-level documentation for Wikidata.
During these 3 days, you'll be able to join (IRL or remotely, at any moment) to work on improving and translating the help pages. We suggest a focus on these 3 topics:
- Beginner documentation about Wikidata https://phabricator.wikimedia.org/T159216 - Lua for Wikimedians https://phabricator.wikimedia.org/T159217 - Wikibase installation https://phabricator.wikimedia.org/T159218
We will also organize some workshops (translation tools, illustration...) to help you build a better documentation.
You will find all the information on the related page https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017/Wikidata_documentation_sprint. If you're interested, feel free to add yourself in the attendees list.
We're now building a list of simple tasks that volunteers could work on during the event. If you have any ideas, parts of the documentation that should really be improved... feel free to add your ideas on the talk page https://www.mediawiki.org/wiki/Talk:Wikimedia_Hackathon_2017/Wikidata_documentation_sprint, or if you feel comfortable with Phabricator, directly create subtasks of the tasks listed above.
Thanks a lot, and maybe see you there!
Might want to consider beginner, streamlined documentation to learn SPARQL (including WikiData specifics).
Most census/demographic statisticians, economists and business people are coming from the relational database world (SQL and spreadsheets), so making the hop over to RDF / linked data involves quite a learning curve. They get frustrated because they are highly productive with SQL, yet flounder on the SPARQL side without graduated help.
SPARQL syntax is not entirely "intuitive" for SQL users. A really good tutorial that starts at ground zero and takes a user up to a minimal level of independent SPARQL proficiency would be great. If users can fairly quickly write simple queries and get useful results they are better motivated to go further, and learn more nuances.
The label service is also not totally "beginner" friendly (for data users). The subject: "[Wikidata] Label gaps on Wikidata" illustrates some struggles that pop up immediately for many new users. General level SPARQL textbooks have zero coverage of WikiData specifics.
Perhaps high quality documentation already exists? Would be great to have at least a syllabus (learn this first, then move on to this, then on to... Might be good to also have common / high value "use-case" scenarios with pointers to documentation/tutorials that cover it. Existing example queries are very helpful but many are complex. For training purposes we need a graduated set of examples, that are designed step-by-step to teach how to construct queries.
Rick
On 3/2/2017 6:03 AM, Léa Lacroix wrote:
Hello all,
As you know, the Wikimedia hackathon https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017 will take place on May 19-21 in Vienna.
During this event, we will organize a documentation sprint to help volunteers to improve the user-level documentation for Wikidata.
During these 3 days, you'll be able to join (IRL or remotely, at any moment) to work on improving and translating the help pages. We suggest a focus on these 3 topics:
- Beginner documentation about Wikidata https://phabricator.wikimedia.org/T159216
- Lua for Wikimedians https://phabricator.wikimedia.org/T159217
- Wikibase installation https://phabricator.wikimedia.org/T159218
We will also organize some workshops (translation tools, illustration...) to help you build a better documentation.
You will find all the information on the related page https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017/Wikidata_documentation_sprint. If you're interested, feel free to add yourself in the attendees list.
We're now building a list of simple tasks that volunteers could work on during the event. If you have any ideas, parts of the documentation that should really be improved... feel free to add your ideas on the talk page https://www.mediawiki.org/wiki/Talk:Wikimedia_Hackathon_2017/Wikidata_documentation_sprint, or if you feel comfortable with Phabricator, directly create subtasks of the tasks listed above.
Thanks a lot, and maybe see you there!
-- Léa Lacroix Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de http://www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 3/2/17 11:48 AM, Rick Labs wrote:
Perhaps high quality documentation already exists? Would be great to have at least a syllabus (learn this first, then move on to this, then on to... Might be good to also have common / high value "use-case" scenarios with pointers to documentation/tutorials that cover it. Existing example queries are very helpful but many are complex. For training purposes we need a graduated set of examples, that are designed step-by-step to teach how to construct queries.
The trouble here isn't really SQL to SPARQL etc.. In my experience, it's more to do with understanding what data is and the nature of data representation. Having arrived at the aforementioned conclusion over the years, I published a presentation titled "Understanding Data" as an aid in this area [1].
SQL and SPARQL aren't very good starting points because literature associated with both assume some fundamental understanding about the nature of data (relations) against which they operate.
If one starts the journey with data representation comprehension combined with clarity about RDF as a language, my hope is that folks reach a point where creating RDF statements always includes (so SPARQL compliant servers don't need to inject workarounds for label injection into query solutions):
1. Addition of annotation relations esp., the likes of rdfs:label, skos:prefLabel, skos:altLabel, schema:name, foaf:name, rdfs:comment, schema:description etc..
2. Addition (where possible) use of relations such as foaf:depiction, schema:image etc..
Adhering to the above leads to RDF statement collections that are easier to read, without the confusing nature of the term "graph" getting in the way. At the end of the day, RDF is simply an abstract language for creating structured data using a variety of notations (RDF-Turtle, RDF-NTriples, JSON-LD, RDF-XML etc..). It isn't a format, but sadly that's how it is still perceived by most circa., 2017 (even though the initial RDF definition snafu on this front occurred around 2000).
SPARQL is a Query Language for operating on data represented as a collection of RDF statements grouped by statement Predicate, as opposed to SQL which is oriented towards data represented as Records grouped by Table.
Links:
[1] https://www.slideshare.net/kidehen/understanding-29894555 -- Understanding Data
[2] http://www.openlinksw.com/data/turtle/general/GlossaryOfTerms.ttl -- Glossary that might also help with terminology
[3] https://www.quora.com/What-is-the-Semantic-Web/answer/Kingsley-Uyi-Idehen
Kingsley,
Wanted to thank you very much for your valuable post! Its a great introduction to making the transition from a table/Excel/spreadsheet view of data over to, as you say, /"a// //collection of RDF statements grouped by statement Predicate"/
Those of us working on the Company Data project typically come with that table orientation background. Having a "learning path" laid out transitioning to the SPARQL world is very helpful.
I'm very fuzzy on basic "inheritance" here at Wikidata.
For example Company->Financial Statements->Income Statement for 2016Q4->total revenue->some number
* Total revenue needs the /*time period*/ attached to it (here start and end dates for the quarter); others need point-in-time measurements, e.g. as of 12/31/2016) * The total revenue needs to have an associated */currency /*attached to it. * The Income Statement for 2016Q4 needs to have a specific */accounting standard/* attached to it (for example US GAAP 2017, IFRS 2016, more at https://www.sec.gov/info/edgar/edgartaxonomies.shtml, and more outside the U.S.. The accounting standard followed in preparing the numbers must be very specific to help with concordance across different standards (especially across countries) * The company needs to have a "dominate" or "default" /*industry code*/ attached to it. WikiData might best go with 56 industries classified according to the '''International Standard Industrial Classification revision 4 (ISIC Rev. 4)'''. This is the set used by the World Input-Output tables http://www.wiod.org/home. They take data from all 28 EU countries and 15 other major countries in the world and transform it to be comparable using these industries. Its the broadest "nearly global" coverage I can find. It would be also advisable to accommodate multiple industry assignments per entity / establishment, each with the standard and year which were followed, applied from a specifically enumerated list. For example in North America data will often be available according to the most current, and highly granular 2017 NAICS system https://www.census.gov/eos/www/naics/ and there are concordances between versions see: https://www.census.gov/eos/www/naics/concordances/concordances.html and https://unstats.un.org/unsd/cr/registry/isic-4.asp. Looking towards the future where large amounts of company data are machine imported it would be best to preserve the original, most detailed industry codes available (such as the 6 digit NACIS code) and preserve the standard and year associated with that assigned code(s). Given the year and the detail the concordances can later be used to machine add different codes as needed. Granular users are then accommodated, and people looking to do cross country / global analysis (at the 56 industry level) are also accommodated.
When I look at the above challenge I think of your prescription of how to make RDF collections easier to read.
1. Addition of annotation relations esp., the likes of rdfs:label, skos:prefLabel, skos:altLabel, schema:name, foaf:name, rdfs:comment, schema:description etc..
2. Addition (where possible) use of relations such as foaf:depiction, schema:image etc..
Adhering to the above *leads to RDF statement collections that are easier** **to read*, without the confusing nature of the term "graph" getting in the way. At the end of the day, RDF is simply an abstract language for creating structured data using a variety of notations (RDF-Turtle, RDF-NTriples, JSON-LD, RDF-XML etc..). *It isn't a format, but sadly** **that's how it is still perceived* by most circa., 2017 (even though the initial RDF definition snafu on this front occurred around 2000).
And I can't help but be intensely curious as to what happened in that 2000 initial RDF definition snafu?
Rick
On 3/2/2017 2:23 PM, Kingsley Idehen wrote:
On 3/2/17 11:48 AM, Rick Labs wrote:
Perhaps high quality documentation already exists? Would be great to have at least a syllabus (learn this first, then move on to this, then on to... Might be good to also have common / high value "use-case" scenarios with pointers to documentation/tutorials that cover it. Existing example queries are very helpful but many are complex. For training purposes we need a graduated set of examples, that are designed step-by-step to teach how to construct queries.
The trouble here isn't really SQL to SPARQL etc.. In my experience, it's more to do with understanding what data is and the nature of data representation. Having arrived at the aforementioned conclusion over the years, I published a presentation titled "Understanding Data" as an aid in this area [1].
SQL and SPARQL aren't very good starting points because literature associated with both assume some fundamental understanding about the nature of data (relations) against which they operate.
If one starts the journey with data representation comprehension combined with clarity about RDF as a language, my hope is that folks reach a point where creating RDF statements always includes (so SPARQL compliant servers don't need to inject workarounds for label injection into query solutions):
- Addition of annotation relations esp., the likes of rdfs:label,
skos:prefLabel, skos:altLabel, schema:name, foaf:name, rdfs:comment, schema:description etc..
- Addition (where possible) use of relations such as foaf:depiction,
schema:image etc..
Adhering to the above leads to RDF statement collections that are easier to read, without the confusing nature of the term "graph" getting in the way. At the end of the day, RDF is simply an abstract language for creating structured data using a variety of notations (RDF-Turtle, RDF-NTriples, JSON-LD, RDF-XML etc..). It isn't a format, but sadly that's how it is still perceived by most circa., 2017 (even though the initial RDF definition snafu on this front occurred around 2000).
SPARQL is a Query Language for operating on data represented as a collection of RDF statements grouped by statement Predicate, as opposed to SQL which is oriented towards data represented as Records grouped by Table.
Links:
[1] https://www.slideshare.net/kidehen/understanding-29894555 -- Understanding Data
[2] http://www.openlinksw.com/data/turtle/general/GlossaryOfTerms.ttl -- Glossary that might also help with terminology
[3] https://www.quora.com/What-is-the-Semantic-Web/answer/Kingsley-Uyi-Idehen
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 3/15/17 12:15 AM, Rick Labs wrote:
Kingsley,
Wanted to thank you very much for your valuable post! Its a great introduction to making the transition from a table/Excel/spreadsheet view of data over to, as you say, /"a// //collection of RDF statements grouped by statement Predicate"/
Those of us working on the Company Data project typically come with that table orientation background. Having a "learning path" laid out transitioning to the SPARQL world is very helpful.
I'm very fuzzy on basic "inheritance" here at Wikidata.
For example Company->Financial Statements->Income Statement for 2016Q4->total revenue->some number
- Total revenue needs the /*time period*/ attached to it (here start and end dates for the quarter); others need point-in-time measurements, e.g. as of 12/31/2016)
- The total revenue needs to have an associated */currency /*attached to it.
- The Income Statement for 2016Q4 needs to have a specific */accounting standard/* attached to it (for example US GAAP 2017, IFRS 2016, more at https://www.sec.gov/info/edgar/edgartaxonomies.shtml, and more outside the U.S.. The accounting standard followed in preparing the numbers must be very specific to help with concordance across different standards (especially across countries)
- The company needs to have a "dominate" or "default" /*industry code*/ attached to it. WikiData might best go with 56 industries classified according to the '''International Standard Industrial Classification revision 4 (ISIC Rev. 4)'''. This is the set used by the World Input-Output tables http://www.wiod.org/home. They take data from all 28 EU countries and 15 other major countries in the world and transform it to be comparable using these industries. Its the broadest "nearly global" coverage I can find. It would be also advisable to accommodate multiple industry assignments per entity / establishment, each with the standard and year which were followed, applied from a specifically enumerated list. For example in North America data will often be available according to the most current, and highly granular 2017 NAICS system https://www.census.gov/eos/www/naics/ and there are concordances between versions see: https://www.census.gov/eos/www/naics/concordances/concordances.html and https://unstats.un.org/unsd/cr/registry/isic-4.asp. Looking towards the future where large amounts of company data are machine imported it would be best to preserve the original, most detailed industry codes available (such as the 6 digit NACIS code) and preserve the standard and year associated with that assigned code(s). Given the year and the detail the concordances can later be used to machine add different codes as needed. Granular users are then accommodated, and people looking to do cross country / global analysis (at the 56 industry level) are also accommodated.
When I look at the above challenge I think of your prescription of how to make RDF collections easier to read.
1. Addition of annotation relations esp., the likes of rdfs:label, skos:prefLabel, skos:altLabel, schema:name, foaf:name, rdfs:comment, schema:description etc.. 2. Addition (where possible) use of relations such as foaf:depiction, schema:image etc.. Adhering to the above *leads to RDF statement collections that are easier** **to read*, without the confusing nature of the term "graph" getting in the way. At the end of the day, RDF is simply an abstract language for creating structured data using a variety of notations (RDF-Turtle, RDF-NTriples, JSON-LD, RDF-XML etc..). *It isn't a format, but sadly** **that's how it is still perceived* by most circa., 2017 (even though the initial RDF definition snafu on this front occurred around 2000).
And I can't help but be intensely curious as to what happened in that 2000 initial RDF definition snafu?
Creating and perpetuating the misconception that RDF/XML == RDF. That was compounded by a Layer Cake diagram that actually depicted the misconception that RDF was built atop XML.
Today folks still get distracted by JSON-LD vs RDF-Turtle vs RDF-XML vs RDFa vs Microdata notations for constructing RDF Language sentences/statements. Net effect, unleashing the real power behind a Semantic Web continues to hit unnecessary hiccups.
Hello Wikidatans,
I have been very vocal about the lack of documentation -- and critical need for documentation regarding Wikidata.
I haven't had a chance to watch the recent intro to Wikidata video that Asaf did here https://www.youtube.com/watch?v=eVrAx3AmUvA but I think it is in essence and concept as an approach pretty much inadequate as well.
Newbies need something more simple than this, that isn't a talking head talking at them.
Not super thrilled this is a crowd-sourced effort. I think this should be created by Wikidata and should be as graphic and simple as possible. It should be a somewhat professionally produced type of output, similar to the basic Wiki intros I use at editathons: - 5 Pillars - https://en.wikipedia.org/wiki/Wikipedia:Five_pillars - Intro to Wiki (self-paced, takes 10 minutes each track) - https://en.wikipedia.org/wiki/Help:Introduction - Wiki Tutorial with very helpful videos - https://en.wikipedia.org/wiki/Wikipedia:Tutorial
As I try to say to myself, Keep It Simple, Stupid.... :-) I'm saying that about myself, not anyone else.... But yeah.
The biggest problem:
This approach and output would require a different, less technical approach to Wikidata, which has been a consistent issue -- at least for me -- regarding Wikidata's approaches and the willingness of the large majority of participants to care about end user issues, especially folks like me coming from English Wikipedia.
I think it's important to question who the audience of these materials are going to be.
Is it for Wikipedia editors (again apologies, I'm assuming English Wikipedia)?
Is it for newbies attending editathons and learning how to edit?
Is it for tech folks who can handle a lot more higher level database and data manipulation concepts?
I don't know. I am I guess not a beginner when it comes to Wikidata but I still feel inadequate and frustrated oftentimes with the interface and knowing stuff. And I use Wikidata ALL THE TIME.
So my 2 cents on this.
I wish I could participate in person at the event -- although I might have a contrarian perspective, as usual.
I would also like to help more actively on this, as it is so important to me personally.
I would be happy to help if I can in any way.
Erika Please excuse incoherence, not a lot of sleep lately
*Erika Herzog* Wikipedia *User:BrillLyle https://en.wikipedia.org/wiki/User:BrillLyle*
On Thu, Mar 2, 2017 at 6:03 AM, Léa Lacroix lea.lacroix@wikimedia.de wrote:
Hello all,
As you know, the Wikimedia hackathon https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017 will take place on May 19-21 in Vienna.
During this event, we will organize a documentation sprint to help volunteers to improve the user-level documentation for Wikidata.
During these 3 days, you'll be able to join (IRL or remotely, at any moment) to work on improving and translating the help pages. We suggest a focus on these 3 topics:
- Beginner documentation about Wikidata
https://phabricator.wikimedia.org/T159216
- Lua for Wikimedians https://phabricator.wikimedia.org/T159217
- Wikibase installation https://phabricator.wikimedia.org/T159218
We will also organize some workshops (translation tools, illustration...) to help you build a better documentation.
You will find all the information on the related page https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017/Wikidata_documentation_sprint. If you're interested, feel free to add yourself in the attendees list.
We're now building a list of simple tasks that volunteers could work on during the event. If you have any ideas, parts of the documentation that should really be improved... feel free to add your ideas on the talk page https://www.mediawiki.org/wiki/Talk:Wikimedia_Hackathon_2017/Wikidata_documentation_sprint, or if you feel comfortable with Phabricator, directly create subtasks of the tasks listed above.
Thanks a lot, and maybe see you there!
-- Léa Lacroix Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Lea,
just a quick question: When will we find out if we actually can participate? I have registered already but haven't heard anything yet.
Thanks a lot for a quick reply! All best, Heidi
2017-03-02 12:03 GMT+01:00 Léa Lacroix lea.lacroix@wikimedia.de:
Hello all,
As you know, the Wikimedia hackathon https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017 will take place on May 19-21 in Vienna.
During this event, we will organize a documentation sprint to help volunteers to improve the user-level documentation for Wikidata.
During these 3 days, you'll be able to join (IRL or remotely, at any moment) to work on improving and translating the help pages. We suggest a focus on these 3 topics:
- Beginner documentation about Wikidata
https://phabricator.wikimedia.org/T159216
- Lua for Wikimedians https://phabricator.wikimedia.org/T159217
- Wikibase installation https://phabricator.wikimedia.org/T159218
We will also organize some workshops (translation tools, illustration...) to help you build a better documentation.
You will find all the information on the related page https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017/Wikidata_documentation_sprint. If you're interested, feel free to add yourself in the attendees list.
We're now building a list of simple tasks that volunteers could work on during the event. If you have any ideas, parts of the documentation that should really be improved... feel free to add your ideas on the talk page https://www.mediawiki.org/wiki/Talk:Wikimedia_Hackathon_2017/Wikidata_documentation_sprint, or if you feel comfortable with Phabricator, directly create subtasks of the tasks listed above.
Thanks a lot, and maybe see you there!
-- Léa Lacroix Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 2 March 2017 at 20:48, Adelheid Heftberger adelheidh@gmail.com wrote:
Hi Lea,
just a quick question: When will we find out if we actually can participate? I have registered already but haven't heard anything yet.
Hello, I don't know. I'll ask the organisation team :)
Thanks a lot for a quick reply! All best, Heidi
2017-03-02 12:03 GMT+01:00 Léa Lacroix lea.lacroix@wikimedia.de:
Hello all,
As you know, the Wikimedia hackathon https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017 will take place on May 19-21 in Vienna.
During this event, we will organize a documentation sprint to help volunteers to improve the user-level documentation for Wikidata.
During these 3 days, you'll be able to join (IRL or remotely, at any moment) to work on improving and translating the help pages. We suggest a focus on these 3 topics:
- Beginner documentation about Wikidata
https://phabricator.wikimedia.org/T159216
- Lua for Wikimedians https://phabricator.wikimedia.org/T159217
- Wikibase installation https://phabricator.wikimedia.org/T159218
We will also organize some workshops (translation tools, illustration...) to help you build a better documentation.
You will find all the information on the related page https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2017/Wikidata_documentation_sprint. If you're interested, feel free to add yourself in the attendees list.
We're now building a list of simple tasks that volunteers could work on during the event. If you have any ideas, parts of the documentation that should really be improved... feel free to add your ideas on the talk page https://www.mediawiki.org/wiki/Talk:Wikimedia_Hackathon_2017/Wikidata_documentation_sprint, or if you feel comfortable with Phabricator, directly create subtasks of the tasks listed above.
Thanks a lot, and maybe see you there!
-- Léa Lacroix Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Perhaps of interest: https://www.wikidata.org/wiki/Wikidata:WikiProject_Welcome, particularly the content that is placed on user talk pages in the welcome templates. (:
Erika, perhaps remotely you could collaborate on those templates and the pages to which they link?
Pine