*CFP: Semantic Web Journal - Special Issue on Quality Management of
Semantic Web Assets (Data, Services and Systems):*
http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-…
<http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-…>
Submission guidelines
*Deadline (_only 1 month left_):October 31, 2015
*
Submissions shall be made through the Semantic Web journal website at
http://www.semantic-web-journal.net
<http://www.semantic-web-journal.net/>. Prospective authors must take
notice of the submission guidelines posted at
http://www.semantic-web-journal.net/authors
<http://www.semantic-web-journal.net/authors>. Note that you need to
request an account on the website for submitting a paper. Please
indicate in the cover letter that it is for the Special Issue on Quality
Management of Semantic Web Assets (Data, Services and Systems).
Submissions are possible in the following categories: full research
papers, application reports, reports on tools and systems, and case
studies. While there is no upper limit, paper length must be justified
by content.
Guest editors
* Amrapali Zaveri, University of Leipzig, AKSW Group, Germany
* Dimitris Kontokostas, University of Leipzig, AKSW Group, Germany
* Sebastian Hellmann, University of Leipzig, AKSW Group, Germany
* Jürgen Umbrich, Vienna University of Economics and Business, Austria
*Overview and Topics*
The standardization and adoption of Semantic Web technologies has
resulted in a variety of assets, including an unprecedented volume of
data being semantically enriched and systems and services, which consume
or publish this data. Although gathering, processing and publishing data
is a step towards further adoption of Semantic Web, quality does not yet
play a central role in these assets (e.g., data lifecycle,
system/service development).
Quality management essentially refers to activities and tasks involved
to guarantee a certain level of consistency and to meet the quality
requirements for the assets. In general, quality management consists of
the following four phases and components: (i) quality planning, (ii)
quality control, (iii) quality assurance and (iv) quality improvement.
The quality planning phase in the Semantic Web typically involves the
design of procedures, strategies and policies to support the management
of the assets. The quality control and assurance components have their
primary aim in preventing errors and to meet quality requirements
pertaining to the Semantic Web standards. A core part for both
components are quality assessment methods which provide the necessary
input for the controlling and assurance tasks.
Quality assessment of Semantic Web Assets (data, services and systems),
in particular, presents new challenges that were not handled before in
other research areas. Thus, adopting existing approaches for data
quality assessment is not a straightforward solution. These challenges
are related to the openness of the Semantic Web, the diversity of the
information and the unbounded, dynamic set of autonomous data sources,
publishers and consumers (legal and software agents). Additionally,
detecting the quality of available data sources and making the
information explicit is yet another challenge. Moreover, noise in one
data set, or missing links between different data sets, propagates
throughout the Web of Data, and imposes great challenges on the data
value chain.
In case of systems and services, different implementations follow the
specifications for RDF and SPARQL to varying extents, or even propose
and offer new, non-standardized extensions. This causes strong
incompatibilities between systems, e.g., between the used SPARQL
features in the query engines and support features in RDF stores. The
potential heterogeneity and incompatibility poses several challenges for
the quality assessments in and for such systems and services.
Eventually, quality improvement methods are used to further enhance the
value of the Semantic Web Assets. One important step to improve the
quality of data is identifying the root cause of the problem and then
designing corresponding data improvement solutions. These solutions
select the most effective and efficient strategies and related set of
techniques and tools to improve quality. Quality improvement metrics for
products and services entails understanding and improving operational
processes and establishing valid and reliable service performance measures.
This Special Issue is addressed to those members of the community
interested in providing novel methodologies or frameworks in managing,
assessing, monitoring, maintaining and improving the quality of the
Semantic Web data, services and systems and also introduce tools and
user interfaces which can effectively assist in this management.
Topics of Interest
We welcome original high quality submissions on (but are not restricted
to) the following topics:
* Methodologies and frameworks to plan, control, assure or improve the
quality of Semantic Web Assets
* Quality exploration and analysis interfaces
* Quality monitoring
* Developing, deploying and managing quality service ecosystems
* Assessing the quality evolution of Semantic Web Assets
* Large-scale quality assessment of structured datasets
* Crowdsourcing data quality assessment
* Quality assessment leveraging background knowledge
* Use-case driven quality management
* Evaluation of trustworthiness of data
* Web Data and LOD quality benchmarks
* Data Quality improvement methods and frameworks, e.g., linkage,
alignment, cleaning, enrichment, correctness
* Service/system quality improvement methods and frameworks
* Managing sustainability issues in services
* Guarantee of service (availability, performance)
* Systems for transparent management of open data
You may have heard about the in-progress work on the Code of Conduct for
Wikimedia technical spaces
(https://www.mediawiki.org/wiki/Code_of_conduct_for_technical_spaces/Draft).
It is currently in draft form, and we are in the process of finalizing
the intro, "Principles", "Expected behavior" and "Unacceptable behavior"
sections.
An earlier version of these sections (except for "Expected behavior")
reached consensus.
However, there is now a new draft, and you can weigh in on whether to
use it instead:
https://www.mediawiki.org/wiki/Talk:Code_of_conduct_for_technical_spaces/Dr…
.
I will continue to ask for your feedback as we discuss the remaining
sections later.
Thanks,
Matt Flaschen
Hi Research community (and especially Wikimedia analytics),
Are there any up-to-date & relatively pretty visualizations of the
current mobile pageview data --eg a comparison chart between desktop &
mobile for global traffic for Wikipedia and/or all projects?
(Stats.wikimedia.org just has desktop, afaik). I know Oliver & Toby
presented such a thing in May 2014, but I don't know if there's a
current version.
Thanks in advance! I am trying to put a presentation together, looking
for the latest numbers and ideally a graph I can use.
Phoebe
--
* I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Hi All,
I been working on graphs to visualize the entire edit activity of in wiki
for some time now. I'm documenting all of it at
https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Grap…
.
The graphs can be viewed at
https://cosmiclattes.github.io/wikigraphs/data/wikis.html. Currently only
graphs for 'en' have been put up, I'll add the graphs for the wikis soon.
Methodology
- The editors are split into groups based on the month in which they
made their first edit.
- The active edit sessions (value or percentage etc) for the groups are
then plotted as stacked bars or as a matrix. I've used the canonical
definition of an active edit session. The value are + or - .1% of the
values on https://stats.wikimedia.org/
Selector
- There is a selector on each graph that lets you filter the data in the
graph. On moving the cursor to the left end of the selector you will get a
resize cursor. The selection can then are moved or redrawn.
- In graphs 1,2 the selector filters by percentage.
- In graphs 3,4,5 the selector filters by the age of the cohort.
Preliminary Finding
- Longevity of editors fell drastically starting Jan 06 and has since
stabilized at levels from Jan 07.
https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Grap…
Would you to hear what you guys think of the graphs & any ideas you would
have for me.
Jeph
[Begging pardon if you read this multiple times]
Hello everyone,
I would like to draw your attention to the StrepHit IEG proposal, which
is now in its final form:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
To cut a long story short, StrepHit is a Natural Language Processing
pipeline that understands human language, extracts structured data from
raw text and produces Wikidata statements with reference URLs.
I have already received support and feedback, but your voice is vital
and it can be heard on the project page in multiple
ways. If you:
1. like the idea, please click on the *endorse* blue button;
2. want to get involved, please click on the *join* blue button;
3. share your thoughts, please click on the *give feedback* link.
Looking forward to your updates.
Cheers!
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j
Hi everybody,
We’re preparing for the September 2015 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201509 and add your name next to any paper you are interested in covering. Our target publication date is Wednesday September 30 UTC. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
Editorial Bias in Crowd-Sourced Political Information
Disease identification and concept mapping using Wikipedia
Recognizing Biographical Sections in Wikipedia
The Descent of Pluto: Interactive dynamics, specialization and reciprocity of roles in a Wikipedia debate
How will your workload look like in 6 years? Analyzing Wikimedia's workload
Gender imbalance and Wikipedia
“A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History
How much is Wikipedia Lagging Behind News?
Measuring the Effectiveness of Wikipedia Articles: How Does Open Content Succeed?
Wikipedia entries on fiction and non-propositional knowledge representation
Students' use of Wikipedia as an academic resource — Patterns of use and perceptions of usefulness
If you have any question about the format or process feel free to get in touch off-list.
Masssly, Tilman Bayer and Dario Taraborelli
[1] http://meta.wikimedia.org/wiki/Research:Newsletter
cross-posting as this might be of interest to people on this list
> Begin forwarded message:
>
> From: Marco Fossati <hell.j.fox(a)gmail.com>
> Subject: [Wikidata] SrepHit IEG proposal: call for support (was Re: [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool)
> Date: September 21, 2015 at 3:32:25 AM PDT
> To: wikidata(a)lists.wikimedia.org
> Reply-To: "Discussion list for the Wikidata project." <wikidata(a)lists.wikimedia.org>
>
> Dear all,
>
> The StrepHit IEG proposal is now pretty much complete:
> https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
>
> We have already received support and feedback, but you are the most relevant community and the project needs your specific help.
>
> Your voice is vital and it can be heard on the project page in multiple ways. If you:
> 1. like the idea, please click on the *endorse* blue button;
> 2. want to get involved, please click on the *join* blue button;
> 3. share your thoughts, please click on the *give feedback* link.
>
> Looking forward to your updates.
> Cheers!
>
> On 9/9/15 11:39, Marco Fossati wrote:
>> Hi Markus, everyone,
>>
>> The project proposal is currently in active development.
>> I would like to focus now on the dissemination of the idea and the
>> engagement of the Wikidata community.
>> Hence, I would love to gather feedback on the following question:
>>
>> Does StrepHit sounds interesting and useful for you?
>>
>> It would be great if you could report your thoughts on the project talk
>> page:
>> https://meta.wikimedia.org/wiki/Grants_talk:IEG/StrepHit:_Wikidata_Statemen…
>>
>>
>> Cheers!
>>
>> On 9/8/15 2:02 PM, wikidata-request(a)lists.wikimedia.org wrote:
>>> Date: Mon, 07 Sep 2015 16:47:16 +0200
>>> From: Markus Krötzsch<markus(a)semantic-mediawiki.org>
>>> To: "Discussion list for the Wikidata project."
>>> <wikidata(a)lists.wikimedia.org>
>>> Subject: Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the
>>> primary sources tool
>>> Message-ID:<55EDA374.2090901(a)semantic-mediawiki.org>
>>> Content-Type: text/plain; charset=utf-8; format=flowed
>>>
>>> Dear Marco,
>>>
>>> Sounds interesting, but the project page still has a lot of gaps. Will
>>> you notify us again when you are done? It is a bit tricky to endorse a
>>> proposal that is not finished yet;-)
>>>
>>> Markus
>>>
>>> On 04.09.2015 17:01, Marco Fossati wrote:
>>>> >[Begging pardon if you have already read this in the Wikidata
>>>> project chat]
>>>> >
>>>> >Hi everyone,
>>>> >
>>>> >As Wikidatans, we all know how much data quality matters.
>>>> >We all know what high quality stands for: statements need to be
>>>> >validated via references to external, non-wiki, sources.
>>>> >
>>>> >That's why the primary sources tool is being developed:
>>>> >https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
>>>> >And that's why I am preparing the StrepHit IEG proposal:
>>>> >https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
>>>>
>>>> >
>>>> >
>>>> >StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
>>>> >a Natural Language Processing pipeline that understands human language,
>>>> >extracts structured data from raw text and produces Wikidata statements
>>>> >with reference URLs.
>>>> >
>>>> >As a demonstration to support the IEG proposal, you can find the
>>>> >**FBK-strephit-soccer** dataset uploaded to the primary sources tool
>>>> >backend.
>>>> >It's a small dataset serving the soccer domain use case.
>>>> >Please follow the instructions on the project page to activate it and
>>>> >start playing with the data.
>>>> >
>>>> >What is the biggest difference that sets StrepHit datasets apart from
>>>> >the currently uploaded ones?
>>>> >At least one reference URL is always guaranteed for each statement.
>>>> >This means that if StrepHit finds some new statement that was not there
>>>> >in Wikidata before, it will always propose its external references.
>>>> >We do not want to manually reject all the new statements with no
>>>> >reference, right?
>>>> >
>>>> >If you like the idea, please endorse the StrepHit IEG proposal!
>>
>
> --
> Marco Fossati
> http://about.me/marco.fossati
> Twitter: @hjfocs
> Skype: hell_j
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
Dario Taraborelli Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
Hello, lists.
You may have heard of projects <https://www.mediawiki.org/wiki/Wikimedia_Research#Highlights> such as revision scoring or article recommendations. We’re now looking for a full-stack software engineer to join Wikimedia Research and support and scale up these and similar projects.
Job description below, please help us find the best possible candidates.
Dario
Software Engineer - Research <http://grnh.se/b12qur>
Summary
Help us create a world in which every single human being can freely share in the sum of all knowledge.
We are a team of scientists and UX researchers at the Wikimedia Foundation using data to understand and empower millions of users – readers, contributors, and donors – who interact with Wikipedia and its sister projects on a daily basis. We turn research questions into publicly shared knowledge, we design and test new technology, we produce data-driven insights to support product and engineering decisions and we publish research informing the organization’s and the movement’s strategy. We are strongly committed to principles of transparency, privacy and collaboration, we use free and open source technology and we collaborate with researchers in the industry and academia. As a full member of the Wikimedia Research department, you will help us build and scale the infrastructure our team needs for research and experimentation, implementing new technology and data-intensive applications.
Description
Collaborate with researchers to expose algorithms and machine learning systems through APIs and web applications
Design, develop, test, and deploy new features, improvements and upgrades to the infrastructure that supports research and powers data-intensive applications.
Support our data science team in optimizing computationally intensive data processing.
Support our UX research capacity by growing, expanding and maintaining our user testing platform and instrumentation stack.
Work in coordination with other infrastructure teams such as Services and Analytics Engineering as well as Product teams to grow and scale research-driven services and applications.
Requirements
Real world experience writing applications using both scripting (e.g. Python, Javascript, PHP) and compiled languages (e.g. Java, Scala, C, C#)
Experience with MySQL/Postgres or similar database technology
Experience developing APIs for data retrieval
Understanding of basic statistical concepts
BS, MS, or PhD in Computer Science, Mathematics, or equivalent work experience
Pluses
Experience with high-traffic web architectures and operations
Production experience with Hadoop and ecosystem technology (Pig, Hive, streaming)
Experience with web UI design (Javascript, HTML, CSS)
Familiarity with scientific computing libraries in Python and R
Experience working with volunteers
Big ups if you are a contributor to Wikipedia or other open collaboration projects
Show us your stuff! Please provide us with information you feel would be useful to us in gaining a better understanding of your technical background and accomplishments. Links to GitHub, your technical blogs, publications, personal projects, etc. are exceptionally useful. We especially appreciate pointers to your best contributions to open source projects.
About the Wikimedia Foundation
The Wikimedia Foundation is the non-profit organization that operates Wikipedia, the free encyclopedia. Wikipedia and the other projects operated by the Wikimedia Foundation receive more than 431 million unique visitors per month, making them the 5th most popular web property worldwide. Available in more than 287 languages, Wikipedia contains more than 32 million articles contributed by a global volunteer community of more than 100,000 people. Based in San Francisco, California, the Wikimedia Foundation is an audited, 501(c)(3) charity that is funded primarily through donations and grants. The Wikimedia Foundation was created in 2003 to manage the operation of Wikipedia and its sister projects. It currently employs over 208 staff members. Wikimedia is supported by local chapter organizations in 40 countries or regions.
The Wikimedia Foundation offers competitive benefits. Fully paid medical, dental, and vision coverage for employees and their eligible families (yes, fully paid premiums!). A Wellness Program which provides reimbursement for mind, body and soul activities such as fitness memberships, massages, cooking classes and much more. 401(k) retirement plan with matched contributions of 4% of annual salary.
More Information
http://wikimediafoundation.org <http://wikimediafoundation.org/>
http://blog.wikimedia.org <http://blog.wikimedia.org/>
http://wikimediafoundation.org/wiki/Vision <http://wikimediafoundation.org/wiki/Vision>
About Wikimedia Research
https://www.mediawiki.org/wiki/Wikimedia_Research <https://www.mediawiki.org/wiki/Wikimedia_Research>
Examples of code
https://github.com/wiki-ai/revscoring <https://github.com/wiki-ai/revscoring>
https://github.com/wiki-ai/ores <https://github.com/wiki-ai/revscoring>
https://github.com/halfak/MediaWiki-Utilities <https://github.com/halfak/MediaWiki-Utilities>
https://github.com/halfak/mwstreaming <https://github.com/halfak/mwstreaming>
Dario Taraborelli Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
Hi researchers,
I could use a little help with understanding these dumps:
https://dumps.wikimedia.org/enwikisource/latest/https://dumps.wikimedia.org/enwiki/20150901/
I'm trying to verify the claim that ENWP is the world's largest open text
project, and to do that I need to verify that ENWP is larger than English
Wikisource. Which files should I be comparing?
Are there any other projects that could make a claim to be a larger open
text project than ENWP? Perhaps there's a library somewhere that has such a
huge volume of out-of-copyright materials that the combined bytes of
published text are larger than ENWP?
Thanks!
Pine