Gift season! We have launched structured data on Commons, available from all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons, with localization, and use it to create wiki tables, lists, or use directly in graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the bottom. US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace on Commons. That data may contain localization, so a table cell could be in multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering, converting, mixing, and formatting the raw data. Lua also lets you generate lists. Or any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and let users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra layer on top of the base map. This way we can show endangered species' habitat.
== Demo == * Raw data example https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab * Interactive Weather data https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history * Same data in Weather template https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo * Interactive GDP map https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight * Endangered Jemez Mountains salamander - habitat https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0 * Population history https://en.wikipedia.org/wiki/Template:Graph:Population_history * Line chart https://en.wikipedia.org/wiki/Template:Graph:Lines
== Getting started == * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget the .tab extension, or it won't work. * Try using some data with the Line chart graph template A thorough guide is needed, help is welcome!
== Documentation links == * Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data * Map help https://www.mediawiki.org/wiki/Help:Map_Data If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ == * Relation to Wikidata: Wikidata is about "facts" (small pieces of information). Structured data is about "blobs" - large amounts of data like the historical weather or the outline of the state of New York.
== TODOs == * Add a nice "table editor" - editing JSON by hand is cruel. T134618 * "What links here" should track data usage across wikis. Will allow quicker auto-refresh of the pages too. T153966 * Support data redirects. T153598 * Mega epic: Support external data feeds.
On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
Gift season! We have launched structured data on Commons, available from all wikis.
I was momentarily excited, then I read a little farther and discovered this isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
Yes, there seem to have been a bit of a naming collision. Tabular data and map data have been jointly known as structured data, but there is also the Structured Data project, which IMO should be called Structured Metadata project :) Naming suggestions are welcome!
P.S. Brad, I'm sorry tabular and map data did not excite you :(
On Thu, Dec 22, 2016 at 2:38 PM Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
Gift season! We have launched structured data on Commons, available from all wikis.
I was momentarily excited, then I read a little farther and discovered this isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
-- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) <bjorsch@wikimedia.org
wrote:
On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
Gift season! We have launched structured data on Commons, available from all wikis.
I was momentarily excited, then I read a little farther and discovered this isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
Same here, I think it needs a better name...
What about calling it datasets or structured datasets?
Cheers, Micru
Micru, thanks, I think Datasets sounds like a good name too!
On Thu, Dec 22, 2016 at 2:44 PM David Cuenca Tudela dacuetu@gmail.com wrote:
On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) < bjorsch@wikimedia.org
wrote:
On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <
yastrakhan@wikimedia.org>
wrote:
Gift season! We have launched structured data on Commons, available
from
all wikis.
I was momentarily excited, then I read a little farther and discovered
this
isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
Same here, I think it needs a better name...
What about calling it datasets or structured datasets?
Cheers, Micru _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Anyway, this is great news! I hope that it gets adopted by the community. Congratulations, Yuri!
I was going to suggest a Wikidata property, but I see that the data type for datasets is not there yet: https://phabricator.wikimedia.org/T151334
On Thu, Dec 22, 2016 at 8:48 PM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
Micru, thanks, I think Datasets sounds like a good name too!
On Thu, Dec 22, 2016 at 2:44 PM David Cuenca Tudela dacuetu@gmail.com wrote:
On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) < bjorsch@wikimedia.org
wrote:
On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <
yastrakhan@wikimedia.org>
wrote:
Gift season! We have launched structured data on Commons, available
from
all wikis.
I was momentarily excited, then I read a little farther and discovered
this
isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data
.
Same here, I think it needs a better name...
What about calling it datasets or structured datasets?
Cheers, Micru _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Yuri,
Seems very interesting. Am I wrong thinking this could helpto create multi-lingual glossary as drafted in https://phabricator.wikimedia.org/T150263#2860014 ?
Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
Gift season! We have launched structured data on Commons, available from all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons, with localization, and use it to create wiki tables, lists, or use directly in graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the bottom. US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace on Commons. That data may contain localization, so a table cell could be in multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering, converting, mixing, and formatting the raw data. Lua also lets you generate lists. Or any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and let users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra layer on top of the base map. This way we can show endangered species' habitat.
== Demo ==
- Raw data example
https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab
- Interactive Weather data
https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history
- Same data in Weather template
https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo
- Interactive GDP map
https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
- Endangered Jemez Mountains salamander - habitat
https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0
- Population history
https://en.wikipedia.org/wiki/Template:Graph:Population_history
== Getting started ==
- Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
the .tab extension, or it won't work.
- Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!
== Documentation links ==
- Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data
- Map help https://www.mediawiki.org/wiki/Help:Map_Data
If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ ==
- Relation to Wikidata: Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data like the historical weather or the outline of the state of New York.
== TODOs ==
- Add a nice "table editor" - editing JSON by hand is cruel. T134618
- "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
- Support data redirects. T153598
- Mega epic: Support external data feeds.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Mathieu, yes, I think you can totally build up this glossary in a dataset. Just remember that each string can be no longer then 400 chars, and total size under 2mb.
On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hi Yuri,
Seems very interesting. Am I wrong thinking this could helpto create multi-lingual glossary as drafted in https://phabricator.wikimedia.org/T150263#2860014 ?
Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
Gift season! We have launched structured data on Commons, available from all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons, with localization, and use it to create wiki tables, lists, or use directly in graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the
bottom.
US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace on Commons. That data may contain localization, so a table cell could be in multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering, converting, mixing, and formatting the raw data. Lua also lets you generate lists. Or any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and let users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
layer
on top of the base map. This way we can show endangered species' habitat.
== Demo ==
- Raw data example
https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab
- Interactive Weather data
https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history
- Same data in Weather template
https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo
- Interactive GDP map
https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
- Endangered Jemez Mountains salamander - habitat
https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0
- Population history
https://en.wikipedia.org/wiki/Template:Graph:Population_history
== Getting started ==
- Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
the .tab extension, or it won't work.
- Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!
== Documentation links ==
- Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data
- Map help https://www.mediawiki.org/wiki/Help:Map_Data
If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ ==
- Relation to Wikidata: Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data
like
the historical weather or the outline of the state of New York.
== TODOs ==
- Add a nice "table editor" - editing JSON by hand is cruel. T134618
- "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
- Support data redirects. T153598
- Mega epic: Support external data feeds.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thank you Yuri. Is there some rational explanation behind this limits? I understand the limit over performance concern, and 2Mb seems already very large for intented glossaries. But 400 chars might be problematic for some definition I guess, especially since translations can lead to varying lenght needs.
Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
Hi Mathieu, yes, I think you can totally build up this glossary in a dataset. Just remember that each string can be no longer then 400 chars, and total size under 2mb.
On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hi Yuri,
Seems very interesting. Am I wrong thinking this could helpto create multi-lingual glossary as drafted in https://phabricator.wikimedia.org/T150263#2860014 ?
Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
Gift season! We have launched structured data on Commons, available from all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons, with localization, and use it to create wiki tables, lists, or use directly in graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the
bottom.
US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace on Commons. That data may contain localization, so a table cell could be in multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering, converting, mixing, and formatting the raw data. Lua also lets you generate lists. Or any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and let users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
layer
on top of the base map. This way we can show endangered species' habitat.
== Demo ==
- Raw data example
https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab
- Interactive Weather data
https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history
- Same data in Weather template
https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo
- Interactive GDP map
https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
- Endangered Jemez Mountains salamander - habitat
https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0
- Population history
https://en.wikipedia.org/wiki/Template:Graph:Population_history
== Getting started ==
- Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
the .tab extension, or it won't work.
- Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!
== Documentation links ==
- Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data
- Map help https://www.mediawiki.org/wiki/Help:Map_Data
If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ ==
- Relation to Wikidata: Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data
like
the historical weather or the outline of the state of New York.
== TODOs ==
- Add a nice "table editor" - editing JSON by hand is cruel. T134618
- "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
- Support data redirects. T153598
- Mega epic: Support external data feeds.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The 400 chat limit is to be in sync with Wikidata, which has the same limitation. The origins of this limit is to encourage storage of "values" rather than full strings (sentences). Also, it discourages storage of wiki markup.
On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Thank you Yuri. Is there some rational explanation behind this limits? I understand the limit over performance concern, and 2Mb seems already very large for intented glossaries. But 400 chars might be problematic for some definition I guess, especially since translations can lead to varying lenght needs.
Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
Hi Mathieu, yes, I think you can totally build up this glossary in a dataset. Just remember that each string can be no longer then 400 chars, and total size under 2mb.
On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hi Yuri,
Seems very interesting. Am I wrong thinking this could helpto create multi-lingual glossary as drafted in https://phabricator.wikimedia.org/T150263#2860014 ?
Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
Gift season! We have launched structured data on Commons, available
from
all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons,
with
localization, and use it to create wiki tables, lists, or use directly
in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the
bottom.
US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace
on
Commons. That data may contain localization, so a table cell could be
in
multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering,
converting,
mixing, and formatting the raw data. Lua also lets you generate lists.
Or
any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and
let
users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
layer
on top of the base map. This way we can show endangered species'
habitat.
== Demo ==
- Raw data example
https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab
- Interactive Weather data
https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history
- Same data in Weather template
https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo
- Interactive GDP map
https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
- Endangered Jemez Mountains salamander - habitat
https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0
- Population history
https://en.wikipedia.org/wiki/Template:Graph:Population_history
== Getting started ==
- Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
forget
the .tab extension, or it won't work.
- Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!
== Documentation links ==
- Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data
- Map help https://www.mediawiki.org/wiki/Help:Map_Data
If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ ==
- Relation to Wikidata: Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data
like
the historical weather or the outline of the state of New York.
== TODOs ==
- Add a nice "table editor" - editing JSON by hand is cruel. T134618
- "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
- Support data redirects. T153598
- Mega epic: Support external data feeds.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :
The 400 chat limit is to be in sync with Wikidata, which has the same limitation. The origins of this limit is to encourage storage of "values" rather than full strings (sentences).
Well, that's probably not the best constraints for a glossary then. To my mind, 400 char limit regardless of the language is rather suprising. Surely you can tell much more with a set of 400 ideograms than with, well, whatever the language happen to have the longest average sentence length (any idea?). Also, at least for some translation pairs, there is a tendancy to have translations longer than the original[1].
[1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf
Also, it discourages storage of wiki markup.
What about disallowing it explicitly? You might even enforce that with a quick parsing that prevent recording, or simply put a reminder when detecting such a string to avoid blocking users in legitimate corner cases.
On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Thank you Yuri. Is there some rational explanation behind this limits? I understand the limit over performance concern, and 2Mb seems already very large for intented glossaries. But 400 chars might be problematic for some definition I guess, especially since translations can lead to varying lenght needs.
Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
Hi Mathieu, yes, I think you can totally build up this glossary in a dataset. Just remember that each string can be no longer then 400 chars, and total size under 2mb.
On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hi Yuri,
Seems very interesting. Am I wrong thinking this could helpto create multi-lingual glossary as drafted in https://phabricator.wikimedia.org/T150263#2860014 ?
Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
Gift season! We have launched structured data on Commons, available
from
all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons,
with
localization, and use it to create wiki tables, lists, or use directly
in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the
bottom.
US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace
on
Commons. That data may contain localization, so a table cell could be
in
multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering,
converting,
mixing, and formatting the raw data. Lua also lets you generate lists.
Or
any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and
let
users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
layer
on top of the base map. This way we can show endangered species'
habitat.
== Demo ==
- Raw data example
https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab
- Interactive Weather data
https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history
- Same data in Weather template
https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo
- Interactive GDP map
https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
- Endangered Jemez Mountains salamander - habitat
https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0
- Population history
https://en.wikipedia.org/wiki/Template:Graph:Population_history
== Getting started ==
- Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
forget
the .tab extension, or it won't work.
- Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!
== Documentation links ==
- Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data
- Map help https://www.mediawiki.org/wiki/Help:Map_Data
If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ ==
- Relation to Wikidata: Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data
like
the historical weather or the outline of the state of New York.
== TODOs ==
- Add a nice "table editor" - editing JSON by hand is cruel. T134618
- "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
- Support data redirects. T153598
- Mega epic: Support external data feeds.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
As to my mind it's a very interesting topic, I searched a bit more.
https://www.w3.org/International/articles/article-text-size.en which quotes http://www-01.ibm.com/software/globalization/guidelines/a3.html
According to which, for strings in English source that are over 70 characters, you might expect an 130% average expansion. So, with an admittedly very loose inference, the 400 character limit for all is equivalent to a 307 character limit for English. Would you say that it would seems ok to have a 307 character limit there?
Le 29/12/2016 à 12:11, mathieu stumpf guntz a écrit :
Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :
The 400 chat limit is to be in sync with Wikidata, which has the same limitation. The origins of this limit is to encourage storage of "values" rather than full strings (sentences).
Well, that's probably not the best constraints for a glossary then. To my mind, 400 char limit regardless of the language is rather suprising. Surely you can tell much more with a set of 400 ideograms than with, well, whatever the language happen to have the longest average sentence length (any idea?). Also, at least for some translation pairs, there is a tendancy to have translations longer than the original[1].
[1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf
Also, it discourages storage of wiki markup.
What about disallowing it explicitly? You might even enforce that with a quick parsing that prevent recording, or simply put a reminder when detecting such a string to avoid blocking users in legitimate corner cases.
On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Thank you Yuri. Is there some rational explanation behind this limits? I understand the limit over performance concern, and 2Mb seems already very large for intented glossaries. But 400 chars might be problematic for some definition I guess, especially since translations can lead to varying lenght needs.
Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
Hi Mathieu, yes, I think you can totally build up this glossary in a dataset. Just remember that each string can be no longer then 400 chars, and total size under 2mb.
On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hi Yuri,
Seems very interesting. Am I wrong thinking this could helpto create multi-lingual glossary as drafted in https://phabricator.wikimedia.org/T150263#2860014 ?
Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
Gift season! We have launched structured data on Commons, available
from
all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons,
with
localization, and use it to create wiki tables, lists, or use directly
in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the
bottom.
US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace
on
Commons. That data may contain localization, so a table cell could be
in
multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering,
converting,
mixing, and formatting the raw data. Lua also lets you generate lists.
Or
any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and
let
users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
layer
on top of the base map. This way we can show endangered species'
habitat.
== Demo ==
- Raw data example
https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab
- Interactive Weather data
https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history
- Same data in Weather template
https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo
- Interactive GDP map
https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
- Endangered Jemez Mountains salamander - habitat
https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0
- Population history
https://en.wikipedia.org/wiki/Template:Graph:Population_history
== Getting started ==
- Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
forget
the .tab extension, or it won't work.
- Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!
== Documentation links ==
- Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data
- Map help https://www.mediawiki.org/wiki/Help:Map_Data
If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ ==
- Relation to Wikidata: Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data
like
the historical weather or the outline of the state of New York.
== TODOs ==
- Add a nice "table editor" - editing JSON by hand is cruel. T134618
- "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
- Support data redirects. T153598
- Mega epic: Support external data feeds.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org