As to my mind it's a very interesting topic, I searched a bit more.
https://www.w3.org/International/articles/article-text-size.en which quotes http://www-01.ibm.com/software/globalization/guidelines/a3.html
According to which, for strings in English source that are over 70 characters, you might expect an 130% average expansion. So, with an admittedly very loose inference, the 400 character limit for all is equivalent to a 307 character limit for English. Would you say that it would seems ok to have a 307 character limit there?
Le 29/12/2016 à 12:11, mathieu stumpf guntz a écrit :
Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :
The 400 chat limit is to be in sync with Wikidata, which has the same limitation. The origins of this limit is to encourage storage of "values" rather than full strings (sentences).
Well, that's probably not the best constraints for a glossary then. To my mind, 400 char limit regardless of the language is rather suprising. Surely you can tell much more with a set of 400 ideograms than with, well, whatever the language happen to have the longest average sentence length (any idea?). Also, at least for some translation pairs, there is a tendancy to have translations longer than the original[1].
[1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf
Also, it discourages storage of wiki markup.
What about disallowing it explicitly? You might even enforce that with a quick parsing that prevent recording, or simply put a reminder when detecting such a string to avoid blocking users in legitimate corner cases.
On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Thank you Yuri. Is there some rational explanation behind this limits? I understand the limit over performance concern, and 2Mb seems already very large for intented glossaries. But 400 chars might be problematic for some definition I guess, especially since translations can lead to varying lenght needs.
Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
Hi Mathieu, yes, I think you can totally build up this glossary in a dataset. Just remember that each string can be no longer then 400 chars, and total size under 2mb.
On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hi Yuri,
Seems very interesting. Am I wrong thinking this could helpto create multi-lingual glossary as drafted in https://phabricator.wikimedia.org/T150263#2860014 ?
Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
Gift season! We have launched structured data on Commons, available
from
all wikis.
TLDR; One data store. Use everywhere. Upload table data to Commons,
with
localization, and use it to create wiki tables, lists, or use directly
in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this per-state GDP map demo, and select multiple years. More demos at the
bottom.
US Map state highlight https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
Data can now be stored as *.tab and *.map pages in the data namespace
on
Commons. That data may contain localization, so a table cell could be
in
multiple languages. And that data is accessible from any wikis, by Lua scripts, Graphs, and Maps.
Lua lets you generate wiki tables from the data by filtering,
converting,
mixing, and formatting the raw data. Lua also lets you generate lists.
Or
any wiki markup.
Graphs can use both .tab and .map directly to visualize the data and
let
users interact with it. The GDP demo above uses a map from Commons, and colors each segment with the data based on a data table.
Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
layer
on top of the base map. This way we can show endangered species'
habitat.
== Demo ==
- Raw data example
https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab
- Interactive Weather data
https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history
- Same data in Weather template
https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo
- Interactive GDP map
https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight
- Endangered Jemez Mountains salamander - habitat
https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0
- Population history
https://en.wikipedia.org/wiki/Template:Graph:Population_history
== Getting started ==
- Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
forget
the .tab extension, or it won't work.
- Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!
== Documentation links ==
- Tabular help https://www.mediawiki.org/wiki/Help:Tabular_Data
- Map help https://www.mediawiki.org/wiki/Help:Map_Data
If you find a bug, create Phabricator ticket with #tabular-data tag, or comment on the documentation talk pages.
== FAQ ==
- Relation to Wikidata: Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data
like
the historical weather or the outline of the state of New York.
== TODOs ==
- Add a nice "table editor" - editing JSON by hand is cruel. T134618
- "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
- Support data redirects. T153598
- Mega epic: Support external data feeds.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l