Can you figure out what a good limit would be for these two use cases? I.e. what would support 99%, 99.9%, and 100%?


On Sun, Sep 18, 2016, 12:27 Egon Willighagen <egon.willighagen@gmail.com> wrote:
Hi all,

sorry for joining the party late...

On Tue, Sep 13, 2016 at 11:39 AM, Sebastian Burgstaller
<sebastian.burgstaller@gmail.com> wrote:
> I think this topic might have been discussed many months ago. For
> certain data types in the chemical compound space (P233, canonical
> smiles, P2017 isomeric smiles and P234 Inchi key) a higher character
> limit than 400 would be really helpful (1500 to 2000 chars (I sense
> that this might cause problems with SPARQL)). Are there any plans on
> implementing this? In general, for quality assurance, many string
> property types would profit from a fixed max string length.

400 characters is not a lot for chemicals... InChIs can be a lot
larger indeed. 2k would allow us to capture a lot more chemicals. BTW,
this also applies to the canonical SMILES, which also doesn't have an
upper bound. Tannic acid (Q427956) is an example (which looking at the
InChIKey came up when running the bot :) From working with ChEMBL as
RDF I know it has InChIs of length > 1024, which was the max length in
Virtuoso... I think it's important for the biology and chemistry to
increase the limit.

Egon

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata