Can you figure out what a good limit would be for
these two use cases? I.e.
what would support 99%, 99.9%, and 100%?
On Sun, Sep 18, 2016, 12:27 Egon Willighagen <egon.willighagen(a)gmail.com>
wrote:
Hi all,
sorry for joining the party late...
On Tue, Sep 13, 2016 at 11:39 AM, Sebastian Burgstaller
<sebastian.burgstaller(a)gmail.com> wrote:
I think this topic might have been discussed many
months ago. For
certain data types in the chemical compound space (P233, canonical
smiles, P2017 isomeric smiles and P234 Inchi key) a higher character
limit than 400 would be really helpful (1500 to 2000 chars (I sense
that this might cause problems with SPARQL)). Are there any plans on
implementing this? In general, for quality assurance, many string
property types would profit from a fixed max string length.
400 characters is not a lot for chemicals... InChIs can be a lot
larger indeed. 2k would allow us to capture a lot more chemicals. BTW,
this also applies to the canonical SMILES, which also doesn't have an
upper bound. Tannic acid (Q427956) is an example (which looking at the
InChIKey came up when running the bot :) From working with ChEMBL as
RDF I know it has InChIs of length > 1024, which was the max length in
Virtuoso... I think it's important for the biology and chemistry to
increase the limit.
Egon
--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (
http://www.bigcat.unimaas.nl/)
Homepage:
http://egonw.github.com/
LinkedIn:
http://se.linkedin.com/in/egonw
Blog:
http://chem-bla-ics.blogspot.com/
PubList:
http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory:
https://impactstory.org/EgonWillighagen
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org