Probably a silly question but ... did you all consider creating a datatype for molecue representation ? This seem to be a very similar usecase than mathematica formula. Essentially we're not dealing with a raw string but a representation of molecule formulas, with its own encoding ...

Changing the limit seem to be a poor workaround to a dedicated datatype - nobody seems to have found a relevant usecase and it seem to me that we're essentially abusing strings for storing blobs ...

2016-10-08 11:33 GMT+02:00 Egon Willighagen <>:

On Sat, Oct 8, 2016 at 11:28 AM, Lydia Pintscher <> wrote:
On Sat, Oct 8, 2016 at 11:23 AM, Egon Willighagen
<> wrote:
> Ah, those numbers are for ...

External identifier then. Cool. And for string like in Sebastian's initial emailĀ 
says 1500 to 2000. Is this still a good number after this discussion?

Yes, that would cover more than 99.9% of all InChIs in PubChem. (See Sebastian's reply earlier in this thread.)


E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (
ORCID: 0000-0001-7542-0286

Wikidata mailing list