On Tue, 31 May 2022 at 14:59, Boris Doubrov boris.doubrov@duallab.com wrote:
As you correctly mention, XML Schema validation cannot be used in case of XMP due to ambiguities in the serialization. This is why veraPDF uses a fork of Adobe XMP library for low-level XMP parsing: https://github.com/veraPDF/veraPDF-library/tree/integration/xmp-core
Thank you Boris for your answer. Somehow I understood that veraPDF wasn't using an XML schema based validation, so thanks for the confirmation. There may possibly be some news here: ISO 16684-2:2014[1] suggests a way to normalize the XMP packet so there won't be ambiguities. This is something I implemented[2] myself: it's not yet a 100% finished implementation but it's close. The same standards also publishes a reference RelaxNG schema for a simple XMP packet, which is free to use/modify/redistribute. I published it in a gist[3], with licensing terms as found in the original document, together with a simple packet that validates against it. I'm not suggesting that veraPDF should change its approach for XMP validation, but the point here is that if someone takes the reference RelaxNG schema and add all the missing PDF properties then, together with the XMP normalization algorithm, XML schema validation strategy becomes possible. At some point I could do it but it's not in my priorities, yet. I leave the info here, in case this is useful for somebody, with a plea to release the modifications to the schema openly, if progress is done here.
Regards, Francesco
[1] https://www.iso.org/standard/57422.html [2] https://github.com/pdfmm/pdfmm/blob/04a3c589dd2a5d919f171f47ce527423e2907e7e... [3] https://gist.github.com/ceztko/aeefff37cbb728753fe314d21715b624