Hello veraPDF users/devs,
I have a question over XMP packets validation in PDF documents: how does veraPDF actually does it? Can you point me to references in the code where this is performed?
I elaborate more on the question, providing a viable solution as suggested by ISO standards committee. I have access to both ISO 16684-1:2019[1] and ISO 16684-2:2014[2]: the first standard generically describes the XMP packets data model and properties, explaining the many alternative notations accepted (that make difficult to validate the packets). The second suggests a method for the normalization of the packets to create an unique representation of the information stored in the packets (this method can be mostly implemented without knowing anything about the actual XMP data model/properties). It then supplies some sample RELAX NG schemas to validate unspecified XMP demo packets. These schemas clearly aren't describing the full schema to validate the XMP packets of PDF documents since all pdf specific properties are missing but also some generic ones. If the full schema to validate XMP packets in PDF documents was publicly available, validate the XMP packets in PDF documents would be quite simple as the normalization algorithm is not very difficult to implement. I asked for it in[3] but I doubt Adobe will release it. It may also not exist at all since 16684-2:2014 recommendations may have been developed independently from use in any actual Adobe product.
Thank you in advance for any insight.
Regards, Francesco
[1] https://www.iso.org/standard/75163.html [2] https://www.iso.org/standard/57422.html [3] https://github.com/adobe/xmp-docs/issues/20
Hi Francesco,
As you correctly mention, XML Schema validation cannot be used in case of XMP due to ambiguities in the serialization. This is why veraPDF uses a fork of Adobe XMP library for low-level XMP parsing: https://github.com/veraPDF/veraPDF-library/tree/integration/xmp-core
All further XMP validation rules are based on the PDF/A requirements as well as predefined schemas as defined Adobe XMP 2004 specification for PDF/A-1, Adobe XMP 2005 specification for PDF/A-2,3 and ISO 16684-1 for PDF/A-4.
The detailed list of these rules can be found in https://github.com/veraPDF/veraPDF-validation-profiles/wiki under the numbers that match the Metadata sections in the corresponding PDF/A specifications:
6.7.* for PDF/A-1 and PDF/A-4 6.6.* for PDF/A-2 and PDF/A-3
Best regards, Boris
-----Original Message----- From: Users users-bounces@lists.verapdf.org On Behalf Of Francesco Pretto Sent: Thursday, May 26, 2022 1:47 PM To: users@lists.verapdf.org Subject: [veraPDF-users] How veraPDF performs XMP packets validation?
Hello veraPDF users/devs,
I have a question over XMP packets validation in PDF documents: how does veraPDF actually does it? Can you point me to references in the code where this is performed?
I elaborate more on the question, providing a viable solution as suggested by ISO standards committee. I have access to both ISO 16684-1:2019[1] and ISO 16684-2:2014[2]: the first standard generically describes the XMP packets data model and properties, explaining the many alternative notations accepted (that make difficult to validate the packets). The second suggests a method for the normalization of the packets to create an unique representation of the information stored in the packets (this method can be mostly implemented without knowing anything about the actual XMP data model/properties). It then supplies some sample RELAX NG schemas to validate unspecified XMP demo packets. These schemas clearly aren't describing the full schema to validate the XMP packets of PDF documents since all pdf specific properties are missing but also some generic ones. If the full schema to validate XMP packets in PDF documents was publicly available, validate the XMP packets in PDF documents would be quite simple as the normalization algorithm is not very difficult to implement. I asked for it in[3] but I doubt Adobe will release it. It may also not exist at all since 16684-2:2014 recommendations may have been developed independently from use in any actual Adobe product.
Thank you in advance for any insight.
Regards, Francesco
[1] https://www.iso.org/standard/75163.html [2] https://www.iso.org/standard/57422.html [3] https://github.com/adobe/xmp-docs/issues/20 _______________________________________________ Users mailing list Users@lists.verapdf.org http://lists.verapdf.org/listinfo/users
On Tue, 31 May 2022 at 14:59, Boris Doubrov boris.doubrov@duallab.com wrote:
As you correctly mention, XML Schema validation cannot be used in case of XMP due to ambiguities in the serialization. This is why veraPDF uses a fork of Adobe XMP library for low-level XMP parsing: https://github.com/veraPDF/veraPDF-library/tree/integration/xmp-core
Thank you Boris for your answer. Somehow I understood that veraPDF wasn't using an XML schema based validation, so thanks for the confirmation. There may possibly be some news here: ISO 16684-2:2014[1] suggests a way to normalize the XMP packet so there won't be ambiguities. This is something I implemented[2] myself: it's not yet a 100% finished implementation but it's close. The same standards also publishes a reference RelaxNG schema for a simple XMP packet, which is free to use/modify/redistribute. I published it in a gist[3], with licensing terms as found in the original document, together with a simple packet that validates against it. I'm not suggesting that veraPDF should change its approach for XMP validation, but the point here is that if someone takes the reference RelaxNG schema and add all the missing PDF properties then, together with the XMP normalization algorithm, XML schema validation strategy becomes possible. At some point I could do it but it's not in my priorities, yet. I leave the info here, in case this is useful for somebody, with a plea to release the modifications to the schema openly, if progress is done here.
Regards, Francesco
[1] https://www.iso.org/standard/57422.html [2] https://github.com/pdfmm/pdfmm/blob/04a3c589dd2a5d919f171f47ce527423e2907e7e... [3] https://gist.github.com/ceztko/aeefff37cbb728753fe314d21715b624
users@lists.openpreservation.org