Dear users of veraPDF,
We are figuring out how to make good, future-proof PDF's based on our day-to-day documents, which are mostly Word documents (DOC and DOCX). For the moment, we are using Word 2016 for Windows. Here, you can choose to save as a PDF/A (save as PDF > options > Conform with PDF/A). When I validate the resulting PDF using veraPDF with PDF/A flavor = auto-detect, it passes (hooray for Microsoft). Looking at the validation report, I see that it gets validated as a PDF/A-3A.
This is where it gets interesting. The test document was a simple text document - like most of the documents we create, maybe supplemented with some images. So, I would like to make my document in conformance with PDF/A-2A, as there should not be any embedded file in this simple PDF. In Word, there is no option to make your document in conformance with specific flavors. When I validate the document with veraPDF with flavor PDF/A-2A, I only get 1 error:
Rule
Status
Specification: ISO 19005-2:2011, Clause: 6.6.4, Test number: 2https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-664-2
The value of pdfaid:part shall be the part number of ISO 19005 to which the file conforms.
Failed
1 occurrences
Showfile:///C:/Users/degroofs/AppData/Local/Temp/veraPDF-tempHTMLReport4783092570292827590.html
So, I guess this means that my document says it isn't a PDF/A-2A, but a PDF/A-3A instead. So, this seems to be a metadata issue. My main question: can I safely ignore this message or change the metadata of my document to make it in conformance with PDF/A-2A?
I also went a step further to test this: I opened the PDF document in Notepad++ and I think I found the metadata part in question (using the veraPDF wiki as a guideline). There is this RDF part almost at the bottom that goes like this:
pdfaid:part3</pdfaid:part>pdfaid:conformanceA</pdfaid:conformance>
If you change the 3 in a 2 and save it, the document validates as a PDF/A-2A!!!
So mainly 2 questions:
1. Is this safe? 2. Is there a less dirty way to change the flavor of my PDF document in the metadata? Thanks for any help. You can find the files I used and the test results in attachment
Kind regards, Stijn De Groof
Koninklijk Instituut voor het Kunstpatrimonium (KIK-IRPA) Jubelpark 1, 1000 Brussel, T: +32 2 73 96 779
Dear Stijn,
PDF/A-2 and PDF/A-3 standards (ISO 19005-2:2011 and ISO 19005-3:2012) are identical except for the additional requirements for embedded files, which are allowed in PDF/A-3 (under some extra restrictions) and not allowed in PDF/A-2.
As far as I know MS Word generated PDF/A files indeed do not contain attachments, unless you install some special third-party plug-ins which would alter the standard MS Word ‘Save As PDF’ functionality.
And yes, you are fully correct in your analysis of the metadata issues and the way to fix it in Notepad++. But I agree this method has a smell of a hack. But it looks safe enough to me for this particular kind of PDF documents. For example, it won’t work in general for non-MS Word generated documents. Also locating metadata inside PDF is not always that straightforward and in general requires parsing low-level PDF syntax.
veraPDF does include a Metadata fixer functionality to do exactly the job you need, but in a PDF-compliant way. It would check if there are no other errors except
* metadata identification (as in your case) * metadata non-synchronized between so-called PDF Info dictionary and the XMP Metadata package and would try to fix them updating the document metadata. However, current version of veraPDF Metadata fixer is not very reliable. We are going to revise it for the next veraPDF release ( ~February 2018). We’ll then use your use case as one of the test scenarios.
Best regards, Boris
From: Users [mailto:users-bounces@lists.verapdf.org] On Behalf Of Stijn De Groof Sent: Monday, December 4, 2017 5:38 PM To: users@lists.verapdf.org Subject: [Users] Make and validate PDF/A-2A based on Word document
Dear users of veraPDF,
We are figuring out how to make good, future-proof PDF’s based on our day-to-day documents, which are mostly Word documents (DOC and DOCX). For the moment, we are using Word 2016 for Windows. Here, you can choose to save as a PDF/A (save as PDF > options > Conform with PDF/A). When I validate the resulting PDF using veraPDF with PDF/A flavor = auto-detect, it passes (hooray for Microsoft). Looking at the validation report, I see that it gets validated as a PDF/A-3A.
This is where it gets interesting. The test document was a simple text document – like most of the documents we create, maybe supplemented with some images. So, I would like to make my document in conformance with PDF/A-2A, as there should not be any embedded file in this simple PDF. In Word, there is no option to make your document in conformance with specific flavors. When I validate the document with veraPDF with flavor PDF/A-2A, I only get 1 error:
Rule
Status
Specification: ISO 19005-2:2011, Clause: 6.6.4, Test number: 2https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-664-2
The value of pdfaid:part shall be the part number of ISO 19005 to which the file conforms.
Failed
1 occurrences
Showfile:///C:/Users/degroofs/AppData/Local/Temp/veraPDF-tempHTMLReport4783092570292827590.html
So, I guess this means that my document says it isn’t a PDF/A-2A, but a PDF/A-3A instead. So, this seems to be a metadata issue. My main question: can I safely ignore this message or change the metadata of my document to make it in conformance with PDF/A-2A?
I also went a step further to test this: I opened the PDF document in Notepad++ and I think I found the metadata part in question (using the veraPDF wiki as a guideline). There is this RDF part almost at the bottom that goes like this:
pdfaid:part3</pdfaid:part>pdfaid:conformanceA</pdfaid:conformance>
If you change the 3 in a 2 and save it, the document validates as a PDF/A-2A!!!
So mainly 2 questions:
1. Is this safe? 2. Is there a less dirty way to change the flavor of my PDF document in the metadata? Thanks for any help. You can find the files I used and the test results in attachment
Kind regards, Stijn De Groof
Koninklijk Instituut voor het Kunstpatrimonium (KIK-IRPA) Jubelpark 1, 1000 Brussel, T: +32 2 73 96 779
Dear Boris,
Thank you very much for your advice. I look forward to the next release. :)
Kind regards, Stijn
Koninklijk Instituut voor het Kunstpatrimonium (KIK-IRPA) Jubelpark 1, 1000 Brussel, T: +32 2 73 96 779
Van: Boris Doubrov [mailto:boris.doubrov@duallab.com] Verzonden: woensdag 6 december 2017 14:09 Aan: Stijn De Groof stijn.degroof@kikirpa.be; users@lists.verapdf.org Onderwerp: RE: Make and validate PDF/A-2A based on Word document
Dear Stijn,
PDF/A-2 and PDF/A-3 standards (ISO 19005-2:2011 and ISO 19005-3:2012) are identical except for the additional requirements for embedded files, which are allowed in PDF/A-3 (under some extra restrictions) and not allowed in PDF/A-2.
As far as I know MS Word generated PDF/A files indeed do not contain attachments, unless you install some special third-party plug-ins which would alter the standard MS Word ‘Save As PDF’ functionality.
And yes, you are fully correct in your analysis of the metadata issues and the way to fix it in Notepad++. But I agree this method has a smell of a hack. But it looks safe enough to me for this particular kind of PDF documents. For example, it won’t work in general for non-MS Word generated documents. Also locating metadata inside PDF is not always that straightforward and in general requires parsing low-level PDF syntax.
veraPDF does include a Metadata fixer functionality to do exactly the job you need, but in a PDF-compliant way. It would check if there are no other errors except
* metadata identification (as in your case) * metadata non-synchronized between so-called PDF Info dictionary and the XMP Metadata package and would try to fix them updating the document metadata. However, current version of veraPDF Metadata fixer is not very reliable. We are going to revise it for the next veraPDF release ( ~February 2018). We’ll then use your use case as one of the test scenarios.
Best regards, Boris
From: Users [mailto:users-bounces@lists.verapdf.org] On Behalf Of Stijn De Groof Sent: Monday, December 4, 2017 5:38 PM To: users@lists.verapdf.orgmailto:users@lists.verapdf.org Subject: [Users] Make and validate PDF/A-2A based on Word document
Dear users of veraPDF,
We are figuring out how to make good, future-proof PDF’s based on our day-to-day documents, which are mostly Word documents (DOC and DOCX). For the moment, we are using Word 2016 for Windows. Here, you can choose to save as a PDF/A (save as PDF > options > Conform with PDF/A). When I validate the resulting PDF using veraPDF with PDF/A flavor = auto-detect, it passes (hooray for Microsoft). Looking at the validation report, I see that it gets validated as a PDF/A-3A.
This is where it gets interesting. The test document was a simple text document – like most of the documents we create, maybe supplemented with some images. So, I would like to make my document in conformance with PDF/A-2A, as there should not be any embedded file in this simple PDF. In Word, there is no option to make your document in conformance with specific flavors. When I validate the document with veraPDF with flavor PDF/A-2A, I only get 1 error:
Rule
Status
Specification: ISO 19005-2:2011, Clause: 6.6.4, Test number: 2https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-664-2
The value of pdfaid:part shall be the part number of ISO 19005 to which the file conforms.
Failed
1 occurrences
Showfile:///C:/Users/degroofs/AppData/Local/Temp/veraPDF-tempHTMLReport4783092570292827590.html
So, I guess this means that my document says it isn’t a PDF/A-2A, but a PDF/A-3A instead. So, this seems to be a metadata issue. My main question: can I safely ignore this message or change the metadata of my document to make it in conformance with PDF/A-2A?
I also went a step further to test this: I opened the PDF document in Notepad++ and I think I found the metadata part in question (using the veraPDF wiki as a guideline). There is this RDF part almost at the bottom that goes like this:
pdfaid:part3</pdfaid:part>pdfaid:conformanceA</pdfaid:conformance>
If you change the 3 in a 2 and save it, the document validates as a PDF/A-2A!!!
So mainly 2 questions:
1. Is this safe? 2. Is there a less dirty way to change the flavor of my PDF document in the metadata? Thanks for any help. You can find the files I used and the test results in attachment
Kind regards, Stijn De Groof
Koninklijk Instituut voor het Kunstpatrimonium (KIK-IRPA) Jubelpark 1, 1000 Brussel, T: +32 2 73 96 779
users@lists.openpreservation.org