Introduction:
The following document will go through the functionality found within JetTrac PDFExtract. This module is used to extract XML data from a PDF form.
Please note that any files edited while setting up JetTrac PDFExtract should be edited using Notepad or Notepad++. Do not use Microsoft Word or Wordpad as these text editors will add formatting that will interfere with the program reading the files.
Technical Support:
If you need assistance in installing and configuring JetTrac PDFExtract™, call Pro Technology Automation, Inc. at 805-527-1248 or email us at support@protechinc.com. Please note that the JetTrac PDFExtract™ license fee does not cover configuration services and technical support so there may be an additional charge. Please ensure you read these instructions carefully before calling for technical support.
How to run JetTrac PDFExtract:
To run the program, you will need to have the following files located in the same folder:
- JTPDFExtract.exe
- ExitHandler.dll
- DebenuPDFLibraryDLL1016.dll
Executable files and dll files should not be edited for any reason.
Default Functionality:
JetTrac PDFExtract takes an imput PDF form and extracts the field data in XML format. When extracting this data, PDFExtract can also save copies of the fillable PDF, flattened PDF, or specific contents within the PDF. We will go through the specific functionality in the section on setting up the configuration .ini file.
Job Step Configuration Window in JobConfig for JetTrac PDFExtract:
When setting up a job step using JTPDFExtract in JobConfig the only lines of the config you need to worry about are the first and the last two, the Config file, the Input PDF and the Output XML. If left as an asterisk * the input will be whatever file is in the data folder that triggered the job running. This works well if it’s the first step but if the input is also the output of a previous job step you would need to specify the fully qualified path of the input pdf you want to extract data from. All other fields should be filled with the fully qualified path to the intended file. Eg: C:\JetTrac\…\input.pdf
In JobConfig, file paths entered should not be in quotes.
As with all modules, there is a dropdown to select whether or not the job should stop completely if this step fails. If you select Yes, any error will terminate the job process and write to the log file what happened, if no, then the job will try to continue anyways. However if any steps further down the line rely on the output of a job set to not stop on error, they may not work properly.
For more specifics go to the JetTrac Field ServiceJobConfig page.
JetTrac PDFExtract Command Line:
The command line for JetTrac PDFExtract is as follows:
“C:\JTPDFExtract.exe” “C:\Input.pdf” “C:\Output.xml” “C:\Config.ini” “JTPDFExtract.log”
In this we are referencing the executable for the module, the PDF in which data will be extracted from, the output extracted XML data, the configuration file used to set up the module’s specific functionality, and then the log file.
Setting Up the Configuration .ini:
The following is a sample configuration .ini file used when running JetTrac PDFExtract:
Mode=EXTRACTACRO
JobFieldName=Job_Name
PdfCopy=Y|”C:\ PDFCopy.pdf”
PdfName=N
SoundFile=N
XmlFile=Y|”C:\XMLFile.xml”
JpgFile=N
IdxFile=N
FlatFile=Y|”C:\FlatFile.pdf”
IdxFormat=XML
PdfNameFile=N
PdfFileNameField=
FieldsToDeleteInFlattenedPdf=Field1|Field2|Field3
This configuration .ini file contains key values which set up specific functionality within the module. Some of these, however, are included for backwards compatibility with earlier version of the module and will not be used, although they must be included within the file.
- Mode: This line should be set as EXTRACTACRO if extracting data from a AcroForm PDF, or EXTRACT if extracting from an XFA PDF.
- JobFieldName: This references the Job Name field located within the XML. Leave it set to Job_Name
- PdfCopy: Set this line to “N” if you do not wish to keep a copy of the PDF that you are extracting the data from. If you do wish to keep a copy, set the line to “Y” followed by a pipe “|” and then the full file path where you wish to save the copy of the PDF.
- PdfName: Will not be used. Keep this set as “N”.
- SoundFile: Will not be used. Keep this set as “N”.
- XmlFile: Set this line to “N” if you do not wish to keep a copy of the XML data you are extracting. If you do wish to keep a copy, set the line to “Y” followed by a pipe “|” and then the full file path where you wish to save it.
- JpgFile: Will not be used. Keep this set as “N”.
- IdxFile: Will not be used. Keep this set as “N”.
- FlatFile: Set this line to “N” if you do not wish to keep a flattened copy of the PDF you are extracting data from. If you do wish to keep a copy, set the line to “Y” followed by a pipe “|” and then the full file path where you wish to save it.
- IdxFormat: Will always be set to “XML”.
- PdfNameFile: Will not be used. Keep this set as “N”.
- PdfFileNameField: Will not be used. Keep this blank.
- FieldsToDeleteInFlattenedPdf: This key value allows you to list multiple fields that you wish to delete if you are saving a flattened version of the PDF. The list is separated by pipes “|” and must exactly match field names within the PDF that you are extracting data from.
Additional Notes:
While JetTrac PDFExtract is primarily used to extract XML data from an input PDF, this module is also used to create flattened copies of PDF’s. At the end of a job, or set of jobs, you will sometimes wish to create a non-fillable, flat copy of your PDF. This can be accomplished by running the PDF through JetTrac PDFExtract and within the configuration .ini file, setting FlatFile to “Y” and specifying the file path where you wish to save it.