Excel

Adobe Acrobat PDF file import

Ease of Use

Intermediate

Version tested with

2000 

Submitted by:

Felix Atagong

Description:

This is a quick and dirty way to parse text-based pdf files into Excel using a freeware command line utility called pdftotext.exe... 

Discussion:

 

Code:

instructions for use

			

Sub OpenPDF() '-------------------------------------------' ' You need to create a bat file first with one single line of text ' pdftotext.exe -layout YourPage.pdf ' DOWNLOAD LINK: http://www.foolabs.com/xpdf/download.html '-------------------------------------------' ' these lines look for a pdf file in your My Documents folder Set WshShell = CreateObject("WScript.Shell") ChDir (WshShell.SpecialFolders("MyDocuments")) PageName = Application.GetOpenFilename("YourPage, *.pdf", , "YourPage") ' if no file is picked the macro ends If PageName = "False" Then Exit Sub End If ' copies and renames the pdf file to the pdf2txt folder FileCopy PageName, "C:\pdf2txt\YourPage.pdf" ChDir ("C:\pdf2txt") ' THE BATFILE CONTAINS ONLY 1 LINE: ' pdftotext.exe -layout YourPage.pdf TestValue = Shell("YourPage.bat", 1) ' because the bat file runs for 1 or 2 seconds (in my case) ' I let the Excel macro wait 5 seconds before doing anything else ' there are more ingenious ways for VBA to wait for the end of an ' application, but this suits me fine... Application.Wait (Now + TimeValue("0:00:05")) ChDir "C:\pdf2txt" PageName = "C:\pdf2txt\YourPage.txt" ' the following reads the text that has been generated Call ReadTextFile ' insert your text parsing - text to columns - ingenious vba stuff hereafter... End Sub Sub ReadTextFile() Dim FileNum As Integer Dim r As Integer Dim wb As Workbook Dim Data As String r = 1 FileNum = FreeFile Set wb = Workbooks.Add Open PageName For Input As #FileNum Do While Not EOF(FileNum) Line Input #FileNum, Data ActiveSheet.Cells(r, 1) = Data r = r + 1 Loop Close #FileNum End Sub

How to use:

  1. First of all download the freeware pdf utilities at http://www.foolabs.com/xpdf/download.html.
  2. Although there are several programs in the download the only one you need to keep is pdftotext.exe. This utility will extract text from pdf files, so scanned pdf documents (containing only images) will result in an error.
  3. Copy pdftotext.exe in a folder called C:\pdf2txt.
  4. Open Notepad and type the following: pdftotext.exe -layout YourPage.pdf
  5. Save the bat-file containing the above line in the pdf2txt folder as Yourpage.bat (not YourPage.bat.txt or YourPage.txt).
 

Test the code:

 

Sample File:

No Attachment 

Approved by Jacob Hilderbrand


This entry has been viewed 563 times.

Please read our Legal Information and Privacy Policy
Copyright @2004 - 2020 VBA Express