Consulting

Results 1 to 11 of 11

Thread: Read Image Pdf Attachment Using Modi Ocr Then Extract Certain Text & Use As Filename

  1. #1
    VBAX Regular
    Joined
    Sep 2012
    Posts
    8
    Location

    Read Image Pdf Attachment Using Modi Ocr Then Extract Certain Text & Use As Filename

    Hi Wondering if anyone can help me?

    Im a newbie when it comes to programming, after extensive research on the subject ive come up with a few bits of code. I was wondering if there is someone out there to help me get this to work!

    Basically what i want to do is set up a outlook rule that saves pdf attachments as they come into my inbox, but before saving them perform ocr on them as they are image pdf's. Then save the now text readable pdf using the words found within the file between Purchase Order: and Job Number: as the filename.

    I realise modi does not support pdf as its a microsoft addon but i can open pdf's manually in modi so im guesing this can be automated aswell?

    I've used this code to successfully save image pdf files to a folder based on a outlook rule:

    [VBA]Public Sub saveattachtoDisk(itm As Outlook.MailItem)
    Dim objAtt As Outlook.Attachment
    Dim saveFolder As String
    saveFolder = "c:\temp\"
    For Each objAtt In itm.Attachments
    objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
    Set objAtt = Nothing
    Next
    End Sub[/VBA]

    This next snippet of code is supposed to run ocr using modi:

    [VBA]Function GetOCRText(TheFile As String) As String
    On Error GoTo PROC_ERR
    If TheFile = "" Then Exit Function
    Dim MyDoc As Object ' MODI.document
    Dim MyLayout As Object ' MODI.Layout
    Set MyDoc = CreateObject("MODI.document") ' New MODI.document
    MyDoc.Create TheFile
    MyDoc.Images(0).OCR
    Set MyLayout = MyDoc.Images(0).Layout
    For Each TheWord In MyLayout.Words
    Result = Result & " " & TheWord.Text
    Next TheWord
    Result = Result & vbCrLf & vbCrLf
    GetOCRText = Result
    Set MyLayout = Nothing
    MyDoc.Close False
    Set MyDoc = Nothing
    PROC_ERR:
    End Function[/VBA]

    Please could someone point me in the right direction or give me some code to work with.

    Many Thanks
    Last edited by Aussiebear; 10-10-2012 at 02:22 AM. Reason: Added the correct tags to the supplied code

  2. #2
    VBAX Regular
    Joined
    Sep 2012
    Posts
    8
    Location
    Just a quick note to add:

    Im on Windows xp using outlook 2007 and Microsoft Office Document Imaging 12.0 Type Library

  3. #3
    VBAX Regular
    Joined
    Sep 2012
    Posts
    8
    Location
    Hi, is there anyone out there that can help me with this?

    Many Thanks

  4. #4
    Quote Originally Posted by nickj
    Hi Wondering if anyone can help me?
    Why have you posted the same question in another thread in this forum?

    Basically what i want to do is set up a outlook rule that saves pdf attachments as they come into my inbox, but before saving them perform ocr on them as they are image pdf's. Then save the now text readable pdf using the words found within the file between Purchase Order: and Job Number: as the filename.
    Try the following code to OCR the file and extract the file name as you describe. I think you'll need to write code which saves the .pdf file attachment to a temporary file/folder before running the code below on it, because it does the OCR on a local file. The code uses early binding, so you must set a reference to the MODI library in your VBA project.
    Sub Test()
        Dim purchaseOrderFileName As String
        purchaseOrderFileName = Get_Purchase_Order("c:\folder1\folder2\attachment.pdf")
    End Sub
    
    Function Get_Purchase_Order(fileName As String) As String
        
        Dim MDoc As MODI.Document
        Dim MLayout As MODI.Layout
        Dim MWord As MODI.Word
        Dim OCRtext As String
        Dim p1 As Long, p2 As Long
        
        Set MDoc = New MODI.Document
        
        MDoc.Create fileName
        MDoc.Images(0).OCR
        
        Set MLayout = MDoc.Images(0).Layout
        OCRtext = ""
        For Each MWord In MLayout.Words
            OCRtext = OCRtext & " " & MWord.Text
        Next
        MDoc.Close False
        
        Get_Purchase_Order = ""
        
        p1 = InStr(OCRtext, "Purchase Order:")
        If p1 > 0 Then
            p1 = p1 + Len("Purchase Order:")
            p2 = InStr(p1, OCRtext, "Job Number:")
            If p2 > 0 Then Get_Purchase_Order = Mid(OCRtext, p1, p2 - p1)
        End If
        
        Set MLayout = Nothing
        Set MDoc = Nothing
    
    End Function

  5. #5
    VBAX Regular
    Joined
    Sep 2012
    Posts
    8
    Location
    Hi Crocus Crow,

    Thank you for the snippet of code..to answer your questions, the reason why I reposted the question was because i didnt get any response on this thread i posted, So i decided to post it on a thread that was relevant.

    I have tried to run your code but i get an error Run-time error '-959967229 (c6c81003)': file is empty or corrupted

    I then tried changing the file extension in line 3 of your code to .tif and then i started getting the error:

    Run-time error '-959966950 (c6c8111a)': IO error

    Finally when you say that the code uses early binding does that mean i need to set a reference to the MODI library in the code or is selecting the reference in the tools menu in VBA enough?

    Many Thanks

    Nick

  6. #6
    Quote Originally Posted by nickj
    I have tried to run your code but i get an error Run-time error '-959967229 (c6c81003)': file is empty or corrupted
    Does the file OCR successfully when you do it manually in MODI (the MS Office Document Imaging application). If it does then the code should also work and OCR the text the successfully.

    I then tried changing the file extension in line 3 of your code to .tif and then i started getting the error:

    Run-time error '-959966950 (c6c8111a)': IO error
    The code should work with .tif and .jpg files amongst others. As above, try the file manually in MODI.

    Finally when you say that the code uses early binding does that mean i need to set a reference to the MODI library in the code or is selecting the reference in the tools menu in VBA enough?
    Early binding means the code uses named MODI object data types (instead of the generic VBA Object type) as in the following lines:
    [vba]
    Dim MDoc As MODI.Document
    Dim MLayout As MODI.Layout
    Dim MWord As MODI.Word
    Set MDoc = New MODI.Document
    [/vba]
    Therefore you must set a reference to the library in the VBA editor in the Tools - References menu, otherwise VBA won't recognise the MODI data types and give an error.
    Last edited by SamT; 03-03-2016 at 09:48 AM. Reason: Removed outdated link

  7. #7
    VBAX Regular
    Joined
    Sep 2012
    Posts
    8
    Location
    Hi Crocus Crow, thank you so much for your feedback! I didnt recieve a email saying you had replied to my post so I thought it had not been responded to!

    I have since seen your post thank you. I have managed to get rid of the errors by using a .tif extension. Only problem now is the code runs without errors but does not change the filename of attachment.tif it remains the same! Why would this be?

    Please be advised I want to run this code everytime this snippet of code runs. How would I achieve this? Here is the snippet:

    [VBA]Public Sub saveattachtoDisk(itm As Outlook.MailItem)
    Dim objAtt As Outlook.Attachment
    Dim saveFolder As String
    saveFolder = "c:\temp\"
    For Each objAtt In itm.Attachments
    objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
    Set objAtt = Nothing
    Next
    End Sub
    [/VBA]

  8. #8
    VBAX Master
    Joined
    Jul 2006
    Location
    Belgium
    Posts
    1,289
    Location
    This coding will do what you want. No error checking for double names !!!
    You could use the dir statement to count the no of files and to add a sequential number to the filename.
    But, will save all attachments, also pictures used as signature.
    [VBA]Public Sub saveattachtoDisk(itm As Outlook.MailItem)
    'attachment
    Dim objAtt As Outlook.Attachment
    'number of attachments
    Dim Attcount As Long
    'savefolder
    Dim saveFolder As String
    saveFolder = "c:\temp\"
    'if no attachments, skip
    If itm.Attachments.Count <> 0 Then
    'loop through attachments
    For Attcount = 1 To itm.Attachments.Count
    Set objAtt = itm.Attachments.item(Attcount)
    objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
    Set objAtt = Nothing
    Next Attcount
    End If
    End Sub[/VBA]Charlize

  9. #9
    VBAX Regular
    Joined
    Sep 2012
    Posts
    8
    Location
    Hey Charlize, thank you so much for your feedback I'm wanting to use this piece of code:

    [VBA] Public Sub saveattachtoDisk(itm As Outlook.MailItem)
    'attachment
    Dim objAtt As Outlook.Attachment 'number of attachments
    Dim Attcount As Long
    'savefolder
    Dim saveFolder As String
    saveFolder = "c:\temp\"
    'if no attachments, skip
    If itm.Attachments.Count <> 0 Then
    'loop through attachments
    For Attcount = 1 To itm.Attachments.Count
    Set objAtt = itm.Attachments.item(Attcount)
    objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
    Set objAtt = Nothing
    Next Attcount
    End If
    End Sub
    [/VBA]

    With this piece of code, so they work together:

    [VBA]
    Sub Test()
    Dim purchaseOrderFileName As String
    purchaseOrderFileName = Get_Purchase_Order("c:\folder1\folder2\attachment.pdf")
    End Sub

    Function Get_Purchase_Order(fileName As String) As String

    Dim MDoc As MODI.Document
    Dim MLayout As MODI.Layout
    Dim MWord As MODI.Word
    Dim OCRtext As String
    Dim p1 As Long, p2 As Long

    Set MDoc = New MODI.Document

    MDoc.Create fileName
    MDoc.Images(0).OCR

    Set MLayout = MDoc.Images(0).Layout
    OCRtext = ""
    For Each MWord In MLayout.Words
    OCRtext = OCRtext & " " & MWord.Text
    Next
    MDoc.Close False

    Get_Purchase_Order = ""

    p1 = InStr(OCRtext, "Purchase Order:")
    If p1 > 0 Then
    p1 = p1 + Len("Purchase Order:")
    p2 = InStr(p1, OCRtext, "Job Number:")
    If p2 > 0 Then Get_Purchase_Order = Mid(OCRtext, p1, p2 - p1)
    End If

    Set MLayout = Nothing
    Set MDoc = Nothing

    End Function
    [/VBA]

    How can I get these two pieces of code to work with one another? I'm a programming newbie, so you probally laughing at me right now :P

    Basically in a nut shell what im trying to achieve is saving a pdf attachment from outlook into a folder, then doing ocr on the saved image pdf file using modi (microsoft office document imaging) and then saving the now text readable pdf with a file name extracted from within the string of the file between the words Purchase Order: and Job No: calling the macro from within outlook using a script mailing rule. If you need a template of the image pdf file I can provide it.

    Any help or pointers would be great.

    Many Thanks

    Nick

  10. #10
    VBAX Newbie
    Joined
    Sep 2015
    Posts
    3
    Location
    Hi, Nick.
    Thanks for sharing those code. I will check it later and send you feedback soon.
    Best Regards,
    Pan

    I am testing about PDF extraction sdks to extract text from pdf files, any ideas?


    Next Tomorrow is Another Day.


  11. #11
    VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    Why have you posted the same question in another thread in this forum?
    Please post a link here to that thread so I can delete that post
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •