Results 1 to 19 of 19

Thread: Extracting the URLs from a tweet

  1. #1
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location

    Extracting the URLs from a tweet

    I have cobbled together code which works but it's extremely slow and the PC sounds as if it wants to die when it runs it.

    The objective is copy and paste, from a tweet hyperlink, the http URL and the pic URL to other columns.

    For example, take a random tweet: "Secrets for crafting the perfect headline http://buff.ly/2dFmHJF pic.twitter.com/u8cE252Gur"

    We would end up with the existing hyperlink in one column, http://buff.ly/2dFmHJF in another column, and "pic.twitter.com/u8cE252Gur" in a third column.

    In addition to being slow and clumsy, my code also cannot discriminate between "pic.twitter" and "picture" or "Olympics", which means that a lot of the results are garbage.

    Also the 1:300000 ranges should actually dynamically restrict the range to the last used row.

    An added complication is that some tweets contain more than 1 URL. I had no idea how to tackle that when I did this code.

    So I'd be really grateful for help in streamlining this.

    '*************EXTRACT URLS FROM THE TWEET***********************************
        
    'It seems we need to add an additional space before http and pic for the URLs to be captured intact
    
    Dim Lrow As Long
    Dim rngC As Range
    
    With ActiveSheet
        Lrow = .Cells(Rows.Count, 1).End(xlUp).Row
        For Each rngC In .Range("A1:A" & Lrow)
            rngC.Replace what:="http", replacement:=" http"
        Next
    End With
    
    With ActiveSheet
        Lrow = .Cells(Rows.Count, 1).End(xlUp).Row
        For Each rngC In .Range("A1:A" & Lrow)
            rngC.Replace what:="pic", replacement:=" pic"
        Next
    End With
    
         
    'This extracts the URLs for http and pic addresses and places them in rows M and N. It then replaces the formulas with values.
          
    Range("M1:M300000").FormulaR1C1 = "=RIGHT(C[-12],LEN(C[-12])-FIND("" http"",C[-12]))"
    
    With Range("m1").CurrentRegion
            .Cells.Copy
            .Cells.PasteSpecial xlPasteValues
            .Cells(1).Select
    End With
    Application.CutCopyMode = False
          
    Range("N1:N300000").FormulaR1C1 = "=LEFT(C[-1],FIND("" "",C[-1]&"" "")-1)"
        
    With Range("n1").CurrentRegion
            .Cells.Copy
            .Cells.PasteSpecial xlPasteValues
            .Cells(1).Select
    End With
    Application.CutCopyMode = False
            
    Range("O1:O300000").FormulaR1C1 = "=RIGHT(C[-14],LEN(C[-14])-FIND("" pic"",C[-14]))"
    
    With Range("O1").CurrentRegion
            .Cells.Copy
            .Cells.PasteSpecial xlPasteValues
            .Cells(1).Select
    End With
    Application.CutCopyMode = False
              
    'This strips errors left over from the formulas
              
    Cells.Replace "#VALUE!", "", xlWhole
    
       'This reinserts the extra spaces put in before http and pic.
            
       Range("A1:A300000").Replace what:=" http", _
                                replacement:="http", _
                                LookAt:=xlPart, _
                                SearchOrder:=xlByRows, _
                                MatchCase:=False, _
                                SearchFormat:=False, _
                               ReplaceFormat:=False
      
       Range("A1:A300000").Replace what:=" pic", _
                                replacement:="pic", _
                                LookAt:=xlPart, _
                                SearchOrder:=xlByRows, _
                                MatchCase:=False, _
                                SearchFormat:=False, _
                               ReplaceFormat:=False

  2. #2
    VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,710
    Location
    VBA Rules:
    All http URLs start with "http://" and end with... the next space? "pic.twi"? What?

    All Image URLs start with "pic.twitter.com/" and end with... what? The next Space? The next non-AlphaNumeric Character?
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  3. #3
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location
    Thanks Sam. Yes, http URLs start with "http://" or https://* and usually end with the next space.

    Image URLs start with "pic.twitter.com/" and usually end with the next space.

    Except often they are at the end of the string (so no space or character afterwards).

  4. #4
    snb
    Guest
    Please post a sample of 20 elements.
    You should use arrays to do the splitting: avoid worksheet interaction.

  5. #5
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location

    Sample attached

    Quote Originally Posted by snb View Post
    Please post a sample of 20 elements.
    You should use arrays to do the splitting: avoid worksheet interaction.
    A sample workbook attached (see sheet1).

    Arrays sounds a great approach.

    Tweet URLs.xlsx

  6. #6
    snb
    Guest
    In your file:
    In the codemodule of sheet1:

    Sub M_snb()
       sn = Cells(1).CurrentRegion.Resize(, 3)
       For j = 1 To UBound(sn)
          If InStr(sn(j, 1), "http://") Then sn(j, 2) = "http://" & Split(Split(Replace(sn(j, 1), Chr(160), " "), "http://")(1))(0)
          If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = "pic.twitter." & Split(sn(j, 1), "pic.twitter.")(1)
       Next
       
       Cells(30, 1).Resize(UBound(sn), 3) = sn
    End Sub
    or
    Sub M_snb()
       Columns(1).Replace Chr(160), " "
       Columns(1).Replace "pic.", " pic."
       
       sn = Cells(1).CurrentRegion.Resize(, 3)
       For j = 1 To UBound(sn)
          If InStr(sn(j, 1), "http://") Then sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
          If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
       Next
       
       Cells(30, 1).Resize(UBound(sn), 3) = sn
    End Sub
    Last edited by snb; 01-20-2017 at 08:46 AM.

  7. #7
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location
    Thank you. I am getting Compile error - variable not defined at line:
       sn = Cells(1).CurrentRegion.Resize(, 3)
    .

    How could that be fixed?

    Thanks.

  8. #8
    VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,710
    Location
    Dim sn As Variant
    Dim j as Long
    Dim sn
    Dim j
    Dim sn, j
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  9. #9
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location
    Thanks, Sam - that's terrific.

    How would the code need to change, so that instead of splitting the "http*" and "pic*), they were cut from the original text.

    So the result would be:

    #AmyGoodman of #DemocracyNow is facing riot charges in ND for a #FacebookLive post in Sept. http://www.prnewsonline.com/democracy-now-facebook-live pic.twitter.com/E8xWQ1mnc3
    #antibullyingweek: We’ve joined forces w/ @ditchthelabel. Our interactive infographic looks at hate speech online http://bit.ly/2fRo6uN pic.twitter.com/2wPMjxfSn3
    #Apple Anticipation: Buzz Building for Tomorrow’s Event—But Who's Buying? http://buff.ly/2bWCFtL pic.twitter.com/Qy7Ig8fg2X
    #Apple Anticipation: Buzz Building for Tuesday's Event—But Who's Buying? http://ln.is/bulldogreporter.com/TyJh7
    #Apple maintains position as most relevant brand via @BulldogReporter: http://ht.ly/WAIo305YBsw
    #Apple responds to diversity criticism: "We had a Canadian" onstage at iPhone 7 event @MelanieHannah for @mic http://ow.ly/j8aJ3049dIC
    #Are We Social Fakes? Only 11% of Millennials Believe People Are Authentic on Social @Cynny … #PR #Publicrelations http://ln.is/bulldogreporter.com/P5yXX
    #ArtificialIntelligence-powered malware is coming - and it's going to be terrifying #AI #cybersecurity #IoT http://read.bi/2cYOZym pic.twitter.com/Hd2qTbYOK3
    #Asia the best placed to drive e-commerce innovation @Accenture's Simon Eaves, @CampaignAsia http://ow.ly/gwOb30429w1 pic.twitter.com/RTfneM3MJ6
    #AskanAnalyst: See how @FR314 unleashes the power of Brandwatch Audiences - http://brnw.ch/2dorGzv pic.twitter.com/RtAum9R9FY
    #AskAnExpert: @G_Price tells us what it takes to uncover real social insights - http://brnw.ch/2d0HuDK pic.twitter.com/FW4f9on37z
    #AskAnExpert: @Kit_Smith has essential advice on how brands can expertly position themselves - http://brnw.ch/2cNd89c pic.twitter.com/27l87sIBG7
    #AskAnExpert: @NathalieNahai looks at the psychology that drives social media behavior - http://brnw.ch/2d73Ln3 pic.twitter.com/iQFapbkzMn
    #AskAnExpert: Brand loyalty, we all want it but how do we get it? Use these tips to build a solid foundation - http://brnw.ch/2e10ovN pic.twitter.com/u78zPzzOvY
    #AskAnExpert: Disconnects in data can be an effective way of identifying insights - http://brnw.ch/2d0HuDK pic.twitter.com/I0uedq2YgY
    #AskAnExpert: How to guide your customers' perceptions of your brand with a brand positioning statement - http://brnw.ch/2cNd89c pic.twitter.com/4T69cmRcXq
    #Australia first developed print market in world to kill audits–advertisers say nothing Tim Burrowes, @Mumbrellanews http://mumbrel.la/2i2UgqC
    #Australia hasn't had a recession in 25 years--what the rest of the world can learn @wef http://wef.ch/2cSW3I1 pic.twitter.com/7s77EwMWX3
    #AutumnStatement – Business should communicate its 'red lines' to Government now: @MarkHenryGlover @NewingtonComms http://bit.ly/2fIF9y4 pic.twitter.com/HBuMHTuDfD
    #AWCIC16 Cardiff Model for Reducing Violence using A&E Data link Via Prof Shepherd http://bit.ly/2gkrSPq pic.twitter.com/HOZDrMg3tY
    #B2B Marketers See Customer Experience As An Exciting Opportunity. So What Are Their #CX Priorities? http://bit.ly/2fwXkqU
    #Baidu launches medical #chatbot to help Chinese doctors diagnose patients @jjvincent for @verge http://ow.ly/3BOR305cjvE
    #BehindTheHeadlines with @LauraBQuigley VP of #Comms @CMAphysicians: | #branding #PR #Media http://bit.ly/2fz3KcP pic.twitter.com/G73GNQgOhv
    #BehindtheHeadlines With Nathan Friedman (@nathanf99) http://bit.ly/2ffjxNL pic.twitter.com/L743dIaGYV

  10. #10
    snb
    Guest
    Keep it simple: remove 'Option Explicit'.
    Sub M_snb() 
      Columns(1).Replace Chr(160), " " 
      Columns(1).Replace "pic.", " pic." 
         
      sn = Cells(1).CurrentRegion.Resize(, 3) 
      For j = 1 To UBound(sn)
        sn(j, 1)= trim(Split(sn(j, 1), "http://")(0))
        sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0) 
        If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0) 
      Next 
         
      Cells(30, 1).Resize(UBound(sn), 3) = sn 
    End Sub

  11. #11
    snb
    Guest
    see #10

  12. #12
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location
    Thanks snb, but when using the code in comment #10, I am getting an error: "Run time error 9 Subscript out of range", and the debug points to this line:
    sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
    .

  13. #13
    snb
    Guest
    In that case you'll have to adapt the code.

  14. #14
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location
    I have tried to reply to my own question in comment #9 (How would the code need to change, so that instead of splitting the "http*" and "pic*), they were cut from the original text) with this code, adapted from http://stackoverflow.com/questions/2...other-cell-vba, but I've got the ranges wrong (and maybe other things beside). Grateful for any help.

    Option Explicit
    
    Sub M_snb()
    
    Dim sn As Variant
    Dim j As Long
    
        Columns(1).Replace Chr(160), " "
        Columns(1).Replace "pic.", " pic."
         
        sn = Cells(1).CurrentRegion.Resize(, 3)
        For j = 1 To UBound(sn)
            If InStr(sn(j, 1), "http") Then sn(j, 2) = Filter(Split(sn(j, 1)), "http")(0)
            If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
        Next
         
        Cells(30, 1).Resize(UBound(sn), 3) = sn
    
    sn = Cells(1).CurrentRegion.Resize(, 3)
    For j = 1 To UBound(sn)
    
    Dim rLongString As Range
    Dim rShortString As Range
    Dim I As Long
    Dim LastRow As Long
    
    Set rLongString = Range("A30:A" & LastRow)
    Set rShortString = Range("B30:C" & LastRow)
    
    For I = 1 To rLongString.Count
        rLongString(I).Replace what:=rShortString(I), replacement:="", lookat:=xlPart
    Next I
    Next
    
    End Sub

  15. #15
    snb
    Guest
    Sub M_snb() 
        Columns(1).Replace Chr(160), " " 
        Columns(1).Replace "pic.", " pic." 
         
        sn = Cells(1).CurrentRegion.Resize(, 3) 
        For j = 1 To UBound(sn) 
            if instr(sn(j,1),"http://") then
               sn(j, 1)= trim(Split(sn(j, 1), "http://")(0)) 
               sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0) 
            end if
            If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0) 
        Next 
         
        Cells(30, 1).Resize(UBound(sn), 3) = sn 
    End Sub

  16. #16
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location
    Quote Originally Posted by snb View Post
    Sub M_snb() 
        Columns(1).Replace Chr(160), " " 
        Columns(1).Replace "pic.", " pic." 
         
        sn = Cells(1).CurrentRegion.Resize(, 3) 
        For j = 1 To UBound(sn) 
            if instr(sn(j,1),"http://") then
               sn(j, 1)= trim(Split(sn(j, 1), "http://")(0)) 
               sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0) 
            end if
            If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0) 
        Next 
         
        Cells(30, 1).Resize(UBound(sn), 3) = sn 
    End Sub
    Thanks snb. I am getting Runtime error 9: Subscript out of range at this line:

    sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)

    Please could you advise a fix?

  17. #17
    snb
    Guest
    Not with the data in the file you posted.

  18. #18
    VBAX Regular
    Joined
    Oct 2014
    Posts
    95
    Location
    Thanks snb. My error message won't go away. I've tried to source a cure, but it seems to be one of those errors with lots of possible causes.

    I am on Excel 2010. Don't know if that makes a difference.

  19. #19
    snb
    Guest
    It has only got to do with the content of your data, not with the VBA code.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •