>be me. >trying to scrape a semi-large website, not a F500 business or anything but decently sized

>be me
>trying to scrape a semi-large website, not a F500 business or anything but decently sized
>need to send roughly 25000 get requests in total
>they keep rate limiting and IP banning me every ~600 requests
>legally there's nothing they can do because all of this is public with no login
>still annoying bc it takes 30 mins to change VPN and get everything going again
>they're also ready as soon as my crawler restarts so I usually have to wait until midnight to start scraping again
>realize that they're probably going to implement anti-bot measures soon because they clearly know what i'm doing and are actively trying to stop me, but haven't gotten around to automating it yet
>as a last ditch effort, start researching company more and more to see if there's a way around this, maybe someone has had the issue before
>end up on company's website, the lead software dude's headshot and contact is under the "about us" section for some reason
>google the guy's name
>see an obituary pop up, wtf
>mfw his teenage daughter had died in a car accident a week or so prior
>see funeral date posted, company i'm trying to scrape is involved
>wait for funeral to happen
>start scraping
>...
>3 hours of scraping and I haven't been IP banned
>mfw everyone had the day off and was probably mourning at the funeral
>mfw i scraped tens of thousands of listings from their site over the next 20 hours
>even went overboard and started scraping shit I didn't think was possible/needed bc of time constraints
>they ended up adding a more robust login system as well as captchas not even a week later but by then it was too late, i already had everything

never give up bros

Unattended Children Pitbull Club Shirt $21.68

DMT Has Friends For Me Shirt $21.68

Unattended Children Pitbull Club Shirt $21.68

  1. 1 month ago
    Anonymous

    incredibly based. bumping so reddit trannies can repost this.

    • 1 month ago
      Anonymous

      Put me in the screencap

  2. 1 month ago
    Anonymous

    Even if this a large. There's something fun about web scraping and botting

  3. 1 month ago
    Anonymous

    well done anon

  4. 1 month ago
    Anonymous

    What were you scraping anon?

    • 1 month ago
      Anonymous

      i won't say what it is specifically for obvious reasons, but it's a website that sells things online, auction style. so the idea is we are ripping information about the items sold and will use it for a project, similar to an aggregator but more advanced.

      • 1 month ago
        Anonymous

        Oh, so Bidspotter?

      • 1 month ago
        Anonymous

        I don't exactly understand the purpose, wouldn't the information be instantly outdated? Or is that not relevant?

  5. 1 month ago
    Anonymous

    >scrape detection bot is disabled when employees aren't in the office
    ???

    • 1 month ago
      Anonymous

      >tfw company "scrape detection bot" is just a sysadmin with tmux open on one screen

  6. 1 month ago
    Anonymous
  7. 1 month ago
    Anonymous

    >mfw his teenage daughter had died in a car accident a week or so prior
    God killed his daughter for trying to hinder your efforts.

  8. 1 month ago
    Anonymous

    Apart from OP being an absolute homosexual as usual, you could just use residential proxies for that.

    • 1 month ago
      Anonymous

      Not OP, but recommend me a provider. Everything I've seen looks sketchy af.

      • 1 month ago
        Anonymous

        Same

      • 1 month ago
        Anonymous

        roundproxy
        lemonclub

  9. 1 month ago
    Anonymous

    King

  10. 1 month ago
    Anonymous

    Noice one

  11. 1 month ago
    Anonymous

    Nothing wrong with a good scrape.
    Recommend headless selenium.
    Recommend "Developer Tools" to see if a API call hit comes up. Worst case use selenium-wire library to packet sniff the API/JSON call internally, if they have the API guarded by cookies and tokens and shit. That can make parsing much easier.
    Worst case us a proxy rotator library and you don't need to be rebooting VPNs
    Beyond user agent and cookie frickery, also note browser fingerprinting is a thing so ideally start modulating that too.

    I was working at a very seedy pro-scraping startup and my scraping got so aggressive and sophisticated (I went for the Gibson computer) that they had to let me go :/

    • 1 month ago
      Anonymous

      Whats a good way to protect your web api from scraping, you already mentioned some things, anything else you would see difficult? Maybe some time based shit? In the end its all security through obfuscation i guess as the official client needs to be able to request it still

  12. 1 month ago
    Anonymous

    we do a little scraping

  13. 1 month ago
    Anonymous

    homosexual

  14. 1 month ago
    Sneedy Pie

    I stopped scrolling because I saw a cool combo of two characters I like - Pepe the Frog and Coolface.
    I read the post and I enjoyed the story.
    Thanks for sharing, OP. Glad you got your goods in the end.

  15. 1 month ago
    Anonymous

    scraping status: scraped beyond the grave

  16. 1 month ago
    Anonymous

    You're a bad person.

    • 1 month ago
      Anonymous

      Sysadmin/Security engineers shouldn't let their data be publicly facing if they don't want it scraped/looked at
      This is an easy fix on the companies side that takes less than an hour at most to implement

      • 1 month ago
        Sneedy Pie

        t. architect of that particular websight

  17. 1 month ago
    Anonymous

    I've given up on you. I'm sorry for your mother and father. I wouldn't be able to talk to mine if I did something ghoulish like that.

  18. 1 month ago
    Anonymous

    cringe

  19. 1 month ago
    Anonymous

    properly evil

  20. 1 month ago
    Anonymous

    Use the proper APIs Black person

  21. 1 month ago
    Anonymous

    Dude how did it go from netscape and IE having a "save website for offline viewing" button to scraping being the greatest sin? Also what the frick is google doing if not scraping every fricking website known to man?

  22. 1 month ago
    Anonymous

    homosexual

  23. 1 month ago
    Anonymous

Your email address will not be published. Required fields are marked *