I'm programming a web scraper to get into making networks requests but I'm having trouble trying to get the text in the death notes.

I'm programming a web scraper to get into making networks requests but I'm having trouble trying to get the text in the death notes. The text I'm looking to scrape is "piedie" for example. It doesn't show up in the source nor does it show in the requests in the network tab. However it appears in the HTML when I inspect element which I believe it means it's DOM content. How would I scrape this?

Thalidomide Vintage Ad Shirt $22.14

Homeless People Are Sexy Shirt $21.68

Thalidomide Vintage Ad Shirt $22.14

  1. 2 months ago
    Anonymous

    Here's the Website btw. Every tutorial I look up on web scraping keeps saying either some magical seperate site or API request they're finding but I don't see it. The only thing that gets loaded in the network requests is the "viewReport.php" which doesn't have the text.
    >https://blankmediagames.com/Trial/viewReport.php?id=3195768

    • 2 months ago
      Anonymous

      The text on the notes is base64 encoded in the "data-info" html attribute of the notes

      What is this homosexual gibberish in the first place? I hope you're doing all this to dox troons.

      • 2 months ago
        Anonymous

        It's from a game called "Town of Salem" a social deduction game. There's a trail report system where if two people report the same person it goes to this website where people can publicly view and "juriors" can vote on whether the repot is guilty or innocent (very meta). After a certain amount of time it gets closed without judgement and archived. I want to scrape all of the reports as a sort of web scraping experiment and to also compare some data in mass.

        • 2 months ago
          Anonymous

          Now the question is would I get noticed if I make approximately over 812,000 requests? That's make I need to make to get the remaining reports and could I do this all with using "requests" for python and not "selenium"? I don't won't to have to use a headless browser since I think it'll take up too much ram.

          • 2 months ago
            Anonymous

            You can do it very easily just using requests. Just do something like this:

            import requests
            import base64
            from bs4 import BeautifulSoup

            s = requests.Session()
            resp = s.get("https://blankmediagames.com/Trial/viewReport.php?id=3195768")
            soup = BeautifulSoup(resp.text)
            notes = [
            base64.b64decode(elem.attrs["data-info"]).decode()
            for elem in soup.find_all(class_="note")
            ]
            print(notes)

            I generally wait 1 sec between each request when scraping to not get blocked. That would take roughly 10 days with 812k requests though, so up to you

          • 2 months ago
            Anonymous

            they only care if you are downloading massive amounts of data, it's not about the number of requests. it's gonna raise their attention if one IP is downloading 100gb+ of data, and i doubt you are going to be doing that with this website

          • 2 months ago
            Anonymous

            nobody checks logs

          • 2 months ago
            Anonymous

            >812,000 requests
            Holy hell. Have you considered just... asking them to send you the data?

          • 2 months ago
            Anonymous

            they will never do that freely. either they will charge you, or claim that it goes against the privacy agreement

          • 2 months ago
            Anonymous

            how did you get that number ? are you saying that there are 812K reports ?

          • 2 months ago
            Anonymous

            Allow me explain, I did obviously the most smart thing here in this situation. So for every report there's an ID like if you go to https://blankmediagames.com/Trial/viewReport.php?id=1 it shows the very first report. However every report after the 5th and I think the 3195764th is archived either due them being old but most of them is because of a data breach (classic Josh). So from that number up to the latest report number which is 4015140. So I have 819,376 reports that I can scrape. I have to confess that I'm not sure about this number (3195764) because so far the way I've been going about this is changing the url id in increments of 100's or 10's to find which reports are availabe. I haven't automated this yet, give me a chance.

            >812,000 requests
            Holy hell. Have you considered just... asking them to send you the data?

            I have but I think they'd say no and begin checking logs...

          • 2 months ago
            Anonymous

            >most of them is because of a data breach
            *archived, information about when and who was reported can be view via DMing their Discord bot but it can't be seen visting the link directly.

      • 2 months ago
        Anonymous

        meds, now

  2. 2 months ago
    Anonymous

    The text on the notes is base64 encoded in the "data-info" html attribute of the notes

    • 2 months ago
      Anonymous

      Oh damn, thx anon. Can other information be encoded in other ways in there as well or is just base64?

      • 2 months ago
        Anonymous

        No idea, I just took a quick look at the html of the site and noticed the base64 encoding on each of the notes and attempted to decode it. Taking a quick look, there doesn't seem to be anything other than the notes that is base64 encoded at least

  3. 2 months ago
    Anonymous

    okay, I'd say pay for a vpn that allows multiple devices, setup some VMs in your pc, set different locations on each one and run that python stuff they put earlier, pick a decent wait time, switch IPs everynow and then, I guess it depends how fast you need to get hold of the data. I have done this in the past to override file sharing site limitations, just an idea

    • 2 months ago
      Anonymous

      what the heck, why would you do that instead of using http proxies?

      • 2 months ago
        Anonymous

        >http proxies
        That's a thing? I'm googling this, I'm learning something new everyday on here.

        • 2 months ago
          Anonymous

          yep, it lets the script itself use multiple IPs instead of having to set your whole computer to a VPN

Your email address will not be published. Required fields are marked *