I'm programming a web scraper to get into making networks requests but I'm having trouble trying to get the text in the death notes.

Posted on March 13, 2024 by Anon

I'm programming a web scraper to get into making networks requests but I'm having trouble trying to get the text in the death notes. The text I'm looking to scrape is "piedie" for example. It doesn't show up in the source nor does it show in the requests in the network tab. However it appears in the HTML when I inspect element which I believe it means it's DOM content. How would I scrape this?

Thalidomide Vintage Ad Shirt $22.14

Homeless People Are Sexy Shirt $21.68

Thalidomide Vintage Ad Shirt $22.14

2 months ago

Reply

Anonymous

Here's the Website btw. Every tutorial I look up on web scraping keeps saying either some magical seperate site or API request they're finding but I don't see it. The only thing that gets loaded in the network requests is the "viewReport.php" which doesn't have the text.
>https://blankmediagames.com/Trial/viewReport.php?id=3195768
- 2 months ago
  
  Reply
  
  Anonymous
  
  The text on the notes is base64 encoded in the "data-info" html attribute of the notes
  
  What is this homosexual gibberish in the first place? I hope you're doing all this to dox troons.
  - 2 months ago
    
    Reply
    
    Anonymous
    
    It's from a game called "Town of Salem" a social deduction game. There's a trail report system where if two people report the same person it goes to this website where people can publicly view and "juriors" can vote on whether the repot is guilty or innocent (very meta). After a certain amount of time it gets closed without judgement and archived. I want to scrape all of the reports as a sort of web scraping experiment and to also compare some data in mass.
    - 2 months ago
      
      Reply
      
      Anonymous
      
      Now the question is would I get noticed if I make approximately over 812,000 requests? That's make I need to make to get the remaining reports and could I do this all with using "requests" for python and not "selenium"? I don't won't to have to use a headless browser since I think it'll take up too much ram.
      - 2 months ago
        
        Anonymous
        
        You can do it very easily just using requests. Just do something like this:
        
        import requests
        import base64
        from bs4 import BeautifulSoup
        
        s = requests.Session()
        resp = s.get("https://blankmediagames.com/Trial/viewReport.php?id=3195768")
        soup = BeautifulSoup(resp.text)
        notes = [
        base64.b64decode(elem.attrs["data-info"]).decode()
        for elem in soup.find_all(class_="note")
        ]
        print(notes)
        
        I generally wait 1 sec between each request when scraping to not get blocked. That would take roughly 10 days with 812k requests though, so up to you
      - 2 months ago
        
        Anonymous
        
        they only care if you are downloading massive amounts of data, it's not about the number of requests. it's gonna raise their attention if one IP is downloading 100gb+ of data, and i doubt you are going to be doing that with this website
      - 2 months ago
        
        Anonymous
        
        nobody checks logs
      - 2 months ago
        
        Anonymous
        
        >812,000 requests
        Holy hell. Have you considered just... asking them to send you the data?
      - 2 months ago
        
        Anonymous
        
        they will never do that freely. either they will charge you, or claim that it goes against the privacy agreement
      - 2 months ago
        
        Anonymous
        
        how did you get that number ? are you saying that there are 812K reports ?
      - 2 months ago
        
        Anonymous
        
        Allow me explain, I did obviously the most smart thing here in this situation. So for every report there's an ID like if you go to https://blankmediagames.com/Trial/viewReport.php?id=1 it shows the very first report. However every report after the 5th and I think the 3195764th is archived either due them being old but most of them is because of a data breach (classic Josh). So from that number up to the latest report number which is 4015140. So I have 819,376 reports that I can scrape. I have to confess that I'm not sure about this number (3195764) because so far the way I've been going about this is changing the url id in increments of 100's or 10's to find which reports are availabe. I haven't automated this yet, give me a chance.
        
        >812,000 requests
        Holy hell. Have you considered just... asking them to send you the data?
        
        I have but I think they'd say no and begin checking logs...
      - 2 months ago
        
        Anonymous
        
        >most of them is because of a data breach
        *archived, information about when and who was reported can be view via DMing their Discord bot but it can't be seen visting the link directly.
  - 2 months ago
    
    Reply
    
    Anonymous
    
    meds, now
2 months ago

Reply

Anonymous

The text on the notes is base64 encoded in the "data-info" html attribute of the notes
- 2 months ago
  
  Reply
  
  Anonymous
  
  Oh damn, thx anon. Can other information be encoded in other ways in there as well or is just base64?
  - 2 months ago
    
    Reply
    
    Anonymous
    
    No idea, I just took a quick look at the html of the site and noticed the base64 encoding on each of the notes and attempted to decode it. Taking a quick look, there doesn't seem to be anything other than the notes that is base64 encoded at least
2 months ago

Reply

Anonymous

okay, I'd say pay for a vpn that allows multiple devices, setup some VMs in your pc, set different locations on each one and run that python stuff they put earlier, pick a decent wait time, switch IPs everynow and then, I guess it depends how fast you need to get hold of the data. I have done this in the past to override file sharing site limitations, just an idea
- 2 months ago
  
  Reply
  
  Anonymous
  
  what the heck, why would you do that instead of using http proxies?
  - 2 months ago
    
    Reply
    
    Anonymous
    
    >http proxies
    That's a thing? I'm googling this, I'm learning something new everyday on here.
    - 2 months ago
      
      Reply
      
      Anonymous
      
      yep, it lets the script itself use multiple IPs instead of having to set your whole computer to a VPN

Cancel reply