Are there some recommended lists for robots.txt?

Are there some recommended lists for robots.txt? I only want to allow robots that benefit me or the public (eg search engines, universities etc) but block everything that only crawls for their own benefit (eg chatgpt, archive websites).

A Conspiracy Theorist Is Talking Shirt $21.68

DMT Has Friends For Me Shirt $21.68

A Conspiracy Theorist Is Talking Shirt $21.68

  1. 2 weeks ago
    Anonymous

    we're archivists b***h we don't give a FRICK about robots.txt.

    • 2 weeks ago
      Anonymous

      Well obviously I can't enforce it but most corporate bots follow the instructions. I don't want companies like ChatGPT to crawl my site they can go frick themselves.

      • 2 weeks ago
        Anonymous

        >most corporate bots follow the instructions

      • 2 weeks ago
        Anonymous

        >most corporate bots follow the instructions

      • 2 weeks ago
        Anonymous

        >most corporate bots follow the instructions
        you can't seriously believe this anon?

        the only bots following robots.txt are the ones that you actually want to index your site
        the others don't give a shit and spoof their user-agent in the first place

      • 2 weeks ago
        Anonymous

        >most corporate bots follow the instructions

  2. 2 weeks ago
    Anonymous

    >robots.txt
    heh

  3. 2 weeks ago
    Anonymous

    Disallow: *

  4. 2 weeks ago
    Anonymous

    >YEA.. UHM AH AHELLO DEAREST INDEXER BOTTERINOS..!!
    >PLS DO !!NOT!! INDEX THESES SPECIFIC RESOURCES FROM MY WEBPAGE!!
    >ITS HECKIN PRIVATERINO!! ... SO JUST IGNORE NOTHINNG TO SEE HERE!!
    This is what IQfyaca's actually believe.

    • 2 weeks ago
      Anonymous

      You fricking Black person learn how to read. I don't want to suck corporate wiener and give them free money if it doesn't benefit me. I don't care about you cringe script kiddies, cloudflare will take care of them.

      • 2 weeks ago
        Anonymous

        Holy frick you are moronic.

  5. 2 weeks ago
    Anonymous
    • 2 weeks ago
      Anonymous

      you should have edited it to wink at the end

  6. 2 weeks ago
    Anonymous

    i hear chinese crawlers in particular will literally rape your site. just a heads up.

    • 2 weeks ago
      Anonymous

      Yeah. Bytedance don't give a frick about your robots.txt. and they damn near ddos your site

      • 2 weeks ago
        Anonymous

        >bytedance
        What a shitty name. It sounds like the startup names me and my friend would come up with when we were 16 and thought we had genius.

      • 2 weeks ago
        Anonymous

        i don't remember if it was bytedance but i remember it was some chinese shit. i've been told it was partially my fault for not protecting my site enough / setting it up correctly and that MAYBE played a role. but holy fricking shit. i had to pull the plug for a bit.

        • 2 weeks ago
          Anonymous

          did you really have to pull the plug?

      • 2 weeks ago
        Anonymous

        How do you know it's bytedance?

    • 2 weeks ago
      Anonymous

      Yes they do. My site is literally 99% traffic from chinese bots.

      • 2 weeks ago
        Anonymous

        Just block SYN packets with a TTL higher than 128. That takes out most phones which also takes out most bot farms. Also disable IPV6 then block any SYN packets with an MSS other than 1460. That also eliminates some VPN users.

        • 2 weeks ago
          Anonymous

          It's just some small site on a cheap webhoster I can't change anything. I'm not complaining as long as the site is still working it's just sad seeing that 99% of the traffic comes from bots. No wonder big sites are all using cloudflare these days.

          • 2 weeks ago
            Anonymous

            Cloudflare AFAIK does not have controls to do what I suggest whereas this can be done on any little cheap VM. They take the more expensive approach by trying to really know who is a bot and who is not. They get it wrong a lot. I take the fascist approach of just blocking phones and have no regrets.

  7. 2 weeks ago
    Anonymous

    No but lots of crawlers kindly and needfully put their names in the user agent. Maybe you can dynamically load a robots.txt depending on that

    • 2 weeks ago
      Anonymous

      Didn't someone already do that and make a list that I can use?

  8. 2 weeks ago
    Anonymous

    Ah, so you're looking to host your site on the sickdarknet.

  9. 2 weeks ago
    Anonymous

    it's the first thing i look at for when i want to download super secret hacker stuff

  10. 2 weeks ago
    Anonymous

    Any equivalent .txt file I can add to stop black people using my site?

    • 2 weeks ago
      Anonymous

      Blackbots.txt

    • 2 weeks ago
      Anonymous

      IP block Africa and the United States

  11. 2 weeks ago
    Anonymous

    I completely block morons without H2. Don't care, not my problem

  12. 2 weeks ago
    Anonymous

    the fact you are advanced enough in your quest to host a publicly available web service to worry about it being crawled yet don't seem to understand that the only way to prevent it is to not have your website be publicly available is disheartening and frankly saddening.

Your email address will not be published. Required fields are marked *