Are there some recommended lists for robots.txt?

Are there some recommended lists for robots.txt? I only want to allow robots that benefit me or the public (eg search engines, universities etc) but block everything that only crawls for their own benefit (eg chatgpt, archive websites).

Thalidomide Vintage Ad Shirt $22.14

Nothing Ever Happens Shirt $21.68

Thalidomide Vintage Ad Shirt $22.14

  1. 1 month ago
    Anonymous

    we're archivists b***h we don't give a FRICK about robots.txt.

    • 1 month ago
      Anonymous

      Well obviously I can't enforce it but most corporate bots follow the instructions. I don't want companies like ChatGPT to crawl my site they can go frick themselves.

      • 1 month ago
        Anonymous

        >most corporate bots follow the instructions

      • 1 month ago
        Anonymous

        >most corporate bots follow the instructions

      • 1 month ago
        Anonymous

        >most corporate bots follow the instructions
        you can't seriously believe this anon?

        the only bots following robots.txt are the ones that you actually want to index your site
        the others don't give a shit and spoof their user-agent in the first place

      • 1 month ago
        Anonymous

        >most corporate bots follow the instructions

  2. 1 month ago
    Anonymous

    >robots.txt
    heh

  3. 1 month ago
    Anonymous

    Disallow: *

  4. 1 month ago
    Anonymous

    >YEA.. UHM AH AHELLO DEAREST INDEXER BOTTERINOS..!!
    >PLS DO !!NOT!! INDEX THESES SPECIFIC RESOURCES FROM MY WEBPAGE!!
    >ITS HECKIN PRIVATERINO!! ... SO JUST IGNORE NOTHINNG TO SEE HERE!!
    This is what IQfyaca's actually believe.

    • 1 month ago
      Anonymous

      You fricking Black person learn how to read. I don't want to suck corporate wiener and give them free money if it doesn't benefit me. I don't care about you cringe script kiddies, cloudflare will take care of them.

      • 1 month ago
        Anonymous

        Holy frick you are moronic.

  5. 1 month ago
    Anonymous
    • 1 month ago
      Anonymous

      you should have edited it to wink at the end

  6. 1 month ago
    Anonymous

    i hear chinese crawlers in particular will literally rape your site. just a heads up.

    • 1 month ago
      Anonymous

      Yeah. Bytedance don't give a frick about your robots.txt. and they damn near ddos your site

      • 1 month ago
        Anonymous

        >bytedance
        What a shitty name. It sounds like the startup names me and my friend would come up with when we were 16 and thought we had genius.

      • 1 month ago
        Anonymous

        i don't remember if it was bytedance but i remember it was some chinese shit. i've been told it was partially my fault for not protecting my site enough / setting it up correctly and that MAYBE played a role. but holy fricking shit. i had to pull the plug for a bit.

        • 1 month ago
          Anonymous

          did you really have to pull the plug?

      • 1 month ago
        Anonymous

        How do you know it's bytedance?

    • 1 month ago
      Anonymous

      Yes they do. My site is literally 99% traffic from chinese bots.

      • 1 month ago
        Anonymous

        Just block SYN packets with a TTL higher than 128. That takes out most phones which also takes out most bot farms. Also disable IPV6 then block any SYN packets with an MSS other than 1460. That also eliminates some VPN users.

        • 1 month ago
          Anonymous

          It's just some small site on a cheap webhoster I can't change anything. I'm not complaining as long as the site is still working it's just sad seeing that 99% of the traffic comes from bots. No wonder big sites are all using cloudflare these days.

          • 1 month ago
            Anonymous

            Cloudflare AFAIK does not have controls to do what I suggest whereas this can be done on any little cheap VM. They take the more expensive approach by trying to really know who is a bot and who is not. They get it wrong a lot. I take the fascist approach of just blocking phones and have no regrets.

  7. 1 month ago
    Anonymous

    No but lots of crawlers kindly and needfully put their names in the user agent. Maybe you can dynamically load a robots.txt depending on that

    • 1 month ago
      Anonymous

      Didn't someone already do that and make a list that I can use?

  8. 1 month ago
    Anonymous

    Ah, so you're looking to host your site on the sickdarknet.

  9. 1 month ago
    Anonymous

    it's the first thing i look at for when i want to download super secret hacker stuff

  10. 1 month ago
    Anonymous

    Any equivalent .txt file I can add to stop black people using my site?

    • 1 month ago
      Anonymous

      Blackbots.txt

    • 1 month ago
      Anonymous

      IP block Africa and the United States

  11. 1 month ago
    Anonymous

    I completely block morons without H2. Don't care, not my problem

  12. 1 month ago
    Anonymous

    the fact you are advanced enough in your quest to host a publicly available web service to worry about it being crawled yet don't seem to understand that the only way to prevent it is to not have your website be publicly available is disheartening and frankly saddening.

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *