Why can't cniles understand that using null terminated strings is a bad idea, because the computation must start at the beginning of the string a...

Posted on April 29, 2024 by Anon

Why can't cniles understand that using null terminated strings is a bad idea, because the computation must start at the beginning of the string and examine each character, in order, until it reaches the null character. This is highly inefficient when compared to using explicit length field representation, which allows for the compulation to be done in constant time by using a memory reference in ILOC. Why are cniles so bad at programming language design in general?

Unattended Children Pitbull Club Shirt $21.68

UFOs Are A Psyop Shirt $21.68

Unattended Children Pitbull Club Shirt $21.68

3 weeks ago

Reply

sage

Why is it the same rust shill posting these threads every day
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  It's because they don't actually write programs in rust, they just complain all day long about people not using rust.
  Once you realize that the people that want you to believe men who cut their dicks off are women, and rust programmers are one and the same sort of people, it will all start making sense.
3 weeks ago

Reply

Anonymous

>use pascal strings
>length prefix gets corrupted because muh cosmic rays
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >null termination gets corrupted because muh cosmic rays
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >null termination gets corrupted because muh cosmic rays
  
  >null terminated string corruption
  any character needs to be set precisely to 0
  0 has to be set to any other value
  
  >pascal type string corruption
  length value has be set to any other value
  
  technically 2nd is more resistant to corrupton
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    There’s nothing preventing you from double or triple terminating c strings
    
    Can’t do the same in pascal
    - 3 weeks ago
      
      Reply
      
      Anonymous
      
      There's also nothing preventing you from making redundant copies of your datums. There's also nothing preventing you from doing both length-prefixed and null-terminated strings if you're that paranoid about muh bitrot.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  Pascall strings are just as bad as null terminated to be honest and have many of the same issues like not allowing constant-time slicing.
  The length should be next to the pointer, not behind it.
  
  name 1 practical algorithm than becomes constant time when given the length of the string (not "replace 10th character with "b").
  
  >Determine the length
  >Make a subslice
  
  I agree that null terminated strings should not be the default string type.
  Default encoding should be UTF8.
  "character" should not be a byte, but an actual character, which would mean a 32bit value due to UTF8mb4.
  
  Strings should be first class citizens, not something magicked from a byte array, which should be its own thing.
  
  Null terminated strings should be a conscious decision. Only used when you know what you're doing.
  
  Of course native support to UTF8 would introduce some pains since a Character could be anything from 1 to 4 bytes.
  
  And since we're at it, adding some QoL like namespaces would be nice.
  
  Unicode codepoints are 21 bits, not 32, and this is actually very useful because Rust will use those other bits for tags if it can.
  Option<char> has the same size as char in rust because it uses one of the other bits as the tag.
3 weeks ago

Reply

Anonymous

name 1 practical algorithm than becomes constant time when given the length of the string (not "replace 10th character with "b").
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  Determine the length of a string.
3 weeks ago

Reply

Anonymous

forget that! you can't even slice a string in c without a heap allocation due to null termination
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  i'm trans btw
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >I have piccy wiccies of naked children on my desktop, btw. UwU
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  unless of course you use a stack allocation instead??
  how does any language let you slice strings without allocating unless they use an extra layer of indirection for the buffer?
3 weeks ago

Reply

Anonymous

>computation must start at the beginning of the string
false. it can start at any point

>examine each character
1 check is faster than decrementing a length counter AND checking if remaining length is 0

>allows for the compulation to be done in constant time
both are constant time.
3 weeks ago

Reply

Anonymous

>Why can't cniles understand that using null terminated strings is a bad idea
Anon we solved this problem 40 years ago with std::string. Try to keep up.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  Internally null terminated, no zero copy slice. You can get a zero copy view and then you can sometimes have null terminated and non null terminated string for extra fun.
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    >Internally null terminated
    For compatibility purposes. Length operation for std::string is constant time assuming you don't cast down to a const char *.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >can't split it
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  std::string is the most useless string class ever created. Its so terrible that every C++ programmer ends up making their own.
3 weeks ago

Reply

Anonymous

I agree that null terminated strings should not be the default string type.
Default encoding should be UTF8.
"character" should not be a byte, but an actual character, which would mean a 32bit value due to UTF8mb4.

Strings should be first class citizens, not something magicked from a byte array, which should be its own thing.

Null terminated strings should be a conscious decision. Only used when you know what you're doing.

Of course native support to UTF8 would introduce some pains since a Character could be anything from 1 to 4 bytes.

And since we're at it, adding some QoL like namespaces would be nice.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >"character" should not be a byte, but an actual character, which would mean a 32bit value due to UTF8mb4.
  That would be a code point. Also, code points don't always map 1:1 to characters, so it ends up being kind of fricked anyways.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >"character" should not be a byte, but an actual character, which would mean a 32bit value due to UTF8mb4.
  That would be really dumb. It would waste lots of space, and a code point is not a character anyway. Where are people getting this misconception? Is it from Rust?
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    If you're desperate enough that you need an array of unknown characters in utf8, then they must be fat enough to fit the biggest codepoints.
    
    I'm not proposing to store strings as arrays of codepoints though. I want a real string type native to the language.
    - 3 weeks ago
      
      Reply
      
      Anonymous
      
      Again, a code point is not a character. There seems to be this misconception among Rust users and other midwits than you can process Unicode text character-by-character like ASCII, except looping over code points instead of bytes. This is not the case. A given code point can be a small part of a character on screen (characters can be composed from 10+ code points). It may be a right-to-left mark, a zero-width space, or another special thing. Having a code point type, or an array of code points, is actually pretty useless. What you generally want is one of these two:
      1) If you're in an embedded environment or otherwise need your code to be very simple and fast, just assume ASCII and refuse to support anything else.
      2) If you want correctness and have the computational power for it, you should treat text as an opaque stream of data that you just hand to some Unicode-aware library like the ICU.
      - 3 weeks ago
        
        Anonymous
        
        I use ASCII everywhere, just swap font bitmap for locales
      - 3 weeks ago
        
        Anonymous
        
        >uhm ackshually
        dumb moron, you are wrong. read the Unicode standard before spouting nonsense. Specifically section "2.4 Code Points and Characters".
      - 3 weeks ago
        
        Anonymous
        
        I'm not sure what you thought that was supposed to prove. Right on the first page of that section, you have an example of a single character on screen being represented by two code points. That just supports my point that it's useless in most programs to have a data type for a code point or to have an array of code points.
      - 3 weeks ago
        
        Anonymous
        
        holy shit, try reading the text that describes the figure, moron.
        >In other instances, an abstract character may be represented by a sequence of two (or more) other encoded characters.
        Also try reading other parts of the standard. Like section "3.4 Characters and Encoding".
        >An abstract character does not necessarily correspond to what a user thinks of as a “character” and should not be confused with a grapheme.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >UTF8mb4
  you really thought you were going to sound smart by throwing around random mysql specific terminology. pathetic lmao
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >Default encoding should be UTF8.
  >Strings should be first class citizens
  And you should kys asap.
3 weeks ago

Reply

Anonymous

>Why can't cniles... (nocoder rust troon bullshit)
3 weeks ago

Reply

Anonymous

Because it's fricking FAST.
3 weeks ago

Reply

Anonymous

>Why can't cniles understand
I do acknowledge it. I consider nul-terminated strings to be one of the four major design mistakes in C (the other three being errno, global locales, and strict aliasing). It would be way better to handle all strings as (char *begin, char *end) or (char *str, size_t len) - either by passing two arguments manually, or by passing/returning a two-element struct by value. The ABI for function arguments is good enough these days that you can pass 6 of them in registers. And even when they hit memory, they still tend to be fast.
That said, solving the problem isn't as easy as just switching to non-nul-terminated strings, perhaps with the help of a small C library. Think about the OS compatibility. Even if you use better strings in your programs, POSIX APIs and the WinAPI expect nul-terminated strings. Which leaves you with two bad options: either you wrap each OS API function that takes a string in something that copies the string to a new buffer with an extra nul byte, incurring a performance penalty and extra allocations, or you put a protective nul byte at the end of each of your strings, ruining the ability to take zero-copy substrings and creating an extra invariant to be maintained.
So I dislike nul-terminated strings but I still use them because that's what's closest to the API of my OS. If you want everyone to switch to non-nul-terminated strings, then get off IQfy and start lobbying Linux, Windows, and BSD devs to create a second version of every string-using function in their API, one that would use (begin, end) or (begin, length) instead of terminating nuls.
>Why are cniles so bad at programming language design in general?
Man, cut Ritchie some slack. Nobody can see the future, and C got the ability to take and return structs by value after it got strings. Besides, it wasn't obvious that nul-terminated strings would be inferior. Hindsight is 20/20.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  Nice post, have a (You)
3 weeks ago

Reply

Anonymous

>Why can't cniles understand that using null terminated strings is a bad idea,
Where are you getting this idea, anon? Everyone understands that it was a bad idea.
3 weeks ago

Reply

Anonymous

>xer hasn't written their own String struct
typedef struct {
unsigned len,
char* data,
} String;
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  Can't use it with libraries or syscalls. If you're going to use a string library anyway just for the code you control, use sds
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    Why do you think syscalls need null terminated strings?
    - 3 weeks ago
      
      Reply
      
      Anonymous
      
      man 2 open
      Anything that uses paths in an OS that was written in C takes a null terminated string.
      - 3 weeks ago
        
        Anonymous
        
        And this means you can't use length prefixed strings in your own code because...?
      - 3 weeks ago
        
        Anonymous
        
        Can't use it with libraries or syscalls. If you're going to use a string library anyway just for the code you control, use sds
        
        Reading comprehension
        >If you're going to use a string library anyway just for the code you control, use sds
      - 3 weeks ago
        
        Anonymous
        
        You have no point to make.
      - 3 weeks ago
        
        Anonymous
        
        >Anything that uses paths in an OS that was written in C takes a null terminated string.
        AND?
        Filenames and by extension file paths can't contain a nul character
        https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_169
      - 3 weeks ago
        
        Anonymous
        
        >Filenames and by extension file paths can't contain a nul character
        Yeah that's how null term strings work.
      - 3 weeks ago
        
        Anonymous
        
        That's because null strings were used to implement them.
        POSIX filenames can literally be any arbitrary sequence of bytes except they can't contain a slash or a null.
        The former is obviously because it's the path separator, the latter because they're null strings.
        
        Would've been find to let them contain slashes though. If anything, it's more inane that they can contain newlines. Would've made more sense to only allow them to contain printable characters.
        >Zero width spaces are allowed in filenames
        My face.
      - 3 weeks ago
        
        Anonymous
        
        >Would've made more sense to only allow them to contain printable characters.
        POSIX encourages implementations to do that
      - 3 weeks ago
        
        Anonymous
        
        Yeah, I've so much shell code that essentially assumes no filename will contain a newline and I doubt they will, but I still wish it were possible to say mount filesystems with an option that would make it reject any attempt at creating a filename that contains a newline or something just to be sure.
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    >sds
    I didn't know about that lib but by reading the doc, it's an impressive one.
    The idea of adding an header before the actual pointer returned by the string creation is ingenious.
    - 3 weeks ago
      
      Reply
      
      Anonymous
      
      isn't that how most malloc impls work?
3 weeks ago

Reply

Anonymous

There is nothing preventing you from using length-prefixed strings in C. In fact, many libraries are available.
3 weeks ago

Reply

Anonymous

How large should this prefix field be?
Where should it be located?
3 weeks ago

Reply

Anonymous

but even if you know the length of the array you will have to go one by one and keep count. null terminated sounds more efficient
3 weeks ago

Reply

Anonymous

>cniles
if you cant speak without using idiotic terms then your opinion is probably meaningless
3 weeks ago

Reply

Anonymous

>computation must start at the beginning of the string and examine each character, in order, until it reaches the null character
That's not how that works at all, no. Also real programs don't do much if any string operations.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >That's not how that works at all
  Yes it is, nocoder.
3 weeks ago

Reply

Anonymous

Getting rid of null terminated strings is almost never worth it. Recently I converted one of my applications to pointer/length, and then realized the code was much worse and rolled back the changes.
3 weeks ago

Reply

Anonymous

skill issue
struct string {
size_t size;
char * s;
}

>Verification not required.
3 weeks ago

Reply

Anonymous

None of this matters.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  Welcome to Electron HQ, you are hired
3 weeks ago

Reply

Anonymous

>just learned what null terminated strings are thread
people just pointer + length when it matters in real world code
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >pointer
  *segfaults*
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    wow you're pretty bad at programming
3 weeks ago

Reply

Anonymous

So don't use null terminated strings. C does not force it on you. A "string" is just an array of bytes. Make a struct and do your own string.
>but muh standard library!!!
Don't use it or convert to/from on the rare occasion that you do.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  In this day and age where ram is cheap, allocating an extra byte for null termination so you can have backwards compatibility with syscalls and the standard library is not a big deal. Just do both. Keep a reference to length and null terminate.
3 weeks ago

Reply

Anonymous

they are bad but C is still better than everything else out there
3 weeks ago

Reply

Anonymous

Mr. God himself used null-terminated strings when intelligently designing the genetic code. If it's good enough for the Big Man, it's good enough for little ol' me.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  is that why DNA is 90% junk code?
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    >Rustroon cosplays as a geneticist
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  It's honestly actually bizarre that cells actually function and have for so long.
  They're such a bizarre analog nanomachine inside that doesn't work well at all but somehow skreaks along, like that there are hundreds of transcription errors with every cell division.
  
  It's really quite amazing how that Ruth Goldberg machine can actually work.
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    >Ruth Goldberg
    Rube Goldberg, you transmonger
  - 3 weeks ago
    
    Reply
    
    Anonymous
    
    cells are thermodynamic machines, not logic machines, so a few information processing mishaps is not the end of the world as long as the stable state is maintained
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  >God was a Cking
  Rustgays will never have this
3 weeks ago

Reply

Anonymous

it ends up not mattering for utf8 because to parse utf8 you need to scan each letter one by one, checking if the letter is null is just as fast as checking if the cursor reached the end.
most functions related to performance in C already take size as a separate parameter (like reading / writing a file).
it turns out that there aren't many situations where you can use the size of the string for optimizations (because of utf8 / formatting) except when you are just copying the memory (reading / writing a file), or comparing strings (strcmp is for comparing null terminated strings, convenient because you only have 2 parameters, if the strings are the same size you can use memcmp).
3 weeks ago

Reply

Anonymous

>muh nullterm strings are slow
haha vpcmpeqb and vptest go burrrr
3 weeks ago

Reply

Anonymous

If you handle arbitrary data, there is no way to know the length of an array ahead of time.
- 3 weeks ago
  
  Reply
  
  Anonymous
  
  What? Why not?
3 weeks ago

Reply

Anonymous

I wonder why rustroon always treat their own language like it's some kind of E.coli
Does it remind them of their axe wound?
Is their goal to infect everything and dies out?

Cancel reply