Why can't cniles understand that using null terminated strings is a bad idea, because the computation must start at the beginning of the string and examine each character, in order, until it reaches the null character. This is highly inefficient when compared to using explicit length field representation, which allows for the compulation to be done in constant time by using a memory reference in ILOC. Why are cniles so bad at programming language design in general?
Why is it the same rust shill posting these threads every day
It's because they don't actually write programs in rust, they just complain all day long about people not using rust.
Once you realize that the people that want you to believe men who cut their dicks off are women, and rust programmers are one and the same sort of people, it will all start making sense.
>use pascal strings
>length prefix gets corrupted because muh cosmic rays
>null termination gets corrupted because muh cosmic rays
>null terminated string corruption
any character needs to be set precisely to 0
0 has to be set to any other value
>pascal type string corruption
length value has be set to any other value
technically 2nd is more resistant to corrupton
There’s nothing preventing you from double or triple terminating c strings
Can’t do the same in pascal
There's also nothing preventing you from making redundant copies of your datums. There's also nothing preventing you from doing both length-prefixed and null-terminated strings if you're that paranoid about muh bitrot.
Pascall strings are just as bad as null terminated to be honest and have many of the same issues like not allowing constant-time slicing.
The length should be next to the pointer, not behind it.
>Determine the length
>Make a subslice
Unicode codepoints are 21 bits, not 32, and this is actually very useful because Rust will use those other bits for tags if it can.
Option<char> has the same size as char in rust because it uses one of the other bits as the tag.
name 1 practical algorithm than becomes constant time when given the length of the string (not "replace 10th character with "b").
Determine the length of a string.
forget that! you can't even slice a string in c without a heap allocation due to null termination
i'm trans btw
>I have piccy wiccies of naked children on my desktop, btw. UwU
unless of course you use a stack allocation instead??
how does any language let you slice strings without allocating unless they use an extra layer of indirection for the buffer?
>computation must start at the beginning of the string
false. it can start at any point
>examine each character
1 check is faster than decrementing a length counter AND checking if remaining length is 0
>allows for the compulation to be done in constant time
both are constant time.
>Why can't cniles understand that using null terminated strings is a bad idea
Anon we solved this problem 40 years ago with std::string. Try to keep up.
Internally null terminated, no zero copy slice. You can get a zero copy view and then you can sometimes have null terminated and non null terminated string for extra fun.
>Internally null terminated
For compatibility purposes. Length operation for std::string is constant time assuming you don't cast down to a const char *.
>can't split it
std::string is the most useless string class ever created. Its so terrible that every C++ programmer ends up making their own.
I agree that null terminated strings should not be the default string type.
Default encoding should be UTF8.
"character" should not be a byte, but an actual character, which would mean a 32bit value due to UTF8mb4.
Strings should be first class citizens, not something magicked from a byte array, which should be its own thing.
Null terminated strings should be a conscious decision. Only used when you know what you're doing.
Of course native support to UTF8 would introduce some pains since a Character could be anything from 1 to 4 bytes.
And since we're at it, adding some QoL like namespaces would be nice.
>"character" should not be a byte, but an actual character, which would mean a 32bit value due to UTF8mb4.
That would be a code point. Also, code points don't always map 1:1 to characters, so it ends up being kind of fricked anyways.
>"character" should not be a byte, but an actual character, which would mean a 32bit value due to UTF8mb4.
That would be really dumb. It would waste lots of space, and a code point is not a character anyway. Where are people getting this misconception? Is it from Rust?
If you're desperate enough that you need an array of unknown characters in utf8, then they must be fat enough to fit the biggest codepoints.
I'm not proposing to store strings as arrays of codepoints though. I want a real string type native to the language.
Again, a code point is not a character. There seems to be this misconception among Rust users and other midwits than you can process Unicode text character-by-character like ASCII, except looping over code points instead of bytes. This is not the case. A given code point can be a small part of a character on screen (characters can be composed from 10+ code points). It may be a right-to-left mark, a zero-width space, or another special thing. Having a code point type, or an array of code points, is actually pretty useless. What you generally want is one of these two:
1) If you're in an embedded environment or otherwise need your code to be very simple and fast, just assume ASCII and refuse to support anything else.
2) If you want correctness and have the computational power for it, you should treat text as an opaque stream of data that you just hand to some Unicode-aware library like the ICU.
I use ASCII everywhere, just swap font bitmap for locales
>uhm ackshually
dumb moron, you are wrong. read the Unicode standard before spouting nonsense. Specifically section "2.4 Code Points and Characters".
I'm not sure what you thought that was supposed to prove. Right on the first page of that section, you have an example of a single character on screen being represented by two code points. That just supports my point that it's useless in most programs to have a data type for a code point or to have an array of code points.
holy shit, try reading the text that describes the figure, moron.
>In other instances, an abstract character may be represented by a sequence of two (or more) other encoded characters.
Also try reading other parts of the standard. Like section "3.4 Characters and Encoding".
>An abstract character does not necessarily correspond to what a user thinks of as a “character” and should not be confused with a grapheme.
>UTF8mb4
you really thought you were going to sound smart by throwing around random mysql specific terminology. pathetic lmao
>Default encoding should be UTF8.
>Strings should be first class citizens
And you should kys asap.
>Why can't cniles... (nocoder rust troon bullshit)
Because it's fricking FAST.
>Why can't cniles understand
I do acknowledge it. I consider nul-terminated strings to be one of the four major design mistakes in C (the other three being errno, global locales, and strict aliasing). It would be way better to handle all strings as (char *begin, char *end) or (char *str, size_t len) - either by passing two arguments manually, or by passing/returning a two-element struct by value. The ABI for function arguments is good enough these days that you can pass 6 of them in registers. And even when they hit memory, they still tend to be fast.
That said, solving the problem isn't as easy as just switching to non-nul-terminated strings, perhaps with the help of a small C library. Think about the OS compatibility. Even if you use better strings in your programs, POSIX APIs and the WinAPI expect nul-terminated strings. Which leaves you with two bad options: either you wrap each OS API function that takes a string in something that copies the string to a new buffer with an extra nul byte, incurring a performance penalty and extra allocations, or you put a protective nul byte at the end of each of your strings, ruining the ability to take zero-copy substrings and creating an extra invariant to be maintained.
So I dislike nul-terminated strings but I still use them because that's what's closest to the API of my OS. If you want everyone to switch to non-nul-terminated strings, then get off IQfy and start lobbying Linux, Windows, and BSD devs to create a second version of every string-using function in their API, one that would use (begin, end) or (begin, length) instead of terminating nuls.
>Why are cniles so bad at programming language design in general?
Man, cut Ritchie some slack. Nobody can see the future, and C got the ability to take and return structs by value after it got strings. Besides, it wasn't obvious that nul-terminated strings would be inferior. Hindsight is 20/20.
Nice post, have a (You)
>Why can't cniles understand that using null terminated strings is a bad idea,
Where are you getting this idea, anon? Everyone understands that it was a bad idea.
>xer hasn't written their own String struct
typedef struct {
unsigned len,
char* data,
} String;
Can't use it with libraries or syscalls. If you're going to use a string library anyway just for the code you control, use sds
Why do you think syscalls need null terminated strings?
man 2 open
Anything that uses paths in an OS that was written in C takes a null terminated string.
And this means you can't use length prefixed strings in your own code because...?
Reading comprehension
>If you're going to use a string library anyway just for the code you control, use sds
You have no point to make.
>Anything that uses paths in an OS that was written in C takes a null terminated string.
AND?
Filenames and by extension file paths can't contain a nul character
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_169
>Filenames and by extension file paths can't contain a nul character
Yeah that's how null term strings work.
That's because null strings were used to implement them.
POSIX filenames can literally be any arbitrary sequence of bytes except they can't contain a slash or a null.
The former is obviously because it's the path separator, the latter because they're null strings.
Would've been find to let them contain slashes though. If anything, it's more inane that they can contain newlines. Would've made more sense to only allow them to contain printable characters.
>Zero width spaces are allowed in filenames
My face.
>Would've made more sense to only allow them to contain printable characters.
POSIX encourages implementations to do that
Yeah, I've so much shell code that essentially assumes no filename will contain a newline and I doubt they will, but I still wish it were possible to say mount filesystems with an option that would make it reject any attempt at creating a filename that contains a newline or something just to be sure.
>sds
I didn't know about that lib but by reading the doc, it's an impressive one.
The idea of adding an header before the actual pointer returned by the string creation is ingenious.
isn't that how most malloc impls work?
There is nothing preventing you from using length-prefixed strings in C. In fact, many libraries are available.
How large should this prefix field be?
Where should it be located?
but even if you know the length of the array you will have to go one by one and keep count. null terminated sounds more efficient
>cniles
if you cant speak without using idiotic terms then your opinion is probably meaningless
>computation must start at the beginning of the string and examine each character, in order, until it reaches the null character
That's not how that works at all, no. Also real programs don't do much if any string operations.
>That's not how that works at all
Yes it is, nocoder.
Getting rid of null terminated strings is almost never worth it. Recently I converted one of my applications to pointer/length, and then realized the code was much worse and rolled back the changes.
skill issue
struct string {
size_t size;
char * s;
}
>Verification not required.
None of this matters.
Welcome to Electron HQ, you are hired
>just learned what null terminated strings are thread
people just pointer + length when it matters in real world code
>pointer
*segfaults*
wow you're pretty bad at programming
So don't use null terminated strings. C does not force it on you. A "string" is just an array of bytes. Make a struct and do your own string.
>but muh standard library!!!
Don't use it or convert to/from on the rare occasion that you do.
In this day and age where ram is cheap, allocating an extra byte for null termination so you can have backwards compatibility with syscalls and the standard library is not a big deal. Just do both. Keep a reference to length and null terminate.
they are bad but C is still better than everything else out there
Mr. God himself used null-terminated strings when intelligently designing the genetic code. If it's good enough for the Big Man, it's good enough for little ol' me.
is that why DNA is 90% junk code?
>Rustroon cosplays as a geneticist
It's honestly actually bizarre that cells actually function and have for so long.
They're such a bizarre analog nanomachine inside that doesn't work well at all but somehow skreaks along, like that there are hundreds of transcription errors with every cell division.
It's really quite amazing how that Ruth Goldberg machine can actually work.
>Ruth Goldberg
Rube Goldberg, you transmonger
cells are thermodynamic machines, not logic machines, so a few information processing mishaps is not the end of the world as long as the stable state is maintained
>God was a Cking
Rustgays will never have this
it ends up not mattering for utf8 because to parse utf8 you need to scan each letter one by one, checking if the letter is null is just as fast as checking if the cursor reached the end.
most functions related to performance in C already take size as a separate parameter (like reading / writing a file).
it turns out that there aren't many situations where you can use the size of the string for optimizations (because of utf8 / formatting) except when you are just copying the memory (reading / writing a file), or comparing strings (strcmp is for comparing null terminated strings, convenient because you only have 2 parameters, if the strings are the same size you can use memcmp).
>muh nullterm strings are slow
haha vpcmpeqb and vptest go burrrr
If you handle arbitrary data, there is no way to know the length of an array ahead of time.
What? Why not?
I wonder why rustroon always treat their own language like it's some kind of E.coli
Does it remind them of their axe wound?
Is their goal to infect everything and dies out?