Like the video? Check out these C programming playlists with hundreds of videos! 🙂 C Programming Examples: ua-cam.com/play/PLA1FTfKBAEX6dPcQitk_7uL3OwDdjMn90.html C Programming Tutorials: ua-cam.com/play/PLA1FTfKBAEX4hblYoH6mnq0zsie2w6Wif.html
Is it perhaps that s1[] = "abcdef" results in the string being on the stack and s2* = "abcdef" has it defined in program memory space or as a const in heap space (depending on the compiler?) and being const in both cases is unmodifiable. Also you can do a cast of s1[] by saying (char *)s1++ and do the same as if it was declared originally as char * s1. There's also: s1[]= "hello"; foobar(void) { } where s1 isn't on the stack, instead on the heap. It would be interesting to check whether doing: char s2 * = "Hello"; would place the string in heap space or const space. I've just tried this and looked at what my compiler does with it and in the s1 case as expected I get: s1 char [6] 0x20000548 where we can see that s1 is an array pointer stored at address 0x20000548 (RAM) that has a length six and s2 char * 0x10002710 "Hello" is a char pointer pointing to a literal stored at 0x10002710 (ROM). So you can see we do know where it's stored, the only thing we don't know without checking is whether it's a const in RAM (heap space) or const in ROM both of which are non modifiable. BTW, because s2 is declared immediately after s1 the map file tells me that it is in RAM location 0x20000550
32-bit Systems: On a 32-bit architecture, the address space is limited to 2^32 addresses, which means that the maximum addressable memory is 4 GB (2^32 bytes). To represent any memory address in this space, you need 4 bytes (32 bits). Therefore, the size of a pointer on a 32-bit system is typically 4 bytes. 64-bit Systems: On a 64-bit architecture, the address space is significantly larger, allowing for 2^64 addresses, which translates to a theoretical maximum of 16 exabytes of addressable memory (2^64 bytes). To represent any memory address in this vast space, you need 8 bytes (64 bits). Therefore, the size of a pointer on a 64-bit system is typically 8 bytes.
char *s is precompiled in .text while char s[] is allocated in .data section so if you try to modify *s like s[0] = xxx, it will usually cause a sigseg since you're modifying .text section which is read only
The Value of the pointer is stored in .text not the literal itself, .text would have a pointer value pointing to a region in memory (that region is not decided by your code rather by the os ) , his explanation is correct
I like how old school tutorials in C programming explained it "A string literal, similar to an integer literal, has it’s memory automatically managed by the compiler for you! With an integer, i.e. a fixed size piece of data, the compiler can pretty easily manage it. But strings are a variable-byte beast which the compiler tames by tossing into a chunk of memory, and giving you a pointer to it. This form points to wherever that string was placed. Typically, that place is in a land faraway from the rest of your program’s memory - read-only memory - for reasons related to performance & safety."
A little bit more in depth explanation: - In C string literals ( aka double quoted strings ) if not used as initializers for char arrays are staticilly allocated, that means they live in the .text / .rodata section and under most operating systems thats read only ( aka no self modifying code for you ;). - char s1[] and char *s2 are different types, s1 can be used in functions expecting a char* parameter because of some evil compiler backwards compatibility magic which automatically converts any function(s1) -> function(&s1[0]) - Both s1 and s2 in the example is stack allocated, but while s1 is an array of chars all values are also stored on the stack, while s2 is a char pointer and only the pointer variable is stack allocated the value it points to isnt. Its also quite interesting to look at the assembly your compiler generates and learn from that ;)
It's not compatibility magic, it's called pointer decaying, you can't pass a whole array to you can't pass a whole array by value in c, instead you pass a pointer to the first element and the size of the array.
I suspected it was a pointer to globally static memory, makes sense it's read-only, otherwise every call to the fuction modifying that memody would get/produce different results. Thanks for the clarification.
Excellent explanation! I just went through a string section in a C course I am taking and completely misunderstood the content. Even though I was able to stumble through a couple projects, I really did not understand what a "string literal" was and how it was different from a character array defined as char[]="some string". This cleared it up perfectly. You have a rare gift in instruction for the C language and I feel lucky to have found Porfolio Courses. Judging by the other comments, I hope you will continue this great work.
Great video. It's been a couple decades since I've done any C programming and I've forgotten some details like this since my programming focus has shifted considerably to functional programming, Python, and C++.
Good video, but I like to also add that char msg[] ="hello"; is just shorthand for char msg[] ={'h', 'e', 'l', 'l', 'o', '\0'}; There are no such things as strings in C. Only char arrays or character arrays. And the address of first character of the char array is fixed , it is a constant pointer, &msg[0] and msg mean the same thing. And msg and msg + 0 is also the same , you can't change the base address of the array, it is suck in memory. It makes sense , lets say you had this : char s[]="car"; then decided to change it to s = "abcdefghijklmnopqrstuvwxyz123456789134566778889abcdefghijklmnopqrstuvwxyz" ; The compiler would have to find a large amount of memory for this new change in size. And there might not be enough memory for this new sudden change in size. Therefore a array's starting address being fixed is a good safeguard. Therefore the base address of any array in C is fixed, it is a constant. This applys to all arrays in C , for example: int a[]={1,2,3}; int b[={4,5,6}; you can't do this a= b; ////// same as a= &b[0] ; // because you are changing the starting or base address of the integer array called a . The address of a is fixed, and will remain fixed for the entire program. But I can do this , int *p = a; ///// int *p =&a[0]; then later I can change the int pointer to point to array b p = b; ////// because p is a pointer variable that can point to any variable or any array as long as it has the same data type , as the variable or array , that it is pointing, or refering to. Pointer variables can point to any variable of the same type. Again as I said earlier , char arrays or C-strings are character arrays only , there is no such thing as a string in C , they are not String objects like Java , or String objects like C++ or C# .
Wow thanks so much for sharing all of this Rick! For anyone curious, some of this gets covered in other tutorials videos in the C programming tutorial playlist on this channel that cover strings, pointers, and arrays. And I get what you're saying regarding "C not really having strings", and I think you mostly clarified it at the end of your own comment. But just to be clear, it's inaccurate to say that "there are no such things as strings in C". In C there is no string *type* like some languages such as Java, but the language definitely has strings. In C the term "string" means a sequence of characters in memory ending with a null terminator, whether it's on the stack, the heap, or wherever the compiler being used stores a string literal. The official C standards make reference to strings and that's the term to use when referring to them: www.open-std.org/jtc1/sc22/wg14/www/projects#9899.
...partly incorrect... Char *text="abcdef" is a pointer to a memory adress containing a string(a sequence of ints, ending with a string end integer Char Text[]="abcdef" is a pointer to a string object, still in a speciffic memory adress ofc, with the difference it also has an table with functions that can be performed on that memory adress... eg it is a STRING_OBJECT , (a table saying this object at this adress is a string and the listed functions can be performed on it, u can add or overwrite functions that can be performed on objects, usually a bad ide overwrite them, just extend if needed ) ...u can point at any memory adress and read it as a string..but its not a string object... eg the compiler has no ide what function is allowed to run on that memory adress... ex. Char Text[]="abcdef" print Text , (will print abcdef) Char *ptr_text=&Text *ptr_text=8 print ptr_text (will output the first charater after the string as a sequence of character until it reaches a end string int (out of bounds)... it will otput som part of memory as characters.
@@Patrik6920 wdym by "Char Text[]="abcdef" is a pointer to a string object, ..., with the difference it also has an table with functions that can be performed on that memory adress... eg it is a STRING_OBJECT"? There is no such "string object" in ANSI C standard library. char text[] declares a char array, that's what it is. It's no way a "STRING_OBJECT" in C, neither a way to declare a "std::string" object in c++. The "table of functions" is even more ridiculous, C is not OOP native and there is no data structure having bound functions (methods). Explicitly adding function pointers to data structure is another story but it's neither something you can do with declaring only "char text[]". Everything in your original comment below the above line quote is hard to understand or difficult to follow. I'm guessing you are either mixing C with some other languages or regarding some custom/3rd-party string libraries as standard C.
Not an accurate statement. A C string is a null terminated array of characters. A true char array is just a sequence of bytes, it is not null terminated and will not work with routines that expect the terminating null unless you just get lucky and have put a null byte in the array. Like other arrays a true char array must have the length metadata tracked separate from the contained data.
@@atussentinel when u create a string object thers functions that can be used on it, those functions is described (it isent some random data), when u compile a piece of code, and u try to use charachter functions on an int for example the compiler is going to realise it.. ...whats associatet with what int, char, float is described in a function table with that data type structure...
In case of embedded devices, and so I suppose in desktop application case also: char *s = "abcdef"; "abcdef" goes together with other constant variables to RO (Read Only) section of memory, that is a part of FLASH (CODE + RO). s pointer goes together with other non-constant global/static variables to RW (Read/Write) section of memory, that is a part of RAM (RW + ZI (Zero-Initialized)) (technically it placed in FLASH, but loads to RAM during initialization). char s[] = "abcdef"; "abcdef" goes to RW section ("s[]" goes nowhere, it kinda does not exist unlike "*s", that is specifically variable type to store address pointing to another location, because all address calls to "s[]" are stored in code commands and cannot be modified and it's the reason why you can't reassign "s[]" to another string later in the code). That's why if you want to save RAM space when using a lot of string arrays those are not gonna be modified, you must declare them "const char s[]" so their values could be placed in RO section (in this case it may look the same as "char *s", but there are still differences in compilation and how it processed by different functions like sizeof(), because array will have data type "char []" not a pointer type). This explains why you get that access error trying to modify value that is placed in RO section.
Which compiler or flags are you using for this? I'm not a professional C programmer, but I figured your first example would require a const declaration? I'm very likely missing something major, hoping you can help me :)
I agree with all but the s[] part. It is a variable that refers to the starting address of the array and it should be observable in a map file with the associated size to it. Furthermore, there's a difference between initialization (compile-time) vs assignment (run-time). You can't re-initialize a variable but you can reassign a data at each element in s[] and ultimately change out the whole string even sizes during run-time.
One is a string, initialised with abcdef and the other is a pointer to part of the program itself, 6 bytes representing abcdef. When I was writing embedded process control software a million years ago, the first version would be 6 bytes of RAM and the second would be a pointer in RAM pointing at a string of 6 bytes in ROM.
yeah that's what I also think so, that contents of s1 allocated in program stack while s2 allocated in .text (which is read only and cannot be modified without something like VirtualProtect() in windows)
I’m still learning C and taking an Embedded systems class. I wrote a char * array when transmitting over UART because google said to, while my professor said to traverse over the array instead. When I asked him to help debug my code he was blow away that I used the more efficient method, and so was I lmao
Yep, depends on your compiler. I've lived with this over the years I've been working with C on embedded systems. It's one reason why fixing your projects code development to a particular version of your compiler tool chain, AND the optimisations you use, is very important.
It also depends on the execution environment. A segmentation fault requires a memory protection unit. When working with a flat physical memory, typical in embedded systems, the two approaches will behave identically.
The compiler may or may not allow you to modify it according to the C standard. In the C standard it's something called "undefined behaviour", there are other undefined behaviours. The compiler could produce code that allows you to modify the string literal, and that would still be a valid C program.
@@PortfolioCourses The language definition says that the behaviour is undefined in order that implementations may put the string in protected memory if such a thing exists on the target platform. The compiler typically needs to do nothing to either allow you to or prevent you from accessing the memory and as @okaro6595 says it is normally the operating system/hardware that blocks the access. It is also true that DOS compilers used to allow this sort of access back in the day since no protection mechanism existed in the hardware. So you may find that these things make sense to you once you understand the reasons that the standard declares the behaviour as undefined.
@@dibblethwaite I appreciate and understand what you're saying, my only point is that because the language definition says that, compiler makers can ultimately do what they like, e.g. they could even put the data for string literals on the stack. Now of course practically, why would they? But again my only point is to state the fact that the language standard lists the behaviour as undefined, so we can't assume. Again though I do appreciate the discussion on underlying reasons "why".
I have watched many videos teaching what pointer is and how to use, but only this one can make me truly understand how to use pointer in a few minutes. This must be a good day for me, as I have learnt thing from a genius.
You should be able to define in the linker in which segment the literal is stored and what attributes it have. In GCC __attribute__ ((section „.rwdata“)) would work as well.
2nd declares a pointer and makes it point at character constant {"abcd"} (which is type of "const char *" not char* (one most likely gets compiler warning about mixing both). And segfault wile executing.
As far as I have experimented, it gave a segfault. The teacher explained that in assembly it is placed in the .text section (which only has read permission) and the pointer just gets assigned the begining address of that string. Though, I haven't tried it on other architectures than x86.
The definition of undefined means it is stupid to use even if it appears to work. Way back in the era of 8086 and 8088, there were compilers where this worked - sort of. This was before x86 had much in the way of process security. The gotcha is that that string literal would change for EVERY instance of the original string. I think it was an early Microsoft compiler, but it could have been some other compiler.
@@johnmcleodvii Well, it actually makes sense that when you modify one of the instances of a string constant all of them will modify the same way because it is usually the same string constant being referenced by all of its instances in order to save memory.
The s2 remains stored in the binary of the program while s1 is stored as a copy on "the stack". Some operating systems will allow you to change bytes between executable code, like cc65-built code for Atari 8 Bit, Apple II and C64, while other, more secure operating systems will deny access to it. The same will happen in any other programming language (look up "self modifying code"). This may be one reason why the C standard is not specially forbidding modifying s2, but operating systems do.
I did C and later C++ for over a decade and was pretty close to giving up programming. Writing business software, I was so frustrated by the tedium of C and C++. Keeping .h files in sync with .cpp files, managing pointers, and my biggest frustration: dealing with strings. It's just so tedious in C and C++. Fortunately, right around that time, C# showed up and strings were as easy to work with as any other data type. Memory management became a distant memory and no more header files. I don't miss that stuff at all.
Try Rust too (no GC), it’s kinda cool and deals with tedious and dangerous memory management (also multithreading) issues by design. Of course it has a “dark side” (unsafe code) but at least tries to deal with such stuff in a clear way. My problem with C/C++ is that it allows shooting yourself easily in the foot (as Stroustrup said) and even proud of it… Idk why it’s OK to be openly dangerous after decades, we are not in the 1980s anymore.
0:38 Note: a string literal is not guaranteed to have a closing null character! If you write char s[6] = "abcdef"; there will not be a null terminator.
When I'm building code with a new toolchain I am always looking at where that tool is putting my string literals (through the map file) just to make sure I understand what it's assumptions are. And these assumptions vary with optimisation options
Theoretically if this pointer to a string literal would be in a function that gets called multiple times the pointer could always be the same. In the output the string literal itself would probably be defined before the function as a "constant" and the pointer would always be to that constant.
This is why I always write my pointer decls as "char* s2" rather than "char *s2". The "pointerness" is part of the type, not the variable. And writing it my way makes it more obvious.
I've usually done it as char **s2 because that's the standard code bases I've worked on have used, but I completely get what you're saying, char** s2 does make it clearer that it's "a pointer to a char" in that sense.
Using stars in UA-cam comments messed up what I was saying and then when I fixed it, it still bolded it. I'll have to read up on how to actually do a star in a comment. :-)
I think c declarations are supposed to mimic how they are used. That is why. I don’t like it either. Most modern language designers seem to prefer the old Pascal way of declaring. I like that much more.
str[] is actually a char array of sizeof("abcd") allocated at the stack memory and can be modified in runtime. str* is a pointer to the constant memory area, obviously can't be modified whenever.
It's more about the OS and modern security features than the compiler, I should think. Run the same code on an old enough system or even most modern microcontrollers and it might well let you change a literal. The compiler could technically put string literals somewhere else if they wanted to but that'd increase memory usage of your program in some cases. Generally, the text segment (which includes executable code and literals) get marked read only in a way that's either enforced by the OS or the CPU depending on the system. That allows those segments to be shared between processes running the same executable on systems with virtual memory and memory protection. It also prevents certain types of security exploits. Coupled with Execute Disable (XD) on the writable segments of the process's virtual memory and it becomes easier to crash a program with bad code but much harder for attackers to exploit.
Thanks for leaving this comment, it's a well put version of a point a few others have made. I agree with what you're saying overall, a string literal being placed somewhere like the stack for example would fall under the "nasal demons" category of "highly unusual.... but legal" C compiler behaviour: www.catb.org/jargon/html/N/nasal-demons.html.
Thanks heavens for malloc: #include #include #include int main() { const char s1[] = "abcdef"; char *s2 = (char*)malloc(sizeof(s1) * 8); strcpy(s2, s1); s2[0] = 'X'; printf("s2: %s ", s2); free(s2); return 0; } Lesson: Only put constants on the stack, else you'll run into the brick-wall of not specified behaviour.
Thank you for taking the time to make these very informative videos. For someone coming from a higher-level language, you get right to the point explaining things that you don't find in, say C# or Python. I have one question about this particular topic: When you "reassign" a const char* to a new string literal (I guess, "point it to a new one" would be more accurate), what happens to the old character data in the first string? As in your example: const char *s2 = "abcdef"; //then later, you reassign s2 to: s2 = "new string"; Does the "abcdef" just lay around somewhere, or does the compiler remove it from memory? Thanks!
Great question! In this example, it must "lay around somewhere in memory". :-) If we had saved the memory address of the string (more specifically, the first character of the string) into a different pointer variable, we could then access that same string literal, when we set s2 to something else. So the compiler and C won't remove it from memory. Now, as a side note, I do wonder if a compiler would be smart enough to remove something like this from memory: const char *s2 = "abc"; s2 = "def"; Because in this case, *maybe* the compiler could tell "at compile time" that the string literal "abc" will never be used. So *maybe* a compiler could remove the string literal "abc" from memory when the program is compiled, knowing it will never be used. But I'm not sure if a compiler could do that, or if it would do that, I'm just speculating for fun. :-) Maybe I need to test this out and see what happens hahaha...
So yeah... I did a little test right now. :-) And at least with gcc on my MacOS machine, there is no optimization like that... the string literal will stay in the compiled executable even if it is never used at all.
@@PortfolioCourses Wow, thanks for such a detailed answer! So that's interesting. I'm still very new to C so it makes me wonder why you would ever bother using strings in this manner. The only scenario I can think of would be to display something like on-screen instructions, field labels, or menu items. Stuff that will probably get used more than once, but not likely to change during the program.
@@herohera3497 I'm not in school, if that's what you mean, but yes. I'm just learning C for no particular reason. I've always found it interesting and challenging and wanted to have a more thorough understanding of how it works.
This is a very interesting video. Help me with something. In your example, s2 is a string literal with a memory location. I'm wondering if it's on the stack or on the heap. If it's on the heap, then I should be able to free it. However, if I try to free it, the compiler outputs that I'm trying to free an unallocated object. I then created a simple function (char * test_function()) that creates and returns a string literal: char *s = "some text" then return s. In main, I call this function (char *s = test_function()) and then try to free s. It compiles and I can print the return value, but free causes a core dump with invalid pointer when I run it. So is it bad practice to create char pointers using this string literal approach? If they are not allocated but if I can treat them like they are allocated, where are they and is there risk of memory leaks doing this?
Would have been interesting to run the code through the GodBolt compiler explorer, I think you’d spot some additional changes in the code generated by the compiler(s).
For the size of numbers we are dealing with in these simple examples %d will work fine. If we're going to use size_t values for larger numbers, like allocating massive blocks of memory for example, we would want to use %zu (though even with %zu, there are portability issues you can run into with some older compilers...). For anyone curious about learning more check out these tutorials: size_t type explained: ua-cam.com/video/nBJuP_un20M/v-deo.html sizeof operator: ua-cam.com/video/2wNc15926X4/v-deo.html
Very nice video! It is interesting that if you use const char *s2 = "abc" for getting the compilation error when trying to do s2[0] = "X"; does not prevent you of assigning a complete new string literal to s2 like s2 = "xyz"; which does not lead neither to compilation error nor to runtime error. So it is a const that allows re-assignment. S a const char * is re-assignable while a const int is not, why?
Hey appreciate the content man , what editor do you use and i would like to know how you effortlessly jump out of autocompleted brackets. Do you use arrow keys or is there a keyboard shortcut?
Great video explanation! Let me know if I understood rightly. I can't change the string attributed to a pointer. But, if I attributed the pointer to a delcared string I am able to change the "pointer"? Like: char *ptr = str (declared before).
In x86 you often got string literals placed in CODE segment not in DATA. On microcontrollers you almost certainly got string literals placed in ROM. So there're pretty obvious reasons why you can't modify them.
for example: char* ptr="Hello"; ptr="Bye"; what does happen to "Hello"? Is the memory where it was stored freed? Or do I get memory leak someway if I continue to assign different values to ptr? Thank you.
Assuming the compiler doesn't optimize it away, "Hello" will be somewhere in memory. We wouldn't really call it a leak because it's not dynamically allocated memory that we are expected to free(). But it would sort of be like a memory leak in the sense that we have data in memory we aren't using. :-)
Can you expand this explanation to include memory leaks? If I declare a string with a pointer but then changebwhat the pointer points to is this a memory leaks? Is this why we use const?
I have looked a lot and still cannot find the answer to my question: I know that `char[5] x;` (and other variations of the array of characters as a string) will be stored on the stack. Thus, it will be freed at the end of the function. No memory issues to deal with in the code. A pointer to a string, `char* x = "asht"` will go where? Will `"asht"` be freed along with the stack in this pointer case? I read in one place that using a pointer to a string literal makes the string literal accessible to other functions, something like a `static`. Is that the case?
4:29 if it gives a runtime error as opposed to being stopped by the compiler, than doesn't that mean that the compiler does allow it? It just gives a seg fault when u run it...
The compiler allows the code to compile because it conforms to the C language standard. But the compiler doesn't allow us to modify the string literal at runtime... theoretically, the compiler could support this and compile the code in such a way that we are allowed to modify the string literal, because the C language standard only says that the behaviour is "undefined".
@@PortfolioCourses It's not the compiler throwing the error, it's basically the OS. IIRC, protected mode prohibits self modifying code, and if it's in the code segment, it's code, even if it is just data. I think real mode used to allow it. I don't think modern processors even support real mode. Self modifying code was common on 8-bit machines.
@@mikechappell4156 That's true and a great point to make. Though what I said in the reply was that the compiler "doesn't allow us to modify the string literal at runtime", and that's also true. The compiler compiles the code in such a way that we can't modify the string literal. But the compiler could compile the code in such a way that we could modify the string literal, and it would still be compliant with the C standard because the behaviour is undefined.
Can u explain me this: suppose a function that returns a char*. When I declare in main a char* variable to store the result of that function, shouldn't it return a memory direction, since it returns a pointer to a char?. When I do this, it just returns a string, or is this because of how printf() shows it on screen?. Sorry if it's a stupid question, I'm new to C.
char *s2 ="abcd" ; // gets `warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]` char const *s2 ="abcd" ; // does not get a warning, using g++ 13.2 -std=c++17 And there in lies the true answer.
I'm watching and following along. You declare that the following creates a character array. char s1[]="abcdef"; As per your description, it appears that you are forcing the string to break up into an array filled with individual characters. An array of 6 individual characters: ("a","b","c","d","e","f") I may be misunderstanding this, but that is how it comes across. Is this not what is happening?
I'm assuming all string literals used in the source code are stored globally. When you use a pointer to a string literal, it gives you a pointer to that global memory, and since next time the current function gets called the same memory pointer will be used, the memory must remain constant, otherwise the next call to the function will produce different results. Using a character array copies that memory to the stack, so you can modify your copy as much as you want.
If the code compiles, hasn't the compiler just allowed you to write the memory location? If I'm not mistaken, it's the runtime that gives you the error.
Because it’s undefined behaviour, a compiler creator can do whatever they like. They could have it produce code that allows you to write or not write to the string literal. The error is going to occur at runtime, and in practice the compiler designer is going to almost certainly just let the architecture dictate the behaviour. But because it’s undefined behaviour in the C standard, it’s ultimately up to the compiler, and we shouldn’t make an assumption about what will happen (unless we are ok with our program being less portable).
@@PortfolioCourses Maybe I just misunderstood what you're trying to say at 4:03. The compiler literally just compiled your code and you were allowed to run it. Then on runtime you got the error when you tried to write to memory location s2[0]. And it's not that the compiler (in that specific situation) could not see that you're trying to modify a string literal.
Yeah, the compiler did compile the code, but the C standard says it’s undefined behaviour to write to string literals, so other compilers could (more theoretically speaking) handle writing to string literals as they like, and they would still be valid standard compliant C compilers. That’s my only point really. Virtually all resources online describe undefined behaviours as things that are up the compiler because that’s a simple and accurate way to put it. There’s definitely more to the story as to why something is made undefined behaviour to begin with and how compilers will handle things in practice.
I want to get into C as an alternative to rust, but I've never broken through the basics yet. Do things that *read* strings implicitly deal with the differences between pointer and literal strings? Rust burned me with all that by having more string/string adjacent variants than I can count on two hands, and explicitly requiring the exact variant every time.
It's going to depend on the function that deals with them. Some functions will read one char (1 byte) at a time until they reach a '0' character, and some will read a specified amount of bytes whose value you have to supply. Be very careful of both. For example, let's say you overwrite the final character of your char array. It's still a valid array, but a function expecting a '0' character will keep reading OR WRITING directly into memory until it hits a '0' if you pass it the starting address of your array.
I don't know... it may be a bit confusing because "string" is not a type in C, so much as a concept. But like it or not we have "strings" in C and need to refer to them as strings at some point. I think the key thing is to understand that a string is a sequence of characters with a null terminator as the last character, and that we have different ways of storing strings in C.
It pretty new with C++, but i always get confused with pointer and adresses. Can't a pointer only point to an adress? How can it be pointed to a String? Normally i have to add a & before everything.
It can be "pointed to" the first character in the string, i.e. the pointer variable can store the memory address of the first character in a string. And at that point we can say "it points to a string".
When you re-assign the pointer to a new string literal, how can you be sure that the previous one was deallocated? Or it goes to some kind of pool and never gets deallocated?
String literals don’t really get deallocated. Where they are in memory ultimately depends on the compiler, but the compiled program has segments like the data segment where we may expect to see the string literals stored: en.m.wikipedia.org/wiki/Object_file
Great question Avetik! :-) Modifying a string literal has “undefined behaviour” officially, this means it *could* work but it’s definitely not guaranteed to work. So it’s “normal” but it’s not something that you should count on always working.
Great question! :-) When we pass a pointer to a char to printf and we use %s as the placeholder, printf will just keep printing characters from that position in memory onwards until it encounters a null terminator character.
Sorry I'm not sure what you're asking Maksim. The code in this video is available here: github.com/portfoliocourses/c-example-code/blob/main/char_array_vs_string_literal_pointer.c. :-)
Like the video? Check out these C programming playlists with hundreds of videos! 🙂
C Programming Examples: ua-cam.com/play/PLA1FTfKBAEX6dPcQitk_7uL3OwDdjMn90.html
C Programming Tutorials: ua-cam.com/play/PLA1FTfKBAEX4hblYoH6mnq0zsie2w6Wif.html
Is it perhaps that s1[] = "abcdef" results in the string being on the stack and s2* = "abcdef" has it defined in program memory space or as a const in heap space (depending on the compiler?) and being const in both cases is unmodifiable.
Also you can do a cast of s1[] by saying (char *)s1++ and do the same as if it was declared originally as char * s1.
There's also:
s1[]= "hello";
foobar(void)
{
}
where s1 isn't on the stack, instead on the heap.
It would be interesting to check whether doing:
char s2 * = "Hello";
would place the string in heap space or const space.
I've just tried this and looked at what my compiler does with it and in the s1 case as expected I get:
s1 char [6] 0x20000548
where we can see that s1 is an array pointer stored at address 0x20000548 (RAM) that has a length six and
s2 char * 0x10002710 "Hello" is a char pointer pointing to a literal stored at 0x10002710 (ROM).
So you can see we do know where it's stored, the only thing we don't know without checking is whether it's a const in RAM (heap space) or const in ROM both of which are non modifiable.
BTW, because s2 is declared immediately after s1 the map file tells me that it is in RAM location 0x20000550
32-bit Systems:
On a 32-bit architecture, the address space is limited to 2^32 addresses, which means that the maximum addressable memory is 4 GB (2^32 bytes).
To represent any memory address in this space, you need 4 bytes (32 bits). Therefore, the size of a pointer on a 32-bit system is typically 4 bytes.
64-bit Systems:
On a 64-bit architecture, the address space is significantly larger, allowing for 2^64 addresses, which translates to a theoretical maximum of 16 exabytes of addressable memory (2^64 bytes).
To represent any memory address in this vast space, you need 8 bytes (64 bits). Therefore, the size of a pointer on a 64-bit system is typically 8 bytes.
char *s is precompiled in .text while char s[] is allocated in .data section
so if you try to modify *s like s[0] = xxx, it will usually cause a sigseg since you're modifying .text section which is read only
that makes things clearer. thanks!
Isnt it always copied to the stack when a function is called? Unless main is sn exception. I guess i have to open up godbolt
The Value of the pointer is stored in .text not the literal itself, .text would have a pointer value pointing to a region in memory (that region is not decided by your code rather by the os ) , his explanation is correct
Makes sense to me
I like how old school tutorials in C programming explained it "A string literal, similar to an integer literal, has it’s memory automatically managed by the compiler for you! With an integer, i.e. a fixed size piece of data, the compiler can pretty easily manage it. But strings are a variable-byte beast which the compiler tames by tossing into a chunk of memory, and giving you a pointer to it.
This form points to wherever that string was placed. Typically, that place is in a land faraway from the rest of your program’s memory - read-only memory - for reasons related to performance & safety."
Portfolio courses is without a doubt, the best c programming / computer science teacher !!
Thank you for the very kind words Mefref. 😀
I totally agree with you, it is the best C courses✨
Thanks Mohammad! :-)
I second that
A little bit more in depth explanation:
- In C string literals ( aka double quoted strings ) if not used as initializers for char arrays are staticilly allocated, that means they live in the .text / .rodata section and under most operating systems thats read only ( aka no self modifying code for you ;).
- char s1[] and char *s2 are different types, s1 can be used in functions expecting a char* parameter because of some evil compiler backwards compatibility magic which automatically converts any function(s1) -> function(&s1[0])
- Both s1 and s2 in the example is stack allocated, but while s1 is an array of chars all values are also stored on the stack, while s2 is a char pointer and only the pointer variable is stack allocated the value it points to isnt.
Its also quite interesting to look at the assembly your compiler generates and learn from that ;)
youtube ate my * characters, but i tink you get my point ;)
Exactly that's what i just commented above in the super popular, pedantic and yet wrong first comment
@@psyience3213 i missed it ;)
It's not compatibility magic, it's called pointer decaying, you can't pass a whole array to you can't pass a whole array by value in c, instead you pass a pointer to the first element and the size of the array.
I suspected it was a pointer to globally static memory, makes sense it's read-only, otherwise every call to the fuction modifying that memody would get/produce different results. Thanks for the clarification.
Excellent explanation! I just went through a string section in a C course I am taking and completely misunderstood the content. Even though I was able to stumble through a couple projects, I really did not understand what a "string literal" was and how it was different from a character array defined as char[]="some string". This cleared it up perfectly. You have a rare gift in instruction for the C language and I feel lucky to have found Porfolio Courses. Judging by the other comments, I hope you will continue this great work.
Great video. It's been a couple decades since I've done any C programming and I've forgotten some details like this since my programming focus has shifted considerably to functional programming, Python, and C++.
I'm glad you enjoyed it! :-)
Extremely useful. Please make the same video with more examples and corner cases. You are very good teacher. You deserved to be paid.
Very good video, thanks to your calm, soothing voice and the low pace in which you explain everything step-by-step!
I'm glad you enjoyed it! :-) And thank you very much for the positive feedback.
Good video, but I like to also add that char msg[] ="hello";
is just shorthand for
char msg[] ={'h', 'e', 'l', 'l', 'o', '\0'};
There are no such things as strings
in C. Only char arrays or
character arrays. And the address of first character of the char array is
fixed , it is a constant pointer, &msg[0] and msg mean the same thing. And msg and
msg + 0 is also the same , you can't change the base address of the array, it is suck in memory. It makes
sense , lets say you had this :
char s[]="car";
then decided to change it to
s =
"abcdefghijklmnopqrstuvwxyz123456789134566778889abcdefghijklmnopqrstuvwxyz" ;
The compiler would have to find a large amount of memory for this new change in size.
And there might not be enough memory for this new sudden change in size. Therefore a array's
starting address being fixed is a good safeguard.
Therefore the base address of any
array in C is fixed, it is a constant.
This applys to all arrays in C , for example:
int a[]={1,2,3};
int b[={4,5,6};
you can't do this
a= b; ////// same as a= &b[0] ;
// because you are changing the starting or base address of the integer array called a . The address of a is fixed, and will remain fixed for the entire program.
But I can do this ,
int *p = a; ///// int *p =&a[0];
then later I can change the int pointer to point to array b
p = b; ////// because p is a
pointer variable that can point to any variable or any array as long as it has the same data type , as the variable or array , that it is pointing, or refering to. Pointer variables can point to any variable of the same type. Again as I said earlier , char arrays or C-strings are character arrays only , there is no such thing as a string in C , they are not String objects like Java , or String objects like C++ or C# .
Wow thanks so much for sharing all of this Rick! For anyone curious, some of this gets covered in other tutorials videos in the C programming tutorial playlist on this channel that cover strings, pointers, and arrays. And I get what you're saying regarding "C not really having strings", and I think you mostly clarified it at the end of your own comment. But just to be clear, it's inaccurate to say that "there are no such things as strings in C". In C there is no string *type* like some languages such as Java, but the language definitely has strings. In C the term "string" means a sequence of characters in memory ending with a null terminator, whether it's on the stack, the heap, or wherever the compiler being used stores a string literal. The official C standards make reference to strings and that's the term to use when referring to them: www.open-std.org/jtc1/sc22/wg14/www/projects#9899.
...partly incorrect...
Char *text="abcdef" is a pointer to a memory adress containing a string(a sequence of ints, ending with a string end integer
Char Text[]="abcdef" is a pointer to a string object, still in a speciffic memory adress ofc, with the difference it also has an table with functions that can be performed on that memory adress... eg it is a STRING_OBJECT , (a table saying this object at this adress is a string and the listed functions can be performed on it, u can add or overwrite functions that can be performed on objects, usually a bad ide overwrite them, just extend if needed )
...u can point at any memory adress and read it as a string..but its not a string object... eg the compiler has no ide what function is allowed to run on that memory adress...
ex.
Char Text[]="abcdef"
print Text , (will print abcdef)
Char *ptr_text=&Text
*ptr_text=8
print ptr_text (will output the first charater after the string as a sequence of character until it reaches a end string int (out of bounds)... it will otput som part of memory as characters.
@@Patrik6920 wdym by "Char Text[]="abcdef" is a pointer to a string object, ..., with the difference it also has an table with functions that can be performed on that memory adress... eg it is a STRING_OBJECT"?
There is no such "string object" in ANSI C standard library. char text[] declares a char array, that's what it is. It's no way a "STRING_OBJECT" in C, neither a way to declare a "std::string" object in c++. The "table of functions" is even more ridiculous, C is not OOP native and there is no data structure having bound functions (methods). Explicitly adding function pointers to data structure is another story but it's neither something you can do with declaring only "char text[]".
Everything in your original comment below the above line quote is hard to understand or difficult to follow. I'm guessing you are either mixing C with some other languages or regarding some custom/3rd-party string libraries as standard C.
Not an accurate statement. A C string is a null terminated array of characters.
A true char array is just a sequence of bytes, it is not null terminated and will not work with routines that expect the terminating null unless you just get lucky and have put a null byte in the array. Like other arrays a true char array must have the length metadata tracked separate from the contained data.
@@atussentinel when u create a string object thers functions that can be used on it, those functions is described (it isent some random data), when u compile a piece of code, and u try to use charachter functions on an int for example the compiler is going to realise it..
...whats associatet with what int, char, float is described in a function table with that data type structure...
Amazing video! Great explanation. I especially love the table at the end.
In case of embedded devices, and so I suppose in desktop application case also:
char *s = "abcdef";
"abcdef" goes together with other constant variables to RO (Read Only) section of memory, that is a part of FLASH (CODE + RO).
s pointer goes together with other non-constant global/static variables to RW (Read/Write) section of memory, that is a part of RAM (RW + ZI (Zero-Initialized)) (technically it placed in FLASH, but loads to RAM during initialization).
char s[] = "abcdef";
"abcdef" goes to RW section ("s[]" goes nowhere, it kinda does not exist unlike "*s", that is specifically variable type to store address pointing to another location, because all address calls to "s[]" are stored in code commands and cannot be modified and it's the reason why you can't reassign "s[]" to another string later in the code).
That's why if you want to save RAM space when using a lot of string arrays those are not gonna be modified, you must declare them "const char s[]" so their values could be placed in RO section (in this case it may look the same as "char *s", but there are still differences in compilation and how it processed by different functions like sizeof(), because array will have data type "char []" not a pointer type).
This explains why you get that access error trying to modify value that is placed in RO section.
Which compiler or flags are you using for this? I'm not a professional C programmer, but I figured your first example would require a const declaration? I'm very likely missing something major, hoping you can help me :)
I agree with all but the s[] part. It is a variable that refers to the starting address of the array and it should be observable in a map file with the associated size to it. Furthermore, there's a difference between initialization (compile-time) vs assignment (run-time). You can't re-initialize a variable but you can reassign a data at each element in s[] and ultimately change out the whole string even sizes during run-time.
One is a string, initialised with abcdef and the other is a pointer to part of the program itself, 6 bytes representing abcdef. When I was writing embedded process control software a million years ago, the first version would be 6 bytes of RAM and the second would be a pointer in RAM pointing at a string of 6 bytes in ROM.
yeah that's what I also think so, that contents of s1 allocated in program stack while s2 allocated in .text (which is read only and cannot be modified without something like VirtualProtect() in windows)
I’m still learning C and taking an Embedded systems class.
I wrote a char * array when transmitting over UART because google said to, while my professor said to traverse over the array instead.
When I asked him to help debug my code he was blow away that I used the more efficient method, and so was I lmao
This subscription reminder is the best I have seen so far. Totally non-obtrusive and non-annoying. I love it. *Klicks subscribe*
Yep, depends on your compiler. I've lived with this over the years I've been working with C on embedded systems. It's one reason why fixing your projects code development to a particular version of your compiler tool chain, AND the optimisations you use, is very important.
It also depends on the execution environment. A segmentation fault requires a memory protection unit. When working with a flat physical memory, typical in embedded systems, the two approaches will behave identically.
Litterally had the exact same question a week ago in my HPC-course! Many thanks 🙏🙏🙏
You’re very welcome Grabriel! :-)
Once again, a clear explanation on a tricky point. Thanks 🙏
You're welcome, I'm glad you found the explanation clear! :-)
Don't you mean a tricky point...er?
I'll see myself out
The compiler allows you to modify. It is the runtime protections that prevent it. If you tried it in DOS then it likely would likely work.
The compiler may or may not allow you to modify it according to the C standard. In the C standard it's something called "undefined behaviour", there are other undefined behaviours. The compiler could produce code that allows you to modify the string literal, and that would still be a valid C program.
@@PortfolioCourses The language definition says that the behaviour is undefined in order that implementations may put the string in protected memory if such a thing exists on the target platform. The compiler typically needs to do nothing to either allow you to or prevent you from accessing the memory and as @okaro6595 says it is normally the operating system/hardware that blocks the access. It is also true that DOS compilers used to allow this sort of access back in the day since no protection mechanism existed in the hardware. So you may find that these things make sense to you once you understand the reasons that the standard declares the behaviour as undefined.
@@dibblethwaite I appreciate and understand what you're saying, my only point is that because the language definition says that, compiler makers can ultimately do what they like, e.g. they could even put the data for string literals on the stack. Now of course practically, why would they? But again my only point is to state the fact that the language standard lists the behaviour as undefined, so we can't assume. Again though I do appreciate the discussion on underlying reasons "why".
I have watched many videos teaching what pointer is and how to use, but only this one can make me truly understand how to use pointer in a few minutes.
This must be a good day for me, as I have learnt thing from a genius.
That’s awesome this video was helpful for you and thank you so much for the kind words! :-)
You should be able to define in the linker in which segment the literal is stored and what attributes it have. In GCC __attribute__ ((section „.rwdata“)) would work as well.
Lets not to make the code compiler dependent.
🥳Wow!! Thanks a lot sir!! This is massive for me..
You’re welcome! :-)
Hello sir, can I link this video on my channel to refer my viewers here for better understanding?😊
The pointer in c[] is constant, but in *c it is a variable that can change value. But everybody know that.
the c++ compiler will forbid to declare char * s1 = "test"; because it is a convertion from const char * to char * which is forbidden.
I wish i could have watched this video back in 1998.
Really well-explained!
2nd declares a pointer and makes it point at character constant {"abcd"} (which is type of "const char *" not char* (one most likely gets compiler warning about mixing both). And segfault wile executing.
Thanks man, that's very clear now!
Thanks bro. It helped me to clear confusion
You’re welcome, I’m glad it helped clear confusion up for you! :-)
char* s2 = "hello", attempting to change string literal is undefined behavior so you don't know what will happen in run time.
Yep! :-)
As far as I have experimented, it gave a segfault. The teacher explained that in assembly it is placed in the .text section (which only has read permission) and the pointer just gets assigned the begining address of that string. Though, I haven't tried it on other architectures than x86.
The definition of undefined means it is stupid to use even if it appears to work.
Way back in the era of 8086 and 8088, there were compilers where this worked - sort of. This was before x86 had much in the way of process security. The gotcha is that that string literal would change for EVERY instance of the original string. I think it was an early Microsoft compiler, but it could have been some other compiler.
@@johnmcleodvii Well, it actually makes sense that when you modify one of the instances of a string constant all of them will modify the same way because it is usually the same string constant being referenced by all of its instances in order to save memory.
I thought it would even give me a compiler error
The s2 remains stored in the binary of the program while s1 is stored as a copy on "the stack". Some operating systems will allow you to change bytes between executable code, like cc65-built code for Atari 8 Bit, Apple II and C64, while other, more secure operating systems will deny access to it. The same will happen in any other programming language (look up "self modifying code"). This may be one reason why the C standard is not specially forbidding modifying s2, but operating systems do.
I did C and later C++ for over a decade and was pretty close to giving up programming. Writing business software, I was so frustrated by the tedium of C and C++. Keeping .h files in sync with .cpp files, managing pointers, and my biggest frustration: dealing with strings. It's just so tedious in C and C++. Fortunately, right around that time, C# showed up and strings were as easy to work with as any other data type. Memory management became a distant memory and no more header files.
I don't miss that stuff at all.
Try Rust too (no GC), it’s kinda cool and deals with tedious and dangerous memory management (also multithreading) issues by design. Of course it has a “dark side” (unsafe code) but at least tries to deal with such stuff in a clear way. My problem with C/C++ is that it allows shooting yourself easily in the foot (as Stroustrup said) and even proud of it… Idk why it’s OK to be openly dangerous after decades, we are not in the 1980s anymore.
0:38 Note: a string literal is not guaranteed to have a closing null character!
If you write
char s[6] = "abcdef";
there will not be a null terminator.
"abcdef" has null terminator, it is just not assigned in because of lack of space
When I'm building code with a new toolchain I am always looking at where that tool is putting my string literals (through the map file) just to make sure I understand what it's assumptions are. And these assumptions vary with optimisation options
Great tutorial! Thank you so much for taking the time to make it.
You’re welcome, I’m happy that you enjoyed it! :-)
Theoretically if this pointer to a string literal would be in a function that gets called multiple times the pointer could always be the same. In the output the string literal itself would probably be defined before the function as a "constant" and the pointer would always be to that constant.
great channel i'm planning to see your whole c++ course later 👍
Thank you, I’m glad to hear that you’re enjoying the channel! :-)
Awesome stuff. Learned something new. Thank you
You're welcome Amani, I'm glad to hear you learned something new! :-D
This is why I always write my pointer decls as "char* s2" rather than "char *s2". The "pointerness" is part of the type, not the variable. And writing it my way makes it more obvious.
I've usually done it as char **s2 because that's the standard code bases I've worked on have used, but I completely get what you're saying, char** s2 does make it clearer that it's "a pointer to a char" in that sense.
Using stars in UA-cam comments messed up what I was saying and then when I fixed it, it still bolded it. I'll have to read up on how to actually do a star in a comment. :-)
I think c declarations are supposed to mimic how they are used. That is why. I don’t like it either. Most modern language designers seem to prefer the old Pascal way of declaring. I like that much more.
str[] is actually a char array of sizeof("abcd") allocated at the stack memory and can be modified in runtime.
str* is a pointer to the constant memory area, obviously can't be modified whenever.
Very clear and good information! Thanks !!
It's more about the OS and modern security features than the compiler, I should think. Run the same code on an old enough system or even most modern microcontrollers and it might well let you change a literal. The compiler could technically put string literals somewhere else if they wanted to but that'd increase memory usage of your program in some cases. Generally, the text segment (which includes executable code and literals) get marked read only in a way that's either enforced by the OS or the CPU depending on the system. That allows those segments to be shared between processes running the same executable on systems with virtual memory and memory protection. It also prevents certain types of security exploits. Coupled with Execute Disable (XD) on the writable segments of the process's virtual memory and it becomes easier to crash a program with bad code but much harder for attackers to exploit.
Thanks for leaving this comment, it's a well put version of a point a few others have made. I agree with what you're saying overall, a string literal being placed somewhere like the stack for example would fall under the "nasal demons" category of "highly unusual.... but legal" C compiler behaviour: www.catb.org/jargon/html/N/nasal-demons.html.
The pointer variant reminds me of a linked list. Where you don’t know where it’s allocated except for the first node.
Thanks heavens for malloc:
#include
#include
#include
int main() {
const char s1[] = "abcdef";
char *s2 = (char*)malloc(sizeof(s1) * 8);
strcpy(s2, s1);
s2[0] = 'X';
printf("s2: %s
", s2);
free(s2);
return 0;
}
Lesson: Only put constants on the stack, else you'll run into the brick-wall of not specified behaviour.
More concise example, without anything except the string literal on the stack:
#include
#include
#include
int main() {
char *s1 = (char*)malloc(6 * 8);
strcpy(s1, "abcdef");
s1[0] = 'X';
printf("s1: %.6s
", s1);
free(s1);
return 0;
}
Remember, kids: malloc means free(dom).
Thank you for taking the time to make these very informative videos. For someone coming from a higher-level language, you get right to the point explaining things that you don't find in, say C# or Python. I have one question about this particular topic:
When you "reassign" a const char* to a new string literal (I guess, "point it to a new one" would be more accurate), what happens to the old character data in the first string? As in your example:
const char *s2 = "abcdef"; //then later, you reassign s2 to:
s2 = "new string";
Does the "abcdef" just lay around somewhere, or does the compiler remove it from memory?
Thanks!
Great question! In this example, it must "lay around somewhere in memory". :-) If we had saved the memory address of the string (more specifically, the first character of the string) into a different pointer variable, we could then access that same string literal, when we set s2 to something else. So the compiler and C won't remove it from memory.
Now, as a side note, I do wonder if a compiler would be smart enough to remove something like this from memory:
const char *s2 = "abc";
s2 = "def";
Because in this case, *maybe* the compiler could tell "at compile time" that the string literal "abc" will never be used. So *maybe* a compiler could remove the string literal "abc" from memory when the program is compiled, knowing it will never be used. But I'm not sure if a compiler could do that, or if it would do that, I'm just speculating for fun. :-) Maybe I need to test this out and see what happens hahaha...
So yeah... I did a little test right now. :-) And at least with gcc on my MacOS machine, there is no optimization like that... the string literal will stay in the compiled executable even if it is never used at all.
@@PortfolioCourses Wow, thanks for such a detailed answer! So that's interesting. I'm still very new to C so it makes me wonder why you would ever bother using strings in this manner. The only scenario I can think of would be to display something like on-screen instructions, field labels, or menu items. Stuff that will probably get used more than once, but not likely to change during the program.
@@mkd1964 hii u there...r u still studying
@@herohera3497 I'm not in school, if that's what you mean, but yes. I'm just learning C for no particular reason. I've always found it interesting and challenging and wanted to have a more thorough understanding of how it works.
Good clarification
This is a very interesting video. Help me with something. In your example, s2 is a string literal with a memory location. I'm wondering if it's on the stack or on the heap. If it's on the heap, then I should be able to free it. However, if I try to free it, the compiler outputs that I'm trying to free an unallocated object. I then created a simple function (char * test_function()) that creates and returns a string literal: char *s = "some text" then return s. In main, I call this function (char *s = test_function()) and then try to free s. It compiles and I can print the return value, but free causes a core dump with invalid pointer when I run it. So is it bad practice to create char pointers using this string literal approach? If they are not allocated but if I can treat them like they are allocated, where are they and is there risk of memory leaks doing this?
i know this but they you describ, just perfect!👍
Why when u printed the s2 ,it gave you an abcdef instead of the address of index 0 ,since s2 is a pointer?
Great question! Because I used %s, printf output the string, if I had used %p it would have output the address stored in the pointer instead. :-)
@@PortfolioCourses Ohh I wasn't paying attention,u are right!
Cool glad it’s sorted out for you. :-)
Would have been interesting to run the code through the GodBolt compiler explorer, I think you’d spot some additional changes in the code generated by the compiler(s).
you should use zu format specifier for sizeof as it returns size_t type
For the size of numbers we are dealing with in these simple examples %d will work fine. If we're going to use size_t values for larger numbers, like allocating massive blocks of memory for example, we would want to use %zu (though even with %zu, there are portability issues you can run into with some older compilers...). For anyone curious about learning more check out these tutorials:
size_t type explained: ua-cam.com/video/nBJuP_un20M/v-deo.html
sizeof operator: ua-cam.com/video/2wNc15926X4/v-deo.html
Very nice video! It is interesting that if you use const char *s2 = "abc" for getting the compilation error when trying to do s2[0] = "X"; does not prevent you of assigning a complete new string literal to s2 like s2 = "xyz"; which does not lead neither to compilation error nor to runtime error. So it is a const that allows re-assignment. S a const char * is re-assignable while a const int is not, why?
Hey appreciate the content man , what editor do you use and i would like to know how you effortlessly jump out of autocompleted brackets. Do you use arrow keys or is there a keyboard shortcut?
In this video I'm using Xcode on a Mac. I just use the arrow keys, nothing special in this video. :-)
Very good explanation, Thank you
You’re very welcome Chetan! :-)
MSVC used to allow modifying string literals but then disallowed it. The road to compiler completion can be a long one.
Thank you for sharing that. :-)
Just as a fun fact, I'm almost sure that if in C++ you pass a constant string to a char*, the compiles optimizes and allocates for you.
Best C tutorial!!!
Thank you very much for the positive feedback Eden! :-)
Brilliant explanation thank you so much😘
You’re very welcome, I’m glad you enjoyed it! :-)
Great video explanation!
Let me know if I understood rightly. I can't change the string attributed to a pointer. But, if I attributed the pointer to a delcared string I am able to change the "pointer"? Like: char *ptr = str (declared before).
In x86 you often got string literals placed in CODE segment not in DATA. On microcontrollers you almost certainly got string literals placed in ROM.
So there're pretty obvious reasons why you can't modify them.
for example:
char* ptr="Hello";
ptr="Bye";
what does happen to "Hello"? Is the memory where it was stored freed? Or do I get memory leak someway if I continue to assign different values to ptr? Thank you.
Assuming the compiler doesn't optimize it away, "Hello" will be somewhere in memory. We wouldn't really call it a leak because it's not dynamically allocated memory that we are expected to free(). But it would sort of be like a memory leak in the sense that we have data in memory we aren't using. :-)
Thank you for asking that Question. It has helped me to understand something in his answer.
Can you expand this explanation to include memory leaks? If I declare a string with a pointer but then changebwhat the pointer points to is this a memory leaks? Is this why we use const?
This is so helpful thank you honestly
You’re very welcome Mahmoud! :-)
great vid, but I think "new string" should be 11 long including the terminator?
Thank you for making this video sir🙂
You're very welcome! :-)
The compiler I used 25 years ago wouldn't complain about s2[0].
I have looked a lot and still cannot find the answer to my question:
I know that `char[5] x;` (and other variations of the array of characters as a string) will be stored on the stack. Thus, it will be freed at the end of the function. No memory issues to deal with in the code. A pointer to a string, `char* x = "asht"` will go where? Will `"asht"` be freed along with the stack in this pointer case? I read in one place that using a pointer to a string literal makes the string literal accessible to other functions, something like a `static`. Is that the case?
you could use de-assembler
in which playlist these types of videos you have? where you discuss these types of generic problems??
4:29 if it gives a runtime error as opposed to being stopped by the compiler, than doesn't that mean that the compiler does allow it? It just gives a seg fault when u run it...
The compiler allows the code to compile because it conforms to the C language standard. But the compiler doesn't allow us to modify the string literal at runtime... theoretically, the compiler could support this and compile the code in such a way that we are allowed to modify the string literal, because the C language standard only says that the behaviour is "undefined".
@@PortfolioCourses ahhh makes sense thank you
@@bsykesbeats You're welcome! 🙂
@@PortfolioCourses It's not the compiler throwing the error, it's basically the OS. IIRC, protected mode prohibits self modifying code, and if it's in the code segment, it's code, even if it is just data. I think real mode used to allow it. I don't think modern processors even support real mode. Self modifying code was common on 8-bit machines.
@@mikechappell4156 That's true and a great point to make. Though what I said in the reply was that the compiler "doesn't allow us to modify the string literal at runtime", and that's also true. The compiler compiles the code in such a way that we can't modify the string literal. But the compiler could compile the code in such a way that we could modify the string literal, and it would still be compliant with the C standard because the behaviour is undefined.
what Editor or IDE do you use ? Looks neat and very simplified, i wanna give it a try too.
Xcode on MacOS
Ty I got my first divine because of you
You're welcome Marilyn! :-) Is a "divine" a type of grade, like getting an A? I've never heard the word used like that before. Congratulations! :-D
This somewhat reminds me to assembly
Why bother with char* when I can use const char ? Or maybe we can't use the latter? I'll have to check
Awesome video, help me a lot... New sub!
Thank you Almir, I’m glad the video helped you out, and welcome aboard! :-)
Good video. Subscribed. Cheers
Can u explain me this: suppose a function that returns a char*. When I declare in main a char* variable to store the result of that function, shouldn't it return a memory direction, since it returns a pointer to a char?. When I do this, it just returns a string, or is this because of how printf() shows it on screen?. Sorry if it's a stupid question, I'm new to C.
are all array-type varibles using Stack Memory?
char *s2 ="abcd" ; // gets `warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]`
char const *s2 ="abcd" ; // does not get a warning, using g++ 13.2 -std=c++17
And there in lies the true answer.
Good video. White theme burning my eyes though 😁
So I get that they are different but why are they different? They seem like they should do exactly the same thing to me.
this was worth a sub
I'm glad you liked the video, and welcome aboard! 😀
Oh man. Where is the comparison of the assembly generated by the compiler? :(
I'm watching and following along. You declare that the following creates a character array.
char s1[]="abcdef";
As per your description, it appears that you are forcing the string to break up into an array filled with individual characters. An array of 6 individual characters: ("a","b","c","d","e","f")
I may be misunderstanding this, but that is how it comes across. Is this not what is happening?
I'm assuming all string literals used in the source code are stored globally. When you use a pointer to a string literal, it gives you a pointer to that global memory, and since next time the current function gets called the same memory pointer will be used, the memory must remain constant, otherwise the next call to the function will produce different results. Using a character array copies that memory to the stack, so you can modify your copy as much as you want.
It depends on the linker. Usually string literals will actually be in the read-only text segment with the assembly.
good video. very well done.
Why isn't the type of the string literal const in the first place? why doesn't it make you cast away the const to assign it to a non-const pointer?
If the code compiles, hasn't the compiler just allowed you to write the memory location? If I'm not mistaken, it's the runtime that gives you the error.
Because it’s undefined behaviour, a compiler creator can do whatever they like. They could have it produce code that allows you to write or not write to the string literal. The error is going to occur at runtime, and in practice the compiler designer is going to almost certainly just let the architecture dictate the behaviour. But because it’s undefined behaviour in the C standard, it’s ultimately up to the compiler, and we shouldn’t make an assumption about what will happen (unless we are ok with our program being less portable).
@@PortfolioCourses Maybe I just misunderstood what you're trying to say at 4:03. The compiler literally just compiled your code and you were allowed to run it. Then on runtime you got the error when you tried to write to memory location s2[0]. And it's not that the compiler (in that specific situation) could not see that you're trying to modify a string literal.
Yeah, the compiler did compile the code, but the C standard says it’s undefined behaviour to write to string literals, so other compilers could (more theoretically speaking) handle writing to string literals as they like, and they would still be valid standard compliant C compilers. That’s my only point really. Virtually all resources online describe undefined behaviours as things that are up the compiler because that’s a simple and accurate way to put it. There’s definitely more to the story as to why something is made undefined behaviour to begin with and how compilers will handle things in practice.
do you understand that light theme hurts my eyes?
did you know you can use s2 like char array also?
i didn't get how s1 is passed as a pointer to printf? it's passed as a value no?
I want to get into C as an alternative to rust, but I've never broken through the basics yet. Do things that *read* strings implicitly deal with the differences between pointer and literal strings? Rust burned me with all that by having more string/string adjacent variants than I can count on two hands, and explicitly requiring the exact variant every time.
It's going to depend on the function that deals with them. Some functions will read one char (1 byte) at a time until they reach a '0' character, and some will read a specified amount of bytes whose value you have to supply.
Be very careful of both.
For example, let's say you overwrite the final character of your char array. It's still a valid array, but a function expecting a '0' character will keep reading OR WRITING directly into memory until it hits a '0' if you pass it the starting address of your array.
amazing my dude
Thank you Albin! :-)
Usung the term string in C is confusung. Maybe using char array and const char array is clearer?
I don't know... it may be a bit confusing because "string" is not a type in C, so much as a concept. But like it or not we have "strings" in C and need to refer to them as strings at some point. I think the key thing is to understand that a string is a sequence of characters with a null terminator as the last character, and that we have different ways of storing strings in C.
you are the best dude
Aw thank you very much! :-)
Hi, what keyboard do you have? the sound is very satisfying.
I have an Apple keyboard, I think it's called a "magic" keyboard officially. 🙂
Perfectly explained, thank you!
You’re welcome! :-D
It pretty new with C++, but i always get confused with pointer and adresses. Can't a pointer only point to an adress? How can it be pointed to a String? Normally i have to add a & before everything.
It can be "pointed to" the first character in the string, i.e. the pointer variable can store the memory address of the first character in a string. And at that point we can say "it points to a string".
When you re-assign the pointer to a new string literal, how can you be sure that the previous one was deallocated? Or it goes to some kind of pool and never gets deallocated?
String literals don’t really get deallocated. Where they are in memory ultimately depends on the compiler, but the compiled program has segments like the data segment where we may expect to see the string literals stored: en.m.wikipedia.org/wiki/Object_file
You can test this yourself with g++ in Linux or in WSL.
g++ -S input.cpp
It will generate an assembly-language file input.s
The example with char *str = "abcd"; str[0] = 'k'; compiles and changes the first index of the string in microsoft compiler, is that normal ?
Great question Avetik! :-) Modifying a string literal has “undefined behaviour” officially, this means it *could* work but it’s definitely not guaranteed to work. So it’s “normal” but it’s not something that you should count on always working.
thx mr canada
You’re welcome! :-)
How come s2 is printing the whole string instead of printing the first character since it's only pointing to the first character
Great question! :-) When we pass a pointer to a char to printf and we use %s as the placeholder, printf will just keep printing characters from that position in memory onwards until it encounters a null terminator character.
How does this program call? I need this!
Sorry I'm not sure what you're asking Maksim. The code in this video is available here: github.com/portfoliocourses/c-example-code/blob/main/char_array_vs_string_literal_pointer.c. :-)