This is by far the best tutorial I've found for the amateur embedded developers. When you actually show the inner workings of the code at assembly it makes so much sense. 10 years and it is still relevant. Thanks Miro.
Fantastic! Each time I re-watch a lesson I find something new that I missed previously. For such concise lessons, they are literally packed with valuable information.
I'm glad that you like the videos. And thank you for noticing and appreciating the details. They are all intended for a deeper understanding of the subject. --MMS
Thank you Miro, for explaining so much in one video - and to just the right depth of knowledge. Until this video, I never really understood the inner workings of how pointers actually function within the ARM Cortex-M architecture. Superb? Absolutely!
I am sorry to say to the rest of the world , that today i learnt the purpose of POINTERS. I am not from the programming background because I am electrical power engineer. And i am sometimes dealing with programming tasks , so this course is really something CLASSIC.
Great to know subtle details from the lecture like where Cortex-M stores global variables, load/store patterns in RISC, and Cortex M0 vs. M4 in terms of support for unaligned accesses.
Thank you, Sir! You should publish this as a book, and sell it. Best ever tutorial I've seen, and I used to be a trainer for Nokia, travelling the world. If I could've trained folks like you do.... (0xDEADBEEF indeed ;) )
I recently got hired by a company for a mixed programming position. I had NO experience in embedded systems programming whatsoever and was struggling to catch up. This course single-handedly brought me up to speed with quality and clarity unrivaled by any other UA-cam training course (and I have watched quite a few). If you have just found this course, keep going. You have come to the right place. The fact that Miro Samek has posted this course for free is the height of scholastic generosity. I have learned more from this course than any computer-science college class.
The operator '++' (pre-increment) has a higher precedence than the operator '*' (de-reference). Therefore, without the parentheses you increment the pointer first and then you de-reference the incremented pointer. Please google for "C operator precedence" to find out exactly the precedence rules.
Superb!!!! The best online Embedded Programming Course I came across ever!!! Can we expect a full pledged embedded C commercial project some time later on the course. This would greatly help the new embedded programmer seeking job :-) Thanks for your wonderful effort. Awaiting for the great lectures....
I am a bit confused as to why your assembly instructions always end up substantially different than mine, whilst I started with your keil project. I thought it was because optimization was off -O0, but -O1 didn't work when not dealing with the volatile keyword yet. In the previous lesson the instructions were different, and the loop assembly code was not in the order you presented it. In this lessons, I see three instructions (MOVW, MOVT, LDR) to load the counter value from memory instead of just one (LDR.N). I feel like I understand the assembly instructions in my compiled code as well, but I just don't get why it is different. What am I missing here?
You aren't "missing" anything. You just see firsthand how different the compilers can be (IAR vs. armclang in this case). The compilers apparently apply different optimization criteria. For example, the LDR/STR instructions are generally expensive (they take at least 2 CPU cycles but sometimes more). Therefore, it is sometimes faster to load constants into a register with the MOVW/MOVT pair of instructions that build the constant in two steps (each only 1 CPU cycle long). Getting the same constant (address of the "counter" variable in this case) via a PC-relative LDR might be more expensive in both CPU and even in memory (because you need to store the address in a 32-bit word too). --MMS
@@StateMachineCOM Ah, that explains a lot! I am super happy with your videos by the way, and even happier with this quick reply. Thank you very much! :)
Iftikhar, The first '*' is missing in the definition of GPIO_PORTA_DATA_BITS_R, because it is a pointer. Later you access GPIO_PORTA_DATA_BITS_R as an array, like so: GPIO_PORTF_AHB_DATA_BITS_R[LED_RED] = LED_RED; Please refer to lesson 7, where I explain the close relationship between pointers and arrays. --MMS
That's a very interesting question. The DC32 "instruction" showing at the bottom of the screen at 3:12 is "Data Cache 32-bit", which is a const data word mixed into the instruction stream. This word holds the address of the counter variable loaded into R0 by the instruction LDR.N R0,??main_2. This is the so-called PC-relative addressing mode, which I have not explained in any detail. The other view of the same instruction would be LDR.N R0,[PC,#offset], where the immediate #offset is the distance between the LDR.N instruction and the DC32 "instruction". This works because when the LDR.N is executed the PC value is known, for example, 0xa2. Interestingly also, the multiple LDR.N R0,??main_2 instructions shown in the disassembly window correspond to *different* offsets from the PC, so if you look carefully at the binary encodings, they are *different* (even though the disassembler shows the *same* LDR.N R0,??main_2 mnemonic). This is because the #offset from the PC to the DCD32 "instruction" is different for each LDR.N instruction. I know it's a bit confusing, so please pause at 3:12 and study the screen carefully. --MMS
Thanks for another great video, Miro. I'm a little confused by the moral of the story where you showcase greater efficiency when using the pointer alias of the counter variable. Isn't it more efficient just because the pointer is a local variable whereas the counter is not? To put it another way, it's counter-intuitive to me that using the counter variable, whose address will never change, is turning into less efficient code which keeps reloading its address into R0, whereas using a pointer that is variable by nature is requiring only one load of the counter address. I would have hoped that the compiler would recognize that first case and load 0x20000000 into R0 only once?
Great Video. Thank you. I have a question. How did an absolute address 0x200000002 get interpreted as [PC, 0xC] and how did 0xDEADBEEF get contained in the address [PC, 0x10] by the compiler? This is at 11:03
An address by itself (like 0x20000002) is just a constant, which is not "interpreted" as anything. Instead, the machine code "LDR R0,[PC,#0xC]" corresponds to the line of C code "p_int = 0x20000002". You can easily understand it when you realize that the "p_int" pointer variable is allocated to the R0 register. This R0 register is then loaded (LDR) with a constant located in the code memory, at the offset 0xC from the *current* PC (Program Counter). "Current PC" means the PC value when the "LDR R0,[PC,xxx]" instruction is evaluated. This is called "PC-relative" addressing (you can google for this term). The value 0xDEADBEEF is another constant, which is located at offset 0x10 from a different value of the PC when the instruction "LDR R1,[PC,#0x10]" is evaluated. The role of the compiler is to inserted the constants into the code space, so that it can subsequently apply the PC-relative addressing to retrieve the constants. You can actually find these constants in the code memory (put the value of the PC in the memory view). --MMS
PC relative addressing mode is used to implement intra segment (within code segment itself in this scenario) to transfer of control, In this mode effective address is obtained by adding displacement to PC
At 07:46 there is a BLT.N instruction. When I subtract 7(f9) from ca (current PC) I get c3 but the loop goes back to c0, can you please explain why ? Thank you
I am having some issues while running the code on the latest version of IAR Workbench IDE- ARM 8.50.1. I followed all the instructions you mentioned in the previous video but the counter variable is not a global variable and the I can't watch the pointer like it's changing in your video.
Trying to understand, how 0xDEADBEEF was stored; you gave address 0x20000002 i.e. fifth nibble (at 11:28) in row 20000000, so assuming it is 4 byte aligned (from register size), when you give address ..002 it should store from..000..right? how it loaded that DEAD and BEEF at 2000000 and 20000006 respectively. Please help!
+Gaurav Minocha In the memory view you see 4-byte chunks of data that are also aligned at 4-byte boundaries. So, a chuck of data like 0xDEADBEAF that is not aligned at a 4-byte boundary appears separated into two halves: 0xDEAD and 0xBEEF. Perhaps it would also help if you look at the data with different grouping. The IAR memory view allows you to view the data also in 2-byte and 1-byte chunks. Specifically, in the 1-byte chunks you will see all available addresses, because each byte has its own address. I hope this helps.
+Quantum Leaps, LLC Final, address to data mapping: 20000000->be, 01->ef, 02->(00152000) 06->de, 07->ad; Instruction tries to store 0xdeadbeef at 0x20000002..right? Now, as the processor is 4 byte aligned, it should try to store (0xdead) first 2 bytes in 4bytes aligned at 0x20000000 and 0xbeef in 4bytes aligned at 0x20000004. But, it happened actually opposite. I hope I made my point clear. Please help!
+Gaurav Minocha I forgot in which lesson I talked about the **endianness** of the CPU, but I certainly did talk about it. I also mentioned that ARM Cortex-M CPU is **little endian**, which means that it stores multi-byte quantities like 0xDEADBEAF such that the least-significant byte is at the lowest address. Remembering that we write numbers left-to-right (they are called Arabic numerals after all!), 0xDEADBEAF will appear in memory as: 0xAF, 0xBE, 0xAD, 0xDE. To actually see this, please switch the memory view to 1-byte quantities.
+Quantum Leaps, LLC Thank you. I wasn't using the workbench, was just going through your videos. Now I understand why you said to see single byte view. So, basically in first word the first byte i.e. address 0x20000000 is 15. I was reading it other way..I will set up the eclipse GNU system soon :)
Thank you for this amazing explanation. At the end you mentioned that Cortex-M0 would have a problem with it. Why is that? and how Cortex-M0 figures out the address alignment?
Cortex-M0 would have problems with accessing misaligned data items. Specifically, 32-bit words must be aligned at addresses divisible by 4 and 16-bit half-words must be aligned at addresses divisible by 2. The bigger members of the Cortex-M family (M3/M4/M7) don't have such restrictions, but if I remember correctly, a mis-aligned data takes additional CPU cycle or two longer to access. --MMS
Great lesson overall! I just struggled with the deadbeef example because i could not reproduce it on my keil microvision ide. I can see deadbeef being written into register r0 through a MOVW and MOVT operation. After that DEADBEEF gets stored into memory at the p_int address but my counter variable still remains at 21. Am I missing something here?
I just started this tutorial. Switched to Keil after lesson 2 so that I could use the STM dev board and had the same issue writing DEADBEEF to SRAM. The solution was writing to the beginning of the 4 byte address (double word) I.E. instead of writing to 0x20000002 try changing the address to 0x20000004 or any multiple of 4 in the SRAM memory block. I suspect this is because of the different compiler settings between IAR and Keil🤔
By the way, Miro, why can't we type ++(*p_int) as ++*p_int? I did so and the program gave an output of nonsense and finished at 4000ish cycle counter. What exactly happens when we dont put the paranthesis?
When switching to Keil and STM DK for lesson 3, I found I could no longer step through lines of code in main using debug. No mater which step button I used it would run to the end of program to return 0; line even with breakpoints set. The solution was to lower the optimization like this- go to Options for Target => C/C++ tab => Optimization level.
In all cases like "my project works differently than yours..." the only sane advice is to compare and see for yourself what's different. Specifically, you should: (1) download the project for this lesson from the companion web-page to this course at www.state-machine.com/video-course , (2) build and run that project on your computer and verify that it behaves as in the video, and (3) compare your project to the official one. A good free differencing tool is WinMerge (please google for it). The tool can compare whole directories. --MMS
Hi. First of all I would like to say a big thank you for such a good course. Especially I like your assembly explanations. There is something I found I can't figure out. Would you say something more about LDR.N, please? (Time, about 3:20) Your disassembly uses labels, I have something else. My disassembly does say: 0xec: 0x4802 LDR.N R0, [PC, #0x8], I do assume we have to add offset to PC. Anyway if I add 0x8 to current PC I land in line 0xf4 where is 0x2000 MOVS R0, #0. When I look closely I can see value 0x20000000 in line 0xf8. Is my calculation wrong?
+Smalera Michal The calculation of PC-based addresses are tricky, because they must also include the effect of the instruction pipeline. (The instruction pipeline has been introduced in the previous lesson 2). The point is that by the time the PC-based offset is evaluated, PC itself has been already incremented in the earlier stage of the pipeline. Obviously, the compiler knows such things.
PC (Program Counter). The Program Counter is automatically incremented by the size of the instruction executed. This size is always 4 bytes in ARM state and 2 bytes in THUMB mode. When a branch instruction is being executed, the PC holds the destination address. During execution, PC stores the address of the current instruction plus 8 (two ARM instructions) in ARM state, and the current instruction plus 4 (two Thumb instructions) in Thumb(v1) state. This is different from x86 where PC always points to the next instruction to be executed. Source: azeria-labs.com/arm-data-types-and-registers-part-2/ In your case you are in ARM state so have to add 4 Bytes to PC and then add #0x8 (PC + 4 Bytes + #0x8) to get to the instruction address where the counter is stored. You know you are in ARM state or THUMB mode when you take a look at the APSR (Application Program State Register) in the Register Window. Is the N-Bit set (1) you are in THUMB Mode (in this case you would add 2 Bytes to PC instead of 4 Bytes). But when the N-Bit is not set (0) it means ARM state.
Hi Miro! Please provide some clarification on the point that "counter variables lives in R0" (what happens if there are more local variables than the cpu registers ). Let me clarify what I understood here, the program runs in privileged mode thus it uses the main stack right? If so the counter with local scope must be in the stack, by making it global moves the variable out of the stack but within RAM (may be in Flash). So how does the variable in stack is mapped to CPU register ?
There is NO requirement that a local variable is in RAM, and the C-compiler is allowed to allocate such variables the way it sees fit. It turns out that keeping the 'counter' variable inside a CPU register is optimal because it avoids memory access. (Every memory access costs at least 2 CPU cycles, and you need LDR instruction to read and STR instruction to write back, which adds up to at least 4 additional cycles for incrementing 'counter'). If there are more variables than registers, the compiler must start using the RAM, but again, there is no strict requirement that the local variables be in certain memory locations. The main point here is that allocation and management of local variables is left to the compiler and you that's precisely why some compilers generate faster code than others. --MMS
In your code example you use the address 0x20000002U, I tried changing the address to 0x40000002 and it didn't work. I ran it using a simulator. Are pointers limited to what they can access for protection?
The simulator applies address ranges that are typical for Cortex-M microcontrollers. Addresses starting from 0x2000000 are allocated to RAM. Addresses starting from 0x4000000 are typically used for external devices. Please see the Cortex-M memory model: developer.arm.com/documentation/dui0552/a/the-cortex-m3-processor/memory-model
So question, I'm using the keil ide/simulator which as far as I know, has a different compiler compared to the IAR ide/sim, which would of course, result in different assembly code being generated. Something slightly strange I'm seeing however occurs when incrementing the counter variable, both directly, and via the p_int pointer. This is what it looks like in the dissassembly: 0x000004DC DC05 BGT 0x000004EA 0x000004DE E7FF B 0x000004E0 6: ++(*p_int); 7: } 8: 0x000004E0 9900 LDR r1,[sp,#0x00] 0x000004E2 6808 LDR r0,[r1,#0x00] 0x000004E4 3001 ADDS r0,r0,#0x01 0x000004E6 6008 STR r0,[r1,#0x00] 0x000004E8 E7F5 B 0x000004D6 0x000004EA 2002 MOVS r0,#0x02 0x000004EC F2C20000 MOVT r0,#0x2000 Now, when single stepping with the debugger, we should reach that first load instruction into r1, then we should be able to keep stepping through the instructions until we reach the branch which will send us back to the beginning of the loop at 0x000004D6. What's actually happening however is that when I press f11 to single step through the instructions, it appears to stay at that first load instruction, but the counter is indeed getting incremented as I can see the value going up in the register window. A similar occurrence appeared to happen when I first tried stepping to see R0 get set to 0xDEADBEEF, but that part at least appears to be working now. I'm not quite sure why this is happening except for maybe its some sort of optimization by the debugger as it knows that it's just incrementing "counter" and therefore doesn't feel the need to display the rest of the loop instructions occurring.
This behavior of the KEIL uVision debugger is most likely caused by stepping one C-instruction at a time, whereas you apparently wish to step one assembly-instruction at a time. To do this, please click on the disassembly window first, and then single-step. I think that the F11 shortcut should work for this, but you might also try to click on the single-step button in the toolbar. --MMS
Can someone tell me something, what exactly does IAR Hide from us when we download our programs to the device? Where do our programs go? Is there a boot loader?
hi sam, i am unable to compile due to the error it pops. ***Error stlink cannot locate mcu..... it works fine if i export some source files via stm cube software. i am confused about where i am doing wrong. Please advice.
GCC compiler will not point issue on the line p_int=0x20000002; it will only generate warning , what is your compiler and which is better to give error or warning ? thanks
That's true. The GCC compiler, even with the -Wall command-line option, gives only a warning (int-conversion). You can turn all such warnings into errors by giving the command-line option (-Werror). Alternatively, you can turn only specific warnings into errors (-Werror=int-conversion). I would recommend to go as strict as possible (so convert warnings into errors as much as you can tolerate). --MMS
why did the pointer make it faster?! As I can follow, - code without pointer : the memory address has been assigned for Reg. R0, like exactly what happened in the code with the pointer. the p_int became an alias to the Reg. R0 , how this can help making the program faster!
To access a variable in memory, the CPU needs the address of this variable in one of the registers. At the lowest levels of code optimization, the compiler loads this address from the code memory before each and every access to the variable. The pointer speeds this up, because being a local variable inside the main() function is allocated to a register. This means that the address sits in a register (R0 in this case) and does not need to be loaded and re-loaded into a register each time. At higher levels of optimization the compiler generates a more sensible code and the code without the pointer is as fast as with the pointer. --MMS
hello I have a problem when I 'm trying to download into microcontroler. Please help ! Wed May 21, 2014 22:59:51: Missing or malformed flash loader specification file: C:\Program Files\IAR Systems\Embedded Workbench 7.0\arm\config\flashloader\
Roman Matveev I'm not sure why your system cannot find the flash loader. The flash loader for the TivaC LauchPad is loated in $TOOLKIT_DIR$\config\flashloader\TexasInstruments\FlashTC4_H6_o.board . The $TOOLKIT_DIR$ is the symbolic name of the directory, where you have installed your IAR EWARM toolset. Please check if you have this file and how it looks. Mine looks as follows: $TOOLKIT_DIR$\config\flashloader\TexasInstruments\FlashTC4_H6.flash --MMS
Quantum Leaps, LLC Thank you for your help, Miro! First of all I need to make clear that I don't use TI - I use STM32F401C-DISCO demo board. However I don't think that it is so important in this case. So! I tried to locate the directory you've specified (C:\Program Files (x86)\IAR Systems\Embedded Workbench 6.5\arm\src\flashloader\ST\FlashSTM32F4xx in my case) but there is nothing like *.board file. There is one (FlashSTM32F4xxx.flash) which is also XML: $TOOLKIT_DIR$\config\flashloader\ST\FlashSTM32F4xxx.out 2 4 0x4000 1 0x10000 7 0x20000 0x08000000 $TOOLKIT_DIR$\config\flashloader\ST\FlashSTM32F4xxx.mac 1 But it is not really similar to the one you exposed in your reply. So do I need to find this (*.board) file anywhere?
The LDR instruction comes with two variants. The 32-bit variant (meaning that the instruction itself takes 32-bits), denoted LDR. The 16-bit variant (meaning that the instruction itself takes only 16-bits), denoted LDR.N. The compiler will generate the shorter LDR.N, when certain restrictions about the registers used is met. You can also google for "ldr vs ldr.n".
The [PC,] addressing mode is the special "PC-relative" addressing (based on the Program Counter register). It is used for accessing constants in the program memory. Please note that the PC is *changing* as the instruction is executed, therefore the immediate needs to take into account the instruction pipeline. The LDR.N opcode is the "near" version, which has a smaller range than LDR, but has some other capabilities. --MMS
hello sir!!! I have downloaded this software (about 1.2 gb) but at the end of installing the software installation doesn't get completed a prompt comes that a file is corrupted and installation stops what should i do now
Perhaps you should contact IAR to help you with the installation of their Embedded Workbench for ARM software. If you cannot resolve these issues, in lesson 19, this course switches the toolset to the free Eclipse-based Code Composer Studio (CCS). Please watch lesson 19 to see how to install CCS and use it for this course. --MMS
I'm having an issue with downloading to the Tiva-C Launchpad. All of the projects used to work, but now I'm only finding that my first one is working. I keep getting a vast number of verification errors from the debug log. Every message (theres 200+ of them) looks like this: Verify error at address 0x000000E6, target byte: 0xFF, byte in file: 0xF8 Theres an error for virtually every address; the target byte is always 0xFF but the byte in file is always different (assuming its the code). How can I fix this? Thank you!
Please visit the companion web-page to that video course at www.state-machine.com/quickstart (I must have repeated this a few hundred times in almost every comment!) Among others, you'll find there the guide "Troubleshooting TivaC LaunchPad". In this guide, you will find the detailed description how to unlock your TivaC LauchPad board. Good luck! --MMS
i am kind of afraid of using pointers... because i am pretty sure ill fuck up my whole pc by using them pointers in for or while loop with increment are dangerous...
Compiler is inteligent? 2:47 Simply killing me this horrible code where R0 have got a variable in register, BUT second time loading pointer to this variable, and reload R0 THIS SAME VALUE. This is horrible. STR R0, [R1] //our variable why HERE doesn't exist "CMP R0, #21" ??? instead this 2 unnecessary instructions bellow LDR R0, ??main_2 LDR R0, [R0] CMP R0, #21
I agree, the constant re-loading of the addresses (that already *are* in the registers) is truly ugly. But this ugly code is generated only in the lowest levels of optimization. When you move to the higher levels, the code is much better. Unfortunately, the code is more difficult to debug, and often the debuggers cannot display the values of variables (because they have been "optimized away"). In other words, there is a tradeoff between code quality and ease of debugging. --MMS
This is by far the best tutorial I've found for the amateur embedded developers. When you actually show the inner workings of the code at assembly it makes so much sense. 10 years and it is still relevant. Thanks Miro.
are youusing the same "software IAR" for learning orother one
@@hiriharanvm5568 I use Keil, everything is pretty much the same with minor and intuitive changes
Fantastic! Each time I re-watch a lesson I find something new that I missed previously. For such concise lessons, they are literally packed with valuable information.
I'm glad that you like the videos. And thank you for noticing and appreciating the details. They are all intended for a deeper understanding of the subject. --MMS
The best embedded tutorial I've found. You do our profession a great credit.
This is the best embedded systems programming course I have taken. Thank you Miro Samek.
This is awesome. I literally heard thousands of explanations for what RISC, first time I see what it means in PRACTICE.
Thank you Miro, for explaining so much in one video - and to just the right depth of knowledge. Until this video, I never really understood the inner workings of how pointers actually function within the ARM Cortex-M architecture. Superb? Absolutely!
I am sorry to say to the rest of the world , that today i learnt the purpose of POINTERS. I am not from the programming background because I am electrical power engineer. And i am sometimes dealing with programming tasks , so this course is really something CLASSIC.
Great to know subtle details from the lecture like where Cortex-M stores global variables, load/store patterns in RISC, and Cortex M0 vs. M4 in terms of support for unaligned accesses.
Thank you man, your style of teaching and effort is priceless
Thanks for your time and effort! its very useful for everyone who wants to learn embedded! great job Mr.Miro Samek
Thank you, Sir! You should publish this as a book, and sell it. Best ever tutorial I've seen, and I used to be a trainer for Nokia, travelling the world. If I could've trained folks like you do....
(0xDEADBEEF indeed ;) )
I recently got hired by a company for a mixed programming position. I had NO experience in embedded systems programming whatsoever and was struggling to catch up. This course single-handedly brought me up to speed with quality and clarity unrivaled by any other UA-cam training course (and I have watched quite a few). If you have just found this course, keep going. You have come to the right place. The fact that Miro Samek has posted this course for free is the height of scholastic generosity. I have learned more from this course than any computer-science college class.
The operator '++' (pre-increment) has a higher precedence than the operator '*' (de-reference). Therefore, without the parentheses you increment the pointer first and then you de-reference the incremented pointer.
Please google for "C operator precedence" to find out exactly the precedence rules.
Amazing course to arm programming! Thank you very much!!!!
Thank you very much, after that video I finally understood what pointers are. BTW this tutorial series is fantastic.
Dude, your lessons are amazing. You're really good. :)
Superb!!!! The best online Embedded Programming Course I came across ever!!! Can we expect a full pledged embedded C commercial project some time later on the course. This would greatly help the new embedded programmer seeking job :-)
Thanks for your wonderful effort. Awaiting for the great lectures....
very well and patiently explained. kudos
Thanks for the very useful tutorial.
I am a bit confused as to why your assembly instructions always end up substantially different than mine, whilst I started with your keil project. I thought it was because optimization was off -O0, but -O1 didn't work when not dealing with the volatile keyword yet.
In the previous lesson the instructions were different, and the loop assembly code was not in the order you presented it. In this lessons, I see three instructions (MOVW, MOVT, LDR) to load the counter value from memory instead of just one (LDR.N). I feel like I understand the assembly instructions in my compiled code as well, but I just don't get why it is different.
What am I missing here?
You aren't "missing" anything. You just see firsthand how different the compilers can be (IAR vs. armclang in this case). The compilers apparently apply different optimization criteria. For example, the LDR/STR instructions are generally expensive (they take at least 2 CPU cycles but sometimes more). Therefore, it is sometimes faster to load constants into a register with the MOVW/MOVT pair of instructions that build the constant in two steps (each only 1 CPU cycle long). Getting the same constant (address of the "counter" variable in this case) via a PC-relative LDR might be more expensive in both CPU and even in memory (because you need to store the address in a 32-bit word too). --MMS
@@StateMachineCOM Ah, that explains a lot! I am super happy with your videos by the way, and even happier with this quick reply. Thank you very much! :)
Iftikhar,
The first '*' is missing in the definition of GPIO_PORTA_DATA_BITS_R, because it is a pointer. Later you access GPIO_PORTA_DATA_BITS_R as an array, like so: GPIO_PORTF_AHB_DATA_BITS_R[LED_RED] = LED_RED; Please refer to lesson 7, where I explain the close relationship between pointers and arrays.
--MMS
Epic..great, no words. keep making videos, youu won subscriber.
Best channel EVER.
At time 3:25, does someone know what the DC32 instruction means and does?
That's a very interesting question. The DC32 "instruction" showing at the bottom of the screen at 3:12 is "Data Cache 32-bit", which is a const data word mixed into the instruction stream. This word holds the address of the counter variable loaded into R0 by the instruction LDR.N R0,??main_2. This is the so-called PC-relative addressing mode, which I have not explained in any detail. The other view of the same instruction would be LDR.N R0,[PC,#offset], where the immediate #offset is the distance between the LDR.N instruction and the DC32 "instruction". This works because when the LDR.N is executed the PC value is known, for example, 0xa2. Interestingly also, the multiple LDR.N R0,??main_2 instructions shown in the disassembly window correspond to *different* offsets from the PC, so if you look carefully at the binary encodings, they are *different* (even though the disassembler shows the *same* LDR.N R0,??main_2 mnemonic). This is because the #offset from the PC to the DCD32 "instruction" is different for each LDR.N instruction. I know it's a bit confusing, so please pause at 3:12 and study the screen carefully. --MMS
This is excellent!
Man, you are doing great job. Go on !
Do you know some good beginner book? Where all of the technical pieces are explained?
Thanks for another great video, Miro. I'm a little confused by the moral of the story where you showcase greater efficiency when using the pointer alias of the counter variable. Isn't it more efficient just because the pointer is a local variable whereas the counter is not?
To put it another way, it's counter-intuitive to me that using the counter variable, whose address will never change, is turning into less efficient code which keeps reloading its address into R0, whereas using a pointer that is variable by nature is requiring only one load of the counter address. I would have hoped that the compiler would recognize that first case and load 0x20000000 into R0 only once?
Great Video. Thank you. I have a question. How did an absolute address 0x200000002 get interpreted as [PC, 0xC] and how did 0xDEADBEEF get contained in the address [PC, 0x10] by the compiler? This is at 11:03
An address by itself (like 0x20000002) is just a constant, which is not "interpreted" as anything. Instead, the machine code "LDR R0,[PC,#0xC]" corresponds to the line of C code "p_int = 0x20000002". You can easily understand it when you realize that the "p_int" pointer variable is allocated to the R0 register. This R0 register is then loaded (LDR) with a constant located in the code memory, at the offset 0xC from the *current* PC (Program Counter). "Current PC" means the PC value when the "LDR R0,[PC,xxx]" instruction is evaluated. This is called "PC-relative" addressing (you can google for this term). The value 0xDEADBEEF is another constant, which is located at offset 0x10 from a different value of the PC when the instruction "LDR R1,[PC,#0x10]" is evaluated. The role of the compiler is to inserted the constants into the code space, so that it can subsequently apply the PC-relative addressing to retrieve the constants. You can actually find these constants in the code memory (put the value of the PC in the memory view). --MMS
PC relative addressing mode is used to implement intra segment (within code segment itself in this scenario) to transfer of control, In this mode effective address is obtained by adding displacement to PC
Congratulation. Very very clear.
At 07:46 there is a BLT.N instruction. When I subtract 7(f9) from ca (current PC) I get c3 but the loop goes back to c0, can you please explain why ? Thank you
I am having some issues while running the code on the latest version of IAR Workbench IDE- ARM 8.50.1. I followed all the instructions you mentioned in the previous video but the counter variable is not a global variable and the I can't watch the pointer like it's changing in your video.
Trying to understand, how 0xDEADBEEF was stored; you gave address 0x20000002 i.e. fifth nibble (at 11:28) in row 20000000, so assuming it is 4 byte aligned (from register size), when you give address ..002 it should store from..000..right? how it loaded that DEAD and BEEF at 2000000 and 20000006 respectively. Please help!
+Gaurav Minocha In the memory view you see 4-byte chunks of data that are also aligned at 4-byte boundaries. So, a chuck of data like 0xDEADBEAF that is not aligned at a 4-byte boundary appears separated into two halves: 0xDEAD and 0xBEEF. Perhaps it would also help if you look at the data with different grouping. The IAR memory view allows you to view the data also in 2-byte and 1-byte chunks. Specifically, in the 1-byte chunks you will see all available addresses, because each byte has its own address. I hope this helps.
+Quantum Leaps, LLC Final, address to data mapping: 20000000->be, 01->ef, 02->(00152000) 06->de, 07->ad; Instruction tries to store 0xdeadbeef at 0x20000002..right? Now, as the processor is 4 byte aligned, it should try to store (0xdead) first 2 bytes in 4bytes aligned at 0x20000000 and 0xbeef in 4bytes aligned at 0x20000004. But, it happened actually opposite. I hope I made my point clear. Please help!
+Gaurav Minocha I forgot in which lesson I talked about the **endianness** of the CPU, but I certainly did talk about it. I also mentioned that ARM Cortex-M CPU is **little endian**, which means that it stores multi-byte quantities like 0xDEADBEAF such that the least-significant byte is at the lowest address. Remembering that we write numbers left-to-right (they are called Arabic numerals after all!), 0xDEADBEAF will appear in memory as: 0xAF, 0xBE, 0xAD, 0xDE. To actually see this, please switch the memory view to 1-byte quantities.
+Quantum Leaps, LLC Thank you. I wasn't using the workbench, was just going through your videos. Now I understand why you said to see single byte view. So, basically in first word the first byte i.e. address 0x20000000 is 15. I was reading it other way..I will set up the eclipse GNU system soon :)
excellent sir!
Thank you for this amazing explanation. At the end you mentioned that Cortex-M0 would have a problem with it. Why is that? and how Cortex-M0 figures out the address alignment?
Cortex-M0 would have problems with accessing misaligned data items. Specifically, 32-bit words must be aligned at addresses divisible by 4 and 16-bit half-words must be aligned at addresses divisible by 2. The bigger members of the Cortex-M family (M3/M4/M7) don't have such restrictions, but if I remember correctly, a mis-aligned data takes additional CPU cycle or two longer to access. --MMS
Great lesson overall! I just struggled with the deadbeef example because i could not reproduce it on my keil microvision ide. I can see deadbeef being written into register r0 through a MOVW and MOVT operation. After that DEADBEEF gets stored into memory at the p_int address but my counter variable still remains at 21. Am I missing something here?
I just started this tutorial. Switched to Keil after lesson 2 so that I could use the STM dev board and had the same issue writing DEADBEEF to SRAM. The solution was writing to the beginning of the 4 byte address (double word) I.E. instead of writing to 0x20000002 try changing the address to 0x20000004 or any multiple of 4 in the SRAM memory block. I suspect this is because of the different compiler settings between IAR and Keil🤔
By the way, Miro, why can't we type ++(*p_int) as ++*p_int? I did so and the program gave an output of nonsense and finished at 4000ish cycle counter. What exactly happens when we dont put the paranthesis?
When switching to Keil and STM DK for lesson 3, I found I could no longer step through lines of code in main using debug. No mater which step button I used it would run to the end of program to return 0; line even with breakpoints set. The solution was to lower the optimization like this- go to Options for Target => C/C++ tab => Optimization level.
In all cases like "my project works differently than yours..." the only sane advice is to compare and see for yourself what's different. Specifically, you should: (1) download the project for this lesson from the companion web-page to this course at www.state-machine.com/video-course , (2) build and run that project on your computer and verify that it behaves as in the video, and (3) compare your project to the official one. A good free differencing tool is WinMerge (please google for it). The tool can compare whole directories. --MMS
Hi. First of all I would like to say a big thank you for such a good course. Especially I like your assembly explanations.
There is something I found I can't figure out. Would you say something more about LDR.N, please? (Time, about 3:20)
Your disassembly uses labels, I have something else.
My disassembly does say:
0xec: 0x4802 LDR.N R0, [PC, #0x8], I do assume we have to add offset to PC. Anyway if I add 0x8 to current PC I land in line 0xf4
where is 0x2000 MOVS R0, #0.
When I look closely I can see value 0x20000000 in line 0xf8. Is my calculation wrong?
+Smalera Michal The calculation of PC-based addresses are tricky, because they must also include the effect of the instruction pipeline. (The instruction pipeline has been introduced in the previous lesson 2). The point is that by the time the PC-based offset is evaluated, PC itself has been already incremented in the earlier stage of the pipeline. Obviously, the compiler knows such things.
PC (Program Counter). The Program Counter is automatically incremented by the size of the instruction executed. This size is always 4 bytes in ARM state and 2 bytes in THUMB mode. When a branch instruction is being executed, the PC holds the destination address. During execution, PC stores the address of the current instruction plus 8 (two ARM instructions) in ARM state, and the current instruction plus 4 (two Thumb instructions) in Thumb(v1) state. This is different from x86 where PC always points to the next instruction to be executed.
Source: azeria-labs.com/arm-data-types-and-registers-part-2/
In your case you are in ARM state so have to add 4 Bytes to PC and then add #0x8 (PC + 4 Bytes + #0x8) to get to the instruction address where the counter is stored.
You know you are in ARM state or THUMB mode when you take a look at the APSR (Application Program State Register) in the Register Window. Is the N-Bit set (1) you are in THUMB Mode (in this case you would add 2 Bytes to PC instead of 4 Bytes). But when the N-Bit is not set (0) it means ARM state.
Hi Miro!
Please provide some clarification on the point that "counter variables lives in R0" (what happens if there are more local variables than the cpu registers ). Let me clarify what I understood here, the program runs in privileged mode thus it uses the main stack right? If so the counter with local scope must be in the stack, by making it global moves the variable out of the stack but within RAM (may be in Flash). So how does the variable in stack is mapped to CPU register ?
There is NO requirement that a local variable is in RAM, and the C-compiler is allowed to allocate such variables the way it sees fit. It turns out that keeping the 'counter' variable inside a CPU register is optimal because it avoids memory access. (Every memory access costs at least 2 CPU cycles, and you need LDR instruction to read and STR instruction to write back, which adds up to at least 4 additional cycles for incrementing 'counter'). If there are more variables than registers, the compiler must start using the RAM, but again, there is no strict requirement that the local variables be in certain memory locations. The main point here is that allocation and management of local variables is left to the compiler and you that's precisely why some compilers generate faster code than others. --MMS
In your code example you use the address 0x20000002U, I tried changing the address to 0x40000002 and it didn't work. I ran it using a simulator. Are pointers limited to what they can access for protection?
The simulator applies address ranges that are typical for Cortex-M microcontrollers. Addresses starting from 0x2000000 are allocated to RAM. Addresses starting from 0x4000000 are typically used for external devices. Please see the Cortex-M memory model: developer.arm.com/documentation/dui0552/a/the-cortex-m3-processor/memory-model
So question, I'm using the keil ide/simulator which as far as I know, has a different compiler compared to the IAR ide/sim, which would of course, result in different assembly code being generated.
Something slightly strange I'm seeing however occurs when incrementing the counter variable, both directly, and via the p_int pointer. This is what it looks like in the dissassembly:
0x000004DC DC05 BGT 0x000004EA
0x000004DE E7FF B 0x000004E0
6: ++(*p_int);
7: }
8:
0x000004E0 9900 LDR r1,[sp,#0x00]
0x000004E2 6808 LDR r0,[r1,#0x00]
0x000004E4 3001 ADDS r0,r0,#0x01
0x000004E6 6008 STR r0,[r1,#0x00]
0x000004E8 E7F5 B 0x000004D6
0x000004EA 2002 MOVS r0,#0x02
0x000004EC F2C20000 MOVT r0,#0x2000
Now, when single stepping with the debugger, we should reach that first load instruction into r1, then we should be able to keep stepping through the instructions until we reach the branch which will send us back to the beginning of the loop at 0x000004D6. What's actually happening however is that when I press f11 to single step through the instructions, it appears to stay at that first load instruction, but the counter is indeed getting incremented as I can see the value going up in the register window. A similar occurrence appeared to happen when I first tried stepping to see R0 get set to 0xDEADBEEF, but that part at least appears to be working now. I'm not quite sure why this is happening except for maybe its some sort of optimization by the debugger as it knows that it's just incrementing "counter" and therefore doesn't feel the need to display the rest of the loop instructions occurring.
This behavior of the KEIL uVision debugger is most likely caused by stepping one C-instruction at a time, whereas you apparently wish to step one assembly-instruction at a time. To do this, please click on the disassembly window first, and then single-step. I think that the F11 shortcut should work for this, but you might also try to click on the single-step button in the toolbar. --MMS
Can someone tell me something, what exactly does IAR Hide from us when we download our programs to the device? Where do our programs go? Is there a boot loader?
hi sam,
i am unable to compile due to the error it pops. ***Error stlink cannot locate mcu..... it works fine if i export some source files via stm cube software. i am confused about where i am doing wrong. Please advice.
GCC compiler will not point issue on the line p_int=0x20000002; it will only generate warning , what is your compiler and which is better to give error or warning ?
thanks
That's true. The GCC compiler, even with the -Wall command-line option, gives only a warning (int-conversion). You can turn all such warnings into errors by giving the command-line option (-Werror). Alternatively, you can turn only specific warnings into errors (-Werror=int-conversion). I would recommend to go as strict as possible (so convert warnings into errors as much as you can tolerate). --MMS
why did the pointer make it faster?!
As I can follow,
- code without pointer : the memory address has been assigned for Reg. R0,
like exactly what happened in the code with the pointer.
the p_int became an alias to the Reg. R0 , how this can help making the program faster!
To access a variable in memory, the CPU needs the address of this variable in one of the registers. At the lowest levels of code optimization, the compiler loads this address from the code memory before each and every access to the variable. The pointer speeds this up, because being a local variable inside the main() function is allocated to a register. This means that the address sits in a register (R0 in this case) and does not need to be loaded and re-loaded into a register each time. At higher levels of optimization the compiler generates a more sensible code and the code without the pointer is as fast as with the pointer. --MMS
@@StateMachineCOM That's clears a lot! Thanks a lot, I asked this question on stackoverflow and it did some jumble :D
hello I have a problem when I 'm trying to download into microcontroler. Please help !
Wed May 21, 2014 22:59:51: Missing or malformed flash loader specification file: C:\Program Files\IAR Systems\Embedded Workbench 7.0\arm\config\flashloader\
numele meueste I have the same shit :(
Roman Matveev
I'm not sure why your system cannot find the flash loader. The flash loader for the TivaC LauchPad is loated in $TOOLKIT_DIR$\config\flashloader\TexasInstruments\FlashTC4_H6_o.board . The $TOOLKIT_DIR$ is the symbolic name of the directory, where you have installed your IAR EWARM toolset. Please check if you have this file and how it looks. Mine looks as follows:
$TOOLKIT_DIR$\config\flashloader\TexasInstruments\FlashTC4_H6.flash
--MMS
Quantum Leaps, LLC Thank you for your help, Miro! First of all I need to make clear that I don't use TI - I use STM32F401C-DISCO demo board. However I don't think that it is so important in this case.
So! I tried to locate the directory you've specified (C:\Program Files (x86)\IAR Systems\Embedded Workbench 6.5\arm\src\flashloader\ST\FlashSTM32F4xx in my case) but there is nothing like *.board file.
There is one (FlashSTM32F4xxx.flash) which is also XML:
$TOOLKIT_DIR$\config\flashloader\ST\FlashSTM32F4xxx.out
2
4 0x4000
1 0x10000
7 0x20000
0x08000000
$TOOLKIT_DIR$\config\flashloader\ST\FlashSTM32F4xxx.mac
1
But it is not really similar to the one you exposed in your reply.
So do I need to find this (*.board) file anywhere?
?What is the difference between ldr and ldr.n??
!Great video series ! TNX!
The LDR instruction comes with two variants. The 32-bit variant (meaning that the instruction itself takes 32-bits), denoted LDR. The 16-bit variant (meaning that the instruction itself takes only 16-bits), denoted LDR.N. The compiler will generate the shorter LDR.N, when certain restrictions about the registers used is met. You can also google for "ldr vs ldr.n".
Nice
Thankyou
thanks you
0x104: 0x4802 LDR.N R0,[PC,#0x8]
How do I interpret this addressing mode?
what is the difference btween LDR and LDR.N?
The [PC,] addressing mode is the special "PC-relative" addressing (based on the Program Counter register). It is used for accessing constants in the program memory. Please note that the PC is *changing* as the instruction is executed, therefore the immediate needs to take into account the instruction pipeline. The LDR.N opcode is the "near" version, which has a smaller range than LDR, but has some other capabilities. --MMS
hello sir!!!
I have downloaded this software (about 1.2 gb) but at the end of installing the software installation doesn't get completed a prompt comes that a file is corrupted and installation stops
what should i do now
Perhaps you should contact IAR to help you with the installation of their Embedded Workbench for ARM software. If you cannot resolve these issues, in lesson 19, this course switches the toolset to the free Eclipse-based Code Composer Studio (CCS). Please watch lesson 19 to see how to install CCS and use it for this course.
--MMS
Quantum Leaps, LLC
thanks a lot sir
I'm having an issue with downloading to the Tiva-C Launchpad. All of the projects used to work, but now I'm only finding that my first one is working. I keep getting a vast number of verification errors from the debug log. Every message (theres 200+ of them) looks like this:
Verify error at address 0x000000E6, target byte: 0xFF, byte in file: 0xF8
Theres an error for virtually every address; the target byte is always 0xFF but the byte in file is always different (assuming its the code). How can I fix this?
Thank you!
Please visit the companion web-page to that video course at www.state-machine.com/quickstart (I must have repeated this a few hundred times in almost every comment!) Among others, you'll find there the guide "Troubleshooting TivaC LaunchPad". In this guide, you will find the detailed description how to unlock your TivaC LauchPad board. Good luck! --MMS
What is it that causes the N flag to reset to zero when the control flow is INSIDE the loop?
i am kind of afraid of using pointers... because i am pretty sure ill fuck up my whole pc by using them pointers in for or while loop with increment are dangerous...
I am Ranjithkumar
Compiler is inteligent? 2:47 Simply killing me this horrible code where R0 have got a variable in register, BUT second time loading pointer to this variable, and reload R0 THIS SAME VALUE. This is horrible.
STR R0, [R1] //our variable
why HERE doesn't exist "CMP R0, #21" ??? instead this 2 unnecessary instructions bellow
LDR R0, ??main_2
LDR R0, [R0]
CMP R0, #21
I agree, the constant re-loading of the addresses (that already *are* in the registers) is truly ugly. But this ugly code is generated only in the lowest levels of optimization. When you move to the higher levels, the code is much better. Unfortunately, the code is more difficult to debug, and often the debuggers cannot display the values of variables (because they have been "optimized away"). In other words, there is a tradeoff between code quality and ease of debugging. --MMS
Hi sir
2015 passed out beach
0xDEADBEEF
i didnt understand a shit🔥