Gary, thank you for this content, I’m a retired hobbyist that between Christmas and new year’s eve discovered the existence of FPGA boards. Rather daunting, but this story might help me a bit. I do know assembly, and the pi pico is rather accessible to me. I’ll be following your channel for sure, and I will also look into what you produced so far. Thanks a lot, very helpfull and motivating! 👍😊
This kind of thing is amazing! It really saves main CPU clocks cycles, rather than having to service high frequency interrupts. You've produced a great video demo, thanks. Some NXP chips (I work with MPC5534 and MPC5777C) have a similar I/O coprocessor they call an eTPU. It concentrates more on timing. It has two independent timers, one can have dynamically variable speed. Each I/O (I think there's 32 per eTPU. 5534 has one eTPU, 5777C has 3) has two capture/compare registers, and can trigger interrupts. You can code the ISRs in a weird dialect of C or different weird dialect of C++, but may have to pay for a compiler. Each I/O can be assigned RAM, as well as global eTPU RAM. Any RAM declared as public can be accessed by the CPU(s) as well as the eTPU. It's amazing what can be done with it, but it can be hard to ensure proper coherency when you start doing complex things.
The PIO is one of the things which make the picopi so great compared to other systems in this area, thanks for this video! I have relatively quickly hit a block which probably would not have been solvable properly on other systems but now I have had a serious look at the PIO yesterday, and my judgement is, that it is solvable with minimal effort compared to for instance an Arduino !
Many thanks for this video. I was trying to program the PIOs using Rust and could not get it to work. This really helped me sort out my thinking mistakes and get it working (the translation from Python to Rust was not so difficult). I also wasted a lot of time on the difference between OUT and SET pin mappings - pity I didn’t watch this beforehand.
Good video Gary, but with all the Time delays in Python you kind of missed the point of the PIO for new users. The key is that once started, the PIO routines run separately in the background and do not affect the performance of the main Pico processors so, for example, a PIO could be reading serial data with very accurate timing, while the main processors run complex mathematics on the data...
If you can get ahold of one, the Teensy 4 and 4.1 have a suped-up version of the PIO called FlexIO. That would make a very interesting episode. And that can all be done within the Arduino IDE.
Great tutorial! One thing: jmp(dec_x, ...) does not decrement X before testing it, but *after*! So, if X is 1, the jump will occur (and X will be decremented).
Challenge: have a drink whenever you encounter 31 But back to getting serious, I like the incremental program examples. I don't have a Pico, though I was thinking of getting one, and this might be helpful.
it seems that we can make a nice bus interface for interfacing to a classic old school cpu bus, like a c64. maybe as an active cpu emulation or a bus interface to emulate an io chip or the SID chip. i have to dig into that a bit closer and explore the possibilities here. writing to the program counter is super powerful to emulate a CPU.
Thanks Gary! The inclusion of the pio in the Pico is revolutionary. I'm hoping later iterations give us more than a 32 instruction memory shared with all 8 of the state machines! While having these state machines is wonderful, having more instruction space would make this truly wonderful!
Not ragging on the Pico at all - but interfaces like these have been available on many MCUs for years. For example the Teensy 4 has it, many STM chips have it, etc. That said, I am glad to see it becoming more mainstream through the Pico, it's a powerful combo between MCUs and FPGAs that I've been anticipating for years. Hopefully they will release another version of the Pico in a BGA package with a lot more IO pins. And native USB 2.0. Then it would be useful for something more sophisticated than hobby projects.
@@yum33333 I am familiar with the Teensy 4 and 4.1, but the 3x FlexIO interfaces (loosely PIO equivalents) on those are quite different in philosophy. The Pico's PIO instructions all execute within 1 cycle and have multiple functions going on at the same time in the same instruction. For example, writing out of the Output Shift Register (OUT instruction) can additionally shift the bits at the same time as well as adding delays (if desired). Each PIO command is highly code and execution efficient and can run at the same frequency as the ARM core itself. The PIO state machines effectively allow certain limited parallel operations to occur, often faster than even a dedicated ARM core itself could. On the other hand, Teensy's FlexIO is limited to 120MHz, which on a 600MHz Teensy 4.1 makes it less useful, as in it is really only intended to offload deterministic bit-banging operations from the ARM core. It does not execute instructions faster than the ARM core itself. This difference in the Pico's PIO philosophy has made projects, like a logic analyzer using the PICO operating with 21 digital channels with up to 120 MHz capture speed (hackaday.com/2022/03/02/need-a-logic-analyzer-use-your-pico/) possible. A 120MHz logic analyzer run on a humble device costing just $4 is quite an achievement and illustrative of the difference in PIO philosophy.
@@yum33333 Which STM chips have PIO functionality like this? Got an example part number? I came across something similar in the LPC4300 many years ago but not so far met it in STM.
@@AttilaAsztalos That Parallax Propeller is an eight core MCU that is at least an order of magnitude more expensive, even now, compared to the RP2040. The reason is it has 8 fully functional cores that are more similar to the dual ARM cores in the Pico than to the PIO. The Pico's "revolutionary" approach is to offer multiple PIO state machines running at the same frequency as the ARM cores and taking up very little die space to achieve that, so keeping costs down.
@16:40 don't we need the last nop() be [30] delay? To have a perfect 50% duty cycle? A jump will take one cpu clock, so the last nop should delay a bit less.
I think you confused the rx and tx queues in your bouncer program. You said "rx is output" and "tx is input" (at around 20:02). That startled me for a moment ;-)
I'm thinking about patterns that allow for sending multiple buffers over different high-frequency (30MHz) SPI lines in parallel. You'd still have to feed individual words to each fifo in the main program, right? i.e. each PIO unit doesn't have access to RAM, so you'd need to keep feeding them.
Section 3.6.1 of the RP2040 datasheet shows you how to do full duplex SPI. Also section 3.2.7 covers Interactions Between State Machines, where it says, "State machines can not communicate data, but they can synchronise with one another by using the IRQ flags."
Gary I have a question since the I/O processors (state machines) are independent of the main CPU. Can you run a seperate program on the CPU if it does not interfere with the I/O.
Reading through the data sheet is a roller coaster: "Ok only 9 instructions." "holy crow each instruction can do like 5 things" "ok I think I got these" "oh INJECTING instructions from the main processor???" "oh look there are 9 ways to trigger/dma/pull/irq things in and out" "oh good 8 interrupts, 4 external, giant enum of configs..."
YA.. blinking LED for your drone.. the red and green navigation lights.. a higher bit rate, programed could be used for communication.. better know the gate rate although.
kan one statemachine change multiple pins? looking at the naming (pins vs pin) and set_base (implying that you can use more pins from that value). That would be useful for some programs, like an uart loopback, or more complex protocols that require precise timing on multiple lines.
Yes, out_base defines the start pin, so if say gpio16 is set as the base, pins greater than that can be used. Also, look at the use of the .side() option on commands which can also be used to control additional pins.
That is a good question. Looking at the source code it seems that the activate() function just sets a bit in the CTRL register. The documentation says, "When disabled, a state machine will cease executing instructions" which implies that it stops on the next cycle (I guess). So I think that means that you can stop it with the LED on.
I tried to use this, there are only 4 not 8. I wanted a simple program that would monitor 8 inputs and switch on outputs until the inputs were triggered again. As they are not numbered sequentially, this is not possible unfortunately. So I went back to using uasyncio on the main core. If there would be 8, I could have dedicated one to each pair of input outputs
ISR means Interrupt Service Routine on just about any processor with interrupts. Why the @#%% would they have something else called ISR in this thing??
The Cortex-M0+ processors (there are two of them) can be programmed in C or Python. The PIO have their own special assembly language as I explain in this video. That special language can be loaded into the PIOs from C or Python.
@@GaryExplains Two ARM processors plus another 8 micros. The 8 are assembly and the two are C or Python. I saw the video on the dual core programing. Is there shared resources?
Finally a good useful description of something I want to know, about real time deterministic fast I/O .. then you somewhat spoil it by muddling it with Python, yet another fashionable interpreter of the time but completely pointless alien speak to me.
@@GaryExplains Yes, I know I am about to do it. I was hoping the video would give me a head start, but I did not get that much useful from it. No real explanation of "pins", still not sure how that works... how does it compete with GPIO, does it have some form of mask, Interaction with ISRs?. The PIO compiler in the SDK is not shown as you went the python route.I will go and read the data now, but it is a bit of shame as the title of video got my hopes up but I came away having felt I learnt little other "The PIO has 8 programmable units"
Well, I am sorry to hear that, but of course you understand that I didn't contact you before I made the video and asked you what you wanted. I made a different video and plenty of people seem to have found it useful, sorry you didn't. All the information you need is in the datasheet which I link to in the description.
@@GaryExplains To be fair you videos are well made with good presentation, but had you added the word "python" to the title I would have ignored it and saved myself some time.
There's more 'boilerplate' in the typical C version of things: the 'in-line assembler' in Python is more 'terse', so why not go ahead with it here, because the theme here is the PIOs themselves (quite well treated, I think). There was basically no support from the main processor(s) in these examples, except a little poke and peek to the queue depth. To get properly into this you would probably need interrupts and DMA and all that jazz, which would be much more likely well done in C, alas. Gary's series on a muti-tasking OS for the Pico has been quite hot on the low level, though.
Gary, thank you for this content, I’m a retired hobbyist that between Christmas and new year’s eve discovered the existence of FPGA boards. Rather daunting, but this story might help me a bit. I do know assembly, and the pi pico is rather accessible to me. I’ll be following your channel for sure, and I will also look into what you produced so far. Thanks a lot, very helpfull and motivating! 👍😊
Yet another excellent production. Please keep producing these types of educational videos. Good job, professor!
This kind of thing is amazing! It really saves main CPU clocks cycles, rather than having to service high frequency interrupts.
You've produced a great video demo, thanks.
Some NXP chips (I work with MPC5534 and MPC5777C) have a similar I/O coprocessor they call an eTPU. It concentrates more on timing. It has two independent timers, one can have dynamically variable speed. Each I/O (I think there's 32 per eTPU. 5534 has one eTPU, 5777C has 3) has two capture/compare registers, and can trigger interrupts.
You can code the ISRs in a weird dialect of C or different weird dialect of C++, but may have to pay for a compiler.
Each I/O can be assigned RAM, as well as global eTPU RAM. Any RAM declared as public can be accessed by the CPU(s) as well as the eTPU.
It's amazing what can be done with it, but it can be hard to ensure proper coherency when you start doing complex things.
The truly genius move by the Raspberry Pi Foundation, FPGA I/O features without complicated dev tools, amazing flexibility for interfacing.
Great introduction. This is the most accessible explanation of PIOs I've seen.
Glad you think so!
took me back to my 6502 days back in the mid 80's thanks 🙂
The PIO is one of the things which make the picopi so great compared to other systems in this area, thanks for this video! I have relatively quickly hit a block which probably would not have been solvable properly on other systems but now I have had a serious look at the PIO yesterday, and my judgement is, that it is solvable with minimal effort compared to for instance an Arduino !
Many thanks for this video. I was trying to program the PIOs using Rust and could not get it to work. This really helped me sort out my thinking mistakes and get it working (the translation from Python to Rust was not so difficult).
I also wasted a lot of time on the difference between OUT and SET pin mappings - pity I didn’t watch this beforehand.
The last minute of this video really saved me! Thank you.
Thank you Gary. You've made understandable what every other website and channel has carefully avoided. I'm going to have a play and see how it goes.
Thank you Gary for keeping us educated! We love your work!! Cheers
he explained PIO well basically in the first 40 seconds. awesome
Good video Gary, but with all the Time delays in Python you kind of missed the point of the PIO for new users.
The key is that once started, the PIO routines run separately in the background and do not affect the performance of the main Pico processors so, for example, a PIO could be reading serial data with very accurate timing, while the main processors run complex mathematics on the data...
True, I thought I explained that. Sorry if it wasn't clear.
I think you missed the point, not the creator
Exactly what I needed to get my state machine working the way I wanted. Thanks!
You really do have a knack for explaining things- thanks very much.
Glad you think so!
If you can get ahold of one, the Teensy 4 and 4.1 have a suped-up version of the PIO called FlexIO. That would make a very interesting episode. And that can all be done within the Arduino IDE.
Great explanations and great capabilities on this chip, AND with micropython!
Exactly what I was looking for !!!!! Great work !!!! Keep up the good work 🙂
Do we really need any more channels besides yours and Explaining Computers?
Dont be silly. Ben Eater, Mitxela, BigClive?
Wonderful, interesting and easy to follow - thank you
Very well explained. Thanks
That's some seriously good bit-banging pico capabilities!
Thank you. Very good video on such a great feature.
Great tutorial!
One thing: jmp(dec_x, ...) does not decrement X before testing it, but *after*! So, if X is 1, the jump will occur (and X will be decremented).
Thanks for this Gary. Very educational!
Challenge: have a drink whenever you encounter 31
But back to getting serious, I like the incremental program examples. I don't have a Pico, though I was thinking of getting one, and this might be helpful.
Thanx for the explanation 👍
Async matters. Good stuff.
it seems that we can make a nice bus interface for interfacing to a classic old school cpu bus, like a c64. maybe as an active cpu emulation or a bus interface to emulate an io chip or the SID chip.
i have to dig into that a bit closer and explore the possibilities here.
writing to the program counter is super powerful to emulate a CPU.
Thanks Gary! The inclusion of the pio in the Pico is revolutionary. I'm hoping later iterations give us more than a 32 instruction memory shared with all 8 of the state machines! While having these state machines is wonderful, having more instruction space would make this truly wonderful!
Not ragging on the Pico at all - but interfaces like these have been available on many MCUs for years. For example the Teensy 4 has it, many STM chips have it, etc. That said, I am glad to see it becoming more mainstream through the Pico, it's a powerful combo between MCUs and FPGAs that I've been anticipating for years.
Hopefully they will release another version of the Pico in a BGA package with a lot more IO pins. And native USB 2.0. Then it would be useful for something more sophisticated than hobby projects.
@@yum33333 I am familiar with the Teensy 4 and 4.1, but the 3x FlexIO interfaces (loosely PIO equivalents) on those are quite different in philosophy. The Pico's PIO instructions all execute within 1 cycle and have multiple functions going on at the same time in the same instruction. For example, writing out of the Output Shift Register (OUT instruction) can additionally shift the bits at the same time as well as adding delays (if desired). Each PIO command is highly code and execution efficient and can run at the same frequency as the ARM core itself. The PIO state machines effectively allow certain limited parallel operations to occur, often faster than even a dedicated ARM core itself could. On the other hand, Teensy's FlexIO is limited to 120MHz, which on a 600MHz Teensy 4.1 makes it less useful, as in it is really only intended to offload deterministic bit-banging operations from the ARM core. It does not execute instructions faster than the ARM core itself. This difference in the Pico's PIO philosophy has made projects, like a logic analyzer using the PICO operating with 21 digital channels with up to 120 MHz capture speed (hackaday.com/2022/03/02/need-a-logic-analyzer-use-your-pico/) possible. A 120MHz logic analyzer run on a humble device costing just $4 is quite an achievement and illustrative of the difference in PIO philosophy.
@@yum33333 Which STM chips have PIO functionality like this? Got an example part number? I came across something similar in the LPC4300 many years ago but not so far met it in STM.
@Mark Warburton "revolutionary"...? HAHAHA. Parallax Propeller, two-fucking-thousand-six. en.wikipedia.org/wiki/Parallax_Propeller
@@AttilaAsztalos That Parallax Propeller is an eight core MCU that is at least an order of magnitude more expensive, even now, compared to the RP2040. The reason is it has 8 fully functional cores that are more similar to the dual ARM cores in the Pico than to the PIO. The Pico's "revolutionary" approach is to offer multiple PIO state machines running at the same frequency as the ARM cores and taking up very little die space to achieve that, so keeping costs down.
A really insightful tutorial. Thank you, Gary. BTW, is the 8xPIO programming only available on the Pico?
On any board using the RP2040
The RP2040 is starting to sound like the old CDC Cyber 7x and 17x series "mainframes", with their single register PPUs used for all I/O!
This is very exciting
8:45 should be your opening lines
@16:40 don't we need the last nop() be [30] delay? To have a perfect 50% duty cycle? A jump will take one cpu clock, so the last nop should delay a bit less.
I don't think a wrap is actually a jump. At the end of the code, the PC for the next instruction is set to the wrap target rather than PC+1.
I think you confused the rx and tx queues in your bouncer program. You said "rx is output" and "tx is input" (at around 20:02). That startled me for a moment ;-)
Well of course rx and tx are relative to which side you are on. But they do have defined names, so sorry if I confused them.
@@GaryExplains plz tell about ur other chanel speed test g???
I'm thinking about patterns that allow for sending multiple buffers over different high-frequency (30MHz) SPI lines in parallel. You'd still have to feed individual words to each fifo in the main program, right? i.e. each PIO unit doesn't have access to RAM, so you'd need to keep feeding them.
Section 3.6.1 of the RP2040 datasheet shows you how to do full duplex SPI. Also section 3.2.7 covers Interactions Between State Machines, where it says, "State machines can not communicate data, but they can synchronise with one another by using the IRQ flags."
@@GaryExplains Oh, neat. Thanks for the reference.
@@trevorschrock8259 Also, it is possible to configure DMA to feed (empty) the FIFO and, again, offload the main processor.
Excellent! Thank you
Gary I have a question since the I/O processors (state machines) are independent of the main CPU. Can you run a seperate program on the CPU if it does not interfere with the I/O.
Yes
@@GaryExplains Great, Thanks for the quick response.
So 4 of them for a video signal generator?
When you will make video on 'Speed Test G'?
Reading through the data sheet is a roller coaster:
"Ok only 9 instructions."
"holy crow each instruction can do like 5 things"
"ok I think I got these"
"oh INJECTING instructions from the main processor???"
"oh look there are 9 ways to trigger/dma/pull/irq things in and out"
"oh good 8 interrupts, 4 external, giant enum of configs..."
Cool video. I wold like to ask, if it is possible access this I/O controllers, using ASM Like old school PEEK POKE using mmbasic instead python?
I don't know if it is supported by mmbasic, I guess it would be best to ask the mmbasic people.
YA.. blinking LED for your drone.. the red and green navigation lights.. a higher bit rate, programed could be used for communication.. better know the gate rate although.
kan one statemachine change multiple pins? looking at the naming (pins vs pin) and set_base (implying that you can use more pins from that value). That would be useful for some programs, like an uart loopback, or more complex protocols that require precise timing on multiple lines.
Yes, out_base defines the start pin, so if say gpio16 is set as the base, pins greater than that can be used. Also, look at the use of the .side() option on commands which can also be used to control additional pins.
The oscilloscope on the background image - is it the soviet C1 series?
Sadly I have no idea, it is just a stock photo.
Garry why your speed test g chanel is dead?
16:19 If blinking time (sleep) is different then is it possible to stop pio in state the led stays on ?
That is a good question. Looking at the source code it seems that the activate() function just sets a bit in the CTRL register. The documentation says, "When disabled, a state machine will cease executing instructions" which implies that it stops on the next cycle (I guess). So I think that means that you can stop it with the LED on.
Hmmm.. A video, using the pico, to _play_ that _outro music_ , Yes? :)
I tried to use this, there are only 4 not 8. I wanted a simple program that would monitor 8 inputs and switch on outputs until the inputs were triggered again. As they are not numbered sequentially, this is not possible unfortunately. So I went back to using uasyncio on the main core. If there would be 8, I could have dedicated one to each pair of input outputs
There are 8. Here is a direct quote from the tech specs: 8 × Programmable I/O (PIO) state machines for custom peripheral support.
Bring back speed test G
That sounds a bit like the Parallax Propeller chip.
Hello quick question, I was under the impression that the PIO could access multiple pins is this incorrect?
Yes, the PIO can access multiple pins.
Is there a sketch that uses/users can program the shift redigure?
Is a state machine when you create a define a function?
dear me the too loud music is very distracting. Keeps making me reach over to turn it down. Anyway thanks for the code.
ISR means Interrupt Service Routine on just about any processor with interrupts. Why the @#%% would they have something else called ISR in this thing??
We need to understand programmed IO better please
if you squint, it's almost like you've got 8 lil 6502s :)
Wow just like the computer game TIS100
beaglebone black has to that can be programed in C
The Pico can be programmed in C as well. So with the Pico you get the option of Python or C and it costs just $4.
@@GaryExplains cool
@@GaryExplains So the microcontrolers can be programmed in C or the main pricoessor?
The Cortex-M0+ processors (there are two of them) can be programmed in C or Python. The PIO have their own special assembly language as I explain in this video. That special language can be loaded into the PIOs from C or Python.
@@GaryExplains Two ARM processors plus another 8 micros. The 8 are assembly and the two are C or Python.
I saw the video on the dual core programing. Is there shared resources?
NOP - No Operation
Mining with a pico?
🤣
Finally a good useful description of something I want to know, about real time deterministic fast I/O .. then you somewhat spoil it by muddling it with Python, yet another fashionable interpreter of the time but completely pointless alien speak to me.
You can use C instead, the principals are the same.
@@GaryExplains Yes, I know I am about to do it. I was hoping the video would give me a head start, but I did not get that much useful from it. No real explanation of "pins", still not sure how that works... how does it compete with GPIO, does it have some form of mask, Interaction with ISRs?. The PIO compiler in the SDK is not shown as you went the python route.I will go and read the data now, but it is a bit of shame as the title of video got my hopes up but I came away having felt I learnt little other "The PIO has 8 programmable units"
Well, I am sorry to hear that, but of course you understand that I didn't contact you before I made the video and asked you what you wanted. I made a different video and plenty of people seem to have found it useful, sorry you didn't. All the information you need is in the datasheet which I link to in the description.
@@GaryExplains To be fair you videos are well made with good presentation, but had you added the word "python" to the title I would have ignored it and saved myself some time.
There's more 'boilerplate' in the typical C version of things: the 'in-line assembler' in Python is more 'terse', so why not go ahead with it here, because the theme here is the PIOs themselves (quite well treated, I think). There was basically no support from the main processor(s) in these examples, except a little poke and peek to the queue depth. To get properly into this you would probably need interrupts and DMA and all that jazz, which would be much more likely well done in C, alas. Gary's series on a muti-tasking OS for the Pico has been quite hot on the low level, though.
still: 4 president!
sorry for caps
i DO NOT UNDERSTAD BUY i LIKE ANYWAT