This video inspired me that it is really possible to implement AI scenarios to hardware, via FPGA. Keep the good work Sir, and thank you for sharing this valuable knowledge.
Thanks for these great tutoria videos Prof Winzker. I want to ask a question, I am new to NN world and now focused on research of NN HW acceleration structures on FPGA and ASIC. I read that the NN structures you utilized in this example is called as Multilayer Perceptron (MLP). However, I see tons of accelerators for Convolutional Neural Network (CNN) structions. Also CNN is mentioned a lot on image classification. You also work on image processing and NN application on images. Can you give some info about when to use MLP and CNN, or can you refer me a paper about which NN structures are being utilized mostly on ML applications ? Thanks a lot for everything again
This example uses a single hidden layer to show the principle of an NN. The task is designed that it fits this architecture. A CNN is helpful, when you want to identify complex objects with a combination of several features. So you identify four legs, pointy ears, whiskers in the front layers. Then following layer of the CNN concludes that the image shows a cat. A hardware accelerator uses the structure that I presented but normally has a complex scheduling. So there is a CPU that gives small pieces of the task to the accelerator hardware. The CPU collects results, and again sends new tasks to the hardware. I don't really have a paper that I can point to. Sorrry.
Loving this playlist I just started working in fpga-based reinforcement learning implementations and I started looking at in youtube, this is a very good introduction, are you going to talk about RL implementations ? That woul be amazing!
Hi , I wanted to ask something Since the we will deploy the NN in FPGA to store th data persistency of the training and predicting shouldnt we convert the training images in a fixed point representation ?
@@marcowinzker3682 sorry maybe we misunderstood each other Once the parameters have been calculated( the parameters have been defined on base of the input images) , then at inference time when the fpga gets the image data(after it has been trained) and goes on the forward propagation , is it bad if we took the images as double while in fpga they will be processed as fixed points?
@@fc3fc354 There is research on doing machine learning applications without high precision arithmetic. Of course there are small differences if you do floating point or fixed point with 32, 16, 8 bit accuracy. Often 8 bit accuracy gives good results with acceptable hardware effort.
Please have a look at the video "Machine Learning on FPGAs: Sigmoid Function and Exercises". The sigmoid function is implemented with a built-in block RAM. ua-cam.com/video/dygzrBiDFnk/v-deo.html
The best approach for verification is simulation! And then you compare the output with another model, for example the Octave result. There is an example for this approach for the FIR filter in the video "FPGA FIR Filter: Verification with VHDL Testbench" and "FPGA FIR Filter: Self-Checking Testbench". If you want to check the output on real hardware, you would check the result on a monitor. You can also take the result with a frame grabber and analyze it. Our remote lab does this. You can click on the output image and store it on your computer. (However, the image is compressed with JPEG to save bandwidth, so the result is not bit-true.)
@@samyakjain327 If you want something physical, you need an FPGA board with video input and video output port. Such a sytem is described in the video "Image Processing with Terasic FPGA-Boards"
Dear professor! Your lecture very very impressive to me. I'm following your tutorial until the step of load Name filter of Cyclone IV, could you please tell to me which "name filter" did you use in your lab? Warmly thank professor so much~
The Cyclone IV is a EP4CE22E22C7. You find this information in the constraints file QSF. And it is also in the FAQ: www.h-brs.de/de/emt/frequently-asked-questions-fpga-vision-remote-lab
@@marcowinzker3682 Thanks professor for your reply in detail. However, I'm facing with a problem when execute bit file in your sever, I receive the output was "no singal". Could you please help me to check your server? thank professor so much~
You can download the video from our homepage www.h-brs.de/de/fpga-vision-lab with the name "Video with road signs on motorway (CC BY)". Our you can make a screenshot from the remote-lab. Then you have to generate the image label file. Octave script are intended (and have been tested) with image size 720p, i.e. 1280x720 pixel. We do not provide the images. You have to generate them on your own.
Thank you for such great content sir!!!.I would like to try this project as part of an academic work. Could you please provide the testbench vhdl code for this for simulation? I am just a beginner. It would be of great help.
A complete testbench is available for the FIR filter experiment. If you are a beginner, I recommend that lecture. Then you can adapt that testbench to this design. ua-cam.com/video/o3hk7xAY5-s/v-deo.html ua-cam.com/video/wTIHEX7WWvg/v-deo.html
Where you able to write testbench code for this simulation? it would be really helpful for me as I am also a beginner and want to try this as an engineering project but I don't have enough time to try on my own.
The video signal consists of two parts: 1) image information: red, green, blue 2) sync signals to indicate when a new frame or a new line begins The control block copies the sync signals from input to output. Because the image information needs some clock cycles for processing, the sync signals needs the same delay. So, the control block copies the sync signals with some clock cycles delay.
Basically yes, but this is a large project. You need a large FPGA and scheduling of the layers with subsampling. Use of a framework (like ua-cam.com/video/FFUyRQukGvM/v-deo.html) is recommend. A good approach is an architecture with a CPU for scheduling of tasks and FPGA fabris as an accelerator.
@@marcowinzker3682 can i have your email so that i can email you proposed design. I'm planning to do in parallel as soon as each byte enters into system.
Are you joking? At minute 4:45 you see the pipeline stages. There is a lot of pipelining! In the code you see it at minute 8:05 inisde the neuron. You get a pipeline stage when you have a signal assignment '
OMG I just started learning ML and DL implementation on FPGAs and I found your channel. Greetings from Indonesia 🎉
Same here. Good luck, Muhammad
Hi please give me yr email
Awesome! This is very helpful and thank you, Prof. Dr. Marco Winzker!
You hit this one out of the park!! Well done, Thank you.
This video inspired me that it is really possible to implement AI scenarios to hardware, via FPGA. Keep the good work Sir, and thank you for sharing this valuable knowledge.
This is amazing and very helpful. Concepts are explained very well, keep up the great work!
This channel is a gold nugget! ,,, keep up the amazing work,, thank you 😊
very interesting and thanks for publishing this stuff
Just awesome ! Thanks for publishing this.
Awesome ! Thanks for publishing this.
Thanks for these great tutoria videos Prof Winzker. I want to ask a question, I am new to NN world and now focused on research of NN HW acceleration structures on FPGA and ASIC. I read that the NN structures you utilized in this example is called as Multilayer Perceptron (MLP). However, I see tons of accelerators for Convolutional Neural Network (CNN) structions. Also CNN is mentioned a lot on image classification. You also work on image processing and NN application on images. Can you give some info about when to use MLP and CNN, or can you refer me a paper about which NN structures are being utilized mostly on ML applications ? Thanks a lot for everything again
This example uses a single hidden layer to show the principle of an NN. The task is designed that it fits this architecture.
A CNN is helpful, when you want to identify complex objects with a combination of several features. So you identify four legs, pointy ears, whiskers in the front layers. Then following layer of the CNN concludes that the image shows a cat.
A hardware accelerator uses the structure that I presented but normally has a complex scheduling. So there is a CPU that gives small pieces of the task to the accelerator hardware. The CPU collects results, and again sends new tasks to the hardware.
I don't really have a paper that I can point to. Sorrry.
@@marcowinzker3682 Thanks a lot. I will look for papers and if found a good one write it to here
Awesome vid !
Thanks !
Excellent Mr thanks a lot
Loving this playlist I just started working in fpga-based reinforcement learning implementations and I started looking at in youtube, this is a very good introduction, are you going to talk about RL implementations ? That woul be amazing!
thank you. Perfect!
Hi , I wanted to ask something
Since the we will deploy the NN in FPGA to store th data persistency of the training and predicting shouldnt we convert the training images in a fixed point representation ?
Hi, the training images do not go into the FPGA. They are used to determine the parameters and the parameters go into the FPGA.
@@marcowinzker3682 sorry maybe we misunderstood each other
Once the parameters have been calculated( the parameters have been defined on base of the input images) , then at inference time when the fpga gets the image data(after it has been trained) and goes on the forward propagation , is it bad if we took the images as double while in fpga they will be processed as fixed points?
@@fc3fc354 There is research on doing machine learning applications without high precision arithmetic. Of course there are small differences if you do floating point or fixed point with 32, 16, 8 bit accuracy. Often 8 bit accuracy gives good results with acceptable hardware effort.
thank you sir the video that i am looking for
SIr, Do you know how to access the biult in FPGA block RAM and store the text file?
Please have a look at the video "Machine Learning on FPGAs: Sigmoid Function and Exercises".
The sigmoid function is implemented with a built-in block RAM.
ua-cam.com/video/dygzrBiDFnk/v-deo.html
Great Work. I have question like if I want to perform this on real hardware then how can we check the output aka new modified image by an algorithm?
The best approach for verification is simulation! And then you compare the output with another model, for example the Octave result. There is an example for this approach for the FIR filter in the video "FPGA FIR Filter: Verification with VHDL Testbench" and "FPGA FIR Filter: Self-Checking Testbench".
If you want to check the output on real hardware, you would check the result on a monitor. You can also take the result with a frame grabber and analyze it.
Our remote lab does this. You can click on the output image and store it on your computer. (However, the image is compressed with JPEG to save bandwidth, so the result is not bit-true.)
@@marcowinzker3682 Right, but how your simulator get the output image from FPGA. I am trying to make something the same physical.
@@samyakjain327 If you want something physical, you need an FPGA board with video input and video output port. Such a sytem is described in the video "Image Processing with Terasic FPGA-Boards"
Okay, Thank you so much for support
Thank you for video. Can you show how to implement spiking neural network in FPGA board?
Good idea. I am supported by student thesis and will take this topic on my list.
@@marcowinzker3682 Thank you, Professor. Hope to see new videos on SNN soon.
Dear professor!
Your lecture very very impressive to me. I'm following your tutorial until the step of load Name filter of Cyclone IV, could you please tell to me which "name filter" did you use in your lab? Warmly thank professor so much~
The Cyclone IV is a EP4CE22E22C7. You find this information in the constraints file QSF.
And it is also in the FAQ: www.h-brs.de/de/emt/frequently-asked-questions-fpga-vision-remote-lab
@@marcowinzker3682 Thanks professor for your reply in detail.
However, I'm facing with a problem when execute bit file in your sever, I receive the output was "no singal". Could you please help me to check your server? thank professor so much~
@@tonydo29 Please check the FAQ, there is a "no signal" item.
In 95% of the cases this error is caused by missing pin assignments.
@@marcowinzker3682 Warmly thank professor for your information. I'll take a look and check it now~
I couldn't get the input image and image_label in previous video..can you please provide details?
You can download the video from our homepage www.h-brs.de/de/fpga-vision-lab with the name "Video with road signs on motorway (CC BY)". Our you can make a screenshot from the remote-lab. Then you have to generate the image label file. Octave script are intended (and have been tested) with image size 720p, i.e. 1280x720 pixel.
We do not provide the images. You have to generate them on your own.
Thank you for such great content sir!!!.I would like to try this project as part of an academic work. Could you please provide the testbench vhdl code for this for simulation? I am just a beginner. It would be of great help.
A complete testbench is available for the FIR filter experiment. If you are a beginner, I recommend that lecture. Then you can adapt that testbench to this design.
ua-cam.com/video/o3hk7xAY5-s/v-deo.html
ua-cam.com/video/wTIHEX7WWvg/v-deo.html
@@marcowinzker3682 Thank you sir... i will try
Where you able to write testbench code for this simulation? it would be really helpful for me as I am also a beginner and want to try this as an engineering project but I don't have enough time to try on my own.
Why do we need a control block for sync signals
The video signal consists of two parts:
1) image information: red, green, blue
2) sync signals to indicate when a new frame or a new line begins
The control block copies the sync signals from input to output. Because the image information needs some clock cycles for processing, the sync signals needs the same delay. So, the control block copies the sync signals with some clock cycles delay.
@@marcowinzker3682 Thanks!!!!
Nice lecture. Sir i want to implement alexnet cnn arch on FPGA, is it do able or not? Please guide
Basically yes, but this is a large project. You need a large FPGA and scheduling of the layers with subsampling. Use of a framework (like ua-cam.com/video/FFUyRQukGvM/v-deo.html) is recommend.
A good approach is an architecture with a CPU for scheduling of tasks and FPGA fabris as an accelerator.
@@marcowinzker3682 can i have your email so that i can email you proposed design. I'm planning to do in parallel as soon as each byte enters into system.
@@marcowinzker3682 thanks for your response 👍
@@SW-ud1wt I am sorry, I can not give individual support.
@@marcowinzker3682 ok
How was the user interface prepared?
The remote lab uses WebLab-Deusto as the management software.
Could you disclose the vhdl codes ?
The code is available. See description of video.
Your code didn't look very pipelined
Are you joking? At minute 4:45 you see the pipeline stages. There is a lot of pipelining!
In the code you see it at minute 8:05 inisde the neuron. You get a pipeline stage when you have a signal assignment '