*Summary: Running Computer Vision Models on NPUs* What is an NPU? (0:37) - NPUs are specialized silicon chips optimized for running neural network computations, especially matrix multiplications. - Unlike CPUs and GPUs, they can't run general-purpose programs, focusing purely on neural network inference. - Many different names exist for these chips, including LPU, TPU, VPU, etc., but they share the core idea of accelerating neural network calculations. Why Use NPUs? (2:29) - Main advantages: Reduced power consumption, lower device cost, potential for significant speedups compared to CPU/GPU for specific tasks. - Main disadvantages: Increased development complexity, limited choice of neural network architectures, more intricate deployment and testing processes. Challenges of working with NPUs: - Diverse Ecosystem: (7:42) A vast landscape of vendors, frameworks, and boards makes finding a perfect solution difficult. Each vendor typically offers its own custom framework. - Model Export and Compatibility: (10:09) - Requires careful preparation, including specific patches and quantization, to adapt your model to the target NPU architecture. - Non-maximum suppression (NMS) (18:59) often needs to be handled outside the NPU, requiring separate code or fallback mechanisms. - Memory Limitations: (20:54) - Limited memory size on NPUs restricts model size and complexity. - Memory access speed and structure significantly impact performance. - Preprocessing: (22:46) May need to be performed separately on the CPU, GPU, or dedicated accelerator depending on the NPU and its capabilities. - Transformer Support: (23:58) Limited or non-existent on many NPUs, often requiring model adjustments or alternative convolutional architectures. - Layer Support: (25:23) - Advertised layer support can be misleading due to merged layers or limited functionalities. - Always verify compatibility and performance for your specific model layers. - Quantization: (27:33) - Essential for many NPUs to reduce model size and accelerate inference. - Can be complex and lead to accuracy degradation, requiring careful fine-tuning and evaluation. - Benchmarks: (30:30) - Often don't reflect real-world performance. - Always test on your target hardware and specific model for accurate results. Additional considerations: - CPUs play a vital role in data transfer, image decoding, preprocessing, and fallback mechanisms, impacting overall performance (36:43). - C++ is the dominant language for inference on most NPUs, while Python prevails in model training and export (38:45). - Training on NPUs is possible but involves a separate class of processors and different considerations (39:51). i used gemini 1.5 pro
Good summary and useful for passers by. However, the video contains some small remarks that contain a lot of useful information, so I still recommend watching the whole video.
Depends on your budget. The smooth experience is with Jetsons or Intel-based boards. In the case of a low budget, I recommend some RockChip-based solutions.
It's difficult to buy one piece for home use, and none of my friends or colleagues are using it right now, so I have no chance to borrow. So, it's not in the plans. But if there is a chance, I will try.
Здравствуйте, давно слежу за Вашим творчеством. Прошу Вас, продолжайте в том же духе! Очень интересно. Могли бы Вы сказать, доводилось ли Вам размещать нейронную сеть на FPGA ? Если да, то могли бы Вы, пожалуйста, поделиться своим опытом ?
Добрый день, спасибо! Пару раз хотел потестить xilinx kria, но меня каждый раз отговаривали со словами что это полный хлам. В целом FPGA дефолтовый не то что хорошо ложиться на архитектуру сетей. Так что не очень понятен смысл даже...
*Summary: Running Computer Vision Models on NPUs*
What is an NPU? (0:37)
- NPUs are specialized silicon chips optimized for running neural network computations, especially matrix multiplications.
- Unlike CPUs and GPUs, they can't run general-purpose programs, focusing purely on neural network inference.
- Many different names exist for these chips, including LPU, TPU, VPU, etc., but they share the core idea of accelerating neural network calculations.
Why Use NPUs? (2:29)
- Main advantages: Reduced power consumption, lower device cost, potential for significant speedups compared to CPU/GPU for specific tasks.
- Main disadvantages: Increased development complexity, limited choice of neural network architectures, more intricate deployment and testing processes.
Challenges of working with NPUs:
- Diverse Ecosystem: (7:42) A vast landscape of vendors, frameworks, and boards makes finding a perfect solution difficult. Each vendor typically offers its own custom framework.
- Model Export and Compatibility: (10:09)
- Requires careful preparation, including specific patches and quantization, to adapt your model to the target NPU architecture.
- Non-maximum suppression (NMS) (18:59) often needs to be handled outside the NPU, requiring separate code or fallback mechanisms.
- Memory Limitations: (20:54)
- Limited memory size on NPUs restricts model size and complexity.
- Memory access speed and structure significantly impact performance.
- Preprocessing: (22:46) May need to be performed separately on the CPU, GPU, or dedicated accelerator depending on the NPU and its capabilities.
- Transformer Support: (23:58) Limited or non-existent on many NPUs, often requiring model adjustments or alternative convolutional architectures.
- Layer Support: (25:23)
- Advertised layer support can be misleading due to merged layers or limited functionalities.
- Always verify compatibility and performance for your specific model layers.
- Quantization: (27:33)
- Essential for many NPUs to reduce model size and accelerate inference.
- Can be complex and lead to accuracy degradation, requiring careful fine-tuning and evaluation.
- Benchmarks: (30:30)
- Often don't reflect real-world performance.
- Always test on your target hardware and specific model for accurate results.
Additional considerations:
- CPUs play a vital role in data transfer, image decoding, preprocessing, and fallback mechanisms, impacting overall performance (36:43).
- C++ is the dominant language for inference on most NPUs, while Python prevails in model training and export (38:45).
- Training on NPUs is possible but involves a separate class of processors and different considerations (39:51).
i used gemini 1.5 pro
Good summary and useful for passers by. However, the video contains some small remarks that contain a lot of useful information, so I still recommend watching the whole video.
Thank you.
good one!
awesome)
Dear, tks for the content.
Which sbc would you recommend for somente just starting with computer vision?
Depends on your budget.
The smooth experience is with Jetsons or Intel-based boards.
In the case of a low budget, I recommend some RockChip-based solutions.
Tks mate, I will check the rockchip!
You gonna test the new Hailo GenAI m.2 board?
It's difficult to buy one piece for home use, and none of my friends or colleagues are using it right now, so I have no chance to borrow.
So, it's not in the plans. But if there is a chance, I will try.
But the next video will probably be about my experience of using Hailo in production (more about framework and Hailo-8)
Здравствуйте, давно слежу за Вашим творчеством. Прошу Вас, продолжайте в том же духе! Очень интересно. Могли бы Вы сказать, доводилось ли Вам размещать нейронную сеть на FPGA ? Если да, то могли бы Вы, пожалуйста, поделиться своим опытом ?
Добрый день, спасибо!
Пару раз хотел потестить xilinx kria, но меня каждый раз отговаривали со словами что это полный хлам.
В целом FPGA дефолтовый не то что хорошо ложиться на архитектуру сетей. Так что не очень понятен смысл даже...
@@AntonMaltsev Понял, спасибо
Your jump cuts make this confusing