my appology i mean the argument "dataset" used in line 11 to 36 in the test function. Does it take the dataset from the llff format from the pkl file? i don get it, tks!!
Hi, thank you for question, and excuse me for the delayed answer. Yes, it is using the data from the pkl file, which was generated by myself directly from the NeRF data to make things easier. It can be downloaded from the GitHub link
Does the code you've shared have the variable "dataset" defined? I don't see it. What is the output of the code a png file with rendered image? than is possible to get a mesh? thanks for your assistance
Hi @businessplaza6212, thank you for your question. The GitHub code has several variables "dataset" in different functions. Therefore, I am not sure to understand your first question, could you please rephrase it? Yes, the output is a rendered 2D image. It is possible to get a mesh, I explain how to do it in my course. Otherwise, you may also be interested in this notebook from the initial NeRF paper github.com/bmild/nerf/blob/master/extract_mesh.ipynb.
Thank you for your fast reply! Your work is great! Im wondering about de “dataset” variable that you use in line 106. But where is defined? Could you clarify pls? I will buy your course as Im working on a Nerf thesis for my master in sc in ML. You mixed the transforms json file from colmap in a pkl file?
Hi, thank you for your question. Unfortunately, not in high quality. We discuss the ray marching algorithm and use it to extract a mesh from NeRF. However, the mesh is not high quality, and does not possess colours. If you want a coarse mesh, that is fine, but if you have high expectations on the quality of the mesh, and need colours, then you would need more advances algorithms than the ones used in the course.
Absolutely great video! Really helped clear up the papers seeing things implemented so straightforwardly. I have a few questions. What type of GPU did you use to train this model? When creating the encoding you initialize your out variable to have the position vector placed in it. (making the output [batch, ((3 * 2) * embedding_pos_dim) + 3] adding that trailing +3) Was there a reason for doing that? I mean adding it surely doesn't hurt. Batching the image creation is also a great idea for smaller gpus. Thanks again for such a great video!
Hi, thank you for your question. This is because we concatenate the position to the positional encoding. This is not mentioned in the paper, but done in their implementation.
Hey I have a smaller question, Nerf takes 5d input, position and view direction, is there s way to get the view direction from a rotation matrix (3x3)?
Hi, thank you for your question. Do you mean the camera to world matrix (c2w)? If so, yes, and actually the direction is already computed from it most of the time. The direction is computed from the camera, using its 3x3 c2w matrix
Thank you so much! You can download it here drive.google.com/drive/folders/18bwm-RiHETRCS5yD9G00seFIcrJHIvD-?usp=sharing. You will understand in the course how it was generated :)
a practical question: how do people figure out the viewing angle and position for a scene that's been captured without that dome of cameras? the dome of cameras makes it easy to know the exact viewing angle and position, but what about just a dude with one camera walking around the scene taking photos of it from arbitrary positions? how do you get theta and phi in practice?
@@jeffreyalidochair MIP-NeRF and Zip-NeRF can be see as algorithms that take as input pictures together with their camera parameters, which can be estimated in several ways. But yes, in the real data from those papers the camera parameters are specifically estimated with colmap
Hi, thank you for your question. The results I show at the beginning of the video are without it.To me these are decent results although they would be better with the hierarchical volume sampling strategy. I think I will make a video about it in the near future :)
@@papersin100linesofcode I have tried it and is it notmal that it generates white images at the beggining? Also Why you set the deltas last as almost inf? Besides I think that using this makes the weight sumbe always 1 so the last regularization has no sense.... Correct me if I am wrong!
@LearningEnglish Does the images remain white with more training? The deltas are the distance to the following sample, and so, for the last sample, the distance to the next one is infinity in theory. We take the exponential of the opposite value of delta which does not lead to exploding values. I hope this is clear. If no, do not hesitate to ask me questions
Thank you for your comment! You should have accessed to the data now, excuse me for the delay. I have removed the authorization, so that anyone can access it directly from now on
Just bought your course!! Pretty cool to find someone talking/teaching NeRFs, since LLMs and Diffusion models stormed and got all the attention haha
Thank you so much! I am glad that you like the content, and I hope you will like the course. Great videos about NeRF will be released soon :)
Awesome. Please keep doing in this field
Thank you! I have a few upcoming videos related to NeRF, and will produce more if people are interested.
my appology i mean the argument "dataset" used in line 11 to 36 in the test function. Does it take the dataset from the llff format from the pkl file? i don get it, tks!!
Hi, thank you for question, and excuse me for the delayed answer. Yes, it is using the data from the pkl file, which was generated by myself directly from the NeRF data to make things easier. It can be downloaded from the GitHub link
thank you very much! I learn the point of nerf from your vidio
Glad to hear it, thank you! :)
Does the code you've shared have the variable "dataset" defined? I don't see it. What is the output of the code a png file with rendered image? than is possible to get a mesh? thanks for your assistance
Hi @businessplaza6212, thank you for your question. The GitHub code has several variables "dataset" in different functions. Therefore, I am not sure to understand your first question, could you please rephrase it? Yes, the output is a rendered 2D image. It is possible to get a mesh, I explain how to do it in my course. Otherwise, you may also be interested in this notebook from the initial NeRF paper github.com/bmild/nerf/blob/master/extract_mesh.ipynb.
Thank you for your fast reply! Your work is great! Im wondering about de “dataset” variable that you use in line 106. But where is defined? Could you clarify pls? I will buy your course as Im working on a Nerf thesis for my master in sc in ML. You mixed the transforms json file from colmap in a pkl file?
Hi, I am so sorry I forgot to answer. Most questions are already answered in other comments. Do you still need clarifications?
I am planning to buy your course, but will i be able to generate the mesh from the capture!?
Hi, thank you for your question. Unfortunately, not in high quality. We discuss the ray marching algorithm and use it to extract a mesh from NeRF. However, the mesh is not high quality, and does not possess colours. If you want a coarse mesh, that is fine, but if you have high expectations on the quality of the mesh, and need colours, then you would need more advances algorithms than the ones used in the course.
How would you add the coarse and fine networks improvement?
Hi, thank you for your comment. I am planning to add a video about it. I hope I can release it in the near future
@@papersin100linesofcode I subscribed, thank you
Absolutely great video! Really helped clear up the papers seeing things implemented so straightforwardly. I have a few questions. What type of GPU did you use to train this model? When creating the encoding you initialize your out variable to have the position vector placed in it. (making the output [batch, ((3 * 2) * embedding_pos_dim) + 3] adding that trailing +3) Was there a reason for doing that? I mean adding it surely doesn't hurt. Batching the image creation is also a great idea for smaller gpus. Thanks again for such a great video!
Hi, thank you for your great comment!
1) I should have used a P5000 or RTX5000.
2) I am not sure I understand which line you are referring to?
I understand 10*6 for the pos enc. but why did you add 3 to it? Posencdim*6+3?
Hi, thank you for your question. This is because we concatenate the position to the positional encoding. This is not mentioned in the paper, but done in their implementation.
@@papersin100linesofcode ow. I understand. Thanks a lot
Hey I have a smaller question, Nerf takes 5d input, position and view direction, is there s way to get the view direction from a rotation matrix (3x3)?
Hi, thank you for your question. Do you mean the camera to world matrix (c2w)? If so, yes, and actually the direction is already computed from it most of the time. The direction is computed from the camera, using its 3x3 c2w matrix
@@papersin100linesofcode yea can you please tell the formula that is used to get them?
@@aditya-bl5xh you may be interested in this script github.com/kwea123/nerf_pl/blob/master/datasets/ray_utils.py. I will soon make a video about it
@@papersin100linesofcode thanks! Appreciated
This is so nice. Just bought your course
Thank you so much! You can download it here drive.google.com/drive/folders/18bwm-RiHETRCS5yD9G00seFIcrJHIvD-?usp=sharing. You will understand in the course how it was generated :)
a practical question: how do people figure out the viewing angle and position for a scene that's been captured without that dome of cameras? the dome of cameras makes it easy to know the exact viewing angle and position, but what about just a dude with one camera walking around the scene taking photos of it from arbitrary positions? how do you get theta and phi in practice?
Hi Jeffrey, thank you for your question. In practise, people use COLMAP (open source pipeline) for estimating the camera parameters.
The camera parameters can also be learned (have a look at my video about NeRF-- if you are interested)
@@papersin100linesofcode thank you! do MIP-NeRF and Zip-NeRF also use COLMAP?
@@jeffreyalidochair MIP-NeRF and Zip-NeRF can be see as algorithms that take as input pictures together with their camera parameters, which can be estimated in several ways. But yes, in the real data from those papers the camera parameters are specifically estimated with colmap
Great video! Could you tell me how much time it took for the model to train approximately?
Thank you! About 24 hours
You skipped the coarse/fine logic from the paper. Were you able to get decent results without it?
Hi, thank you for your question. The results I show at the beginning of the video are without it.To me these are decent results although they would be better with the hierarchical volume sampling strategy. I think I will make a video about it in the near future :)
Great video thank you very much!
I am glad you like it. Thank you for your comment!
Does it generate the sample in 16 epochs?
Thank you for your question. The model is trained for 16 epochs and then, it can be used for rendering
@@papersin100linesofcode I have tried it and is it notmal that it generates white images at the beggining? Also Why you set the deltas last as almost inf? Besides I think that using this makes the weight sumbe always 1 so the last regularization has no sense.... Correct me if I am wrong!
@LearningEnglish Does the images remain white with more training? The deltas are the distance to the following sample, and so, for the last sample, the distance to the next one is infinity in theory. We take the exponential of the opposite value of delta which does not lead to exploding values.
I hope this is clear. If no, do not hesitate to ask me questions
Can you explain pytorch implementation for mip NeRF or zip NeRF? The github repos are very hard to understand
Thank you for the suggestion! I will try to add them
@@papersin100linesofcode thanks!
Great video! can i get the dataset ?
Thank you for your comment! You should have accessed to the data now, excuse me for the delay. I have removed the authorization, so that anyone can access it directly from now on
can you share link for dataset
Done