- 27
- 55 707
Luke Data Manager
Norway
Приєднався 9 лис 2021
Welcome to the channel - guiding you through the FAIR data revolution!
Learn how to read and create CF-NetCDF files and Darwin Core Archives, as well as how publishing FAIR data can help you, the scientific community and beyond! Whether you are a data manager, a senior researcher or a student, this is the channel to help you work with FAIR data.
I am Luke Marsden and this is my personal channel. I work in data management and software development at the Norwegian Meteorological Institute, as well as holding an adjunct position at the University Centre in Svalbard. I am the lead data manager of the Nansen Legacy project, and am also involved in Arctic Passion and the flow of Copernicus satellite data from the European Space Agency through to the Norwegian portal.
I have a 2nd UA-cam channel for videos that have been made for specific research institutions or projects
www.youtube.com/@LukeDataManagementProjects
Learn how to read and create CF-NetCDF files and Darwin Core Archives, as well as how publishing FAIR data can help you, the scientific community and beyond! Whether you are a data manager, a senior researcher or a student, this is the channel to help you work with FAIR data.
I am Luke Marsden and this is my personal channel. I work in data management and software development at the Norwegian Meteorological Institute, as well as holding an adjunct position at the University Centre in Svalbard. I am the lead data manager of the Nansen Legacy project, and am also involved in Arctic Passion and the flow of Copernicus satellite data from the European Space Agency through to the Norwegian portal.
I have a 2nd UA-cam channel for videos that have been made for specific research institutions or projects
www.youtube.com/@LukeDataManagementProjects
How to create a CF NetCDF file using R
In this video, you will learn how to create a NetCDF file that is fully compliant with both the Climate and Forecast (CF) Conventions and the Attribute Convention for Data Discovery (ACDD).
# Chapters
00:00 Introduction
01:22 Dimensions and coordinate variables
07:37 Time in NetCDF
14:46 1D data variables
16:50 2D data variables
22:02 3D data variables
25:09 Dataframe to 3D array
26:20 Irregular grids and instruments that move
27:05 Variable attributes
36:10 Global attributes
45:30 Checking your file is compliant with CF and ACDD
# How to cite this course
If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:
*Marsden, L.* (2024, May 31). NetCDF in R - from beginner to pro. Zenodo. doi.org/10.5281/zenodo.11400754
# All videos in this course
01: ua-cam.com/video/Xer1XBm3sns/v-deo.html
02: ua-cam.com/video/MXr3tp6Q1aA/v-deo.html
03: ua-cam.com/video/9-EDaRQ8Aps/v-deo.html
04: ua-cam.com/video/IZDygRjfMIg/v-deo.html
# The code
This tutorial series is accompanied by a Jupyter Book with code, explanations and more examples. You can find the relevant section here:
nordatanet.github.io/NetCDF_in_R_from_beginner_to_pro/04_creating_a_cfnetcdf_file.html
# Useful links
Time in NetCDF - www.unidata.ucar.edu/software/netcdf/time/recs.html
Climate and Forecast (CF) conventions: cfconventions.org/
CF standard names: cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
Attribute Convention for Data Discovery: wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3
Recommendations of Arctic Data Centre: adc.met.no/node/4
Options and descriptions for coverage_cotent_type: wiki.esipfed.org/ISO_19115_and_19115-2_CodeList_Dictionaries#MD_CoverageContentTypeCode
ISO 8601 for encoding timestamps: en.wikipedia.org/wiki/ISO_8601
NetCDF compliance checker: compliance.ioos.us/index.html
RNetCDF documentation: cran.r-project.org/web/packages/RNetCDF/RNetCDF.pdf
Repository for where to propose a new CF standard name. Raise an issue: github.com/cf-convention/vocabularies/issues
Guidelines on construction of CF standard names: cfconventions.org/Data/cf-standard-names/docs/guidelines.html
UUID generator (for *id* attribute): www.uuidgenerator.net/
# Chapters
00:00 Introduction
01:22 Dimensions and coordinate variables
07:37 Time in NetCDF
14:46 1D data variables
16:50 2D data variables
22:02 3D data variables
25:09 Dataframe to 3D array
26:20 Irregular grids and instruments that move
27:05 Variable attributes
36:10 Global attributes
45:30 Checking your file is compliant with CF and ACDD
# How to cite this course
If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:
*Marsden, L.* (2024, May 31). NetCDF in R - from beginner to pro. Zenodo. doi.org/10.5281/zenodo.11400754
# All videos in this course
01: ua-cam.com/video/Xer1XBm3sns/v-deo.html
02: ua-cam.com/video/MXr3tp6Q1aA/v-deo.html
03: ua-cam.com/video/9-EDaRQ8Aps/v-deo.html
04: ua-cam.com/video/IZDygRjfMIg/v-deo.html
# The code
This tutorial series is accompanied by a Jupyter Book with code, explanations and more examples. You can find the relevant section here:
nordatanet.github.io/NetCDF_in_R_from_beginner_to_pro/04_creating_a_cfnetcdf_file.html
# Useful links
Time in NetCDF - www.unidata.ucar.edu/software/netcdf/time/recs.html
Climate and Forecast (CF) conventions: cfconventions.org/
CF standard names: cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
Attribute Convention for Data Discovery: wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3
Recommendations of Arctic Data Centre: adc.met.no/node/4
Options and descriptions for coverage_cotent_type: wiki.esipfed.org/ISO_19115_and_19115-2_CodeList_Dictionaries#MD_CoverageContentTypeCode
ISO 8601 for encoding timestamps: en.wikipedia.org/wiki/ISO_8601
NetCDF compliance checker: compliance.ioos.us/index.html
RNetCDF documentation: cran.r-project.org/web/packages/RNetCDF/RNetCDF.pdf
Repository for where to propose a new CF standard name. Raise an issue: github.com/cf-convention/vocabularies/issues
Guidelines on construction of CF standard names: cfconventions.org/Data/cf-standard-names/docs/guidelines.html
UUID generator (for *id* attribute): www.uuidgenerator.net/
Переглядів: 88
Відео
How to write a data from a NetCDF file to a CSV or Excel file using R
Переглядів 2352 місяці тому
In this video, you will learn to write data from NetCDF files to a CSV or XLSX file that you can open in Excel or your favourite spreadsheet editor. # Chapters 00:00 Introduction 00:38 Variables with 1 dimension 07:22 Variables with 2 dimensions 18:27 Variables with 3 dimensions # How to cite this course If you think this course contributed to the work you are doing, consider citing it in your ...
How to plot data from a NetCDF file in R programming
Переглядів 6605 місяців тому
In this video, you will learn to quickly and easily make plots of data from a NetCDF file in R # Chapters 00:00 Introduction 00:50 1D data (depth profile or time series) 10:23 Plotting data on a map # How to cite this course If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation: *Marsden, L.* (2024, May 31)....
How to open a NetCDF file in R programming
Переглядів 9436 місяців тому
In this video, you will learn how to open a NetCDF file in R, understand the contents, and extract the metadata and data. You will also learn about the Climate & Forecast (CF) conventions and the Attribute Convention for Data Discovery (ACDD). For a NetCDF file to be compliant with the FAIR principles, it should adhere to both these conventions. # Chapters 00:00 Introduction 00:58 Data used in ...
How to publish and cite code using GitHub and Zenodo
Переглядів 8006 місяців тому
Quick tutorial on how to make code citable using GitHub and Zenodo 00:00 Why code should be citable 00:20 Overview of method 00:37 Uploading code to GitHub 01:13 My GitHub repository 01:32 Publishing your repository with GitHub 02:05 Publishing a release of your repository 02:36 Editing your publication 03:18 Citing your publication 03:31 Including your citation in your GitHub repository I didn...
How to access and plot the NOAA surface temperature anomaly data in Python
Переглядів 5136 місяців тому
Tutorial on how to access and plot the NOAA surface temperature anomaly data in Python. I would like to thank the authors of the data for making them openly available and FAIR. If you want to use the data publicly, you should also give credit to the authors of the dataset by including the following recommended citation (change the access data at the end): H.-M. Zhang, B. Huang, J. H. Lawrimore,...
How to extract data from multiple NetCDF files in one Python script
Переглядів 1,1 тис.7 місяців тому
In this video, I will show you how retrieve data from multiple NetCDF files and combine them into a pandas dataframe that you can save to a CSV or XLSX file. You will also learn how to loop through a THREDDS data server. # Chapters 00:00 Introduction 01:14 OPeNDAP 03:31 Looping through a THREDDS data server 08:09 Combing the data into a pandas dataframe 14:55 One column per depth profile # How ...
How batch create NetCDF files in Python
Переглядів 4108 місяців тому
In this video, I will show you how to create lots of NetCDF files quickly and easily in a single Python script. # Chapters 00:00 Introduction 00:30 Creating a single NetCDF file 02:12 Setting up your data 03:01 How a for loop works 03:46 Iterating through multiple input data files 05:09 Batch creating NetCDF files 07:05 Different metadata for each file # How to cite this video If you think this...
How to open a NetCDF file
Переглядів 7 тис.8 місяців тому
In this video, we will open a NetCDF file in Python, understand what is inside, and learn to read the data and metadata. # Chapters 00:00 Introduction to the course 02:11 Installing and importing modules 03:41 Opening a NetCDF file with xarray 04:35 Introducing OPeNDAP 05:31 Understanding the contents of a NetCDF file 07:20 Global attributes (metadata describing the whole file) 09:47 Attribute ...
How to plot data from a NetCDF file
Переглядів 3,3 тис.8 місяців тому
In this video, you will learn to quickly and easily make plots of data from a NetCDF file in Python # Chapters 00:00 Introduction 00:12 Installing and importing modules 00:50 1st dataset (1D depth profile) 09:08 2nd dataset (Map) # How to cite this video If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:...
How to write data from a NetCDF file to a CSV or XLSX/Excel file
Переглядів 1,7 тис.8 місяців тому
In this video, you will learn to export your data to a Pandas dataframe which you can write to a CSV or XLSX file. # Chapters 00:00 Introduction 00:23 Importing modules 00:47 Loading in 1st dataset (1D) 03:05 Data to pandas dataframe 04:21 Exporting to CSV or XLSX 05:44 Second dataset (multiple dimensions) 08:40 Extracting a subset of the data # How to cite this video If you think this course c...
How to create a NetCDF file & CF and ACDD conventions | FAIR compliant
Переглядів 1,2 тис.8 місяців тому
In this video, you will learn how to create a NetCDF file that is fully compliant with both the Climate and Forecast (CF) Conventions and the Attribute Convention for Data Discovery (ACDD). # Chapters 00:00 Introduction 01:13 Initialising your xarray object 01:55 Adding coordinate variables (depth, latitude, longitude) 03:15 Adding time coordinate variable 09:46 Adding 1D data variable 11:03 Ad...
How to structure a data collection of NetCDF files - Granularity
Переглядів 5478 місяців тому
In this video, you will learn about how to divide up your data. Should you create lots of smaller files (finer granularity) or put everything together in a single file (coarse granularity)? This video will show you why finer granularity is generally better! # Chapters 00:00 Introduction 00:29 A network of data for everyone 01:56 How to divide up your data 03:21 Parent-child publications 04:05 4...
Spreadsheet templates for Darwin Core - The Nansen Legacy template generator
Переглядів 127Рік тому
Creating a Darwin Core Archive can be tricky. Which cores and extensions should you include? Which columns are required in each of those? We have developed the Nansen Legacy template generator to help you. It creates spreadsheet templates with headers you can include in a Darwin Core Archive. www.nordatanet.no/aen/template-generator/config=Darwin Core The templates are in XLSX format that you c...
Excel templates for NetCDF files
Переглядів 373Рік тому
Creating CF compliant NetCDF files can be tricky. Which global attributes should you be including? Which standard names should you assign for your variables? I have developed a new template generator that will make it easier for you to create CF-NetCDF files. • You select column headers based on the full list of CF standard names • Descriptions for each term are displayed as notes each time you...
Excel templates for Darwin Core Archives - and publishing the finished archive!
Переглядів 370Рік тому
Excel templates for Darwin Core Archives - and publishing the finished archive!
Why you shouldn't publish data to Zenodo
Переглядів 1,3 тис.Рік тому
Why you shouldn't publish data to Zenodo
How to open and visualise a NetCDF file in Panoply and export the data to CSV or Excel (.nc/.nc4).
Переглядів 2,6 тис.Рік тому
How to open and visualise a NetCDF file in Panoply and export the data to CSV or Excel (.nc/.nc4).
How to create a NetCDF-CF file using Python xarray for beginngers - multiple dimensions
Переглядів 2 тис.Рік тому
How to create a NetCDF-CF file using Python xarray for beginngers - multiple dimensions
How to extract data from multiple NetCDF files in one R script
Переглядів 4,8 тис.2 роки тому
How to extract data from multiple NetCDF files in one R script
How to create a NetCDF file using R for beginners: depth profile
Переглядів 1,3 тис.2 роки тому
How to create a NetCDF file using R for beginners: depth profile
Darwin Core Archives and how to publish data to GBIF
Переглядів 1 тис.2 роки тому
Darwin Core Archives and how to publish data to GBIF
How we record data on Arctic expeditions
Переглядів 2552 роки тому
How we record data on Arctic expeditions
How to get data out of a NetCDF file using R: depth profile
Переглядів 4,3 тис.2 роки тому
How to get data out of a NetCDF file using R: depth profile
How to create a NetCDF file using Python xarray for beginners - a depth profile
Переглядів 3,3 тис.2 роки тому
How to create a NetCDF file using Python xarray for beginners - a depth profile
How to get data out of a NetCDF file using Python: depth profile
Переглядів 13 тис.2 роки тому
How to get data out of a NetCDF file using Python: depth profile
Introducing NetCDF and the CF and ACDD conventions
Переглядів 3,3 тис.2 роки тому
Introducing NetCDF and the CF and ACDD conventions
How do I find a good explanation of what 'anamolies' means in this context? I am trying to understand if this is a deviation from a determined normal?
You can look at the definition of the standard name which is: cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html surface_temperature_anomaly The surface called "surface" means the lower boundary of the atmosphere. "anomaly" means difference from climatology. The surface temperature is the (skin) temperature at the interface, not the bulk temperature of the medium above or below. It is strongly recommended that a variable with this standard name should have the attribute units_metadata="temperature: difference", meaning that it refers to temperature differences and implying that the origin of the temperature scale is irrelevant, because it is essential to know whether a temperature is on-scale or a difference in order to convert the units correctly (cf. cfconventions.org/cf-conventions/cf-conventions.html#temperature-units). Climatology refers to a 30 year period 1961-1990 as described here in the CF conventions doc cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#climatological-statistics
Thanks for this, very helpful!
You're welcome!
Thank you, Luke. Very informative video!
Glad it was helpful!
Can you tell me how to read XML files and how should I filter data if I download it from simple so as to get the points in ArcGIS or any other softwares?
Sorry, I only provide help with FAIR compliant data formats. XML files could potentially be FAIR if the contents and structure adhere to commonly used conventions. Otherwise they are just a container.
@@LukeDataManager It would be helpful if you make video on filtering of data and how to remove the errors as I want the point data basically the coordinates.If you can provide me your mail it would be really helpful.
@naureenfatima9715 sorry I don't have time for individual requests.
@@LukeDataManager Okay . No problem
I am a renegade. I always convert my netcdf files to .h5, disregard all the metadata and stitch it back together when I need to open it again B)
What an incredible and helpful tutorial thanks a lot! Only thing I am wondering is how can I plot only a sub-area like for example I provide 2 pairs of latitude and longitude that define a rectangular region and just plot that region only
Thanks, something like this ax.set_extent([-30, 60, 30, 75], crs=ccrs.PlateCarree()) # Longitude and latitude
You’ve been so helpful. I’ve been working with cf conventions and wmo standards for weeks. I have some variables and attributes that match the cf conventions examples exactly, yet it still tells me I am wrong. Thank you so much for this series. It has helped me greatly. I’m creating netcdf files from weather observation stations, roughly 50k, as time series at a point. The cf_role has been killing me lately for time series data.
Great to hear things like this, good luck with it! Nice job!
This is very technical and interesting. I am on a journey to learn how to present historical wave data graphically as a map overlay using WW3 or NORA10. Unfortunately I do not posses the tools or the skills to do this yet, but hopefully this series will help.
Hope so!
Thank you for this detailed video on how to read a netcdf file. I'm using GSMaP precipitation data which is in .dat file format can you make a video on how to convert such files into netcdf file. Thank you!
Thanks. You might find my video on creating a netcdf file useful. You can find it in the same playlist and in the description to this video.
Thank you so much for these videos!! They have helped me a lot now I'm starting to work with NetCDF files at my new job.
Always great to hear that, thanks and good luck in your new job!
Thanks for the video! This is very helpful!
You're welcome!
Thank you so much Sir your tutorial helps me a lot. it's so insightful! but i have a couple of questions sir: 1. just to make sure, does the prompt from line 1-18 works in jupyter notebook? 2. If I use a monthly climatology timeframe, does that mean I should average each month first before following the prompt in this video? Thank you so much, have a nice day Sir!
Pretty much all python code can work in jupyter notebook. Regarding your data it really depends what you are trying to achieve.
Thanks for the great introduction to working with netcdf files. I really enjoyed this series so far! Do you have any recommendations on what packages to install if I'm learning climate data analyses (e.g. xclim) and comparing the performance of forecast models (e.g. climpred)? Any video resources I should try studying from?
Thanks for the comment. Your question regarding data analysis is a bit outside of my area, but good luck!
Brilliant tutorials!
Thanks!
Hey, great video, thanks, I have a question. While selecting the date, you picked one, what if I want a range? Like from 1950 to 1960?
In this code anom <- var.get.nc(data, 'anom', start=c(NA, NA, 1, time_index), count=c(NA,NA,1,1)) Count is the number of values. So you need to find the index of the 'end date' and then subtract the index of the 'start date' to give you the number of elements. There is a new library in development in R that will allow you to do this by only entering the dates themselves. github.com/pvanlaake/ncdfCF So watch this space!
how do i fix the problem when it says no module ‘xarray’
'Pip install xarray' in your terminal
Hi Luke, how are you dude? Excuse me, I'm trying to see a .nc from Modis Aqua satellite for to get the chlorophyll concentration with Python, I'm follow your steps but the file doesn't show the variables, it's empty. Do you have some video about this situation?
Very unusual. Where can I find the data?
@@LukeDataManager You can download from ocean color web site, is a NASA page, this is the link where you can get a .nc file : oceancolor.gsfc.nasa.gov/cgi/browse.pl?sen=amod or if you want a can send you a email with the file attached
Amazing video. Thank you! How do I fix this problem: AttributeError: partially initialized module 'xarray' has no attribute 'open_dataset' (most likely due to a circular import)?
You can fix the `AttributeError` by following these steps: 1. **Check for Circular Imports**: Ensure there are no circular dependencies in your imports. 2. **Correct Import Statement**: ```python import xarray as xr dataset = xr.open_dataset('your_dataset_file.nc') ``` 3. **Avoid Module Name Conflicts**: Ensure your script is not named `xarray.py` or similar. 4. **Verify Environment**: Make sure you are using the environment where xarray is installed.
By the way, could you please provide an example of spreading the data variables across several dimensions? e.g. I have locations, time and variables, which values depend on both location and time.
This is in preparation. The code is here lhmarsden.github.io/NetCDF_in_R_from_beginner_to_pro/04_creating_a_cfnetcdf_file.html I will be making the video over the summer.
Hi! could you please make tutorial on how to plot water current maps in R?
Can you provide a link to the data?
@@LukeDataManager it is using Copernicus Marine Service data product, velocity, SLA, Wind etc. I am interested in making plot using R. May I get you personal website or email to further contact? please
@@abdulkarim9055 I have limited time or funding for one-on-one tuition - these videos are only a small part of my job. I won't add my email address here or I will receive a lot of requests for help - I already receive quite a lot these requests. I hope you understand. I think it is a better use of my time to provide tutorials that many people can use. But if you leave a link to the data here in the comments I will take it into consideration.
Thank you!!
You're welcome!
Norsk?
Ja
Islansk - but have Norwegian relatives
@@haraldurkarlsson1147 I am not norwegian but live and work in Norway
I see. But still Norway is a great place to live and work in.
Can you explain the values that you entered for the temperature anomaly a little bit better? What are the values in the start and count vectors? What does the value 700 for time signify? It is time slice number 700? It is days since 1800? The documentation that comes with NetCDF is far from insightful. Thanks.
700 in this case is the 700th element in the list. Alone this tells you nothing about what the time is. The value of the 700th element along with the units can be used to calculate the time.
NA means all values. 1 means the first value (in start) or only one value (in count)
@@LukeDataManager Somewhat confusing since NA has a different meaning in R.
@haraldurkarlsson1147 NA stands for not available. I think there is some consistency with how it is used here. NA as in don't include a start or count for that dimension.
I have also worked with the ncdf4 package. It appears to be more complex and moreover the files generated by the two pacakges (ncdf4 and RNetCDF) are different. Thus the functions in one do not work on the other. I find that a bit surprising and not in spirit of the netcdf project. Python seems to have a lot more programs for .nc files (even R does better on the maps).
Do you have an example of a netcdf file created with either RNetCDF or ncdf4 that you can't open and process with the other library? In my experience, the files created by either program are interoperable and usable by the other.
I have downloaded climate data twice - some time ago and yesterday. With the older file I used ncdf4 and with the more recent one RNetCDF. For the first I ran: gistemp <- nc_open(here("data", "climate_data","gistemp1200_GHCNv4_ERSSTv5.nc")). In my Global Environment it is a ncdf4 type object. For the second one I used: glob_temp <- open.nc(here("data", "NOAAGlobalTemp_v6.nc")). This file is listed as a NetCDF object type in my Global Environment. When I attempt to use ncdf4 functions on the NetCDF file I get an error message and visa versa. The error message for ncdf4 trying to act on the NetCDF file is: Error in ncatt_get(glob_temp, "anom", "standard_name") : Error, first passed argument must be an object of class ncdf4
You lost me at “R”. Why not just use Avro, HDF or Proto?
Because many sections of the scientific community use R already. And why not use R?
CF-NetCDF files can be fully FAIR compliant and therefore a lot of important scientific datasets are published in CF-NetCDF. Many other file formats like HDF are good containers but the data and metadata within must follow well defined and commonly used conventions, and structured in a standardised format to be fully FAIR compliant. Some conventions are not mature enough, yet at least.
Very useful
Thanks
How to resample the .nc grided data from 1degree to 0.01degree grid? Give me the code for multiple files
You can use the `interp` method in xarray to interpolate the data from 1 degree to 0.01 degrees. Here's how you can do it: ```python import xarray as xr # Load your xarray dataset # For example, assuming your dataset is named 'data' Let's imagine you have an xarray object called 'data' data_interp = data.interp(lat=range(data.lat.min(), data.lat.max(), 0.01), lon=range(data.lon.min(), data.lon.max(), 0.01)) However, one problem is that you also need to interpolate the data between 359 degrees and 1 degrees. Luckily for you, I have a function you can use for this. In the below function the data are sampled at 2.5 degree intervals and I am interpolating to 0.5 degrees. Method should be one of "linear", "nearest", "zero", "slinear", "quadratic", "cubic", "polynomial" def interpolate_data(ds, method): ds_90_to_270 = ds.sel(lon=slice(87.5, 272.5)) ds_90_to_270_interp = ds_90_to_270.interp(lat=np.arange(-90, 90, 0.5), lon=np.arange(87.5, 272.5, 0.5), method=method) ds_90_to_270_interp = ds_90_to_270_interp.sel(lon=slice(90, 270)) ds_90_to_neg90 = ds_90_to_270_interp.assign_coords(lon=(ds_90_to_270_interp.lon + 180) % 360 - 180) # Combine the interpolated parts ds_0_to_90 = ds.sel(lon=slice(0, 92.5)) ds_270_to_360 = ds.sel(lon=slice(267.5, 360)) ds_combined = xr.concat([ds_0_to_90, ds_270_to_360], dim='lon') ds_neg90_to_90 = ds_combined.assign_coords(lon=(ds_combined.lon + 180) % 360 - 180) ds_neg90_to_90_interp = ds_neg90_to_90.interp(lat=np.arange(-90, 90, 0.5), lon=np.arange(-92.5, 92.5, 0.5), method=method) ds_neg90_to_90_interp = ds_neg90_to_90_interp.sel(lon=slice(-90, 90)) interpolated_ds = xr.concat([ds_neg90_to_90_interp, ds_90_to_neg90], dim='lon') interpolated_ds = interpolated_ds.sortby('lon') return interpolated_ds
For multiple files just include that within your for loop. The execution of the function I mean. Defining the function can go at the top of your code.
amazing video
Glad you think so!
good gob
Thanks!
Hi, friend, would you give an example of how to manipulate the sequence during looping through a THREDDS data server? I mean opening and appending the dataset in a giving order, thank you very much.
This really depends on which order you want to open your files. You could manually create a list of filenames if you have a specific order in mind and do a for loop through that list.
ok,thank you very much
How do I compress NetCDF files using ZStandard? Or is it only possible using ZLib?
There has been some discussion on this over the past few years. github.com/Unidata/netcdf-c/issues/2173
@@LukeDataManager thank you sir for the reference. I have read that but I can't understand anything since I am not majored in computer science or anything related that (I am majored in environmental engineering). From what I've gathered, we can compress NetCDF files using nccopy, ncks (NCO), CDO, and nccompress. I need to find a better NetCDF compression method because I have very limited resource to store 10-year worth of ERA5 data for my regional climate model. I would be thankful if you could give me some insight for that. Thank you so much.
I think zstandard is possible in NetCDF. For example if you are using the netcdf4 li rary in Python you can read about different compressions in the section 'Efficient compression of netCDF variables' on this page unidata.github.io/netcdf4-python/
This video series is my first foray into learning about NetCDF, thank you so much for putting it together! My use case has me downloading precipitation data from PRISM over at Oregon State University (comes in .bil....) and doing some manipulations and saving it to NetCDF for input into another workflow. I read somewhere that PRISM used to offer their data in NetCDF, but no longer does. Any insights as to why that might be the case? Seems like, when done thoughtfully, NetCDF is a gold standard of a sort. It strikes me as odd that an institution would regress from it.
Thank you! I agree in principle with your comments regarding moving from NetCDF to .bil but I am not familiar with the dataset or providers so I'm reluctant to say too much.
@@LukeDataManager Fair enough! Thanks!
v6 of the data are here: www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C01704/html
The link for global surface temperature is giving an error. Is there a way to download as nc file rather than using URL?
Seems to me that NOAA are having some issue with their THREDDS server right now. Try here: www.ncei.noaa.gov/data/noaa-global-surface-temperature/v6/access/gridded/ And you should be able to download v6 of the data in a NetCDF file
@@LukeDataManager When I reach this part ''' desired_date = '2022-12-01' data_for_desired_date = xrds.sel(time=desired_date) print(data_for_desired_date) ''' It gives TypeError: get_loc() got an unexpected keyword argument 'method' Did I miss to import some libraries?
It worked in Colab. I might have missed something in local device.
Interesting video mate!!! How can I do for Chla in a specific area??
Is it a depth profile or geospatial data like the ones in this video? You can use xrds.sel to obtain a subset of an xarray object for your desired latitude and longitude.
docs.xarray.dev/en/latest/generated/xarray.Dataset.sel.html
Hi Luke ! I have an urgent problem regarding data management of the ERA5 reanalysis dataset. Would you be able to help me with that ? Rasmus
Sure, send me the details.
@@LukeDataManager Thanks Luke, Do you have an email where I can reach you on ? Thanks for great videos. Rasmus
You found me. I don't like to share my email on UA-cam comments if possible. But I am on LinkedIn and respond to messages there.
Super useful, Luke!
Glad you think so!
I didn't include this is in the video, but you can also create a CITATION.cff file in your git repository that includes some metadata. Zenodo will then use the CITATION.cff file to pre-fill the metadata when the release is published. So you can do: 1) add software to GitHub; 2) add CITATION.cff file; 3) get a DOI with Zenodo. 4) edit the CITATION.cff file to add the DOI Information on CITATION.cff files here: citation-file-format.github.io/
Thanks for the video. But how to read separate nc file in the computer?
Do you mean if you have multiple files on your computer? Then you need to open them one by one in a for loop for example. You might find this video useful: ua-cam.com/video/AbLRV5YUW2g/v-deo.html In it I extract data from multiple NetCDF files that are hosted on a THREDDS server. But you can adapt the code to loop through files on your computer instead of a THREDDS server. Just use the relative or absolute filepath to your netcdf files instead of the OPeNDAP URL.
@@LukeDataManager Thank you. I will try
hey, what is pressure and depth there, is it a dtype or what?
I am not entirely sure what you mean, and pressure is not provided in either of the datasets. When looking at the variables, you might see for example CHLOROPHYLL_A_TOTAL (DEPTH) float64 .... In this example, CHLOROPHYLL_A_TOTAL is the variable name, DEPTH is the name of the dimension, and float64 is the format of the values in the variable.
@@LukeDataManager ohh..yes I was talking about (DEPTH), and I got it now that the data within the file is organized and indexed along this dimension. Thanks!
Glad you solved it!
Wow so easy
I'm glad you think so!
Hi, thank you for your video. Do you have any recommendation of where to instead host my research data, especially when it's large amounts of data of more than 50 gigabyte.
What type of data is it? And where are you based?
There are hundreds of data centres. Choosing one is hard. There are a few things to consider when making your choice. 1. Are you aware of any data access portals that might be relevant for your data? It is not practical for data users to look through all the data centres for relevant data. Data access portals aim to make all relevant data available in one place by providing a catalogue of access points to datasets in contributing data centres. A data access portal might focus on a certain region or a certain type of data. Contributing data centres most likely follow commonly used standards for metadata and access methods to make this possible. They are interoperable. Other data centres (like zenodo) do not. If one data access portal can point to one data centre, other data access portals can also point to that data centre using the same method. 2. Can you speculatively find data in a data centre you are interested in? Can you search by coordinates, time the data were collected, some keywords? If not, avoid that data centre. 3. Some data centres specialise in certain types of data. E.g. GBIF is a good place for biodiversity data and associated measurements, genbank for genome data. 4. Can the data centre ensure your data will be preserved over decades and centuries? Some private companies or projects might lose funding. What then? What will happen to your data? Does the data centre have a plan? Often national data centres that are funded by national research funding are considered quite secure forms of funding. This is a difficult question to answer but if you let me know more about your data I can suggest a data centre. Pangaea is good and quite broad, and is harvested by many access portals. But there are likely other options for you too.
Great video Luke. Thanks😙
My pleasure!
Loved it! :)
Thanks!
Thanks again. Could you make a video on how to manipulate CMIP6 NetCDF data? No pressure. It will be helpful.
Thanks, that is an interesting suggestion. Are there any particular models within the CMIP6 data that you think I should focus on? Looks like there is a lot of different data in their THREDDS catalogue: ds.nccs.nasa.gov/thredds/catalog/AMES/NEX/GDDP-CMIP6/catalog.html
@@LukeDataManager Thank you for your reply. I checked the link you shared. Could you make a video by using precipitation from "MRI-ESM2-0" and visualizing its monthly and yearly precipitation? Please use a folder for example MRI-ESM2-0/ssp585/r1i1p1f1/pr/ in the THREDDS catalogue.
@@user-asam2 I will consider making this. I have been thinking about making videos on specific, in-demand datasets for a while and this is a good example, so thanks! I might focus on several things in CMIP6 rather than being too specific but hopefully it will be useful. And I will consider using the precipitation data you suggested as an example. Though I have quite a long list of tasks and this has to be done outside of work hours so be patient :)
@@LukeDataManager Thanks! Manupulating precipitation in CMIP6 is very helpful. Sure, I understand you have many tasks to do. No pressure. Thank you so much for such informative videos!
You're welcome!
Thank you for sharing this video. This helped me to understand structure of NetCDF file.
How can I do this for multiple data points with multiple dimensions? My dimensions are longitude, latitude, and time (year, month, day, hour [hour increments of 3]), and I want to extract two values from each longitude/latitude data point, for the entire year.
I'm not entirely sure what you mean but maybe you will find this helpful. import xarray as xr Load your dataset (replace 'filename.nc' with your actual file) ds = xr.open_dataset('filename.nc') # Suppose your latitude range is from 20 to 30 degrees, longitude range is from -10 to 10 degrees, and time range is from '2024-01-01' to '2024-01-31' Selecting latitude range from 20 to 30 degrees lat_range = ds.sel(lat=slice(20, 30)) Selecting longitude range from -10 to 10 degrees lon_range = lat_range.sel(lon=slice(-10, 10)) Selecting time range from '2024-01-01' to '2024-01-31' time_range = lon_range.sel(time=slice('2024-01-01', '2024-01-31'))
@@LukeDataManager i can do it individually, so scratch the multiple data points part. However, the year, month, time, and hour are each individual variables.
Interesting. I think I would be tempted to extract each variable as numpy arrays first and then combine them. import numpy as np years = np.array([2024, 2024, 2024]) months = np.array([2, 2, 2]) days = np.array([28, 29, 29]) hours = np.array([10, 12, 15]) # Create datetime array timestamps = np.array([np.datetime64(f'{y}-{m:02}-{d:02}T{h:02}') for y, m, d, h in zip(years, months, days, hours)]) Then you can add that numpy array to your dataframe if you need to. df['timestamps'] = timestamps Assuming you have one row per time. If not, create another dataframe of your year, month.... first and then merge the dataframe together based on a shared column or list of columns
And if you are ever creating a netcdf file don't encode time like that! There is a section the next video in the series about how to encode time in CF-NetCDF.
It's great with your videos I've started to understand the NetCDF files, I encourage you to put more videos and personally I'm waiting for your help.
Thanks, I am working on a series of videos entitled something like 'NetCDF files in Python: from beginner to pro'. Should be 7-8 videos, to be released next year
You answered all my questions! Very clear explanation. Thanks a lot Luke :D
You are welcome!