How can I do this for multiple data points with multiple dimensions? My dimensions are longitude, latitude, and time (year, month, day, hour [hour increments of 3]), and I want to extract two values from each longitude/latitude data point, for the entire year.
I'm not entirely sure what you mean but maybe you will find this helpful. import xarray as xr Load your dataset (replace 'filename.nc' with your actual file) ds = xr.open_dataset('filename.nc') # Suppose your latitude range is from 20 to 30 degrees, longitude range is from -10 to 10 degrees, and time range is from '2024-01-01' to '2024-01-31' Selecting latitude range from 20 to 30 degrees lat_range = ds.sel(lat=slice(20, 30)) Selecting longitude range from -10 to 10 degrees lon_range = lat_range.sel(lon=slice(-10, 10)) Selecting time range from '2024-01-01' to '2024-01-31' time_range = lon_range.sel(time=slice('2024-01-01', '2024-01-31'))
@@LukeDataManager i can do it individually, so scratch the multiple data points part. However, the year, month, time, and hour are each individual variables.
Interesting. I think I would be tempted to extract each variable as numpy arrays first and then combine them. import numpy as np years = np.array([2024, 2024, 2024]) months = np.array([2, 2, 2]) days = np.array([28, 29, 29]) hours = np.array([10, 12, 15]) # Create datetime array timestamps = np.array([np.datetime64(f'{y}-{m:02}-{d:02}T{h:02}') for y, m, d, h in zip(years, months, days, hours)]) Then you can add that numpy array to your dataframe if you need to. df['timestamps'] = timestamps Assuming you have one row per time. If not, create another dataframe of your year, month.... first and then merge the dataframe together based on a shared column or list of columns
And if you are ever creating a netcdf file don't encode time like that! There is a section the next video in the series about how to encode time in CF-NetCDF.
How can I do this for multiple data points with multiple dimensions? My dimensions are longitude, latitude, and time (year, month, day, hour [hour increments of 3]), and I want to extract two values from each longitude/latitude data point, for the entire year.
I'm not entirely sure what you mean but maybe you will find this helpful.
import xarray as xr
Load your dataset (replace 'filename.nc' with your actual file)
ds = xr.open_dataset('filename.nc')
# Suppose your latitude range is from 20 to 30 degrees, longitude range is from -10 to 10 degrees, and time range is from '2024-01-01' to '2024-01-31'
Selecting latitude range from 20 to 30 degrees
lat_range = ds.sel(lat=slice(20, 30))
Selecting longitude range from -10 to 10 degrees
lon_range = lat_range.sel(lon=slice(-10, 10))
Selecting time range from '2024-01-01' to '2024-01-31'
time_range = lon_range.sel(time=slice('2024-01-01', '2024-01-31'))
@@LukeDataManager i can do it individually, so scratch the multiple data points part. However, the year, month, time, and hour are each individual variables.
Interesting. I think I would be tempted to extract each variable as numpy arrays first and then combine them.
import numpy as np
years = np.array([2024, 2024, 2024])
months = np.array([2, 2, 2])
days = np.array([28, 29, 29])
hours = np.array([10, 12, 15])
# Create datetime array
timestamps = np.array([np.datetime64(f'{y}-{m:02}-{d:02}T{h:02}') for y, m, d, h in zip(years, months, days, hours)])
Then you can add that numpy array to your dataframe if you need to.
df['timestamps'] = timestamps
Assuming you have one row per time. If not, create another dataframe of your year, month.... first and then merge the dataframe together based on a shared column or list of columns
And if you are ever creating a netcdf file don't encode time like that! There is a section the next video in the series about how to encode time in CF-NetCDF.