The project participant will learn (1) how scientific data are stored, (2) to use Python 3 to display metadata, (3) to read a netcdf file, and (4) to display this data as a function of latitude and longitude. Some additional information specific to this satellite measurement is presented at the end.
This script reads in a netcdf file and plots the data.
The file has been downloaded from Remote Sensing Systems (http://www.remss.com/) and contains sea surface salinity (SSS), as well as a few other data products. Below, we look at SSS and the fraction of pixels contaminated by land, zooming into a region known as the Gulf of Guinea. We then plot this as a function of geographic location. Another script (Lesson 2) will make an animation of these data in time.
Christian Buckingham & Eben Nyadjro
The outline of the less is as follows, where text shown in bold denotes where the project participant must send results to the Instructor.
First, we load a few important packages. If you do not have these, and if you are using your personal computer and Anaconda as your Python distribution, type conda install xarray at the terminal window (MacOS, Linux) or command line (Windows). The conda manager package should download it from the internet. If instead you have installed Python a different way, try pip install xarray.
# If using Google Colab
# install dependencies.
!pip install netCDF4
## A few necessary packages.
import numpy as np
from netCDF4 import Dataset #Dataset, MFDataset
import xarray as xr
import matplotlib.pyplot as plt
import datetime as dt
import scipy.signal as signal
Before processing these data, we need to first obtain the dataset from the Internet.
Our dataset is developed by Remote Sensing Systems and is described in detail here (http://www.remss.com/missions/smap/salinity/). The present project is making use of sea surface salinity (SSS) so we will download five example netcdf (.nc) files containing these data. To accomplish this, do the following:
basedirin below). I placed mine in a directory entitled data/salinity.These data are from the Soil Moisture Active Passive (SMAP) sensor. As the satellite orbits Earth, it samples specific locations along its path. This is referred to as the satellite swath. The sensor employed here is a passive sensor, meaning that it examines the electromagnetic radiation emitted from the Earth towards space without sending a signal toward the Earth. (This consumes less power on the satellite, as well.) What is really cool about this sensor is that it it makes measurements in the "microwave" portion of the spectrum, meaning that the satellite can see through the clouds! These data are then combined (averaged) over 8 days and the result is the salinity measurement you will see. More information on the data used in our project can be obtained from the following video:
NetCDF (.nc) is a binary data format that contains not only data but information that describes the data. We refer to this latter type of information as metadata.
Below, we use strings to tell Python where to find the file, we display to the screen some of the metadata, and in the final portion, we plot the data itself. Let's first see what the metadata looks like.
Here, we use the xarray package only to display the contents of the netcdf file in a tidy manner. There are other ways of displaying this information but xarray does a nice job.
# Define the filename.
pname = "../../data/salinity/" # remember the slash on the end of the pathname
fname = "RSS_smap_SSS_L3_8day_running_2020_199_FNL_v04.0.nc"
infile = pname+fname
# Read the metadata of the file. This is equivalent to an "ncdump -h RSS_smap.nc" command at the terminal
# window or command line.
test = xr.open_dataset(infile,decode_times=False)
print(test.info())
The reason for using the statement decode_times=False is that the xarray code naturally wants to interpret the time vector. Sometimes this prevents the code from working on a netcdf file that is not CF-compliant. That is, sometimes it doesn't have the correct standards. So to be general for any netcdf file, we place in the switch decode_times=False.
Note that the filename structure can be read as follows:
Now, we read in the netcdf file. This uses a function called Dataset in the netCDF4 package.
# Read in the data from the netcdf file.
# We use the netCDF4 package to read the netcdf file.
nc = Dataset(infile, "r")
etime = nc.variables["time"][:] # time in seconds since 2000/01/00 00:00
lat = nc.variables["lat"][:] # latitude (degrees), values = [-90, 90]
lon = nc.variables["lon"][:] # longitude (degrees), values = [0, 360]
nobs = nc.variables["nobs"][:] # Number of observations for L3 average (unitless)
sss_smap = nc.variables["sss_smap"][:] # sea_surface_salinity (practical salinity units == unitless)
sss_ref = nc.variables["sss_ref"][:] # Reference sea surface salinity from HYCOM (practical salinity units == unitless)
gland = nc.variables["gland"][:] # average land fraction (weighted by antenna gain)
sst_ref = nc.variables["surtep"][:] # Ancillary sea surface temperature (from Canada Meteorological Center), doi: 10.5067/GHCMC-4FM03
# Access the "data" portion of the variable, as python handles this as a masked array.
etime = etime.data
lat = lat.data
lon = lon.data
sss = sss_smap.data
sss_hycom = sss_ref.data
sst_cmc = sst_ref.data
land_fraction = gland.data
# Collapse data to one dimension.
#etime = np.squeeze(etime)
#lat = np.squeeze(lat)
#lon = np.squeeze(lon)
# Print some info about the variables.
print(etime.dtype) # print the data type
print(etime.shape) # print the shape
print(lat.dtype) # print the data type
print(lat.shape) # print the shape
print(lon.dtype) # print the data type
print(lon.shape) # print the shape
print(sss.dtype) # print the data type
print(sss.shape) # print the shape
print(sss_hycom.dtype) # print the data type
print(sss_hycom.shape) # print the shape
# Convert some variables to double precision (float64).
lat = np.double(lat)
lon = np.double(lon)
sss = np.double(sss)
nlat = len(lat)
nlon = len(lon)
sdata = sss.shape
# Handle time.
# This is really more tricky than it seems.
# Keeping track of time and converting between time units is one of the
# most tricky things in python (and in most computer languages). Thus,
# for now, we will simple convert to years since the reference time.
# Simple manner of handling time.
dtime = etime/86400 # convert seconds to days since ...
ytime = dtime/366 # convert from days to years
Now that we have the data, plot it as a global map. (This takes a few minutes because of the density of data points.)
# Plot the salinity.
plt.pcolor(lon,lat,sss)
plt.clim(28,35)
plt.xlabel('Longitude (deg)')
plt.ylabel('Latitude (deg)')
plt.title('Sea Surface Salinity: '+fname) # here we need to insert a date inside the brackets
plt.grid()
plt.colorbar()
plt.show()
We now want to subset the global map. We examine a region centred on the Gulf of Guinea (GoG). The challenge with this particular location is that the data is given to us from the prime meridian (longitude = 0 degrees) to 360 degrees. That is, the region of the GoG falls on the "seam" or edge of the data. (See how the GoG is both on the left-hand side of the plot and the right-hand side of the plot.) To handle this, we add two halves of data (below, referred to as blocks) to a single matrix, creating one large matrix. The latitudes are the same for both blocks.
# This part is tricky because we need to obtain data across the seam.
# Subset for the region of interest.
latlim = np.array([-10.0,10.0])
lonlim = np.array([-20.0,15.0])
# latlim = np.array([4.25,6.25])
# lonlim = np.array([-1,1])
latlim = np.double(latlim)
lonlim = np.double(lonlim)
ilat1 = (lat >= latlim[0]) & (lat <= latlim[1])
ilon1 = ((lon-360.0) >= lonlim[0]) & ((lon-360.0) < 0.);
ilon2 = (lon >= 0) & (lon <= lonlim[1])
ilat = ilat1;
ilon = np.concatenate((ilon1,ilon2), axis=0)
lats = lat[ilat1]
lons1 = lon[ilon1] - 360
lons2 = lon[ilon2]
lons = np.concatenate((lons1,lons2), axis=0)
index1 = np.array(np.where(ilat))
index2 = np.array(np.where(ilon))
#print(index1)
#print(index2)
sss_block1 = sss[ilat,:]
sss_block1 = sss_block1[:,ilon1]
sss_block2 = sss[ilat,:]
sss_block2 = sss_block2[:,ilon2]
nlats = len(lats)
nlons = len(lons)
nlons1 = len(lons1)
nlons2 = len(lons2)
sss_block = np.zeros([nlats,nlons])
sss_block[0:nlats,0:nlons1] = sss_block1
sss_block[0:nlats,nlons1:nlons1+nlons2] = sss_block2
# Form a mask for the land.
# This mask uses the bad salinity values to identify land.
mask = np.zeros([nlats,nlons])
igood = (sss_block >= 10) # find good salinity values
mask[igood] = 1
inan = (sss_block < 10) # find bad values
mask[inan] = np.nan # not a number
We will now plot the sea surface salinity for a single day. Fresh water (low salinity) is depicted as dark blue. Note, there seems to be a lot of fresh water emanating from the coastline and which is likely due to the Congo and Niger rivers. Also note, we are multiplying by the mask (nans where data is bad, ones where it is good) to distinguish between bad measurements and good measurements. It also shows land as white, which is helpful.
# Plot the salinity.
plt.pcolor(lons,lats,sss_block*mask)
plt.clim(28,35)
plt.xlabel('Longitude (deg)')
plt.ylabel('Latitude (deg)')
plt.title('Sea Surface Salinity: '+fname) # here we need to insert a date inside the brackets
plt.grid()
plt.colorbar()
plt.show()
We can also save this picture above to a file. For example, using the above code, we would need to comment the statement plt.show() and then replace this with two extra lines of code. This saves the graphic to a file called SSS_map.png which is located in the same directory as the Jupyter Notebook.
# Plot the salinity.
plt.pcolor(lons,lats,sss_block*mask)
plt.clim(28,35)
plt.xlabel('Longitude (deg)')
plt.ylabel('Latitude (deg)')
plt.title('Sea Surface Salinity: '+fname) # here we need to insert a date inside the brackets
plt.grid()
plt.colorbar()
#plt.show()
outfile = "SSS_map.png"
plt.savefig(outfile,format='png',dpi=200)
The following plot examines how the salinity values are contaminated by radiation from the land. This happens for microwave-derived salinity measurements because it is a passive sensor, and the radiation from satellite TV antennas and other technology tend to bias the results. So, whoever created the netcdf file was aware of this problem and masked out values of salinity for which the gland variable had a value exceeding a given amount.
We don't need to understand the units of this variable (it is multiplied by the antennae gain). For right now, just simply look at the colours: bright (yellow) colours indicate bias in the salinity measurements.
# Look at the fraction of pixels contaminated by land.
land_block1 = land_fraction[ilat,:]
land_block1 = land_block1[:,ilon1]
land_block2 = land_fraction[ilat,:]
land_block2 = land_block2[:,ilon2]
land_block = np.zeros([nlats,nlons]) # allocate space
land_block[0:nlats,0:nlons1] = land_block1
land_block[0:nlats,nlons1:nlons1+nlons2] = land_block2
# Land fraction threshold (in metadata).
land_fraction_threshold = 0.00800000037997961
iland = land_block > land_fraction_threshold # pixels that are land
#mask = np.ones([nlats,nlons])
#mask[iland] = np.nan # not a number
# Plot the land fraction.
plt.pcolor(lons,lats,(land_block*mask))
#plt.clim(0,1)
plt.xlabel('Longitude (deg)')
plt.ylabel('Latitude (deg)')
plt.title('Fraction of pixel contaminated by land')
plt.grid()
plt.colorbar()
plt.show()
If I execute the above code with the latitude and longitude limits changed to be latlim = np.array([4.25,6.25]) and lonlim = np.array([-1.0,1.0]), I get the following result.
# Plot the land fraction.
plt.pcolor(lons,lats,(land_block*mask))
#plt.clim(0,1)
plt.xlabel('Longitude (deg)')
plt.ylabel('Latitude (deg)')
plt.title('Fraction of pixel contaminated by land')
plt.grid()
plt.colorbar()
plt.show()
This is actually not bad at all! The previous version of these data (version 3.0) had more contamination near the coast. The present version of these data (version 4.0) has only minimal contamination near the coast.
It is worth looking back on our lesson to summarize what we have learned.
The project participant (you) should, for example, be able to apply it to another data set. This can be tricky but, below, we encourage the researcher/scientist to apply these to sea surface temperature (SST) in the Gulf of Guinea.
Try the following data file, for example. It is a sea surface temperature (SST) file created by RSS:
When finished, send a "png" file of the map of SST (zoom in on the Gulf of Guinea) to the instructor. It can be a little tricky! so a few tips are given in a separate document.
# Plot the sea surface temperature.
plt.pcolor(lons,lats,sst_block*mask,cmap="coolwarm")
plt.xlabel('Longitude (deg)')
plt.ylabel('Latitude (deg)')
plt.title('Sea Surface Temperature: '+fname) # here we need to insert a date inside the brackets
plt.grid()
plt.colorbar()
outfile = "SST_map.png"
plt.savefig(outfile,format='png',dpi=200)
# Note that we had to get rid of the plt.show() command.
# Please ask Dr. Paige Martin why this is.