POLARIN’s ERDDAP querying: tabledap

POLARIN’s ERDDAP querying: tabledap#

How can we query oceanographic data in tabular form using tabledap?

This notebook can be run with Binder by following the link: polarin_erddap_querying_tabledap

This notebook will illustrate how to build queries and make requests to https://erddap.s4polarin.eu/erddap/index.html using Python.

Setup#

To begin we need to import the necessary libraries.

!pip install requests pandas
# these packages should be installed with the command above if running the code locally

import requests
import pandas as pd
import io
import warnings

warnings.filterwarnings("ignore")

Get a list of available tabledap datasets#

To check what datasets are available in the ERDDAP and get their URLs the first step is to make a request to https://erddap.s4polarin.eu/erddap/tabledap/allDatasets.html using the URL that will allow us to get the datasets’ ids and their URLs based on the data structure. After receiving the data it will be loaded into a pandas DataFrame.

datasets_url = 'https://erddap.s4polarin.eu/erddap/tabledap/allDatasets.csv?datasetID%2Ctabledap'

# request and load into DataFrame
datasets_resp = requests.get(datasets_url)
datasets_df = pd.read_csv(io.StringIO(datasets_resp.text), sep=',')

# drop rows where tabledap is NaN
datasets_df = datasets_df.dropna(subset=['tabledap'])

# add url column
datasets_df['url'] = datasets_df['tabledap']
cleaned_df = datasets_df.drop(columns=['tabledap'])

pd.set_option('display.max_colwidth', None)
cleaned_df = cleaned_df.reset_index(drop=True)
cleaned_df

	datasetID	url
0	allDatasets	https://erddap.s4polarin.eu/erddap/tabledap/allDatasets
1	cnr_iadc_119c_736f_bdbd	https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_119c_736f_bdbd
2	cnr_iadc_a128_9f50_66ca	https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_a128_9f50_66ca
3	cnr_iadc_7668_30ee_5237	https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_7668_30ee_5237
4	ARICE_CCIN80_Arcticnet_0501a_BaffinBay	https://erddap.s4polarin.eu/erddap/tabledap/ARICE_CCIN80_Arcticnet_0501a_BaffinBay
...	...	...
314	cnr_iadc_aee4_8c82_1df6	https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_aee4_8c82_1df6
315	cnr_antarcticdatacenter_a8e9_20db_433e	https://erddap.s4polarin.eu/erddap/tabledap/cnr_antarcticdatacenter_a8e9_20db_433e
316	cnr_iadc_8dd9_d4f8_1cf8	https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_8dd9_d4f8_1cf8
317	ARICE_Amundsen_TSG_V1	https://erddap.s4polarin.eu/erddap/tabledap/ARICE_Amundsen_TSG_V1
318	cnr_iadc_25f1_464d_dfcc	https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_25f1_464d_dfcc

319 rows × 2 columns

Using these URLs we will than be able to get their data.
In this example we will use the cnr_iadc_f5ff_4134_70dd dataset, with the URL: https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_f5ff_4134_70dd

Get a list of variables for the dataset#

Now we can make a request to the dataset’s metadata, which will give us a list of all the available variables and their relative data type. These variables can be than used in the following requests.

dataset_url = 'https://erddap.s4polarin.eu/erddap/tabledap/cnr_iadc_f5ff_4134_70dd'

# building the full url for the metadata and making the request
metadata_url = dataset_url.replace('tabledap', 'info').replace('griddap', 'info') + '/index.csv'

metadata_resp = requests.get(metadata_url)
metadata_df = pd.read_csv(io.StringIO(metadata_resp.text), sep=',')
variables_df = metadata_df.loc[metadata_df['Row Type'].isin(['variable', 'dimension'])]
variables_df.reset_index(drop=True, inplace=True)
variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)
variables_df

/tmp/ipython-input-1584371082.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variables_df.drop(columns=['Row Type', 'Attribute Name', 'Value'], inplace=True)

	Variable Name	Data Type
0	time	double
1	latitude	float
2	longitude	float
3	depth	float
4	TEMP	float
5	TEMP_QC	byte
6	station_id	String

Get a list of platform codes#

We will then perform another request to check the station id for the selected dataset, which will be useful in the following queries to the ERDDAP.

platforms_query = '.csv?station_id&distinct()'

# The data format specified is 'csv' (in which the first row contains the column names and the second the units of measurment, which will be removed from the dataframe in these examples).
# Other possibilities are  'csv0' which will return only the data rows and 'csvp', which will return a csv with the column names (and their unit of measurment) as first row and data starting from the second.
# the additional parameter &distinct() will ensure we will get only unique rows

platform_resp = requests.get(dataset_url + platforms_query)
platforms_df = pd.read_csv(io.StringIO(platform_resp.text), sep=',')
platforms_df

	station_id
0	vws21g

Data gathering#

Following are three examples of data queries:

With station_id and time range#

When building the URL to get the data a platform code can be inserted in the query to get the data relative to the platform. In the following example the platform code ‘vws21g’ has been chosen and the variables are:

station_id
time
latitude
longitude
TEMP

The query will look like:

?PLATFORMCODE%2Ctime%2Clatitude%2Clongitude%2CTEMP&station_id=%22vws21g%22&time%3E=2023-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

It can be divided into two main parts:

?station_id%2Ctime%2Clatitude%2Clongitude%2CTEMP

Where ? indicates the start of query parametes and the rest is a list of variables we want as columns in the csv, separated by %2C, an encoded comma(,).
&dtation_id=%22vws21g%22&time%3E=2022-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

After the list of variables we can add filters, separated by &.

The platform code chosen is vws21g and it has to be inserted between encoded double quotes(“), represented by %22.

The syntax for the timerange is:

time%3E=2022-04-29T09%3A45%3A00Z&time%3C=2024-04-29T09%3A45%3A00Z

Here the other encoded characters are %3E (>), %3C (<) and %3A (:).

The time has to be passed as an ISO string, with the format YYYY-MM-DDThh:mm:ssZ.

platform_code = 'vws21g'

variables = '.csv?station_id%2Ctime%2Clatitude%2Clongitude%2CTEMP'
filters = f'&station_id=%22{platform_code}%22&time%3E=2023-04-29T00%3A00%3A00Z&time%3C=2024-04-29T00%3A00%3A00Z'

data_resp = requests.get(dataset_url + variables + filters)
data_df = pd.read_csv(io.StringIO(data_resp.text), sep=',')

data_df=data_df.sort_values(by=["time"])
data_df.reset_index(drop=True, inplace=True)
data_df = data_df.dropna(subset=['station_id'])
data_df

	station_id	time	latitude	longitude	TEMP
0	vws21g	2023-04-29T00:00:00Z	79.12711	11.724883	NaN
1	vws21g	2023-04-29T00:10:00Z	79.12711	11.724883	NaN
2	vws21g	2023-04-29T00:20:00Z	79.12711	11.724883	NaN
3	vws21g	2023-04-29T00:30:00Z	79.12711	11.724883	NaN
4	vws21g	2023-04-29T00:40:00Z	79.12711	11.724883	NaN
...	...	...	...	...	...
50955	vws21g	2024-04-28T23:15:07Z	79.12797	11.7237	0.206
50956	vws21g	2024-04-28T23:25:07Z	79.12797	11.7237	0.2033
50957	vws21g	2024-04-28T23:35:07Z	79.12797	11.7237	0.2071
50958	vws21g	2024-04-28T23:45:07Z	79.12797	11.7237	0.2061
50959	vws21g	2024-04-28T23:55:07Z	79.12797	11.7237	0.202

50960 rows × 5 columns

With coordinates range#

Another possibility when querying the data is to specify a range of coordinates. This can be done by inserting in the query filters the following:

latitude%3E=75&latitude%3C=80&longitude%3E=-50&longitude%3C=50

Effectively selecting platforms inside a square delimited by:

latitude equal or greater than 75 and equal or less than 80

and

longitude equal or greater than -50 and equal or less than 50.

coords_variables = '.csv?station_id%2Clatitude%2Clongitude'
coords_filter = '&latitude%3E=75&latitude%3C=80&longitude%3E=-50&longitude%3C=50&distinct()'

coords_data_resp = requests.get(dataset_url + coords_variables + coords_filter)
coords_data_df = pd.read_csv(io.StringIO(coords_data_resp.text), sep=',')

coords_data_df = coords_data_df.dropna(subset=['station_id'])

coords_data_df

	station_id	latitude	longitude
1	vws21g	79.12711	11.724883
2	vws21g	79.12797	11.7237

Additional resources#

For additional information about ERDDAP please visit:

https://erddap.s4polarin.eu/erddap/information.html

The webpages for the Python’s libraries that have been used in this notebook are:

REQUESTS: https://requests.readthedocs.io/en/latest/
PANDAS: https://pandas.pydata.org/
IO: https://docs.python.org/3/library/io.html

This work has received funding from the European Union Horizon Europe project Polar Research Infrastructure Network (POLARIN) under grant agreement No. 101130949 (https://doi.org/10.3030/101130949). This notebook makes use of data available in the European Marine Observation and Data Network (EMODnet, https://emodnet.ec.europa.eu).