stations

StationsN

class weatherDB.stations.StationsN[source]

Bases: StationsBase

A class to work with and download 10 minutes precipitation data for several stations.

Public Methods:

update_richter_class([stids, do_mp])

Update the Richter exposition class.

richter_correct([stids])

Richter correct the filled data.

last_imp_corr([stids, do_mp])

Richter correct the filled data for the last imported period.

update([only_new])

Make a complete update of the stations.

Inherited from StationsBase

__init__()

download_meta()

Download the meta file(s) from the CDC server.

update_meta()

Update the meta table by comparing to the CDC server.

update_period_meta([stids])

Update the period in the meta table of the raw data.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos, stids, only_real])

Get the meta Dataframe from the Database.

get_stations([only_real, stids])

Get a list with all the stations as Station-objects.

count_holes([stids])

Count holes in timeseries depending on there length.

update_raw([only_new, only_real, stids, ...])

Download all stations data from CDC and upload to database.

last_imp_quality_check([stids, do_mp])

Do the quality check of the last import.

last_imp_fillup([stids, do_mp])

Do the gap filling of the last import.

quality_check([period, only_real, stids, do_mp])

Quality check the raw data for a given period.

update_ma([stids, do_mp])

Update the multi annual values for the stations.

fillup([only_real, stids, do_mp])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

update([only_new])

Make a complete update of the stations.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.


count_holes(stids='all', **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) –

    This is a list of parameters, that is supported by the StationBase.count_holes method. E.G.: weeks : list, optional

    A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

    kindstr

    The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

    periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional

    The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

    between_meta_periodbool, optional

    Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

    crop_periodbool, optional

    should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_meta()

Download the meta file(s) from the CDC server.

Returns:

The meta file from the CDC server. If there are several meta files on the server, they are joined together.

Return type:

geopandas.GeoDataFrame

fillup(only_real=False, stids='all', do_mp=False, **kwargs)

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Parameters:
  • only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. can be period, agg_to, kinds

Returns:

A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.

Return type:

pd.Dataframe

get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)

Get the meta Dataframe from the Database.

Parameters:
  • infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]

  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

Returns:

The meta DataFrame.

Return type:

pandas.DataFrame or geopandas.GeoDataFrae

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_stations(only_real=True, stids='all')

Get a list with all the stations as Station-objects.

Parameters:
  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

last_imp_corr(stids='all', do_mp=False, **kwargs)[source]

Richter correct the filled data for the last imported period.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

last_imp_fillup(stids='all', do_mp=False, **kwargs)

Do the gap filling of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

last_imp_quality_check(stids='all', do_mp=False, **kwargs)

Do the quality check of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)

Quality check the raw data for a given period.

Parameters:
  • period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

richter_correct(stids='all', **kwargs)[source]

Richter correct the filled data.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update(only_new=True, **kwargs)[source]

Make a complete update of the stations.

Does the update_raw, quality check, fillup and richter correction of the stations.

Parameters:

only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.

update_ma(stids='all', do_mp=False, **kwargs)

Update the multi annual values for the stations.

Get a multi annual value from the corresponding raster and save to the multi annual table in the database.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_meta()

Update the meta table by comparing to the CDC server.

The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.

update_period_meta(stids='all')

Update the period in the meta table of the raw data.

Parameters:

stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)

Download all stations data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_richter_class(stids='all', do_mp=True, **kwargs)[source]

Update the Richter exposition class.

Get the value from the raster, compare with the richter categories and save to the database.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The keyword arguments to be handed to the station.StationN.update_richter_class method.

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

StationsT

class weatherDB.stations.StationsT[source]

Bases: StationsTETBase

A class to work with and download temperature data for several stations.

Public Methods:

Inherited from StationsTETBase

fillup([only_real, stids])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Inherited from StationsBase

__init__()

download_meta()

Download the meta file(s) from the CDC server.

update_meta()

Update the meta table by comparing to the CDC server.

update_period_meta([stids])

Update the period in the meta table of the raw data.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos, stids, only_real])

Get the meta Dataframe from the Database.

get_stations([only_real, stids])

Get a list with all the stations as Station-objects.

count_holes([stids])

Count holes in timeseries depending on there length.

update_raw([only_new, only_real, stids, ...])

Download all stations data from CDC and upload to database.

last_imp_quality_check([stids, do_mp])

Do the quality check of the last import.

last_imp_fillup([stids, do_mp])

Do the gap filling of the last import.

quality_check([period, only_real, stids, do_mp])

Quality check the raw data for a given period.

update_ma([stids, do_mp])

Update the multi annual values for the stations.

fillup([only_real, stids, do_mp])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

update([only_new])

Make a complete update of the stations.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.


count_holes(stids='all', **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) –

    This is a list of parameters, that is supported by the StationBase.count_holes method. E.G.: weeks : list, optional

    A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

    kindstr

    The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

    periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional

    The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

    between_meta_periodbool, optional

    Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

    crop_periodbool, optional

    should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_meta()

Download the meta file(s) from the CDC server.

Returns:

The meta file from the CDC server. If there are several meta files on the server, they are joined together.

Return type:

geopandas.GeoDataFrame

fillup(only_real=False, stids='all')

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Parameters:
  • only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. can be period, agg_to, kinds

Returns:

A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.

Return type:

pd.Dataframe

get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)

Get the meta Dataframe from the Database.

Parameters:
  • infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]

  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

Returns:

The meta DataFrame.

Return type:

pandas.DataFrame or geopandas.GeoDataFrae

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_stations(only_real=True, stids='all')

Get a list with all the stations as Station-objects.

Parameters:
  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

last_imp_fillup(stids='all', do_mp=False, **kwargs)

Do the gap filling of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

last_imp_quality_check(stids='all', do_mp=False, **kwargs)

Do the quality check of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)

Quality check the raw data for a given period.

Parameters:
  • period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

update(only_new=True, **kwargs)

Make a complete update of the stations.

Does the update_raw, quality check and fillup of the stations.

Parameters:

only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.

update_ma(stids='all', do_mp=False, **kwargs)

Update the multi annual values for the stations.

Get a multi annual value from the corresponding raster and save to the multi annual table in the database.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_meta()

Update the meta table by comparing to the CDC server.

The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.

update_period_meta(stids='all')

Update the period in the meta table of the raw data.

Parameters:

stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)

Download all stations data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

StationsET

class weatherDB.stations.StationsET[source]

Bases: StationsTETBase

A class to work with and download potential Evapotranspiration (VPGB) data for several stations.

Public Methods:

Inherited from StationsTETBase

fillup([only_real, stids])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Inherited from StationsBase

__init__()

download_meta()

Download the meta file(s) from the CDC server.

update_meta()

Update the meta table by comparing to the CDC server.

update_period_meta([stids])

Update the period in the meta table of the raw data.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos, stids, only_real])

Get the meta Dataframe from the Database.

get_stations([only_real, stids])

Get a list with all the stations as Station-objects.

count_holes([stids])

Count holes in timeseries depending on there length.

update_raw([only_new, only_real, stids, ...])

Download all stations data from CDC and upload to database.

last_imp_quality_check([stids, do_mp])

Do the quality check of the last import.

last_imp_fillup([stids, do_mp])

Do the gap filling of the last import.

quality_check([period, only_real, stids, do_mp])

Quality check the raw data for a given period.

update_ma([stids, do_mp])

Update the multi annual values for the stations.

fillup([only_real, stids, do_mp])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

update([only_new])

Make a complete update of the stations.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.


count_holes(stids='all', **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) –

    This is a list of parameters, that is supported by the StationBase.count_holes method. E.G.: weeks : list, optional

    A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

    kindstr

    The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

    periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional

    The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

    between_meta_periodbool, optional

    Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

    crop_periodbool, optional

    should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_meta()

Download the meta file(s) from the CDC server.

Returns:

The meta file from the CDC server. If there are several meta files on the server, they are joined together.

Return type:

geopandas.GeoDataFrame

fillup(only_real=False, stids='all')

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Parameters:
  • only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. can be period, agg_to, kinds

Returns:

A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.

Return type:

pd.Dataframe

get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)

Get the meta Dataframe from the Database.

Parameters:
  • infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]

  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

Returns:

The meta DataFrame.

Return type:

pandas.DataFrame or geopandas.GeoDataFrae

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_stations(only_real=True, stids='all')

Get a list with all the stations as Station-objects.

Parameters:
  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

last_imp_fillup(stids='all', do_mp=False, **kwargs)

Do the gap filling of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

last_imp_quality_check(stids='all', do_mp=False, **kwargs)

Do the quality check of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)

Quality check the raw data for a given period.

Parameters:
  • period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

update(only_new=True, **kwargs)

Make a complete update of the stations.

Does the update_raw, quality check and fillup of the stations.

Parameters:

only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.

update_ma(stids='all', do_mp=False, **kwargs)

Update the multi annual values for the stations.

Get a multi annual value from the corresponding raster and save to the multi annual table in the database.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_meta()

Update the meta table by comparing to the CDC server.

The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.

update_period_meta(stids='all')

Update the period in the meta table of the raw data.

Parameters:

stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)

Download all stations data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

StationsND

class weatherDB.stations.StationsND[source]

Bases: StationsBase

A class to work with and download daily precipitation data for several stations.

Those stations data are only downloaded to do some quality checks on the 10 minutes data. Therefor there is no special quality check and richter correction done on this data. If you want daily precipitation data, better use the 10 minutes station class (StationN) and aggregate to daily values.

Public Methods:

Inherited from StationsBase

__init__()

download_meta()

Download the meta file(s) from the CDC server.

update_meta()

Update the meta table by comparing to the CDC server.

update_period_meta([stids])

Update the period in the meta table of the raw data.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos, stids, only_real])

Get the meta Dataframe from the Database.

get_stations([only_real, stids])

Get a list with all the stations as Station-objects.

count_holes([stids])

Count holes in timeseries depending on there length.

update_raw([only_new, only_real, stids, ...])

Download all stations data from CDC and upload to database.

last_imp_quality_check([stids, do_mp])

Do the quality check of the last import.

last_imp_fillup([stids, do_mp])

Do the gap filling of the last import.

quality_check([period, only_real, stids, do_mp])

Quality check the raw data for a given period.

update_ma([stids, do_mp])

Update the multi annual values for the stations.

fillup([only_real, stids, do_mp])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

update([only_new])

Make a complete update of the stations.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.


count_holes(stids='all', **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) –

    This is a list of parameters, that is supported by the StationBase.count_holes method. E.G.: weeks : list, optional

    A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

    kindstr

    The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

    periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional

    The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

    between_meta_periodbool, optional

    Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

    crop_periodbool, optional

    should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_meta()

Download the meta file(s) from the CDC server.

Returns:

The meta file from the CDC server. If there are several meta files on the server, they are joined together.

Return type:

geopandas.GeoDataFrame

fillup(only_real=False, stids='all', do_mp=False, **kwargs)

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Parameters:
  • only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. can be period, agg_to, kinds

Returns:

A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.

Return type:

pd.Dataframe

get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)

Get the meta Dataframe from the Database.

Parameters:
  • infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]

  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

Returns:

The meta DataFrame.

Return type:

pandas.DataFrame or geopandas.GeoDataFrae

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_stations(only_real=True, stids='all')

Get a list with all the stations as Station-objects.

Parameters:
  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

last_imp_fillup(stids='all', do_mp=False, **kwargs)

Do the gap filling of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

last_imp_quality_check(stids='all', do_mp=False, **kwargs)

Do the quality check of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)

Quality check the raw data for a given period.

Parameters:
  • period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

update(only_new=True, **kwargs)

Make a complete update of the stations.

Does the update_raw, quality check and fillup of the stations.

Parameters:

only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.

update_ma(stids='all', do_mp=False, **kwargs)

Update the multi annual values for the stations.

Get a multi annual value from the corresponding raster and save to the multi annual table in the database.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_meta()

Update the meta table by comparing to the CDC server.

The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.

update_period_meta(stids='all')

Update the period in the meta table of the raw data.

Parameters:

stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)

Download all stations data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

GroupStations

class weatherDB.stations.GroupStations[source]

Bases: object

A class to group all possible parameters of all the stations.

Public Methods:

__init__()

get_valid_stids()

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([paras, stids])

Get the meta Dataframe from the Database.

get_para_stations([paras])

Get a list with all the multi parameter stations as stations.Station{parameter}-objects.

get_group_stations([stids])

Get a list with all the stations as station.GroupStation-objects.

create_ts(dir[, period, kinds, stids, ...])

Download and create the weather tables as csv files.

create_roger_ts(dir[, period, stids, kind, ...])

Create the timeserie files for roger as csv.


create_roger_ts(dir, period=(None, None), stids='all', kind='best', r_r0=1, add_t_min=False, add_t_max=False, do_toolbox_format=False, **kwargs)[source]

Create the timeserie files for roger as csv.

This is only a wrapper function for create_ts with some standard settings.

Parameters:
  • dir (pathlib like object or zipfile.ZipFile) – The directory or Zipfile to store the timeseries in. If a zipfile is given a folder with the stations ID is added to the filepath.

  • period (TimestampPeriod like object, optional) – The period for which to get the timeseries. If (None, None) is entered, then the maximal possible period is computed. The default is (None, None)

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • r_r0 (int or float or None or pd.Series or list, optional) – Should the ET timeserie contain a column with R_R0. If None, then no column is added. If int, then a R/R0 column is appended with this number as standard value. If list of int or floats, then the list should have the same length as the ET-timeserie and is appended to the Timeserie. If pd.Series, then the index should be a timestamp index. The series is then joined to the ET timeserie. The default is 1.

  • add_t_min (bool, optional) – Should the minimal temperature value get added? The default is False.

  • add_t_max (bool, optional) – Should the maximal temperature value get added? The default is False.

  • do_toolbox_format (bool, optional) – Should the timeseries be saved in the RoGeR toolbox format? (have a look at the RoGeR examples in https://github.com/Hydrology-IFH/roger) The default is False.

  • **kwargs – additional parameters for GroupStation.create_ts

Raises:

Warning – If there are NAs in the timeseries or the period got changed.

create_ts(dir, period=(None, None), kinds='best', stids='all', agg_to='10 min', r_r0=None, split_date=False, nas_allowed=True, add_na_share=False, add_t_min=False, add_t_max=False, **kwargs)[source]

Download and create the weather tables as csv files.

Parameters:
  • dir (path-like object) – The directory where to save the tables. If the directory is a ZipFile, then the output will get zipped into this.

  • period (TimestampPeriod like object, optional) – The period for which to get the timeseries. If (None, None) is entered, then the maximal possible period is computed. The default is (None, None)

  • kinds (str or list of str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • agg_to (str, optional) – To what aggregation level should the timeseries get aggregated to. The minimum aggregation for Temperatur and ET is daily and for the precipitation it is 10 minutes. If a smaller aggregation is selected the minimum possible aggregation for the respective parameter is returned. So if 10 minutes is selected, than precipitation is returned in 10 minuets and T and ET as daily. The default is “10 min”.

  • r_r0 (int or float or None or pd.Series or list, optional) – Should the ET timeserie contain a column with R/R0. If None, then no column is added. If int, then a R/R0 column is appended with this number as standard value. If list of int or floats, then the list should have the same length as the ET-timeserie and is appended to the Timeserie. If pd.Series, then the index should be a timestamp index. The series is then joined to the ET timeserie. The default is None.

  • split_date (bool, optional) – Should the timestamp get splitted into parts, so one column for year, one for month etc.? If False the timestamp is saved in one column as string.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • add_t_min (bool, optional) – Should the minimal temperature value get added? The default is False.

  • add_t_max (bool, optional) – Should the maximal temperature value get added? The default is False.

  • **kwargs – additional parameters for GroupStation.create_ts

get_group_stations(stids='all', **kwargs)[source]

Get a list with all the stations as station.GroupStation-objects.

Parameters:
  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • **kwargs (optional) – The keyword arguments are handed to the creation of the single GroupStation objects. Can be e.g. “error_if_missing”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_meta(paras='all', stids='all', **kwargs)[source]

Get the meta Dataframe from the Database.

Parameters:
  • paras (list or str, optional) – The parameters for which to get the information. If “all” then all the available parameters are requested. The default is “all”.

  • stids (string or list of int, optional) – The Stations to return the meta information for. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • **kwargs (dict, optional) – The keyword arguments are passed to the station.GroupStation().get_meta method. From there it is passed to the single station get_meta method. Can be e.g. “infos”

Returns:

  • dict of pandas.DataFrame or geopandas.GeoDataFrame

  • or pandas.DataFrame or geopandas.GeoDataFrame – The meta DataFrame. If several parameters are asked for, then a dict with an entry per parameter is returned.

Raises:
  • ValueError – If the given stids (Station_IDs) are not all valid.

  • ValueError – If the given paras are not all valid.

classmethod get_meta_explanation(infos='all')[source]

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_para_stations(paras='all')[source]

Get a list with all the multi parameter stations as stations.Station{parameter}-objects.

Parameters:

paras (list or str, optional) – The parameters for which to get the objects. If “all” then all the available parameters are requested. The default is “all”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_valid_stids()[source]

StationsBase…

Those are the base station classes on which the real station classes above depend on. None of them is working on its own, because the class variables are not yet set correctly.

class weatherDB.stations.StationsBase[source]

Bases: object

Public Methods:

__init__()

download_meta()

Download the meta file(s) from the CDC server.

update_meta()

Update the meta table by comparing to the CDC server.

update_period_meta([stids])

Update the period in the meta table of the raw data.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos, stids, only_real])

Get the meta Dataframe from the Database.

get_stations([only_real, stids])

Get a list with all the stations as Station-objects.

count_holes([stids])

Count holes in timeseries depending on there length.

update_raw([only_new, only_real, stids, ...])

Download all stations data from CDC and upload to database.

last_imp_quality_check([stids, do_mp])

Do the quality check of the last import.

last_imp_fillup([stids, do_mp])

Do the gap filling of the last import.

quality_check([period, only_real, stids, do_mp])

Quality check the raw data for a given period.

update_ma([stids, do_mp])

Update the multi annual values for the stations.

fillup([only_real, stids, do_mp])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

update([only_new])

Make a complete update of the stations.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.


count_holes(stids='all', **kwargs)[source]

Count holes in timeseries depending on there length.

Parameters:
  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) –

    This is a list of parameters, that is supported by the StationBase.count_holes method. E.G.: weeks : list, optional

    A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

    kindstr

    The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

    periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional

    The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

    between_meta_periodbool, optional

    Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

    crop_periodbool, optional

    should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_meta()[source]

Download the meta file(s) from the CDC server.

Returns:

The meta file from the CDC server. If there are several meta files on the server, they are joined together.

Return type:

geopandas.GeoDataFrame

fillup(only_real=False, stids='all', do_mp=False, **kwargs)[source]

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Parameters:
  • only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_df(stids, **kwargs)[source]

Get a DataFrame with the corresponding data.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. can be period, agg_to, kinds

Returns:

A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.

Return type:

pd.Dataframe

get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)[source]

Get the meta Dataframe from the Database.

Parameters:
  • infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]

  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

Returns:

The meta DataFrame.

Return type:

pandas.DataFrame or geopandas.GeoDataFrae

classmethod get_meta_explanation(infos='all')[source]

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_stations(only_real=True, stids='all')[source]

Get a list with all the stations as Station-objects.

Parameters:
  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

last_imp_fillup(stids='all', do_mp=False, **kwargs)[source]

Do the gap filling of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

last_imp_quality_check(stids='all', do_mp=False, **kwargs)[source]

Do the quality check of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)[source]

Quality check the raw data for a given period.

Parameters:
  • period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

update(only_new=True, **kwargs)[source]

Make a complete update of the stations.

Does the update_raw, quality check and fillup of the stations.

Parameters:

only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.

update_ma(stids='all', do_mp=False, **kwargs)[source]

Update the multi annual values for the stations.

Get a multi annual value from the corresponding raster and save to the multi annual table in the database.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_meta()[source]

Update the meta table by comparing to the CDC server.

The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.

update_period_meta(stids='all')[source]

Update the period in the meta table of the raw data.

Parameters:

stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)[source]

Download all stations data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

class weatherDB.stations.StationsTETBase[source]

Bases: StationsBase

Public Methods:

fillup([only_real, stids])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Inherited from StationsBase

__init__()

download_meta()

Download the meta file(s) from the CDC server.

update_meta()

Update the meta table by comparing to the CDC server.

update_period_meta([stids])

Update the period in the meta table of the raw data.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos, stids, only_real])

Get the meta Dataframe from the Database.

get_stations([only_real, stids])

Get a list with all the stations as Station-objects.

count_holes([stids])

Count holes in timeseries depending on there length.

update_raw([only_new, only_real, stids, ...])

Download all stations data from CDC and upload to database.

last_imp_quality_check([stids, do_mp])

Do the quality check of the last import.

last_imp_fillup([stids, do_mp])

Do the gap filling of the last import.

quality_check([period, only_real, stids, do_mp])

Quality check the raw data for a given period.

update_ma([stids, do_mp])

Update the multi annual values for the stations.

fillup([only_real, stids, do_mp])

Fill up the quality checked data with data from nearby stations to get complete timeseries.

update([only_new])

Make a complete update of the stations.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.


count_holes(stids='all', **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) –

    This is a list of parameters, that is supported by the StationBase.count_holes method. E.G.: weeks : list, optional

    A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

    kindstr

    The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

    periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional

    The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

    between_meta_periodbool, optional

    Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

    crop_periodbool, optional

    should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_meta()

Download the meta file(s) from the CDC server.

Returns:

The meta file from the CDC server. If there are several meta files on the server, they are joined together.

Return type:

geopandas.GeoDataFrame

fillup(only_real=False, stids='all')[source]

Fill up the quality checked data with data from nearby stations to get complete timeseries.

Parameters:
  • only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

get_df(stids, **kwargs)

Get a DataFrame with the corresponding data.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. can be period, agg_to, kinds

Returns:

A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.

Return type:

pd.Dataframe

get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)

Get the meta Dataframe from the Database.

Parameters:
  • infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]

  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

Returns:

The meta DataFrame.

Return type:

pandas.DataFrame or geopandas.GeoDataFrae

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_stations(only_real=True, stids='all')

Get a list with all the stations as Station-objects.

Parameters:
  • only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Returns:

returns a list with the corresponding station objects.

Return type:

Station-object

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

last_imp_fillup(stids='all', do_mp=False, **kwargs)

Do the gap filling of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

last_imp_quality_check(stids='all', do_mp=False, **kwargs)

Do the quality check of the last import.

Parameters:
  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)

Quality check the raw data for a given period.

Parameters:
  • period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

update(only_new=True, **kwargs)

Make a complete update of the stations.

Does the update_raw, quality check and fillup of the stations.

Parameters:

only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.

update_ma(stids='all', do_mp=False, **kwargs)

Update the multi annual values for the stations.

Get a multi annual value from the corresponding raster and save to the multi annual table in the database.

Parameters:
  • stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_meta()

Update the meta table by comparing to the CDC server.

The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.

update_period_meta(stids='all')

Update the period in the meta table of the raw data.

Parameters:

stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.

update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)

Download all stations data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.

  • stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.

  • do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

  • kwargs (dict, optional) – The additional keyword arguments for the _run_method method

Raises:

ValueError – If the given stids (Station_IDs) are not all valid.