stations
StationsP
- class weatherDB.stations.StationsP[source]
Bases:
StationsBaseA class to work with and download 10 minutes precipitation data for several stations.
Public Methods:
update_richter_class([stids, do_mp])Update the Richter exposition class.
richter_correct([stids])Richter correct the filled data.
last_imp_corr([stids, do_mp])Richter correct the filled data for the last imported period.
update([only_new])Make a complete update of the stations.
Inherited from
StationsBase__init__()Download the meta file(s) from the CDC server.
update_meta([stids])Update the meta table by comparing to the CDC server.
update_period_meta([stids])Update the period in the meta table of the raw data.
get_meta_explanation([infos])Get the explanations of the available meta fields.
get_meta([infos, stids, only_real])Get the meta Dataframe from the Database.
get_stations([only_real, stids, ...])Get a list with all the stations as Station-objects.
get_quotient(kinds_num, kinds_denom[, ...])Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
count_holes([stids])Count holes in timeseries depending on there length.
update_raw([only_new, only_real, stids, ...])Download all stations data from CDC and upload to database.
last_imp_quality_check([stids, do_mp])Do the quality check of the last import.
last_imp_fillup([stids, do_mp])Do the gap filling of the last import.
quality_check([period, only_real, stids, do_mp])Quality check the raw data for a given period.
update_ma_raster([stids, do_mp])Update the multi annual raster values for the stations.
update_ma_timeseries(kind[, stids, do_mp])Update the multi annual timeseries values for the stations.
fillup([only_real, stids, do_mp])Fill up the quality checked data with data from nearby stations to get complete timeseries.
update([only_new])Make a complete update of the stations.
get_df(stids, **kwargs)Get a DataFrame with the corresponding data.
- update_richter_class(stids='all', do_mp=True, **kwargs)[source]
Update the Richter exposition class.
Get the value from the raster, compare with the richter categories and save to the database.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The keyword arguments to be handed to the station.StationP.update_richter_class and get_stations method.
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- richter_correct(stids='all', **kwargs)[source]
Richter correct the filled data.
- Parameters:
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- last_imp_corr(stids='all', do_mp=False, **kwargs)[source]
Richter correct the filled data for the last imported period.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update(only_new=True, **kwargs)[source]
Make a complete update of the stations.
Does the update_raw, quality check, fillup and richter correction of the stations.
- Parameters:
only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.
- count_holes(stids='all', **kwargs)
Count holes in timeseries depending on there length.
- Parameters:
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional)
**kwargs –
This is a list of parameters, that is supported by the StationBase.count_holes method.
Furthermore the kwargs are passed to the get_stations method.
possible values are:
- weekslist, optional
A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]
- kindstr
The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.
- periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional
The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).
- between_meta_periodbool, optional
Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.
- crop_periodbool, optional
should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.
- Returns:
A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.
- Return type:
- Raises:
ValueError – If the input parameters were not correct.
- download_meta()
Download the meta file(s) from the CDC server.
- Returns:
The meta file from the CDC server. If there are several meta files on the server, they are joined together.
- Return type:
geopandas.GeoDataFrame
- fillup(only_real=False, stids='all', do_mp=False, **kwargs)
Fill up the quality checked data with data from nearby stations to get complete timeseries.
- Parameters:
only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- get_df(stids, **kwargs)
Get a DataFrame with the corresponding data.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. Possible parameters are period, agg_to, kinds. Furthermore the kwargs are passed to the get_stations method.
- Returns:
A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.
- Return type:
pd.Dataframe
- get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)
Get the meta Dataframe from the Database.
- Parameters:
infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
- Returns:
The meta DataFrame.
- Return type:
pandas.DataFrame or geopandas.GeoDataFrae
- classmethod get_meta_explanation(infos='all')
Get the explanations of the available meta fields.
- Parameters:
infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”
- Returns:
a pandas Series with the information names as index and the explanation as values.
- Return type:
pd.Series
- get_quotient(kinds_num, kinds_denom, stids='all', return_as='df', **kwargs)
Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
$quotient = overline{ts}_{kind_num} / overline{ts}_{denom}$
- Parameters:
kinds_num (list of str or str) – The timeseries kinds of the numerators. Should be one of [‘raw’, ‘qc’, ‘filled’]. For precipitation also “corr” is possible.
kinds_denom (list of str or str) –
The timeseries kinds of the denominator or the multi annual raster key. If the denominator is a multi annual raster key, then the result is the quotient of the timeserie and the raster value. Possible values are:
for timeserie kinds: ‘raw’, ‘qc’, ‘filled’ or for precipitation also “corr”.
for raster keys: ‘hyras’, ‘dwd’ or ‘regnie’, depending on your defined raster files.
stids (list of Integer) – The stations IDs for which to compute the quotient.
return_as (str, optional) – The format of the return value. If “df” then a pandas DataFrame is returned. If “json” then a list with dictionaries is returned.
**kwargs (dict, optional) – The additional keyword arguments are passed to the get_stations method.
- Returns:
The quotient of the two timeseries as DataFrame or list of dictionaries (JSON) depending on the return_as parameter. The default is pd.DataFrame.
- Return type:
pandas.DataFrame or list of dict
- Raises:
ValueError – If the input parameters were not correct.
- get_stations(only_real=True, stids='all', skip_missing_stids=False, **kwargs)
Get a list with all the stations as Station-objects.
- Parameters:
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
skip_missing_stids (bool, optional) – Should the method skip the missing stations from input stids? If False, then a ValueError is raised if a station is not found. The default is False.
**kwargs (dict, optional) – The additional keyword arguments aren’t used in this method.
- Returns:
returns a list with the corresponding station objects.
- Return type:
Station-object
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- last_imp_fillup(stids='all', do_mp=False, **kwargs)
Do the gap filling of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- last_imp_quality_check(stids='all', do_mp=False, **kwargs)
Do the quality check of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)
Quality check the raw data for a given period.
- Parameters:
period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- update_ma_raster(stids='all', do_mp=False, **kwargs)
Update the multi annual raster values for the stations.
Get a multi annual value from the corresponding raster and save to the multi annual table in the database.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_ma_timeseries(kind, stids='all', do_mp=False, **kwargs)
Update the multi annual timeseries values for the stations.
Get a multi annual value from the corresponding timeseries and save to the database.
- Parameters:
kind (str or list of str) – The timeseries data kind to update theire multi annual value. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. For the precipitation also “corr” is valid.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_meta(stids='all', **kwargs)
Update the meta table by comparing to the CDC server.
The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.
- update_period_meta(stids='all', **kwargs)
Update the period in the meta table of the raw data.
- Parameters:
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)
Download all stations data from CDC and upload to database.
- Parameters:
only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True
only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.
remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
StationsT
- class weatherDB.stations.StationsT[source]
Bases:
StationsBaseTETA class to work with and download temperature data for several stations.
Public Methods:
get_quotient(**kwargs)Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
Inherited from
StationsBaseTETfillup([only_real, stids])Fill up the quality checked data with data from nearby stations to get complete timeseries.
Inherited from
StationsBase__init__()Download the meta file(s) from the CDC server.
update_meta([stids])Update the meta table by comparing to the CDC server.
update_period_meta([stids])Update the period in the meta table of the raw data.
get_meta_explanation([infos])Get the explanations of the available meta fields.
get_meta([infos, stids, only_real])Get the meta Dataframe from the Database.
get_stations([only_real, stids, ...])Get a list with all the stations as Station-objects.
get_quotient(kinds_num, kinds_denom[, ...])Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
count_holes([stids])Count holes in timeseries depending on there length.
update_raw([only_new, only_real, stids, ...])Download all stations data from CDC and upload to database.
last_imp_quality_check([stids, do_mp])Do the quality check of the last import.
last_imp_fillup([stids, do_mp])Do the gap filling of the last import.
quality_check([period, only_real, stids, do_mp])Quality check the raw data for a given period.
update_ma_raster([stids, do_mp])Update the multi annual raster values for the stations.
update_ma_timeseries(kind[, stids, do_mp])Update the multi annual timeseries values for the stations.
fillup([only_real, stids, do_mp])Fill up the quality checked data with data from nearby stations to get complete timeseries.
update([only_new])Make a complete update of the stations.
get_df(stids, **kwargs)Get a DataFrame with the corresponding data.
- get_quotient(**kwargs)[source]
Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
$quotient = overline{ts}_{kind_num} / overline{ts}_{denom}$
- Parameters:
kinds_num (list of str or str) – The timeseries kinds of the numerators. Should be one of [‘raw’, ‘qc’, ‘filled’]. For precipitation also “corr” is possible.
kinds_denom (list of str or str) –
The timeseries kinds of the denominator or the multi annual raster key. If the denominator is a multi annual raster key, then the result is the quotient of the timeserie and the raster value. Possible values are:
for timeserie kinds: ‘raw’, ‘qc’, ‘filled’ or for precipitation also “corr”.
for raster keys: ‘hyras’, ‘dwd’ or ‘regnie’, depending on your defined raster files.
stids (list of Integer) – The stations IDs for which to compute the quotient.
return_as (str, optional) – The format of the return value. If “df” then a pandas DataFrame is returned. If “json” then a list with dictionaries is returned.
**kwargs (dict, optional) – The additional keyword arguments are passed to the get_stations method.
- Returns:
The quotient of the two timeseries as DataFrame or list of dictionaries (JSON) depending on the return_as parameter. The default is pd.DataFrame.
- Return type:
pandas.DataFrame or list of dict
- Raises:
ValueError – If the input parameters were not correct.
- count_holes(stids='all', **kwargs)
Count holes in timeseries depending on there length.
- Parameters:
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional)
**kwargs –
This is a list of parameters, that is supported by the StationBase.count_holes method.
Furthermore the kwargs are passed to the get_stations method.
possible values are:
- weekslist, optional
A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]
- kindstr
The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.
- periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional
The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).
- between_meta_periodbool, optional
Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.
- crop_periodbool, optional
should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.
- Returns:
A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.
- Return type:
- Raises:
ValueError – If the input parameters were not correct.
- download_meta()
Download the meta file(s) from the CDC server.
- Returns:
The meta file from the CDC server. If there are several meta files on the server, they are joined together.
- Return type:
geopandas.GeoDataFrame
- fillup(only_real=False, stids='all', **kwargs)
Fill up the quality checked data with data from nearby stations to get complete timeseries.
- Parameters:
only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- get_df(stids, **kwargs)
Get a DataFrame with the corresponding data.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. Possible parameters are period, agg_to, kinds. Furthermore the kwargs are passed to the get_stations method.
- Returns:
A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.
- Return type:
pd.Dataframe
- get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)
Get the meta Dataframe from the Database.
- Parameters:
infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
- Returns:
The meta DataFrame.
- Return type:
pandas.DataFrame or geopandas.GeoDataFrae
- classmethod get_meta_explanation(infos='all')
Get the explanations of the available meta fields.
- Parameters:
infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”
- Returns:
a pandas Series with the information names as index and the explanation as values.
- Return type:
pd.Series
- get_stations(only_real=True, stids='all', skip_missing_stids=False, **kwargs)
Get a list with all the stations as Station-objects.
- Parameters:
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
skip_missing_stids (bool, optional) – Should the method skip the missing stations from input stids? If False, then a ValueError is raised if a station is not found. The default is False.
**kwargs (dict, optional) – The additional keyword arguments aren’t used in this method.
- Returns:
returns a list with the corresponding station objects.
- Return type:
Station-object
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- last_imp_fillup(stids='all', do_mp=False, **kwargs)
Do the gap filling of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- last_imp_quality_check(stids='all', do_mp=False, **kwargs)
Do the quality check of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)
Quality check the raw data for a given period.
- Parameters:
period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- update(only_new=True, **kwargs)
Make a complete update of the stations.
Does the update_raw, quality check and fillup of the stations.
- Parameters:
only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.
- update_ma_raster(stids='all', do_mp=False, **kwargs)
Update the multi annual raster values for the stations.
Get a multi annual value from the corresponding raster and save to the multi annual table in the database.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_ma_timeseries(kind, stids='all', do_mp=False, **kwargs)
Update the multi annual timeseries values for the stations.
Get a multi annual value from the corresponding timeseries and save to the database.
- Parameters:
kind (str or list of str) – The timeseries data kind to update theire multi annual value. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. For the precipitation also “corr” is valid.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_meta(stids='all', **kwargs)
Update the meta table by comparing to the CDC server.
The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.
- update_period_meta(stids='all', **kwargs)
Update the period in the meta table of the raw data.
- Parameters:
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)
Download all stations data from CDC and upload to database.
- Parameters:
only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True
only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.
remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
StationsET
- class weatherDB.stations.StationsET[source]
Bases:
StationsBaseTETA class to work with and download potential Evapotranspiration (VPGB) data for several stations.
Public Methods:
Inherited from
StationsBaseTETfillup([only_real, stids])Fill up the quality checked data with data from nearby stations to get complete timeseries.
Inherited from
StationsBase__init__()Download the meta file(s) from the CDC server.
update_meta([stids])Update the meta table by comparing to the CDC server.
update_period_meta([stids])Update the period in the meta table of the raw data.
get_meta_explanation([infos])Get the explanations of the available meta fields.
get_meta([infos, stids, only_real])Get the meta Dataframe from the Database.
get_stations([only_real, stids, ...])Get a list with all the stations as Station-objects.
get_quotient(kinds_num, kinds_denom[, ...])Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
count_holes([stids])Count holes in timeseries depending on there length.
update_raw([only_new, only_real, stids, ...])Download all stations data from CDC and upload to database.
last_imp_quality_check([stids, do_mp])Do the quality check of the last import.
last_imp_fillup([stids, do_mp])Do the gap filling of the last import.
quality_check([period, only_real, stids, do_mp])Quality check the raw data for a given period.
update_ma_raster([stids, do_mp])Update the multi annual raster values for the stations.
update_ma_timeseries(kind[, stids, do_mp])Update the multi annual timeseries values for the stations.
fillup([only_real, stids, do_mp])Fill up the quality checked data with data from nearby stations to get complete timeseries.
update([only_new])Make a complete update of the stations.
get_df(stids, **kwargs)Get a DataFrame with the corresponding data.
- count_holes(stids='all', **kwargs)
Count holes in timeseries depending on there length.
- Parameters:
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional)
**kwargs –
This is a list of parameters, that is supported by the StationBase.count_holes method.
Furthermore the kwargs are passed to the get_stations method.
possible values are:
- weekslist, optional
A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]
- kindstr
The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.
- periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional
The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).
- between_meta_periodbool, optional
Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.
- crop_periodbool, optional
should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.
- Returns:
A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.
- Return type:
- Raises:
ValueError – If the input parameters were not correct.
- download_meta()
Download the meta file(s) from the CDC server.
- Returns:
The meta file from the CDC server. If there are several meta files on the server, they are joined together.
- Return type:
geopandas.GeoDataFrame
- fillup(only_real=False, stids='all', **kwargs)
Fill up the quality checked data with data from nearby stations to get complete timeseries.
- Parameters:
only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- get_df(stids, **kwargs)
Get a DataFrame with the corresponding data.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. Possible parameters are period, agg_to, kinds. Furthermore the kwargs are passed to the get_stations method.
- Returns:
A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.
- Return type:
pd.Dataframe
- get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)
Get the meta Dataframe from the Database.
- Parameters:
infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
- Returns:
The meta DataFrame.
- Return type:
pandas.DataFrame or geopandas.GeoDataFrae
- classmethod get_meta_explanation(infos='all')
Get the explanations of the available meta fields.
- Parameters:
infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”
- Returns:
a pandas Series with the information names as index and the explanation as values.
- Return type:
pd.Series
- get_quotient(kinds_num, kinds_denom, stids='all', return_as='df', **kwargs)
Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
$quotient = overline{ts}_{kind_num} / overline{ts}_{denom}$
- Parameters:
kinds_num (list of str or str) – The timeseries kinds of the numerators. Should be one of [‘raw’, ‘qc’, ‘filled’]. For precipitation also “corr” is possible.
kinds_denom (list of str or str) –
The timeseries kinds of the denominator or the multi annual raster key. If the denominator is a multi annual raster key, then the result is the quotient of the timeserie and the raster value. Possible values are:
for timeserie kinds: ‘raw’, ‘qc’, ‘filled’ or for precipitation also “corr”.
for raster keys: ‘hyras’, ‘dwd’ or ‘regnie’, depending on your defined raster files.
stids (list of Integer) – The stations IDs for which to compute the quotient.
return_as (str, optional) – The format of the return value. If “df” then a pandas DataFrame is returned. If “json” then a list with dictionaries is returned.
**kwargs (dict, optional) – The additional keyword arguments are passed to the get_stations method.
- Returns:
The quotient of the two timeseries as DataFrame or list of dictionaries (JSON) depending on the return_as parameter. The default is pd.DataFrame.
- Return type:
pandas.DataFrame or list of dict
- Raises:
ValueError – If the input parameters were not correct.
- get_stations(only_real=True, stids='all', skip_missing_stids=False, **kwargs)
Get a list with all the stations as Station-objects.
- Parameters:
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
skip_missing_stids (bool, optional) – Should the method skip the missing stations from input stids? If False, then a ValueError is raised if a station is not found. The default is False.
**kwargs (dict, optional) – The additional keyword arguments aren’t used in this method.
- Returns:
returns a list with the corresponding station objects.
- Return type:
Station-object
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- last_imp_fillup(stids='all', do_mp=False, **kwargs)
Do the gap filling of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- last_imp_quality_check(stids='all', do_mp=False, **kwargs)
Do the quality check of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)
Quality check the raw data for a given period.
- Parameters:
period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- update(only_new=True, **kwargs)
Make a complete update of the stations.
Does the update_raw, quality check and fillup of the stations.
- Parameters:
only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.
- update_ma_raster(stids='all', do_mp=False, **kwargs)
Update the multi annual raster values for the stations.
Get a multi annual value from the corresponding raster and save to the multi annual table in the database.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_ma_timeseries(kind, stids='all', do_mp=False, **kwargs)
Update the multi annual timeseries values for the stations.
Get a multi annual value from the corresponding timeseries and save to the database.
- Parameters:
kind (str or list of str) – The timeseries data kind to update theire multi annual value. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. For the precipitation also “corr” is valid.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_meta(stids='all', **kwargs)
Update the meta table by comparing to the CDC server.
The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.
- update_period_meta(stids='all', **kwargs)
Update the period in the meta table of the raw data.
- Parameters:
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)
Download all stations data from CDC and upload to database.
- Parameters:
only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True
only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.
remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
StationsPD
- class weatherDB.stations.StationsPD[source]
Bases:
StationsBaseA class to work with and download daily precipitation data for several stations.
Those stations data are only downloaded to do some quality checks on the 10 minutes data. Therefor there is no special quality check and richter correction done on this data. If you want daily precipitation data, better use the 10 minutes station class (StationP) and aggregate to daily values.
Public Methods:
Inherited from
StationsBase__init__()Download the meta file(s) from the CDC server.
update_meta([stids])Update the meta table by comparing to the CDC server.
update_period_meta([stids])Update the period in the meta table of the raw data.
get_meta_explanation([infos])Get the explanations of the available meta fields.
get_meta([infos, stids, only_real])Get the meta Dataframe from the Database.
get_stations([only_real, stids, ...])Get a list with all the stations as Station-objects.
get_quotient(kinds_num, kinds_denom[, ...])Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
count_holes([stids])Count holes in timeseries depending on there length.
update_raw([only_new, only_real, stids, ...])Download all stations data from CDC and upload to database.
last_imp_quality_check([stids, do_mp])Do the quality check of the last import.
last_imp_fillup([stids, do_mp])Do the gap filling of the last import.
quality_check([period, only_real, stids, do_mp])Quality check the raw data for a given period.
update_ma_raster([stids, do_mp])Update the multi annual raster values for the stations.
update_ma_timeseries(kind[, stids, do_mp])Update the multi annual timeseries values for the stations.
fillup([only_real, stids, do_mp])Fill up the quality checked data with data from nearby stations to get complete timeseries.
update([only_new])Make a complete update of the stations.
get_df(stids, **kwargs)Get a DataFrame with the corresponding data.
- count_holes(stids='all', **kwargs)
Count holes in timeseries depending on there length.
- Parameters:
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional)
**kwargs –
This is a list of parameters, that is supported by the StationBase.count_holes method.
Furthermore the kwargs are passed to the get_stations method.
possible values are:
- weekslist, optional
A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]
- kindstr
The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.
- periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional
The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).
- between_meta_periodbool, optional
Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.
- crop_periodbool, optional
should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.
- Returns:
A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.
- Return type:
- Raises:
ValueError – If the input parameters were not correct.
- download_meta()
Download the meta file(s) from the CDC server.
- Returns:
The meta file from the CDC server. If there are several meta files on the server, they are joined together.
- Return type:
geopandas.GeoDataFrame
- fillup(only_real=False, stids='all', do_mp=False, **kwargs)
Fill up the quality checked data with data from nearby stations to get complete timeseries.
- Parameters:
only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- get_df(stids, **kwargs)
Get a DataFrame with the corresponding data.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. Possible parameters are period, agg_to, kinds. Furthermore the kwargs are passed to the get_stations method.
- Returns:
A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.
- Return type:
pd.Dataframe
- get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)
Get the meta Dataframe from the Database.
- Parameters:
infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
- Returns:
The meta DataFrame.
- Return type:
pandas.DataFrame or geopandas.GeoDataFrae
- classmethod get_meta_explanation(infos='all')
Get the explanations of the available meta fields.
- Parameters:
infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”
- Returns:
a pandas Series with the information names as index and the explanation as values.
- Return type:
pd.Series
- get_quotient(kinds_num, kinds_denom, stids='all', return_as='df', **kwargs)
Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
$quotient = overline{ts}_{kind_num} / overline{ts}_{denom}$
- Parameters:
kinds_num (list of str or str) – The timeseries kinds of the numerators. Should be one of [‘raw’, ‘qc’, ‘filled’]. For precipitation also “corr” is possible.
kinds_denom (list of str or str) –
The timeseries kinds of the denominator or the multi annual raster key. If the denominator is a multi annual raster key, then the result is the quotient of the timeserie and the raster value. Possible values are:
for timeserie kinds: ‘raw’, ‘qc’, ‘filled’ or for precipitation also “corr”.
for raster keys: ‘hyras’, ‘dwd’ or ‘regnie’, depending on your defined raster files.
stids (list of Integer) – The stations IDs for which to compute the quotient.
return_as (str, optional) – The format of the return value. If “df” then a pandas DataFrame is returned. If “json” then a list with dictionaries is returned.
**kwargs (dict, optional) – The additional keyword arguments are passed to the get_stations method.
- Returns:
The quotient of the two timeseries as DataFrame or list of dictionaries (JSON) depending on the return_as parameter. The default is pd.DataFrame.
- Return type:
pandas.DataFrame or list of dict
- Raises:
ValueError – If the input parameters were not correct.
- get_stations(only_real=True, stids='all', skip_missing_stids=False, **kwargs)
Get a list with all the stations as Station-objects.
- Parameters:
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
skip_missing_stids (bool, optional) – Should the method skip the missing stations from input stids? If False, then a ValueError is raised if a station is not found. The default is False.
**kwargs (dict, optional) – The additional keyword arguments aren’t used in this method.
- Returns:
returns a list with the corresponding station objects.
- Return type:
Station-object
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- last_imp_fillup(stids='all', do_mp=False, **kwargs)
Do the gap filling of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- last_imp_quality_check(stids='all', do_mp=False, **kwargs)
Do the quality check of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)
Quality check the raw data for a given period.
- Parameters:
period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- update(only_new=True, **kwargs)
Make a complete update of the stations.
Does the update_raw, quality check and fillup of the stations.
- Parameters:
only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.
- update_ma_raster(stids='all', do_mp=False, **kwargs)
Update the multi annual raster values for the stations.
Get a multi annual value from the corresponding raster and save to the multi annual table in the database.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_ma_timeseries(kind, stids='all', do_mp=False, **kwargs)
Update the multi annual timeseries values for the stations.
Get a multi annual value from the corresponding timeseries and save to the database.
- Parameters:
kind (str or list of str) – The timeseries data kind to update theire multi annual value. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. For the precipitation also “corr” is valid.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_meta(stids='all', **kwargs)
Update the meta table by comparing to the CDC server.
The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.
- update_period_meta(stids='all', **kwargs)
Update the period in the meta table of the raw data.
- Parameters:
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)
Download all stations data from CDC and upload to database.
- Parameters:
only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True
only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.
remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
GroupStations
- weatherDB.stations.GroupStations
alias of <module ‘weatherDB.stations.GroupStations’ from ‘/home/docs/checkouts/readthedocs.org/user_builds/weatherdb/checkouts/stable/weatherDB/stations/GroupStations.py’>
StationsBase…
Those are the base stations classes on which the real stations classes above depend on. None of them is working on its own, because the class variables are not yet set correctly.
- class weatherDB.stations.StationsBase.StationsBase[source]
Bases:
objectPublic Methods:
__init__()Download the meta file(s) from the CDC server.
update_meta([stids])Update the meta table by comparing to the CDC server.
update_period_meta([stids])Update the period in the meta table of the raw data.
get_meta_explanation([infos])Get the explanations of the available meta fields.
get_meta([infos, stids, only_real])Get the meta Dataframe from the Database.
get_stations([only_real, stids, ...])Get a list with all the stations as Station-objects.
get_quotient(kinds_num, kinds_denom[, ...])Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
count_holes([stids])Count holes in timeseries depending on there length.
update_raw([only_new, only_real, stids, ...])Download all stations data from CDC and upload to database.
last_imp_quality_check([stids, do_mp])Do the quality check of the last import.
last_imp_fillup([stids, do_mp])Do the gap filling of the last import.
quality_check([period, only_real, stids, do_mp])Quality check the raw data for a given period.
update_ma_raster([stids, do_mp])Update the multi annual raster values for the stations.
update_ma_timeseries(kind[, stids, do_mp])Update the multi annual timeseries values for the stations.
fillup([only_real, stids, do_mp])Fill up the quality checked data with data from nearby stations to get complete timeseries.
update([only_new])Make a complete update of the stations.
get_df(stids, **kwargs)Get a DataFrame with the corresponding data.
- download_meta()[source]
Download the meta file(s) from the CDC server.
- Returns:
The meta file from the CDC server. If there are several meta files on the server, they are joined together.
- Return type:
geopandas.GeoDataFrame
- update_meta(stids='all', **kwargs)[source]
Update the meta table by comparing to the CDC server.
The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.
- update_period_meta(stids='all', **kwargs)[source]
Update the period in the meta table of the raw data.
- Parameters:
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- classmethod get_meta_explanation(infos='all')[source]
Get the explanations of the available meta fields.
- Parameters:
infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”
- Returns:
a pandas Series with the information names as index and the explanation as values.
- Return type:
pd.Series
- get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)[source]
Get the meta Dataframe from the Database.
- Parameters:
infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
- Returns:
The meta DataFrame.
- Return type:
pandas.DataFrame or geopandas.GeoDataFrae
- get_stations(only_real=True, stids='all', skip_missing_stids=False, **kwargs)[source]
Get a list with all the stations as Station-objects.
- Parameters:
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
skip_missing_stids (bool, optional) – Should the method skip the missing stations from input stids? If False, then a ValueError is raised if a station is not found. The default is False.
**kwargs (dict, optional) – The additional keyword arguments aren’t used in this method.
- Returns:
returns a list with the corresponding station objects.
- Return type:
Station-object
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- get_quotient(kinds_num, kinds_denom, stids='all', return_as='df', **kwargs)[source]
Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
$quotient = overline{ts}_{kind_num} / overline{ts}_{denom}$
- Parameters:
kinds_num (list of str or str) – The timeseries kinds of the numerators. Should be one of [‘raw’, ‘qc’, ‘filled’]. For precipitation also “corr” is possible.
kinds_denom (list of str or str) –
The timeseries kinds of the denominator or the multi annual raster key. If the denominator is a multi annual raster key, then the result is the quotient of the timeserie and the raster value. Possible values are:
for timeserie kinds: ‘raw’, ‘qc’, ‘filled’ or for precipitation also “corr”.
for raster keys: ‘hyras’, ‘dwd’ or ‘regnie’, depending on your defined raster files.
stids (list of Integer) – The stations IDs for which to compute the quotient.
return_as (str, optional) – The format of the return value. If “df” then a pandas DataFrame is returned. If “json” then a list with dictionaries is returned.
**kwargs (dict, optional) – The additional keyword arguments are passed to the get_stations method.
- Returns:
The quotient of the two timeseries as DataFrame or list of dictionaries (JSON) depending on the return_as parameter. The default is pd.DataFrame.
- Return type:
pandas.DataFrame or list of dict
- Raises:
ValueError – If the input parameters were not correct.
- count_holes(stids='all', **kwargs)[source]
Count holes in timeseries depending on there length.
- Parameters:
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional)
**kwargs –
This is a list of parameters, that is supported by the StationBase.count_holes method.
Furthermore the kwargs are passed to the get_stations method.
possible values are:
- weekslist, optional
A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]
- kindstr
The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.
- periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional
The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).
- between_meta_periodbool, optional
Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.
- crop_periodbool, optional
should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.
- Returns:
A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.
- Return type:
- Raises:
ValueError – If the input parameters were not correct.
- update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)[source]
Download all stations data from CDC and upload to database.
- Parameters:
only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True
only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.
remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- last_imp_quality_check(stids='all', do_mp=False, **kwargs)[source]
Do the quality check of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- last_imp_fillup(stids='all', do_mp=False, **kwargs)[source]
Do the gap filling of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)[source]
Quality check the raw data for a given period.
- Parameters:
period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- update_ma_raster(stids='all', do_mp=False, **kwargs)[source]
Update the multi annual raster values for the stations.
Get a multi annual value from the corresponding raster and save to the multi annual table in the database.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_ma_timeseries(kind, stids='all', do_mp=False, **kwargs)[source]
Update the multi annual timeseries values for the stations.
Get a multi annual value from the corresponding timeseries and save to the database.
- Parameters:
kind (str or list of str) – The timeseries data kind to update theire multi annual value. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. For the precipitation also “corr” is valid.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- fillup(only_real=False, stids='all', do_mp=False, **kwargs)[source]
Fill up the quality checked data with data from nearby stations to get complete timeseries.
- Parameters:
only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update(only_new=True, **kwargs)[source]
Make a complete update of the stations.
Does the update_raw, quality check and fillup of the stations.
- Parameters:
only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.
- get_df(stids, **kwargs)[source]
Get a DataFrame with the corresponding data.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. Possible parameters are period, agg_to, kinds. Furthermore the kwargs are passed to the get_stations method.
- Returns:
A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.
- Return type:
pd.Dataframe
- class weatherDB.stations.StationsBaseTET.StationsBaseTET[source]
Bases:
StationsBasePublic Methods:
fillup([only_real, stids])Fill up the quality checked data with data from nearby stations to get complete timeseries.
Inherited from
StationsBase__init__()Download the meta file(s) from the CDC server.
update_meta([stids])Update the meta table by comparing to the CDC server.
update_period_meta([stids])Update the period in the meta table of the raw data.
get_meta_explanation([infos])Get the explanations of the available meta fields.
get_meta([infos, stids, only_real])Get the meta Dataframe from the Database.
get_stations([only_real, stids, ...])Get a list with all the stations as Station-objects.
get_quotient(kinds_num, kinds_denom[, ...])Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
count_holes([stids])Count holes in timeseries depending on there length.
update_raw([only_new, only_real, stids, ...])Download all stations data from CDC and upload to database.
last_imp_quality_check([stids, do_mp])Do the quality check of the last import.
last_imp_fillup([stids, do_mp])Do the gap filling of the last import.
quality_check([period, only_real, stids, do_mp])Quality check the raw data for a given period.
update_ma_raster([stids, do_mp])Update the multi annual raster values for the stations.
update_ma_timeseries(kind[, stids, do_mp])Update the multi annual timeseries values for the stations.
fillup([only_real, stids, do_mp])Fill up the quality checked data with data from nearby stations to get complete timeseries.
update([only_new])Make a complete update of the stations.
get_df(stids, **kwargs)Get a DataFrame with the corresponding data.
- fillup(only_real=False, stids='all', **kwargs)[source]
Fill up the quality checked data with data from nearby stations to get complete timeseries.
- Parameters:
only_real (bool, optional) – Whether only real stations are computed or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- count_holes(stids='all', **kwargs)
Count holes in timeseries depending on there length.
- Parameters:
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional)
**kwargs –
This is a list of parameters, that is supported by the StationBase.count_holes method.
Furthermore the kwargs are passed to the get_stations method.
possible values are:
- weekslist, optional
A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]
- kindstr
The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.
- periodTimestampPeriod or (tuple or list of datetime.datetime or None), optional
The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).
- between_meta_periodbool, optional
Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.
- crop_periodbool, optional
should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.
- Returns:
A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.
- Return type:
- Raises:
ValueError – If the input parameters were not correct.
- download_meta()
Download the meta file(s) from the CDC server.
- Returns:
The meta file from the CDC server. If there are several meta files on the server, they are joined together.
- Return type:
geopandas.GeoDataFrame
- get_df(stids, **kwargs)
Get a DataFrame with the corresponding data.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (optional keyword arguments) – Those keyword arguments are passed to the get_df function of the station class. Possible parameters are period, agg_to, kinds. Furthermore the kwargs are passed to the get_stations method.
- Returns:
A DataFrame with the timeseries for the selected stations, kind(s) and the given period. If multiple columns are selected, the columns in this DataFrame is a MultiIndex with the station IDs as first level and the kind as second level.
- Return type:
pd.Dataframe
- get_meta(infos=['station_id', 'filled_from', 'filled_until', 'geometry'], stids='all', only_real=True)
Get the meta Dataframe from the Database.
- Parameters:
infos (list or str, optional) – A list of information from the meta file to return If “all” than all possible columns are returned, but only one geometry column. The default is: [“Station_id”, “filled_from”, “filled_until”, “geometry”]
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
- Returns:
The meta DataFrame.
- Return type:
pandas.DataFrame or geopandas.GeoDataFrae
- classmethod get_meta_explanation(infos='all')
Get the explanations of the available meta fields.
- Parameters:
infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”
- Returns:
a pandas Series with the information names as index and the explanation as values.
- Return type:
pd.Series
- get_quotient(kinds_num, kinds_denom, stids='all', return_as='df', **kwargs)
Get the quotient of multi-annual means of two different kinds or the timeserie and the multi annual raster value.
$quotient = overline{ts}_{kind_num} / overline{ts}_{denom}$
- Parameters:
kinds_num (list of str or str) – The timeseries kinds of the numerators. Should be one of [‘raw’, ‘qc’, ‘filled’]. For precipitation also “corr” is possible.
kinds_denom (list of str or str) –
The timeseries kinds of the denominator or the multi annual raster key. If the denominator is a multi annual raster key, then the result is the quotient of the timeserie and the raster value. Possible values are:
for timeserie kinds: ‘raw’, ‘qc’, ‘filled’ or for precipitation also “corr”.
for raster keys: ‘hyras’, ‘dwd’ or ‘regnie’, depending on your defined raster files.
stids (list of Integer) – The stations IDs for which to compute the quotient.
return_as (str, optional) – The format of the return value. If “df” then a pandas DataFrame is returned. If “json” then a list with dictionaries is returned.
**kwargs (dict, optional) – The additional keyword arguments are passed to the get_stations method.
- Returns:
The quotient of the two timeseries as DataFrame or list of dictionaries (JSON) depending on the return_as parameter. The default is pd.DataFrame.
- Return type:
pandas.DataFrame or list of dict
- Raises:
ValueError – If the input parameters were not correct.
- get_stations(only_real=True, stids='all', skip_missing_stids=False, **kwargs)
Get a list with all the stations as Station-objects.
- Parameters:
only_real (bool, optional) – Whether only real stations are returned or also virtual ones. True: only stations with own data are returned. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
skip_missing_stids (bool, optional) – Should the method skip the missing stations from input stids? If False, then a ValueError is raised if a station is not found. The default is False.
**kwargs (dict, optional) – The additional keyword arguments aren’t used in this method.
- Returns:
returns a list with the corresponding station objects.
- Return type:
Station-object
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- last_imp_fillup(stids='all', do_mp=False, **kwargs)
Do the gap filling of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- last_imp_quality_check(stids='all', do_mp=False, **kwargs)
Do the quality check of the last import.
- Parameters:
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- quality_check(period=(None, None), only_real=True, stids='all', do_mp=False, **kwargs)
Quality check the raw data for a given period.
- Parameters:
period (tuple or list of datetime.datetime or None, optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- update(only_new=True, **kwargs)
Make a complete update of the stations.
Does the update_raw, quality check and fillup of the stations.
- Parameters:
only_new (bool, optional) – Should a only new values be computed? If False: The stations are updated for the whole possible period. If True, the stations are only updated for new values. The default is True.
- update_ma_raster(stids='all', do_mp=False, **kwargs)
Update the multi annual raster values for the stations.
Get a multi annual value from the corresponding raster and save to the multi annual table in the database.
- Parameters:
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_ma_timeseries(kind, stids='all', do_mp=False, **kwargs)
Update the multi annual timeseries values for the stations.
Get a multi annual value from the corresponding timeseries and save to the database.
- Parameters:
kind (str or list of str) – The timeseries data kind to update theire multi annual value. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. For the precipitation also “corr” is valid.
stids (string or list of int, optional) – The Stations for which to compute. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is False.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_meta(stids='all', **kwargs)
Update the meta table by comparing to the CDC server.
The “von_datum” and “bis_datum” is ignored because it is better to set this by the filled period of the stations in the database. Often the CDC period is not correct.
- update_period_meta(stids='all', **kwargs)
Update the period in the meta table of the raw data.
- Parameters:
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.
- update_raw(only_new=True, only_real=True, stids='all', remove_nas=True, do_mp=True, **kwargs)
Download all stations data from CDC and upload to database.
- Parameters:
only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True
only_real (bool, optional) – Whether only real stations are tried to download. True: only stations with a date in raw_from in meta are downloaded. The default is True.
stids (string or list of int, optional) – The Stations to return. Can either be “all”, for all possible stations or a list with the Station IDs. The default is “all”.
do_mp (bool, optional) – Should the method be done in multiprocessing mode? If False the methods will be called in threading mode. Multiprocessing needs more memory and a bit more initiating time. Therefor it is only usefull for methods with a lot of computation effort in the python code. If the most computation of a method is done in the postgresql database, then threading is enough to speed the process up. The default is True.
remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.
**kwargs (dict, optional) – The additional keyword arguments for the _run_method and get_stations method
- Raises:
ValueError – If the given stids (Station_IDs) are not all valid.