station

StationN

class weatherDB.station.StationN(id, **kwargs)[source]

Bases: StationNBase

A class to work with and download 10 minutes precipitation data for one station.

Create a Station object.

Parameters:
  • id (int) – The stations ID.

  • _skip_meta_check (bool, optional) – Should the check if the station is in the database meta file get skiped. Pay attention, when skipping this, because it can lead to problems. This is for computational reasons, because it makes the initialization faster. Is used by the stations classes, because the only initialize objects that are in the meta table. The default is False

Raises:

NotImplementedError – _description_

Public Methods:

__init__(id, **kwargs)

Create a Station object.

update_horizon([skip_if_exist])

Update the horizon angle (Horizontabschirmung) in the meta table.

update_richter_class([skip_if_exist])

Update the richter class in the meta table.

richter_correct([period])

Do the richter correction on the filled data for the given period.

corr(*args, **kwargs)

last_imp_richter_correct([_last_imp_period])

Do the richter correction of the last import.

last_imp_corr([_last_imp_period])

A wrapper for last_imp_richter_correct().

fillup([period])

Fill up missing data with measurements from nearby stations.

get_corr(**kwargs)

get_qn(**kwargs)

get_richter_class([update_if_fails])

Get the richter class for this station.

get_horizon()

Get the value for the horizon angle.

Inherited from StationNBase

get_adj(**kwargs)

Get the adjusted timeserie.

Inherited from StationBase

__init__(id[, _skip_meta_check])

Create a Station object.

isin_db()

Check if Station is already in a timeseries table.

isin_meta()

Check if Station is already in the meta table.

isin_ma()

Check if Station is already in the multi annual table.

is_virtual()

Check if the station is a real station or only a virtual one.

is_real()

Check if the station is a real station or only a virtual one.

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

update_period_meta(kind)

Update the time period in the meta file.

update_ma([skip_if_exist, drop_when_error])

Update the multi annual values in the stations_raster_values table.

update_raw([only_new, ftp_file_list, remove_nas])

Download data from CDC and upload to database.

get_zipfiles([only_new, ftp_file_list])

Get the zipfiles on the CDC server with the raw data.

download_raw([only_new])

Download the timeserie from the CDC Server.

quality_check([period])

Quality check the raw data for a given period.

fillup([period])

Fill up missing data with measurements from nearby stations.

last_imp_quality_check()

Do the quality check of the last import.

last_imp_qc()

last_imp_fillup([_last_imp_period])

Do the gap filling of the last import.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos])

Get Information from the meta table.

get_geom([format, crs])

Get the point geometry of the station.

get_geom_shp([crs])

Get the geometry of the station as a shapely Point object.

get_name()

count_holes([weeks, kind, period, ...])

Count holes in timeseries depending on there length.

get_period_meta(kind[, all])

Get a specific period from the meta information table.

get_filled_period(kind[, from_meta])

Get the min and max Timestamp for which there is data in the corresponding timeserie.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_last_imp_period([all])

Get the last imported Period for this Station.

get_neighboor_stids([n, only_real, p_elev, ...])

Get a list with Station Ids of the nearest neighboor stations.

get_multi_annual()

Get the multi annual value(s) for this station.

get_ma()

get_raster_value(raster)

get_coef(other_stid[, in_db_unit])

Get the regionalisation coefficients due to the height.

get_df(kinds[, period, agg_to, nas_allowed, ...])

Get a timeseries DataFrame from the database.

get_raw(**kwargs)

Get the raw timeserie.

get_qc(**kwargs)

Get the quality checked timeserie.

get_dist([period])

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

get_filled([period, with_dist])

Get the filled timeserie.

get_adj(**kwargs)

Get the adjusted timeserie.

plot([period, kind, agg_to])

Plot the data of this station.


corr(*args, **kwargs)[source]
count_holes(weeks=[2, 4, 8, 12, 16, 20, 24], kind='qc', period=(None, None), between_meta_period=True, crop_period=False, **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • weeks (list, optional) – A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

  • kind (str) – The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

  • between_meta_period (bool, optional) – Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

  • crop_period (bool, optional) – should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_raw(only_new=False)

Download the timeserie from the CDC Server.

This function only returns the timeserie, but is not updating the database.

Parameters:

only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is False.

Returns:

The Timeseries as a DataFrame with a Timestamp Index.

Return type:

pandas.DataFrame

fillup(period=(None, None), **kwargs)[source]

Fill up missing data with measurements from nearby stations.

Parameters:
  • period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to gap fill the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kwargs (dict, optional) – Additional arguments for the fillup function. e.g. p_elev to consider the elevation to select nearest stations. (only for T and ET)

get_adj(**kwargs)

Get the adjusted timeserie.

The timeserie get adjusted to match the multi-annual value over the given period. So the yearly variability is kept and only the whole period is adjusted.

The basis for the adjusted timeseries is the filled data and not the richter corrected data, as the ma values are also uncorrected vallues.

Returns:

The adjusted timeserie with the timestamp as index.

Return type:

pd.DataFrame

get_coef(other_stid, in_db_unit=False)

Get the regionalisation coefficients due to the height.

Those are the values from the dwd grid, HYRAS or REGNIE grids.

Parameters:
  • other_stid (int) – The Station Id of the other station from wich to regionalise for own station.

  • in_db_unit (bool, optional) – Should the coefficients be returned in the unit as stored in the database? This is only relevant for the temperature. The default is False.

Returns:

A list of coefficients. For T, ET and N-daily only the the yearly coefficient is returned. For N the winter and summer half yearly coefficient is returned in tuple. None is returned if either the own or other stations multi-annual value is not available.

Return type:

list of floats or None

get_corr(**kwargs)[source]
get_df(kinds, period=(None, None), agg_to=None, nas_allowed=True, add_na_share=False, db_unit=False, sql_add_where=None, **kwargs)

Get a timeseries DataFrame from the database.

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”. For the precipitation also “qn” and “corr” are valid. If “filled_by” is given together with an aggregation step, the “filled_by” is replaced by the “filled_share”. The “filled_share” gives the share of filled values in the aggregation group in percent.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • agg_to (str or None, optional) – Aggregate to a given timespan. If more than 20% of missing values in the aggregation group, the aggregated value will be None. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • db_unit (bool, optional) – Should the result be in the Database unit. If False the unit is getting converted to normal unit, like mm or °C. The numbers are saved as integer in the database and got therefor multiplied by 10 or 100 to get to an integer. The default is False.

  • sql_add_where (str or None, optional) – additional sql where statement to filter the output. E.g. “EXTRACT(MONTH FROM timestamp) == 2” The default is None

Returns:

The timeserie Dataframe with a DatetimeIndex.

Return type:

pandas.DataFrame

get_dist(period=(None, None))

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Returns:

The timeserie for this station and the given period with the station_id and the distance in meters from which the data got filled from.

Return type:

pd.DataFrame

get_filled(period=(None, None), with_dist=False, **kwargs)

Get the filled timeserie.

Either only the timeserie is returned or also the id of the station from which the station data got filled, together with the distance to this station in m.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • with_dist (bool, optional) – Should the distance to the stations from which the timeseries got filled be added. The default is False.

Returns:

The filled timeserie for this station and the given period.

Return type:

pd.DataFrame

get_filled_period(kind, from_meta=False)

Get the min and max Timestamp for which there is data in the corresponding timeserie.

Computes the period from the timeserie or meta table.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is False.

Raises:
  • NotImplementedError – If the given kind is not valid.

  • ValueError – If the given kind is not a string.

Returns:

A TimestampPeriod of the filled timeserie. (NaT, NaT) if the timeserie is all empty or not defined.

Return type:

util.TimestampPeriod

get_geom(format='EWKT', crs=None)

Get the point geometry of the station.

Parameters:
  • format (str or None, optional) – The format of the geometry to return. Needs to be a format that is understood by Postgresql. ST_AsXXXXX function needs to exist in postgresql language. If None, then the binary representation is returned. the default is “EWKT”.

  • crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

string or bytes representation of the geometry, depending on the selected format.

Return type:

str or bytes

get_geom_shp(crs=None)

Get the geometry of the station as a shapely Point object.

Parameters:

crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

The location of the station as shapely Point.

Return type:

shapely.geometries.Point

get_horizon()[source]

Get the value for the horizon angle. (Horizontabschirmung)

This value is defined by Richter (1995) as the mean horizon angle in the west direction as: H’=0,15H(S-SW) +0,35H(SW-W) +0,35H(W-NW) +0, 15H(NW-N)

Returns:

The mean western horizon angle

Return type:

float or None

get_last_imp_period(all=False)

Get the last imported Period for this Station.

Parameters:

all (bool, optional) – Should the maximum Timespan for all the last imports be returned. If False only the period for this station is returned. The default is False.

Returns:

(minimal datetime, maximal datetime)

Return type:

TimespanPeriod or tuple of datetime.datetime

get_ma()
get_max_period(kinds, nas_allowed=False, **kwargs)

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(infos='all')

Get Information from the meta table.

Parameters:

infos (list of str or str, optional) – A list of the information to get from the database. If “all” then all the information are returned. The default is “all”.

Returns:

dict with the meta information. The first level has one entry per parameter. The second level has one entry per information, asked for. If only one information is asked for, then it is returned as single value and not as subdict.

Return type:

dict or int/string

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_multi_annual()

Get the multi annual value(s) for this station.

Returns:

The corresponding multi annual value. For T en ET the yearly value is returned. For N the winter and summer half yearly sum is returned in tuple. The returned unit is mm or °C.

Return type:

list or number

get_name()
get_neighboor_stids(n=5, only_real=True, p_elev=None, period=None, **kwargs)

Get a list with Station Ids of the nearest neighboor stations.

nint, optional

The number of stations to return. If None, then all the possible stations are returned. The default is 5.

only_real: bool, optional

Should only real station get considered? If false also virtual stations are part of the result. The default is True.

p_elevtuple of float or None, optional

The parameters (P_1, P_2) to weight the height differences between stations. The elevation difference is considered with the formula from LARSIM (equation 3-18 & 3-19 from the LARSIM manual): $L_{gewichtet} = L_{horizontal} * (1 + (

rac{|\delta H|}{P_1})^{P_2})$

If None, then the height difference is not considered and only the nearest stations are returned. literature:

The default is None.

periodutils.TimestampPeriod or None, optional

The period for which the nearest neighboors are returned. The neighboor station needs to have raw data for at least one half of the period. If None, then the availability of the data is not checked. The default is None.

list of int

A list of station Ids in order of distance. The closest station is the first in the list.

get_period_meta(kind, all=False)

Get a specific period from the meta information table.

This functions returns the information from the meta table. In this table there are several periods saved, like the period of the last import.

Parameters:
  • kind (str) – The kind of period to return. Should be one of [‘filled’, ‘raw’, ‘last_imp’]. filled: the maximum filled period of the filled timeserie. raw: the maximum filled timeperiod of the raw data. last_imp: the maximum filled timeperiod of the last import.

  • all (bool, optional) – Should the maximum Timespan for all the filled periods be returned. If False only the period for this station is returned. The default is False.

Returns:

The TimespanPeriod of the station or of all the stations if all=True.

Return type:

TimespanPeriod

Raises:

ValueError – If a wrong kind is handed in.

get_qc(**kwargs)

Get the quality checked timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The quality checked timeserie for this station and the given period.

Return type:

pd.DataFrame

get_qn(**kwargs)[source]
get_raster_value(raster)
get_raw(**kwargs)

Get the raw timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The raw timeserie for this station and the given period.

Return type:

pd.DataFrame

get_richter_class(update_if_fails=True)[source]

Get the richter class for this station.

Provide the data from the meta table.

Parameters:

update_if_fails (bool, optional) – Should the richter class get updatet if no exposition class is found in the meta table? If False and no exposition class was found None is returned. The default is True.

Returns:

The corresponding richter exposition class.

Return type:

string

get_zipfiles(only_new=True, ftp_file_list=None)

Get the zipfiles on the CDC server with the raw data.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

Returns:

A DataFrame of zipfiles and the corresponding modification time on the CDC server to import.

Return type:

pandas.DataFrame or None

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

Returns:

True if the last import of the given kind is already treated.

Return type:

bool

is_real()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is real, false if it is virtual.

Return type:

bool

is_virtual()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is virtual, false if it is real.

Return type:

bool

isin_db()

Check if Station is already in a timeseries table.

Returns:

True if Station has a table in DB, no matter if it is filled or not.

Return type:

bool

isin_ma()

Check if Station is already in the multi annual table.

Returns:

True if Station is in multi annual table.

Return type:

bool

isin_meta()

Check if Station is already in the meta table.

Returns:

True if Station is in meta table.

Return type:

bool

last_imp_corr(_last_imp_period=None)[source]

A wrapper for last_imp_richter_correct().

last_imp_fillup(_last_imp_period=None)

Do the gap filling of the last import.

last_imp_qc()
last_imp_quality_check()

Do the quality check of the last import.

last_imp_richter_correct(_last_imp_period=None)[source]

Do the richter correction of the last import.

Parameters:

_last_imp_period (_type_, optional) – Give the overall period of the last import. This is only for intern use of the stationsN method to not compute over and over again the period. The default is None.

plot(period=(None, None), kind='filled', agg_to=None, **kwargs)

Plot the data of this station.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kind (str, optional) – The data kind to plot. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid. The default is “filled.

  • agg_to (str or None, optional) – Aggregate to a given timespan. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

quality_check(period=(None, None), **kwargs)

Quality check the raw data for a given period.

Parameters:

period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

richter_correct(period=(None, None), **kwargs)[source]

Do the richter correction on the filled data for the given period.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Raises:

Exception – If no richter class was found for this station.

update_horizon(skip_if_exist=True)[source]

Update the horizon angle (Horizontabschirmung) in the meta table.

Get new values from the raster and put in the table.

Parameters:

skip_if_exist (bool, optional) – Skip updating the value if there is already a value in the meta table. The default is True.

Returns:

The horizon angle in degrees (Horizontabschirmung).

Return type:

float

update_ma(skip_if_exist=True, drop_when_error=True)

Update the multi annual values in the stations_raster_values table.

Get new values from the raster and put in the table.

update_period_meta(kind)

Update the time period in the meta file.

Compute teh filled period of a timeserie and save in the meta table.

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “corr” are valid.

update_raw(only_new=True, ftp_file_list=None, remove_nas=True)

Download data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

Returns:

The raw Dataframe of the Stations data.

Return type:

pandas.DataFrame

update_richter_class(skip_if_exist=True)[source]

Update the richter class in the meta table.

Get new values from the raster and put in the table.

Parameters:

skip_if_exist (bool, optional) – Skip updating the value if there is already a value in the meta table. The default is True

Returns:

The richter class name.

Return type:

str

StationT

class weatherDB.station.StationT(id, **kwargs)[source]

Bases: StationTETBase

A class to work with and download temperaure data for one station.

Create a Station object.

Parameters:
  • id (int) – The stations ID.

  • _skip_meta_check (bool, optional) – Should the check if the station is in the database meta file get skiped. Pay attention, when skipping this, because it can lead to problems. This is for computational reasons, because it makes the initialization faster. Is used by the stations classes, because the only initialize objects that are in the meta table. The default is False

Raises:

NotImplementedError – _description_

Public Methods:

__init__(id, **kwargs)

Create a Station object.

get_multi_annual()

Get the multi annual value(s) for this station.

get_adj(**kwargs)

Get the adjusted timeserie.

Inherited from StationTETBase

get_neighboor_stids([p_elev])

Get the 5 nearest stations to this station.

fillup([p_elev])

Set the default P values.

get_adj(**kwargs)

Get the adjusted timeserie.

Inherited from StationCanVirtualBase

isin_meta_n()

Check if Station is in the precipitation meta table.

quality_check([period])

Quality check the raw data for a given period.

Inherited from StationBase

__init__(id[, _skip_meta_check])

Create a Station object.

isin_db()

Check if Station is already in a timeseries table.

isin_meta()

Check if Station is already in the meta table.

isin_ma()

Check if Station is already in the multi annual table.

is_virtual()

Check if the station is a real station or only a virtual one.

is_real()

Check if the station is a real station or only a virtual one.

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

update_period_meta(kind)

Update the time period in the meta file.

update_ma([skip_if_exist, drop_when_error])

Update the multi annual values in the stations_raster_values table.

update_raw([only_new, ftp_file_list, remove_nas])

Download data from CDC and upload to database.

get_zipfiles([only_new, ftp_file_list])

Get the zipfiles on the CDC server with the raw data.

download_raw([only_new])

Download the timeserie from the CDC Server.

quality_check([period])

Quality check the raw data for a given period.

fillup([period])

Fill up missing data with measurements from nearby stations.

last_imp_quality_check()

Do the quality check of the last import.

last_imp_qc()

last_imp_fillup([_last_imp_period])

Do the gap filling of the last import.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos])

Get Information from the meta table.

get_geom([format, crs])

Get the point geometry of the station.

get_geom_shp([crs])

Get the geometry of the station as a shapely Point object.

get_name()

count_holes([weeks, kind, period, ...])

Count holes in timeseries depending on there length.

get_period_meta(kind[, all])

Get a specific period from the meta information table.

get_filled_period(kind[, from_meta])

Get the min and max Timestamp for which there is data in the corresponding timeserie.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_last_imp_period([all])

Get the last imported Period for this Station.

get_neighboor_stids([n, only_real, p_elev, ...])

Get a list with Station Ids of the nearest neighboor stations.

get_multi_annual()

Get the multi annual value(s) for this station.

get_ma()

get_raster_value(raster)

get_coef(other_stid[, in_db_unit])

Get the regionalisation coefficients due to the height.

get_df(kinds[, period, agg_to, nas_allowed, ...])

Get a timeseries DataFrame from the database.

get_raw(**kwargs)

Get the raw timeserie.

get_qc(**kwargs)

Get the quality checked timeserie.

get_dist([period])

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

get_filled([period, with_dist])

Get the filled timeserie.

get_adj(**kwargs)

Get the adjusted timeserie.

plot([period, kind, agg_to])

Plot the data of this station.


count_holes(weeks=[2, 4, 8, 12, 16, 20, 24], kind='qc', period=(None, None), between_meta_period=True, crop_period=False, **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • weeks (list, optional) – A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

  • kind (str) – The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

  • between_meta_period (bool, optional) – Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

  • crop_period (bool, optional) – should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_raw(only_new=False)

Download the timeserie from the CDC Server.

This function only returns the timeserie, but is not updating the database.

Parameters:

only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is False.

Returns:

The Timeseries as a DataFrame with a Timestamp Index.

Return type:

pandas.DataFrame

fillup(p_elev=(250, 1.5), **kwargs)

Set the default P values. See _get_sql_near_median for more informations.

get_adj(**kwargs)[source]

Get the adjusted timeserie.

The timeserie get adjusted to match the multi-annual value over the given period. So the yearly variability is kept and only the whole period is adjusted.

Returns:

The adjusted timeserie with the timestamp as index.

Return type:

pd.DataFrame

get_coef(other_stid, in_db_unit=False)

Get the regionalisation coefficients due to the height.

Those are the values from the dwd grid, HYRAS or REGNIE grids.

Parameters:
  • other_stid (int) – The Station Id of the other station from wich to regionalise for own station.

  • in_db_unit (bool, optional) – Should the coefficients be returned in the unit as stored in the database? This is only relevant for the temperature. The default is False.

Returns:

A list of coefficients. For T, ET and N-daily only the the yearly coefficient is returned. For N the winter and summer half yearly coefficient is returned in tuple. None is returned if either the own or other stations multi-annual value is not available.

Return type:

list of floats or None

get_df(kinds, period=(None, None), agg_to=None, nas_allowed=True, add_na_share=False, db_unit=False, sql_add_where=None, **kwargs)

Get a timeseries DataFrame from the database.

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”. For the precipitation also “qn” and “corr” are valid. If “filled_by” is given together with an aggregation step, the “filled_by” is replaced by the “filled_share”. The “filled_share” gives the share of filled values in the aggregation group in percent.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • agg_to (str or None, optional) – Aggregate to a given timespan. If more than 20% of missing values in the aggregation group, the aggregated value will be None. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • db_unit (bool, optional) – Should the result be in the Database unit. If False the unit is getting converted to normal unit, like mm or °C. The numbers are saved as integer in the database and got therefor multiplied by 10 or 100 to get to an integer. The default is False.

  • sql_add_where (str or None, optional) – additional sql where statement to filter the output. E.g. “EXTRACT(MONTH FROM timestamp) == 2” The default is None

Returns:

The timeserie Dataframe with a DatetimeIndex.

Return type:

pandas.DataFrame

get_dist(period=(None, None))

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Returns:

The timeserie for this station and the given period with the station_id and the distance in meters from which the data got filled from.

Return type:

pd.DataFrame

get_filled(period=(None, None), with_dist=False, **kwargs)

Get the filled timeserie.

Either only the timeserie is returned or also the id of the station from which the station data got filled, together with the distance to this station in m.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • with_dist (bool, optional) – Should the distance to the stations from which the timeseries got filled be added. The default is False.

Returns:

The filled timeserie for this station and the given period.

Return type:

pd.DataFrame

get_filled_period(kind, from_meta=False)

Get the min and max Timestamp for which there is data in the corresponding timeserie.

Computes the period from the timeserie or meta table.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is False.

Raises:
  • NotImplementedError – If the given kind is not valid.

  • ValueError – If the given kind is not a string.

Returns:

A TimestampPeriod of the filled timeserie. (NaT, NaT) if the timeserie is all empty or not defined.

Return type:

util.TimestampPeriod

get_geom(format='EWKT', crs=None)

Get the point geometry of the station.

Parameters:
  • format (str or None, optional) – The format of the geometry to return. Needs to be a format that is understood by Postgresql. ST_AsXXXXX function needs to exist in postgresql language. If None, then the binary representation is returned. the default is “EWKT”.

  • crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

string or bytes representation of the geometry, depending on the selected format.

Return type:

str or bytes

get_geom_shp(crs=None)

Get the geometry of the station as a shapely Point object.

Parameters:

crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

The location of the station as shapely Point.

Return type:

shapely.geometries.Point

get_last_imp_period(all=False)

Get the last imported Period for this Station.

Parameters:

all (bool, optional) – Should the maximum Timespan for all the last imports be returned. If False only the period for this station is returned. The default is False.

Returns:

(minimal datetime, maximal datetime)

Return type:

TimespanPeriod or tuple of datetime.datetime

get_ma()
get_max_period(kinds, nas_allowed=False, **kwargs)

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(infos='all')

Get Information from the meta table.

Parameters:

infos (list of str or str, optional) – A list of the information to get from the database. If “all” then all the information are returned. The default is “all”.

Returns:

dict with the meta information. The first level has one entry per parameter. The second level has one entry per information, asked for. If only one information is asked for, then it is returned as single value and not as subdict.

Return type:

dict or int/string

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_multi_annual()[source]

Get the multi annual value(s) for this station.

Returns:

The corresponding multi annual value. For T en ET the yearly value is returned. For N the winter and summer half yearly sum is returned in tuple. The returned unit is mm or °C.

Return type:

list or number

get_name()
get_neighboor_stids(p_elev=(250, 1.5), **kwargs)

Get the 5 nearest stations to this station.

Parameters:

p_elev (tuple, optional) –

In Larsim those parameters are defined as $P_1 = 500$ and $P_2 = 1$. Stoelzle et al. (2016) found that $P_1 = 100$ and $P_2 = 4$ is better for Baden-Würtemberg to consider the quick changes in topographie. For all of germany, those parameter values are giving too much weight to the elevation difference, which can result in getting neighboor stations from the border of the Tschec Republic for the Feldberg station. Therefor the values $P_1 = 250$ and $P_2 = 1.5$ are used as default values. literature:

The default is (250, 1.5).

Returns:

_description_

Return type:

_type_

get_period_meta(kind, all=False)

Get a specific period from the meta information table.

This functions returns the information from the meta table. In this table there are several periods saved, like the period of the last import.

Parameters:
  • kind (str) – The kind of period to return. Should be one of [‘filled’, ‘raw’, ‘last_imp’]. filled: the maximum filled period of the filled timeserie. raw: the maximum filled timeperiod of the raw data. last_imp: the maximum filled timeperiod of the last import.

  • all (bool, optional) – Should the maximum Timespan for all the filled periods be returned. If False only the period for this station is returned. The default is False.

Returns:

The TimespanPeriod of the station or of all the stations if all=True.

Return type:

TimespanPeriod

Raises:

ValueError – If a wrong kind is handed in.

get_qc(**kwargs)

Get the quality checked timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The quality checked timeserie for this station and the given period.

Return type:

pd.DataFrame

get_raster_value(raster)
get_raw(**kwargs)

Get the raw timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The raw timeserie for this station and the given period.

Return type:

pd.DataFrame

get_zipfiles(only_new=True, ftp_file_list=None)

Get the zipfiles on the CDC server with the raw data.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

Returns:

A DataFrame of zipfiles and the corresponding modification time on the CDC server to import.

Return type:

pandas.DataFrame or None

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

Returns:

True if the last import of the given kind is already treated.

Return type:

bool

is_real()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is real, false if it is virtual.

Return type:

bool

is_virtual()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is virtual, false if it is real.

Return type:

bool

isin_db()

Check if Station is already in a timeseries table.

Returns:

True if Station has a table in DB, no matter if it is filled or not.

Return type:

bool

isin_ma()

Check if Station is already in the multi annual table.

Returns:

True if Station is in multi annual table.

Return type:

bool

isin_meta()

Check if Station is already in the meta table.

Returns:

True if Station is in meta table.

Return type:

bool

isin_meta_n()

Check if Station is in the precipitation meta table.

Returns:

True if Station is in the precipitation meta table.

Return type:

bool

last_imp_fillup(_last_imp_period=None)

Do the gap filling of the last import.

last_imp_qc()
last_imp_quality_check()

Do the quality check of the last import.

plot(period=(None, None), kind='filled', agg_to=None, **kwargs)

Plot the data of this station.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kind (str, optional) – The data kind to plot. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid. The default is “filled.

  • agg_to (str or None, optional) – Aggregate to a given timespan. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

quality_check(period=(None, None), **kwargs)

Quality check the raw data for a given period.

Parameters:

period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

update_ma(skip_if_exist=True, drop_when_error=True)

Update the multi annual values in the stations_raster_values table.

Get new values from the raster and put in the table.

update_period_meta(kind)

Update the time period in the meta file.

Compute teh filled period of a timeserie and save in the meta table.

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “corr” are valid.

update_raw(only_new=True, ftp_file_list=None, remove_nas=True)

Download data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

Returns:

The raw Dataframe of the Stations data.

Return type:

pandas.DataFrame

StationET

class weatherDB.station.StationET(id, **kwargs)[source]

Bases: StationTETBase

A class to work with and download potential Evapotranspiration (VPGB) data for one station.

Create a Station object.

Parameters:
  • id (int) – The stations ID.

  • _skip_meta_check (bool, optional) – Should the check if the station is in the database meta file get skiped. Pay attention, when skipping this, because it can lead to problems. This is for computational reasons, because it makes the initialization faster. Is used by the stations classes, because the only initialize objects that are in the meta table. The default is False

Raises:

NotImplementedError – _description_

Public Methods:

__init__(id, **kwargs)

Create a Station object.

get_adj(**kwargs)

Get the adjusted timeserie.

Inherited from StationTETBase

get_neighboor_stids([p_elev])

Get the 5 nearest stations to this station.

fillup([p_elev])

Set the default P values.

get_adj(**kwargs)

Get the adjusted timeserie.

Inherited from StationCanVirtualBase

isin_meta_n()

Check if Station is in the precipitation meta table.

quality_check([period])

Quality check the raw data for a given period.

Inherited from StationBase

__init__(id[, _skip_meta_check])

Create a Station object.

isin_db()

Check if Station is already in a timeseries table.

isin_meta()

Check if Station is already in the meta table.

isin_ma()

Check if Station is already in the multi annual table.

is_virtual()

Check if the station is a real station or only a virtual one.

is_real()

Check if the station is a real station or only a virtual one.

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

update_period_meta(kind)

Update the time period in the meta file.

update_ma([skip_if_exist, drop_when_error])

Update the multi annual values in the stations_raster_values table.

update_raw([only_new, ftp_file_list, remove_nas])

Download data from CDC and upload to database.

get_zipfiles([only_new, ftp_file_list])

Get the zipfiles on the CDC server with the raw data.

download_raw([only_new])

Download the timeserie from the CDC Server.

quality_check([period])

Quality check the raw data for a given period.

fillup([period])

Fill up missing data with measurements from nearby stations.

last_imp_quality_check()

Do the quality check of the last import.

last_imp_qc()

last_imp_fillup([_last_imp_period])

Do the gap filling of the last import.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos])

Get Information from the meta table.

get_geom([format, crs])

Get the point geometry of the station.

get_geom_shp([crs])

Get the geometry of the station as a shapely Point object.

get_name()

count_holes([weeks, kind, period, ...])

Count holes in timeseries depending on there length.

get_period_meta(kind[, all])

Get a specific period from the meta information table.

get_filled_period(kind[, from_meta])

Get the min and max Timestamp for which there is data in the corresponding timeserie.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_last_imp_period([all])

Get the last imported Period for this Station.

get_neighboor_stids([n, only_real, p_elev, ...])

Get a list with Station Ids of the nearest neighboor stations.

get_multi_annual()

Get the multi annual value(s) for this station.

get_ma()

get_raster_value(raster)

get_coef(other_stid[, in_db_unit])

Get the regionalisation coefficients due to the height.

get_df(kinds[, period, agg_to, nas_allowed, ...])

Get a timeseries DataFrame from the database.

get_raw(**kwargs)

Get the raw timeserie.

get_qc(**kwargs)

Get the quality checked timeserie.

get_dist([period])

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

get_filled([period, with_dist])

Get the filled timeserie.

get_adj(**kwargs)

Get the adjusted timeserie.

plot([period, kind, agg_to])

Plot the data of this station.


count_holes(weeks=[2, 4, 8, 12, 16, 20, 24], kind='qc', period=(None, None), between_meta_period=True, crop_period=False, **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • weeks (list, optional) – A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

  • kind (str) – The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

  • between_meta_period (bool, optional) – Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

  • crop_period (bool, optional) – should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_raw(only_new=False)

Download the timeserie from the CDC Server.

This function only returns the timeserie, but is not updating the database.

Parameters:

only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is False.

Returns:

The Timeseries as a DataFrame with a Timestamp Index.

Return type:

pandas.DataFrame

fillup(p_elev=(250, 1.5), **kwargs)

Set the default P values. See _get_sql_near_median for more informations.

get_adj(**kwargs)[source]

Get the adjusted timeserie.

The timeserie get adjusted to match the multi-annual value over the given period. So the yearly variability is kept and only the whole period is adjusted.

Returns:

The adjusted timeserie with the timestamp as index.

Return type:

pd.DataFrame

get_coef(other_stid, in_db_unit=False)

Get the regionalisation coefficients due to the height.

Those are the values from the dwd grid, HYRAS or REGNIE grids.

Parameters:
  • other_stid (int) – The Station Id of the other station from wich to regionalise for own station.

  • in_db_unit (bool, optional) – Should the coefficients be returned in the unit as stored in the database? This is only relevant for the temperature. The default is False.

Returns:

A list of coefficients. For T, ET and N-daily only the the yearly coefficient is returned. For N the winter and summer half yearly coefficient is returned in tuple. None is returned if either the own or other stations multi-annual value is not available.

Return type:

list of floats or None

get_df(kinds, period=(None, None), agg_to=None, nas_allowed=True, add_na_share=False, db_unit=False, sql_add_where=None, **kwargs)

Get a timeseries DataFrame from the database.

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”. For the precipitation also “qn” and “corr” are valid. If “filled_by” is given together with an aggregation step, the “filled_by” is replaced by the “filled_share”. The “filled_share” gives the share of filled values in the aggregation group in percent.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • agg_to (str or None, optional) – Aggregate to a given timespan. If more than 20% of missing values in the aggregation group, the aggregated value will be None. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • db_unit (bool, optional) – Should the result be in the Database unit. If False the unit is getting converted to normal unit, like mm or °C. The numbers are saved as integer in the database and got therefor multiplied by 10 or 100 to get to an integer. The default is False.

  • sql_add_where (str or None, optional) – additional sql where statement to filter the output. E.g. “EXTRACT(MONTH FROM timestamp) == 2” The default is None

Returns:

The timeserie Dataframe with a DatetimeIndex.

Return type:

pandas.DataFrame

get_dist(period=(None, None))

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Returns:

The timeserie for this station and the given period with the station_id and the distance in meters from which the data got filled from.

Return type:

pd.DataFrame

get_filled(period=(None, None), with_dist=False, **kwargs)

Get the filled timeserie.

Either only the timeserie is returned or also the id of the station from which the station data got filled, together with the distance to this station in m.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • with_dist (bool, optional) – Should the distance to the stations from which the timeseries got filled be added. The default is False.

Returns:

The filled timeserie for this station and the given period.

Return type:

pd.DataFrame

get_filled_period(kind, from_meta=False)

Get the min and max Timestamp for which there is data in the corresponding timeserie.

Computes the period from the timeserie or meta table.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is False.

Raises:
  • NotImplementedError – If the given kind is not valid.

  • ValueError – If the given kind is not a string.

Returns:

A TimestampPeriod of the filled timeserie. (NaT, NaT) if the timeserie is all empty or not defined.

Return type:

util.TimestampPeriod

get_geom(format='EWKT', crs=None)

Get the point geometry of the station.

Parameters:
  • format (str or None, optional) – The format of the geometry to return. Needs to be a format that is understood by Postgresql. ST_AsXXXXX function needs to exist in postgresql language. If None, then the binary representation is returned. the default is “EWKT”.

  • crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

string or bytes representation of the geometry, depending on the selected format.

Return type:

str or bytes

get_geom_shp(crs=None)

Get the geometry of the station as a shapely Point object.

Parameters:

crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

The location of the station as shapely Point.

Return type:

shapely.geometries.Point

get_last_imp_period(all=False)

Get the last imported Period for this Station.

Parameters:

all (bool, optional) – Should the maximum Timespan for all the last imports be returned. If False only the period for this station is returned. The default is False.

Returns:

(minimal datetime, maximal datetime)

Return type:

TimespanPeriod or tuple of datetime.datetime

get_ma()
get_max_period(kinds, nas_allowed=False, **kwargs)

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(infos='all')

Get Information from the meta table.

Parameters:

infos (list of str or str, optional) – A list of the information to get from the database. If “all” then all the information are returned. The default is “all”.

Returns:

dict with the meta information. The first level has one entry per parameter. The second level has one entry per information, asked for. If only one information is asked for, then it is returned as single value and not as subdict.

Return type:

dict or int/string

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_multi_annual()

Get the multi annual value(s) for this station.

Returns:

The corresponding multi annual value. For T en ET the yearly value is returned. For N the winter and summer half yearly sum is returned in tuple. The returned unit is mm or °C.

Return type:

list or number

get_name()
get_neighboor_stids(p_elev=(250, 1.5), **kwargs)

Get the 5 nearest stations to this station.

Parameters:

p_elev (tuple, optional) –

In Larsim those parameters are defined as $P_1 = 500$ and $P_2 = 1$. Stoelzle et al. (2016) found that $P_1 = 100$ and $P_2 = 4$ is better for Baden-Würtemberg to consider the quick changes in topographie. For all of germany, those parameter values are giving too much weight to the elevation difference, which can result in getting neighboor stations from the border of the Tschec Republic for the Feldberg station. Therefor the values $P_1 = 250$ and $P_2 = 1.5$ are used as default values. literature:

The default is (250, 1.5).

Returns:

_description_

Return type:

_type_

get_period_meta(kind, all=False)

Get a specific period from the meta information table.

This functions returns the information from the meta table. In this table there are several periods saved, like the period of the last import.

Parameters:
  • kind (str) – The kind of period to return. Should be one of [‘filled’, ‘raw’, ‘last_imp’]. filled: the maximum filled period of the filled timeserie. raw: the maximum filled timeperiod of the raw data. last_imp: the maximum filled timeperiod of the last import.

  • all (bool, optional) – Should the maximum Timespan for all the filled periods be returned. If False only the period for this station is returned. The default is False.

Returns:

The TimespanPeriod of the station or of all the stations if all=True.

Return type:

TimespanPeriod

Raises:

ValueError – If a wrong kind is handed in.

get_qc(**kwargs)

Get the quality checked timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The quality checked timeserie for this station and the given period.

Return type:

pd.DataFrame

get_raster_value(raster)
get_raw(**kwargs)

Get the raw timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The raw timeserie for this station and the given period.

Return type:

pd.DataFrame

get_zipfiles(only_new=True, ftp_file_list=None)

Get the zipfiles on the CDC server with the raw data.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

Returns:

A DataFrame of zipfiles and the corresponding modification time on the CDC server to import.

Return type:

pandas.DataFrame or None

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

Returns:

True if the last import of the given kind is already treated.

Return type:

bool

is_real()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is real, false if it is virtual.

Return type:

bool

is_virtual()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is virtual, false if it is real.

Return type:

bool

isin_db()

Check if Station is already in a timeseries table.

Returns:

True if Station has a table in DB, no matter if it is filled or not.

Return type:

bool

isin_ma()

Check if Station is already in the multi annual table.

Returns:

True if Station is in multi annual table.

Return type:

bool

isin_meta()

Check if Station is already in the meta table.

Returns:

True if Station is in meta table.

Return type:

bool

isin_meta_n()

Check if Station is in the precipitation meta table.

Returns:

True if Station is in the precipitation meta table.

Return type:

bool

last_imp_fillup(_last_imp_period=None)

Do the gap filling of the last import.

last_imp_qc()
last_imp_quality_check()

Do the quality check of the last import.

plot(period=(None, None), kind='filled', agg_to=None, **kwargs)

Plot the data of this station.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kind (str, optional) – The data kind to plot. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid. The default is “filled.

  • agg_to (str or None, optional) – Aggregate to a given timespan. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

quality_check(period=(None, None), **kwargs)

Quality check the raw data for a given period.

Parameters:

period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

update_ma(skip_if_exist=True, drop_when_error=True)

Update the multi annual values in the stations_raster_values table.

Get new values from the raster and put in the table.

update_period_meta(kind)

Update the time period in the meta file.

Compute teh filled period of a timeserie and save in the meta table.

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “corr” are valid.

update_raw(only_new=True, ftp_file_list=None, remove_nas=True)

Download data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

Returns:

The raw Dataframe of the Stations data.

Return type:

pandas.DataFrame

StationND

GroupStation

class weatherDB.station.GroupStation(id, error_if_missing=True, **kwargs)[source]

Bases: object

A class to group all possible parameters of one station.

So if you want to create the input files for a simulation, where you need T, ET and N, use this class to download the data for one station.

Public Methods:

__init__(id[, error_if_missing])

get_available_paras([short])

Get the possible parameters for this station.

get_filled_period([kinds, from_meta, join_how])

Get the combined filled period for all 3 stations.

get_df([period, kinds, paras, agg_to, ...])

Get a DataFrame with the corresponding data.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_meta([paras])

Get the meta information for every parameter of this station.

get_geom()

get_name()

create_roger_ts(dir[, period, kind, r_r0, ...])

Create the timeserie files for roger as csv.

create_ts(dir[, period, kinds, paras, ...])

Create the timeserie files as csv.


create_roger_ts(dir, period=(None, None), kind='best', r_r0=1, add_t_min=False, add_t_max=False, do_toolbox_format=False, **kwargs)[source]

Create the timeserie files for roger as csv.

This is only a wrapper function for create_ts with some standard settings.

Parameters:
  • dir (pathlib like object or zipfile.ZipFile) – The directory or Zipfile to store the timeseries in. If a zipfile is given a folder with the statiopns ID is added to the filepath.

  • period (TimestampPeriod like object, optional) – The period for which to get the timeseries. If (None, None) is entered, then the maximal possible period is computed. The default is (None, None)

  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • r_r0 (int or float, list of int or float or None, optional) – Should the ET timeserie contain a column with R/R0. If None, then no column is added. If int or float, then a R/R0 column is appended with this number as standard value. If list of int or floats, then the list should have the same length as the ET-timeserie and is appanded to the Timeserie. If pd.Series, then the index should be a timestamp index. The serie is then joined to the ET timeserie. The default is 1.

  • add_t_min=False (bool, optional) – Schould the minimal temperature value get added? The default is False.

  • add_t_max=False (bool, optional) – Schould the maximal temperature value get added? The default is False.

  • do_toolbox_format (bool, optional) – Should the timeseries be saved in the RoGeR toolbox format? (have a look at the RoGeR examples in https://github.com/Hydrology-IFH/roger) The default is False.

  • **kwargs – additional parameters for Station.get_df

Raises:

Warning – If there are NAs in the timeseries or the period got changed.

create_ts(dir, period=(None, None), kinds='best', paras='all', agg_to='10 min', r_r0=None, split_date=False, nas_allowed=True, add_na_share=False, add_t_min=False, add_t_max=False, add_meta=True, file_names={}, col_names={}, keep_date_parts=False, **kwargs)[source]

Create the timeserie files as csv.

Parameters:
  • dir (pathlib like object or zipfile.ZipFile) – The directory or Zipfile to store the timeseries in. If a zipfile is given a folder with the statiopns ID is added to the filepath.

  • period (TimestampPeriod like object, optional) – The period for which to get the timeseries. If (None, None) is entered, then the maximal possible period is computed. The default is (None, None)

  • kinds (str or list of str) – The data kinds to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid. If only one kind is asked for, then the columns get renamed to only have the parameter name as column name.

  • paras (list of str or str, optional) – Give the parameters for which to get the meta information. Can be “n”, “t”, “et” or “all”. If “all”, then every available station parameter is returned. The default is “all”

  • agg_to (str, optional) – To what aggregation level should the timeseries get aggregated to. The minimum aggregation for Temperatur and ET is daily and for the precipitation it is 10 minutes. If a smaller aggregation is selected the minimum possible aggregation for the respective parameter is returned. So if 10 minutes is selected, than precipitation is returned in 10 minuets and T and ET as daily. The default is “10 min”.

  • r_r0 (int or float or None or pd.Series or list, optional) – Should the ET timeserie contain a column with R/R0. If None, then no column is added. If int, then a R/R0 column is appended with this number as standard value. If list of int or floats, then the list should have the same length as the ET-timeserie and is appanded to the Timeserie. If pd.Series, then the index should be a timestamp index. The serie is then joined to the ET timeserie. The default is None.

  • split_date (bool, optional) – Should the timestamp get splitted into parts, so one column for year, one for month etc.? If False the timestamp is saved in one column as string.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • add_t_min=False (bool, optional) – Should the minimal temperature value get added? The default is False.

  • add_t_max=False (bool, optional) – Should the maximal temperature value get added? The default is False.

  • add_meta (bool, optional) – Should station Meta information like name and Location (lat, long) be added to the file? The default is True.

  • file_names (dict, optional) – A dictionary with the file names for the different parameters. e.g.{“N”:”PREC.txt”, “T”:”TA.txt”, “ET”:”ET.txt”} If an empty dictionary is given, then the standard names are used. The default is {}.

  • col_names (dict, optional) – A dictionary with the column names for the different parameters. e.g.{“N”:”PREC”, “T”:”TA”, “ET”:”ET”, “Jahr”:”YYYY”, “Monat”:”MM”, “Tag”:”DD”, “Stunde”:”HH”, “Minute”:”MN”} If an empty dictionary is given, then the standard names are used. The default is {}.

  • keep_date_parts (bool, optional) – only used if split_date is True. Should the date parts that are not needed, e.g. hour value for daily timeseries, be kept? If False, then the columns that are not needed are dropped. The default is False.

  • **kwargs – additional parameters for Station.get_df

Raises:

Warning – If there are NAs in the timeseries and nas_allowed is False or the period got changed.

get_available_paras(short=False)[source]

Get the possible parameters for this station.

Parameters:

short (bool, optional) – Should the short name of the parameters be returned. The default is “long”.

Returns:

A list of the long parameter names that are possible for this station to get.

Return type:

list of str

get_df(period=(None, None), kinds='best', paras='all', agg_to='day', nas_allowed=True, add_na_share=False, add_t_min=False, add_t_max=False, **kwargs)[source]

Get a DataFrame with the corresponding data.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kinds (str or list of str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “best”(“corr” for N and “filled” for T and ET). If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • agg_to (str, optional) – To what aggregation level should the timeseries get aggregated to. The minimum aggregation for Temperatur and ET is daily and for the precipitation it is 10 minutes. If a smaller aggregation is selected the minimum possible aggregation for the respective parameter is returned. So if 10 minutes is selected, than precipitation is returned in 10 minuets and T and ET as daily. The default is “10 min”.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • paras (list of str or str, optional) – Give the parameters for which to get the meta information. Can be “n”, “t”, “et” or “all”. If “all”, then every available station parameter is returned. The default is “all”

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • add_t_min (bool, optional) – Should the minimal temperature value get added? The default is False.

  • add_t_max (bool, optional) – Should the maximal temperature value get added? The default is False.

Returns:

A DataFrame with the timeseries for this station and the given period.

Return type:

pd.Dataframe

get_filled_period(kinds='best', from_meta=True, join_how='inner')[source]

Get the combined filled period for all 3 stations.

This is the maximum possible timerange for these stations.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is True.

  • join_how (str, optional) – How should the different periods get joined. If “inner” then the minimal period that is inside of all the filled_periods is returned. If “outer” then the maximal possible period is returned. The default is “inner”.

Returns:

The maximum filled period for the 3 parameters for this station.

Return type:

TimestampPeriod

get_geom()[source]
get_max_period(kinds, nas_allowed=False)[source]

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(paras='all', **kwargs)[source]

Get the meta information for every parameter of this station.

Parameters:
  • paras (list of str or str, optional) – Give the parameters for which to get the meta information. Can be “n”, “t”, “et” or “all”. If “all”, then every available station parameter is returned. The default is “all”

  • kwargs (dict, optional) – The optional keyword arguments are handed to the single Station get_meta methods. Can be e.g. “info”.

Returns:

dict with the information. there is one subdict per parameter. If only one parameter is asked for, then there is no subdict, but only a single value.

Return type:

dict

classmethod get_meta_explanation(infos='all')[source]

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_name()[source]

StationBase…

Those are the base station classes on which the real station classes above depend on. None of them is working on its own, because the class variables are not yet set correctly.

class weatherDB.station.StationBase(id, _skip_meta_check=False)[source]

Bases: object

This is the Base class for one Station. It is not working on it’s own, because those parameters need to get defined in the real classes

Create a Station object.

Parameters:
  • id (int) – The stations ID.

  • _skip_meta_check (bool, optional) – Should the check if the station is in the database meta file get skiped. Pay attention, when skipping this, because it can lead to problems. This is for computational reasons, because it makes the initialization faster. Is used by the stations classes, because the only initialize objects that are in the meta table. The default is False

Raises:

NotImplementedError – _description_

Public Methods:

__init__(id[, _skip_meta_check])

Create a Station object.

isin_db()

Check if Station is already in a timeseries table.

isin_meta()

Check if Station is already in the meta table.

isin_ma()

Check if Station is already in the multi annual table.

is_virtual()

Check if the station is a real station or only a virtual one.

is_real()

Check if the station is a real station or only a virtual one.

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

update_period_meta(kind)

Update the time period in the meta file.

update_ma([skip_if_exist, drop_when_error])

Update the multi annual values in the stations_raster_values table.

update_raw([only_new, ftp_file_list, remove_nas])

Download data from CDC and upload to database.

get_zipfiles([only_new, ftp_file_list])

Get the zipfiles on the CDC server with the raw data.

download_raw([only_new])

Download the timeserie from the CDC Server.

quality_check([period])

Quality check the raw data for a given period.

fillup([period])

Fill up missing data with measurements from nearby stations.

last_imp_quality_check()

Do the quality check of the last import.

last_imp_qc()

last_imp_fillup([_last_imp_period])

Do the gap filling of the last import.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos])

Get Information from the meta table.

get_geom([format, crs])

Get the point geometry of the station.

get_geom_shp([crs])

Get the geometry of the station as a shapely Point object.

get_name()

count_holes([weeks, kind, period, ...])

Count holes in timeseries depending on there length.

get_period_meta(kind[, all])

Get a specific period from the meta information table.

get_filled_period(kind[, from_meta])

Get the min and max Timestamp for which there is data in the corresponding timeserie.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_last_imp_period([all])

Get the last imported Period for this Station.

get_neighboor_stids([n, only_real, p_elev, ...])

Get a list with Station Ids of the nearest neighboor stations.

get_multi_annual()

Get the multi annual value(s) for this station.

get_ma()

get_raster_value(raster)

get_coef(other_stid[, in_db_unit])

Get the regionalisation coefficients due to the height.

get_df(kinds[, period, agg_to, nas_allowed, ...])

Get a timeseries DataFrame from the database.

get_raw(**kwargs)

Get the raw timeserie.

get_qc(**kwargs)

Get the quality checked timeserie.

get_dist([period])

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

get_filled([period, with_dist])

Get the filled timeserie.

get_adj(**kwargs)

Get the adjusted timeserie.

plot([period, kind, agg_to])

Plot the data of this station.


count_holes(weeks=[2, 4, 8, 12, 16, 20, 24], kind='qc', period=(None, None), between_meta_period=True, crop_period=False, **kwargs)[source]

Count holes in timeseries depending on there length.

Parameters:
  • weeks (list, optional) – A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

  • kind (str) – The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

  • between_meta_period (bool, optional) – Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

  • crop_period (bool, optional) – should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_raw(only_new=False)[source]

Download the timeserie from the CDC Server.

This function only returns the timeserie, but is not updating the database.

Parameters:

only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is False.

Returns:

The Timeseries as a DataFrame with a Timestamp Index.

Return type:

pandas.DataFrame

fillup(period=(None, None), **kwargs)[source]

Fill up missing data with measurements from nearby stations.

Parameters:
  • period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to gap fill the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kwargs (dict, optional) – Additional arguments for the fillup function. e.g. p_elev to consider the elevation to select nearest stations. (only for T and ET)

get_adj(**kwargs)[source]

Get the adjusted timeserie.

The timeserie is adjusted to the multi annual mean. So the overall mean of the given period will be the same as the multi annual mean.

Parameters:

kwargs (dict, optional) – The keyword arguments are passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”.

Returns:

A timeserie with the adjusted data.

Return type:

pandas.DataFrame

get_coef(other_stid, in_db_unit=False)[source]

Get the regionalisation coefficients due to the height.

Those are the values from the dwd grid, HYRAS or REGNIE grids.

Parameters:
  • other_stid (int) – The Station Id of the other station from wich to regionalise for own station.

  • in_db_unit (bool, optional) – Should the coefficients be returned in the unit as stored in the database? This is only relevant for the temperature. The default is False.

Returns:

A list of coefficients. For T, ET and N-daily only the the yearly coefficient is returned. For N the winter and summer half yearly coefficient is returned in tuple. None is returned if either the own or other stations multi-annual value is not available.

Return type:

list of floats or None

get_df(kinds, period=(None, None), agg_to=None, nas_allowed=True, add_na_share=False, db_unit=False, sql_add_where=None, **kwargs)[source]

Get a timeseries DataFrame from the database.

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”. For the precipitation also “qn” and “corr” are valid. If “filled_by” is given together with an aggregation step, the “filled_by” is replaced by the “filled_share”. The “filled_share” gives the share of filled values in the aggregation group in percent.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • agg_to (str or None, optional) – Aggregate to a given timespan. If more than 20% of missing values in the aggregation group, the aggregated value will be None. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • db_unit (bool, optional) – Should the result be in the Database unit. If False the unit is getting converted to normal unit, like mm or °C. The numbers are saved as integer in the database and got therefor multiplied by 10 or 100 to get to an integer. The default is False.

  • sql_add_where (str or None, optional) – additional sql where statement to filter the output. E.g. “EXTRACT(MONTH FROM timestamp) == 2” The default is None

Returns:

The timeserie Dataframe with a DatetimeIndex.

Return type:

pandas.DataFrame

get_dist(period=(None, None))[source]

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Returns:

The timeserie for this station and the given period with the station_id and the distance in meters from which the data got filled from.

Return type:

pd.DataFrame

get_filled(period=(None, None), with_dist=False, **kwargs)[source]

Get the filled timeserie.

Either only the timeserie is returned or also the id of the station from which the station data got filled, together with the distance to this station in m.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • with_dist (bool, optional) – Should the distance to the stations from which the timeseries got filled be added. The default is False.

Returns:

The filled timeserie for this station and the given period.

Return type:

pd.DataFrame

get_filled_period(kind, from_meta=False)[source]

Get the min and max Timestamp for which there is data in the corresponding timeserie.

Computes the period from the timeserie or meta table.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is False.

Raises:
  • NotImplementedError – If the given kind is not valid.

  • ValueError – If the given kind is not a string.

Returns:

A TimestampPeriod of the filled timeserie. (NaT, NaT) if the timeserie is all empty or not defined.

Return type:

util.TimestampPeriod

get_geom(format='EWKT', crs=None)[source]

Get the point geometry of the station.

Parameters:
  • format (str or None, optional) – The format of the geometry to return. Needs to be a format that is understood by Postgresql. ST_AsXXXXX function needs to exist in postgresql language. If None, then the binary representation is returned. the default is “EWKT”.

  • crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

string or bytes representation of the geometry, depending on the selected format.

Return type:

str or bytes

get_geom_shp(crs=None)[source]

Get the geometry of the station as a shapely Point object.

Parameters:

crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

The location of the station as shapely Point.

Return type:

shapely.geometries.Point

get_last_imp_period(all=False)[source]

Get the last imported Period for this Station.

Parameters:

all (bool, optional) – Should the maximum Timespan for all the last imports be returned. If False only the period for this station is returned. The default is False.

Returns:

(minimal datetime, maximal datetime)

Return type:

TimespanPeriod or tuple of datetime.datetime

get_ma()[source]
get_max_period(kinds, nas_allowed=False, **kwargs)[source]

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(infos='all')[source]

Get Information from the meta table.

Parameters:

infos (list of str or str, optional) – A list of the information to get from the database. If “all” then all the information are returned. The default is “all”.

Returns:

dict with the meta information. The first level has one entry per parameter. The second level has one entry per information, asked for. If only one information is asked for, then it is returned as single value and not as subdict.

Return type:

dict or int/string

classmethod get_meta_explanation(infos='all')[source]

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_multi_annual()[source]

Get the multi annual value(s) for this station.

Returns:

The corresponding multi annual value. For T en ET the yearly value is returned. For N the winter and summer half yearly sum is returned in tuple. The returned unit is mm or °C.

Return type:

list or number

get_name()[source]
get_neighboor_stids(n=5, only_real=True, p_elev=None, period=None, **kwargs)[source]

Get a list with Station Ids of the nearest neighboor stations.

nint, optional

The number of stations to return. If None, then all the possible stations are returned. The default is 5.

only_real: bool, optional

Should only real station get considered? If false also virtual stations are part of the result. The default is True.

p_elevtuple of float or None, optional

The parameters (P_1, P_2) to weight the height differences between stations. The elevation difference is considered with the formula from LARSIM (equation 3-18 & 3-19 from the LARSIM manual): $L_{gewichtet} = L_{horizontal} * (1 + (

rac{|\delta H|}{P_1})^{P_2})$

If None, then the height difference is not considered and only the nearest stations are returned. literature:

The default is None.

periodutils.TimestampPeriod or None, optional

The period for which the nearest neighboors are returned. The neighboor station needs to have raw data for at least one half of the period. If None, then the availability of the data is not checked. The default is None.

list of int

A list of station Ids in order of distance. The closest station is the first in the list.

get_period_meta(kind, all=False)[source]

Get a specific period from the meta information table.

This functions returns the information from the meta table. In this table there are several periods saved, like the period of the last import.

Parameters:
  • kind (str) – The kind of period to return. Should be one of [‘filled’, ‘raw’, ‘last_imp’]. filled: the maximum filled period of the filled timeserie. raw: the maximum filled timeperiod of the raw data. last_imp: the maximum filled timeperiod of the last import.

  • all (bool, optional) – Should the maximum Timespan for all the filled periods be returned. If False only the period for this station is returned. The default is False.

Returns:

The TimespanPeriod of the station or of all the stations if all=True.

Return type:

TimespanPeriod

Raises:

ValueError – If a wrong kind is handed in.

get_qc(**kwargs)[source]

Get the quality checked timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The quality checked timeserie for this station and the given period.

Return type:

pd.DataFrame

get_raster_value(raster)[source]
get_raw(**kwargs)[source]

Get the raw timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The raw timeserie for this station and the given period.

Return type:

pd.DataFrame

get_zipfiles(only_new=True, ftp_file_list=None)[source]

Get the zipfiles on the CDC server with the raw data.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

Returns:

A DataFrame of zipfiles and the corresponding modification time on the CDC server to import.

Return type:

pandas.DataFrame or None

is_last_imp_done(kind)[source]

Is the last import for the given kind already worked in?

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

Returns:

True if the last import of the given kind is already treated.

Return type:

bool

is_real()[source]

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is real, false if it is virtual.

Return type:

bool

is_virtual()[source]

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is virtual, false if it is real.

Return type:

bool

isin_db()[source]

Check if Station is already in a timeseries table.

Returns:

True if Station has a table in DB, no matter if it is filled or not.

Return type:

bool

isin_ma()[source]

Check if Station is already in the multi annual table.

Returns:

True if Station is in multi annual table.

Return type:

bool

isin_meta()[source]

Check if Station is already in the meta table.

Returns:

True if Station is in meta table.

Return type:

bool

last_imp_fillup(_last_imp_period=None)[source]

Do the gap filling of the last import.

last_imp_qc()[source]
last_imp_quality_check()[source]

Do the quality check of the last import.

plot(period=(None, None), kind='filled', agg_to=None, **kwargs)[source]

Plot the data of this station.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kind (str, optional) – The data kind to plot. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid. The default is “filled.

  • agg_to (str or None, optional) – Aggregate to a given timespan. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

quality_check(period=(None, None), **kwargs)[source]

Quality check the raw data for a given period.

Parameters:

period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

update_ma(skip_if_exist=True, drop_when_error=True)[source]

Update the multi annual values in the stations_raster_values table.

Get new values from the raster and put in the table.

update_period_meta(kind)[source]

Update the time period in the meta file.

Compute teh filled period of a timeserie and save in the meta table.

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “corr” are valid.

update_raw(only_new=True, ftp_file_list=None, remove_nas=True)[source]

Download data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

Returns:

The raw Dataframe of the Stations data.

Return type:

pandas.DataFrame

class weatherDB.station.StationNBase(id, _skip_meta_check=False)[source]

Bases: StationBase

Create a Station object.

Parameters:
  • id (int) – The stations ID.

  • _skip_meta_check (bool, optional) – Should the check if the station is in the database meta file get skiped. Pay attention, when skipping this, because it can lead to problems. This is for computational reasons, because it makes the initialization faster. Is used by the stations classes, because the only initialize objects that are in the meta table. The default is False

Raises:

NotImplementedError – _description_

Public Methods:

get_adj(**kwargs)

Get the adjusted timeserie.

Inherited from StationBase

__init__(id[, _skip_meta_check])

Create a Station object.

isin_db()

Check if Station is already in a timeseries table.

isin_meta()

Check if Station is already in the meta table.

isin_ma()

Check if Station is already in the multi annual table.

is_virtual()

Check if the station is a real station or only a virtual one.

is_real()

Check if the station is a real station or only a virtual one.

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

update_period_meta(kind)

Update the time period in the meta file.

update_ma([skip_if_exist, drop_when_error])

Update the multi annual values in the stations_raster_values table.

update_raw([only_new, ftp_file_list, remove_nas])

Download data from CDC and upload to database.

get_zipfiles([only_new, ftp_file_list])

Get the zipfiles on the CDC server with the raw data.

download_raw([only_new])

Download the timeserie from the CDC Server.

quality_check([period])

Quality check the raw data for a given period.

fillup([period])

Fill up missing data with measurements from nearby stations.

last_imp_quality_check()

Do the quality check of the last import.

last_imp_qc()

last_imp_fillup([_last_imp_period])

Do the gap filling of the last import.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos])

Get Information from the meta table.

get_geom([format, crs])

Get the point geometry of the station.

get_geom_shp([crs])

Get the geometry of the station as a shapely Point object.

get_name()

count_holes([weeks, kind, period, ...])

Count holes in timeseries depending on there length.

get_period_meta(kind[, all])

Get a specific period from the meta information table.

get_filled_period(kind[, from_meta])

Get the min and max Timestamp for which there is data in the corresponding timeserie.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_last_imp_period([all])

Get the last imported Period for this Station.

get_neighboor_stids([n, only_real, p_elev, ...])

Get a list with Station Ids of the nearest neighboor stations.

get_multi_annual()

Get the multi annual value(s) for this station.

get_ma()

get_raster_value(raster)

get_coef(other_stid[, in_db_unit])

Get the regionalisation coefficients due to the height.

get_df(kinds[, period, agg_to, nas_allowed, ...])

Get a timeseries DataFrame from the database.

get_raw(**kwargs)

Get the raw timeserie.

get_qc(**kwargs)

Get the quality checked timeserie.

get_dist([period])

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

get_filled([period, with_dist])

Get the filled timeserie.

get_adj(**kwargs)

Get the adjusted timeserie.

plot([period, kind, agg_to])

Plot the data of this station.


count_holes(weeks=[2, 4, 8, 12, 16, 20, 24], kind='qc', period=(None, None), between_meta_period=True, crop_period=False, **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • weeks (list, optional) – A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

  • kind (str) – The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

  • between_meta_period (bool, optional) – Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

  • crop_period (bool, optional) – should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_raw(only_new=False)

Download the timeserie from the CDC Server.

This function only returns the timeserie, but is not updating the database.

Parameters:

only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is False.

Returns:

The Timeseries as a DataFrame with a Timestamp Index.

Return type:

pandas.DataFrame

fillup(period=(None, None), **kwargs)

Fill up missing data with measurements from nearby stations.

Parameters:
  • period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to gap fill the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kwargs (dict, optional) – Additional arguments for the fillup function. e.g. p_elev to consider the elevation to select nearest stations. (only for T and ET)

get_adj(**kwargs)[source]

Get the adjusted timeserie.

The timeserie get adjusted to match the multi-annual value over the given period. So the yearly variability is kept and only the whole period is adjusted.

The basis for the adjusted timeseries is the filled data and not the richter corrected data, as the ma values are also uncorrected vallues.

Returns:

The adjusted timeserie with the timestamp as index.

Return type:

pd.DataFrame

get_coef(other_stid, in_db_unit=False)

Get the regionalisation coefficients due to the height.

Those are the values from the dwd grid, HYRAS or REGNIE grids.

Parameters:
  • other_stid (int) – The Station Id of the other station from wich to regionalise for own station.

  • in_db_unit (bool, optional) – Should the coefficients be returned in the unit as stored in the database? This is only relevant for the temperature. The default is False.

Returns:

A list of coefficients. For T, ET and N-daily only the the yearly coefficient is returned. For N the winter and summer half yearly coefficient is returned in tuple. None is returned if either the own or other stations multi-annual value is not available.

Return type:

list of floats or None

get_df(kinds, period=(None, None), agg_to=None, nas_allowed=True, add_na_share=False, db_unit=False, sql_add_where=None, **kwargs)

Get a timeseries DataFrame from the database.

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”. For the precipitation also “qn” and “corr” are valid. If “filled_by” is given together with an aggregation step, the “filled_by” is replaced by the “filled_share”. The “filled_share” gives the share of filled values in the aggregation group in percent.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • agg_to (str or None, optional) – Aggregate to a given timespan. If more than 20% of missing values in the aggregation group, the aggregated value will be None. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • db_unit (bool, optional) – Should the result be in the Database unit. If False the unit is getting converted to normal unit, like mm or °C. The numbers are saved as integer in the database and got therefor multiplied by 10 or 100 to get to an integer. The default is False.

  • sql_add_where (str or None, optional) – additional sql where statement to filter the output. E.g. “EXTRACT(MONTH FROM timestamp) == 2” The default is None

Returns:

The timeserie Dataframe with a DatetimeIndex.

Return type:

pandas.DataFrame

get_dist(period=(None, None))

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Returns:

The timeserie for this station and the given period with the station_id and the distance in meters from which the data got filled from.

Return type:

pd.DataFrame

get_filled(period=(None, None), with_dist=False, **kwargs)

Get the filled timeserie.

Either only the timeserie is returned or also the id of the station from which the station data got filled, together with the distance to this station in m.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • with_dist (bool, optional) – Should the distance to the stations from which the timeseries got filled be added. The default is False.

Returns:

The filled timeserie for this station and the given period.

Return type:

pd.DataFrame

get_filled_period(kind, from_meta=False)

Get the min and max Timestamp for which there is data in the corresponding timeserie.

Computes the period from the timeserie or meta table.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is False.

Raises:
  • NotImplementedError – If the given kind is not valid.

  • ValueError – If the given kind is not a string.

Returns:

A TimestampPeriod of the filled timeserie. (NaT, NaT) if the timeserie is all empty or not defined.

Return type:

util.TimestampPeriod

get_geom(format='EWKT', crs=None)

Get the point geometry of the station.

Parameters:
  • format (str or None, optional) – The format of the geometry to return. Needs to be a format that is understood by Postgresql. ST_AsXXXXX function needs to exist in postgresql language. If None, then the binary representation is returned. the default is “EWKT”.

  • crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

string or bytes representation of the geometry, depending on the selected format.

Return type:

str or bytes

get_geom_shp(crs=None)

Get the geometry of the station as a shapely Point object.

Parameters:

crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

The location of the station as shapely Point.

Return type:

shapely.geometries.Point

get_last_imp_period(all=False)

Get the last imported Period for this Station.

Parameters:

all (bool, optional) – Should the maximum Timespan for all the last imports be returned. If False only the period for this station is returned. The default is False.

Returns:

(minimal datetime, maximal datetime)

Return type:

TimespanPeriod or tuple of datetime.datetime

get_ma()
get_max_period(kinds, nas_allowed=False, **kwargs)

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(infos='all')

Get Information from the meta table.

Parameters:

infos (list of str or str, optional) – A list of the information to get from the database. If “all” then all the information are returned. The default is “all”.

Returns:

dict with the meta information. The first level has one entry per parameter. The second level has one entry per information, asked for. If only one information is asked for, then it is returned as single value and not as subdict.

Return type:

dict or int/string

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_multi_annual()

Get the multi annual value(s) for this station.

Returns:

The corresponding multi annual value. For T en ET the yearly value is returned. For N the winter and summer half yearly sum is returned in tuple. The returned unit is mm or °C.

Return type:

list or number

get_name()
get_neighboor_stids(n=5, only_real=True, p_elev=None, period=None, **kwargs)

Get a list with Station Ids of the nearest neighboor stations.

nint, optional

The number of stations to return. If None, then all the possible stations are returned. The default is 5.

only_real: bool, optional

Should only real station get considered? If false also virtual stations are part of the result. The default is True.

p_elevtuple of float or None, optional

The parameters (P_1, P_2) to weight the height differences between stations. The elevation difference is considered with the formula from LARSIM (equation 3-18 & 3-19 from the LARSIM manual): $L_{gewichtet} = L_{horizontal} * (1 + (

rac{|\delta H|}{P_1})^{P_2})$

If None, then the height difference is not considered and only the nearest stations are returned. literature:

The default is None.

periodutils.TimestampPeriod or None, optional

The period for which the nearest neighboors are returned. The neighboor station needs to have raw data for at least one half of the period. If None, then the availability of the data is not checked. The default is None.

list of int

A list of station Ids in order of distance. The closest station is the first in the list.

get_period_meta(kind, all=False)

Get a specific period from the meta information table.

This functions returns the information from the meta table. In this table there are several periods saved, like the period of the last import.

Parameters:
  • kind (str) – The kind of period to return. Should be one of [‘filled’, ‘raw’, ‘last_imp’]. filled: the maximum filled period of the filled timeserie. raw: the maximum filled timeperiod of the raw data. last_imp: the maximum filled timeperiod of the last import.

  • all (bool, optional) – Should the maximum Timespan for all the filled periods be returned. If False only the period for this station is returned. The default is False.

Returns:

The TimespanPeriod of the station or of all the stations if all=True.

Return type:

TimespanPeriod

Raises:

ValueError – If a wrong kind is handed in.

get_qc(**kwargs)

Get the quality checked timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The quality checked timeserie for this station and the given period.

Return type:

pd.DataFrame

get_raster_value(raster)
get_raw(**kwargs)

Get the raw timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The raw timeserie for this station and the given period.

Return type:

pd.DataFrame

get_zipfiles(only_new=True, ftp_file_list=None)

Get the zipfiles on the CDC server with the raw data.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

Returns:

A DataFrame of zipfiles and the corresponding modification time on the CDC server to import.

Return type:

pandas.DataFrame or None

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

Returns:

True if the last import of the given kind is already treated.

Return type:

bool

is_real()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is real, false if it is virtual.

Return type:

bool

is_virtual()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is virtual, false if it is real.

Return type:

bool

isin_db()

Check if Station is already in a timeseries table.

Returns:

True if Station has a table in DB, no matter if it is filled or not.

Return type:

bool

isin_ma()

Check if Station is already in the multi annual table.

Returns:

True if Station is in multi annual table.

Return type:

bool

isin_meta()

Check if Station is already in the meta table.

Returns:

True if Station is in meta table.

Return type:

bool

last_imp_fillup(_last_imp_period=None)

Do the gap filling of the last import.

last_imp_qc()
last_imp_quality_check()

Do the quality check of the last import.

plot(period=(None, None), kind='filled', agg_to=None, **kwargs)

Plot the data of this station.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kind (str, optional) – The data kind to plot. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid. The default is “filled.

  • agg_to (str or None, optional) – Aggregate to a given timespan. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

quality_check(period=(None, None), **kwargs)

Quality check the raw data for a given period.

Parameters:

period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

update_ma(skip_if_exist=True, drop_when_error=True)

Update the multi annual values in the stations_raster_values table.

Get new values from the raster and put in the table.

update_period_meta(kind)

Update the time period in the meta file.

Compute teh filled period of a timeserie and save in the meta table.

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “corr” are valid.

update_raw(only_new=True, ftp_file_list=None, remove_nas=True)

Download data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

Returns:

The raw Dataframe of the Stations data.

Return type:

pandas.DataFrame

class weatherDB.station.StationCanVirtualBase(id, _skip_meta_check=False)[source]

Bases: StationBase

A class to add the methods for stations that can also be virtual. Virtual means, that there is no real DWD station with measurements. But to have data for every parameter at every 10 min precipitation station location, it is necessary to add stations and fill the gaps with data from neighboors.

Create a Station object.

Parameters:
  • id (int) – The stations ID.

  • _skip_meta_check (bool, optional) – Should the check if the station is in the database meta file get skiped. Pay attention, when skipping this, because it can lead to problems. This is for computational reasons, because it makes the initialization faster. Is used by the stations classes, because the only initialize objects that are in the meta table. The default is False

Raises:

NotImplementedError – _description_

Public Methods:

isin_meta_n()

Check if Station is in the precipitation meta table.

quality_check([period])

Quality check the raw data for a given period.

Inherited from StationBase

__init__(id[, _skip_meta_check])

Create a Station object.

isin_db()

Check if Station is already in a timeseries table.

isin_meta()

Check if Station is already in the meta table.

isin_ma()

Check if Station is already in the multi annual table.

is_virtual()

Check if the station is a real station or only a virtual one.

is_real()

Check if the station is a real station or only a virtual one.

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

update_period_meta(kind)

Update the time period in the meta file.

update_ma([skip_if_exist, drop_when_error])

Update the multi annual values in the stations_raster_values table.

update_raw([only_new, ftp_file_list, remove_nas])

Download data from CDC and upload to database.

get_zipfiles([only_new, ftp_file_list])

Get the zipfiles on the CDC server with the raw data.

download_raw([only_new])

Download the timeserie from the CDC Server.

quality_check([period])

Quality check the raw data for a given period.

fillup([period])

Fill up missing data with measurements from nearby stations.

last_imp_quality_check()

Do the quality check of the last import.

last_imp_qc()

last_imp_fillup([_last_imp_period])

Do the gap filling of the last import.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos])

Get Information from the meta table.

get_geom([format, crs])

Get the point geometry of the station.

get_geom_shp([crs])

Get the geometry of the station as a shapely Point object.

get_name()

count_holes([weeks, kind, period, ...])

Count holes in timeseries depending on there length.

get_period_meta(kind[, all])

Get a specific period from the meta information table.

get_filled_period(kind[, from_meta])

Get the min and max Timestamp for which there is data in the corresponding timeserie.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_last_imp_period([all])

Get the last imported Period for this Station.

get_neighboor_stids([n, only_real, p_elev, ...])

Get a list with Station Ids of the nearest neighboor stations.

get_multi_annual()

Get the multi annual value(s) for this station.

get_ma()

get_raster_value(raster)

get_coef(other_stid[, in_db_unit])

Get the regionalisation coefficients due to the height.

get_df(kinds[, period, agg_to, nas_allowed, ...])

Get a timeseries DataFrame from the database.

get_raw(**kwargs)

Get the raw timeserie.

get_qc(**kwargs)

Get the quality checked timeserie.

get_dist([period])

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

get_filled([period, with_dist])

Get the filled timeserie.

get_adj(**kwargs)

Get the adjusted timeserie.

plot([period, kind, agg_to])

Plot the data of this station.


count_holes(weeks=[2, 4, 8, 12, 16, 20, 24], kind='qc', period=(None, None), between_meta_period=True, crop_period=False, **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • weeks (list, optional) – A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

  • kind (str) – The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

  • between_meta_period (bool, optional) – Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

  • crop_period (bool, optional) – should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_raw(only_new=False)

Download the timeserie from the CDC Server.

This function only returns the timeserie, but is not updating the database.

Parameters:

only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is False.

Returns:

The Timeseries as a DataFrame with a Timestamp Index.

Return type:

pandas.DataFrame

fillup(period=(None, None), **kwargs)

Fill up missing data with measurements from nearby stations.

Parameters:
  • period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to gap fill the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kwargs (dict, optional) – Additional arguments for the fillup function. e.g. p_elev to consider the elevation to select nearest stations. (only for T and ET)

get_adj(**kwargs)

Get the adjusted timeserie.

The timeserie is adjusted to the multi annual mean. So the overall mean of the given period will be the same as the multi annual mean.

Parameters:

kwargs (dict, optional) – The keyword arguments are passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”.

Returns:

A timeserie with the adjusted data.

Return type:

pandas.DataFrame

get_coef(other_stid, in_db_unit=False)

Get the regionalisation coefficients due to the height.

Those are the values from the dwd grid, HYRAS or REGNIE grids.

Parameters:
  • other_stid (int) – The Station Id of the other station from wich to regionalise for own station.

  • in_db_unit (bool, optional) – Should the coefficients be returned in the unit as stored in the database? This is only relevant for the temperature. The default is False.

Returns:

A list of coefficients. For T, ET and N-daily only the the yearly coefficient is returned. For N the winter and summer half yearly coefficient is returned in tuple. None is returned if either the own or other stations multi-annual value is not available.

Return type:

list of floats or None

get_df(kinds, period=(None, None), agg_to=None, nas_allowed=True, add_na_share=False, db_unit=False, sql_add_where=None, **kwargs)

Get a timeseries DataFrame from the database.

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”. For the precipitation also “qn” and “corr” are valid. If “filled_by” is given together with an aggregation step, the “filled_by” is replaced by the “filled_share”. The “filled_share” gives the share of filled values in the aggregation group in percent.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • agg_to (str or None, optional) – Aggregate to a given timespan. If more than 20% of missing values in the aggregation group, the aggregated value will be None. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • db_unit (bool, optional) – Should the result be in the Database unit. If False the unit is getting converted to normal unit, like mm or °C. The numbers are saved as integer in the database and got therefor multiplied by 10 or 100 to get to an integer. The default is False.

  • sql_add_where (str or None, optional) – additional sql where statement to filter the output. E.g. “EXTRACT(MONTH FROM timestamp) == 2” The default is None

Returns:

The timeserie Dataframe with a DatetimeIndex.

Return type:

pandas.DataFrame

get_dist(period=(None, None))

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Returns:

The timeserie for this station and the given period with the station_id and the distance in meters from which the data got filled from.

Return type:

pd.DataFrame

get_filled(period=(None, None), with_dist=False, **kwargs)

Get the filled timeserie.

Either only the timeserie is returned or also the id of the station from which the station data got filled, together with the distance to this station in m.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • with_dist (bool, optional) – Should the distance to the stations from which the timeseries got filled be added. The default is False.

Returns:

The filled timeserie for this station and the given period.

Return type:

pd.DataFrame

get_filled_period(kind, from_meta=False)

Get the min and max Timestamp for which there is data in the corresponding timeserie.

Computes the period from the timeserie or meta table.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is False.

Raises:
  • NotImplementedError – If the given kind is not valid.

  • ValueError – If the given kind is not a string.

Returns:

A TimestampPeriod of the filled timeserie. (NaT, NaT) if the timeserie is all empty or not defined.

Return type:

util.TimestampPeriod

get_geom(format='EWKT', crs=None)

Get the point geometry of the station.

Parameters:
  • format (str or None, optional) – The format of the geometry to return. Needs to be a format that is understood by Postgresql. ST_AsXXXXX function needs to exist in postgresql language. If None, then the binary representation is returned. the default is “EWKT”.

  • crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

string or bytes representation of the geometry, depending on the selected format.

Return type:

str or bytes

get_geom_shp(crs=None)

Get the geometry of the station as a shapely Point object.

Parameters:

crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

The location of the station as shapely Point.

Return type:

shapely.geometries.Point

get_last_imp_period(all=False)

Get the last imported Period for this Station.

Parameters:

all (bool, optional) – Should the maximum Timespan for all the last imports be returned. If False only the period for this station is returned. The default is False.

Returns:

(minimal datetime, maximal datetime)

Return type:

TimespanPeriod or tuple of datetime.datetime

get_ma()
get_max_period(kinds, nas_allowed=False, **kwargs)

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(infos='all')

Get Information from the meta table.

Parameters:

infos (list of str or str, optional) – A list of the information to get from the database. If “all” then all the information are returned. The default is “all”.

Returns:

dict with the meta information. The first level has one entry per parameter. The second level has one entry per information, asked for. If only one information is asked for, then it is returned as single value and not as subdict.

Return type:

dict or int/string

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_multi_annual()

Get the multi annual value(s) for this station.

Returns:

The corresponding multi annual value. For T en ET the yearly value is returned. For N the winter and summer half yearly sum is returned in tuple. The returned unit is mm or °C.

Return type:

list or number

get_name()
get_neighboor_stids(n=5, only_real=True, p_elev=None, period=None, **kwargs)

Get a list with Station Ids of the nearest neighboor stations.

nint, optional

The number of stations to return. If None, then all the possible stations are returned. The default is 5.

only_real: bool, optional

Should only real station get considered? If false also virtual stations are part of the result. The default is True.

p_elevtuple of float or None, optional

The parameters (P_1, P_2) to weight the height differences between stations. The elevation difference is considered with the formula from LARSIM (equation 3-18 & 3-19 from the LARSIM manual): $L_{gewichtet} = L_{horizontal} * (1 + (

rac{|\delta H|}{P_1})^{P_2})$

If None, then the height difference is not considered and only the nearest stations are returned. literature:

The default is None.

periodutils.TimestampPeriod or None, optional

The period for which the nearest neighboors are returned. The neighboor station needs to have raw data for at least one half of the period. If None, then the availability of the data is not checked. The default is None.

list of int

A list of station Ids in order of distance. The closest station is the first in the list.

get_period_meta(kind, all=False)

Get a specific period from the meta information table.

This functions returns the information from the meta table. In this table there are several periods saved, like the period of the last import.

Parameters:
  • kind (str) – The kind of period to return. Should be one of [‘filled’, ‘raw’, ‘last_imp’]. filled: the maximum filled period of the filled timeserie. raw: the maximum filled timeperiod of the raw data. last_imp: the maximum filled timeperiod of the last import.

  • all (bool, optional) – Should the maximum Timespan for all the filled periods be returned. If False only the period for this station is returned. The default is False.

Returns:

The TimespanPeriod of the station or of all the stations if all=True.

Return type:

TimespanPeriod

Raises:

ValueError – If a wrong kind is handed in.

get_qc(**kwargs)

Get the quality checked timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The quality checked timeserie for this station and the given period.

Return type:

pd.DataFrame

get_raster_value(raster)
get_raw(**kwargs)

Get the raw timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The raw timeserie for this station and the given period.

Return type:

pd.DataFrame

get_zipfiles(only_new=True, ftp_file_list=None)

Get the zipfiles on the CDC server with the raw data.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

Returns:

A DataFrame of zipfiles and the corresponding modification time on the CDC server to import.

Return type:

pandas.DataFrame or None

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

Returns:

True if the last import of the given kind is already treated.

Return type:

bool

is_real()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is real, false if it is virtual.

Return type:

bool

is_virtual()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is virtual, false if it is real.

Return type:

bool

isin_db()

Check if Station is already in a timeseries table.

Returns:

True if Station has a table in DB, no matter if it is filled or not.

Return type:

bool

isin_ma()

Check if Station is already in the multi annual table.

Returns:

True if Station is in multi annual table.

Return type:

bool

isin_meta()

Check if Station is already in the meta table.

Returns:

True if Station is in meta table.

Return type:

bool

isin_meta_n()[source]

Check if Station is in the precipitation meta table.

Returns:

True if Station is in the precipitation meta table.

Return type:

bool

last_imp_fillup(_last_imp_period=None)

Do the gap filling of the last import.

last_imp_qc()
last_imp_quality_check()

Do the quality check of the last import.

plot(period=(None, None), kind='filled', agg_to=None, **kwargs)

Plot the data of this station.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kind (str, optional) – The data kind to plot. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid. The default is “filled.

  • agg_to (str or None, optional) – Aggregate to a given timespan. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

quality_check(period=(None, None), **kwargs)[source]

Quality check the raw data for a given period.

Parameters:

period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

update_ma(skip_if_exist=True, drop_when_error=True)

Update the multi annual values in the stations_raster_values table.

Get new values from the raster and put in the table.

update_period_meta(kind)

Update the time period in the meta file.

Compute teh filled period of a timeserie and save in the meta table.

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “corr” are valid.

update_raw(only_new=True, ftp_file_list=None, remove_nas=True)

Download data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

Returns:

The raw Dataframe of the Stations data.

Return type:

pandas.DataFrame

class weatherDB.station.StationTETBase(id, _skip_meta_check=False)[source]

Bases: StationCanVirtualBase

A base class for T and ET.

This class adds methods that are only used by temperatur and evapotranspiration stations.

Create a Station object.

Parameters:
  • id (int) – The stations ID.

  • _skip_meta_check (bool, optional) – Should the check if the station is in the database meta file get skiped. Pay attention, when skipping this, because it can lead to problems. This is for computational reasons, because it makes the initialization faster. Is used by the stations classes, because the only initialize objects that are in the meta table. The default is False

Raises:

NotImplementedError – _description_

Public Methods:

get_neighboor_stids([p_elev])

Get the 5 nearest stations to this station.

fillup([p_elev])

Set the default P values.

get_adj(**kwargs)

Get the adjusted timeserie.

Inherited from StationCanVirtualBase

isin_meta_n()

Check if Station is in the precipitation meta table.

quality_check([period])

Quality check the raw data for a given period.

Inherited from StationBase

__init__(id[, _skip_meta_check])

Create a Station object.

isin_db()

Check if Station is already in a timeseries table.

isin_meta()

Check if Station is already in the meta table.

isin_ma()

Check if Station is already in the multi annual table.

is_virtual()

Check if the station is a real station or only a virtual one.

is_real()

Check if the station is a real station or only a virtual one.

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

update_period_meta(kind)

Update the time period in the meta file.

update_ma([skip_if_exist, drop_when_error])

Update the multi annual values in the stations_raster_values table.

update_raw([only_new, ftp_file_list, remove_nas])

Download data from CDC and upload to database.

get_zipfiles([only_new, ftp_file_list])

Get the zipfiles on the CDC server with the raw data.

download_raw([only_new])

Download the timeserie from the CDC Server.

quality_check([period])

Quality check the raw data for a given period.

fillup([period])

Fill up missing data with measurements from nearby stations.

last_imp_quality_check()

Do the quality check of the last import.

last_imp_qc()

last_imp_fillup([_last_imp_period])

Do the gap filling of the last import.

get_meta_explanation([infos])

Get the explanations of the available meta fields.

get_meta([infos])

Get Information from the meta table.

get_geom([format, crs])

Get the point geometry of the station.

get_geom_shp([crs])

Get the geometry of the station as a shapely Point object.

get_name()

count_holes([weeks, kind, period, ...])

Count holes in timeseries depending on there length.

get_period_meta(kind[, all])

Get a specific period from the meta information table.

get_filled_period(kind[, from_meta])

Get the min and max Timestamp for which there is data in the corresponding timeserie.

get_max_period(kinds[, nas_allowed])

Get the maximum available period for this stations timeseries.

get_last_imp_period([all])

Get the last imported Period for this Station.

get_neighboor_stids([n, only_real, p_elev, ...])

Get a list with Station Ids of the nearest neighboor stations.

get_multi_annual()

Get the multi annual value(s) for this station.

get_ma()

get_raster_value(raster)

get_coef(other_stid[, in_db_unit])

Get the regionalisation coefficients due to the height.

get_df(kinds[, period, agg_to, nas_allowed, ...])

Get a timeseries DataFrame from the database.

get_raw(**kwargs)

Get the raw timeserie.

get_qc(**kwargs)

Get the quality checked timeserie.

get_dist([period])

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

get_filled([period, with_dist])

Get the filled timeserie.

get_adj(**kwargs)

Get the adjusted timeserie.

plot([period, kind, agg_to])

Plot the data of this station.


count_holes(weeks=[2, 4, 8, 12, 16, 20, 24], kind='qc', period=(None, None), between_meta_period=True, crop_period=False, **kwargs)

Count holes in timeseries depending on there length.

Parameters:
  • weeks (list, optional) – A list of hole length to count. Every hole longer than the duration of weeks specified is counted. The default is [2, 4, 8, 12, 16, 20, 24]

  • kind (str) – The kind of the timeserie to analyze. Should be one of [‘raw’, ‘qc’, ‘filled’]. For N also “corr” is possible. Normally only “raw” and “qc” make sense, because the other timeseries should not have holes.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to analyze the timeseries. If None is given, the maximum and minimal possible Timestamp is taken. The default is (None, None).

  • between_meta_period (bool, optional) – Only check between the respective period that is defined in the meta table. If “qc” is chosen as kind, then the “raw” meta period is taken. The default is True.

  • crop_period (bool, optional) – should the period get cropped to the maximum filled period. This will result in holes being ignored when they are at the end or at the beginning of the timeserie. If period = (None, None) is given, then this parameter is set to True. The default is False.

Returns:

A Pandas Dataframe, with station_id as index and one column per week. The numbers in the table are the amount of NA-periods longer than the respective amount of weeks.

Return type:

pandas.DataFrame

Raises:

ValueError – If the input parameters were not correct.

download_raw(only_new=False)

Download the timeserie from the CDC Server.

This function only returns the timeserie, but is not updating the database.

Parameters:

only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is False.

Returns:

The Timeseries as a DataFrame with a Timestamp Index.

Return type:

pandas.DataFrame

fillup(p_elev=(250, 1.5), **kwargs)[source]

Set the default P values. See _get_sql_near_median for more informations.

get_adj(**kwargs)[source]

Get the adjusted timeserie.

The timeserie get adjusted to match the multi-annual value over the given period. So the yearly variability is kept and only the whole period is adjusted.

Returns:

The adjusted timeserie with the timestamp as index.

Return type:

pd.DataFrame

get_coef(other_stid, in_db_unit=False)

Get the regionalisation coefficients due to the height.

Those are the values from the dwd grid, HYRAS or REGNIE grids.

Parameters:
  • other_stid (int) – The Station Id of the other station from wich to regionalise for own station.

  • in_db_unit (bool, optional) – Should the coefficients be returned in the unit as stored in the database? This is only relevant for the temperature. The default is False.

Returns:

A list of coefficients. For T, ET and N-daily only the the yearly coefficient is returned. For N the winter and summer half yearly coefficient is returned in tuple. None is returned if either the own or other stations multi-annual value is not available.

Return type:

list of floats or None

get_df(kinds, period=(None, None), agg_to=None, nas_allowed=True, add_na_share=False, db_unit=False, sql_add_where=None, **kwargs)

Get a timeseries DataFrame from the database.

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “filled_by”, “filled_share”. For the precipitation also “qn” and “corr” are valid. If “filled_by” is given together with an aggregation step, the “filled_by” is replaced by the “filled_share”. The “filled_share” gives the share of filled values in the aggregation group in percent.

  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • agg_to (str or None, optional) – Aggregate to a given timespan. If more than 20% of missing values in the aggregation group, the aggregated value will be None. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is True.

  • add_na_share (bool, optional) – Should one or several columns be added to the Dataframe with the share of NAs in the data. This is especially important, when the stations data get aggregated, because the aggregation doesn’t make sense if there are a lot of NAs in the original data. If True, one column per asked kind is added with the respective share of NAs, if the aggregation step is not the smallest. The “kind”_na_share column is in percentage. The default is False.

  • db_unit (bool, optional) – Should the result be in the Database unit. If False the unit is getting converted to normal unit, like mm or °C. The numbers are saved as integer in the database and got therefor multiplied by 10 or 100 to get to an integer. The default is False.

  • sql_add_where (str or None, optional) – additional sql where statement to filter the output. E.g. “EXTRACT(MONTH FROM timestamp) == 2” The default is None

Returns:

The timeserie Dataframe with a DatetimeIndex.

Return type:

pandas.DataFrame

get_dist(period=(None, None))

Get the timeserie with the infomation from which station the data got filled and the corresponding distance to this station.

Parameters:

period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

Returns:

The timeserie for this station and the given period with the station_id and the distance in meters from which the data got filled from.

Return type:

pd.DataFrame

get_filled(period=(None, None), with_dist=False, **kwargs)

Get the filled timeserie.

Either only the timeserie is returned or also the id of the station from which the station data got filled, together with the distance to this station in m.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeserie. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • with_dist (bool, optional) – Should the distance to the stations from which the timeseries got filled be added. The default is False.

Returns:

The filled timeserie for this station and the given period.

Return type:

pd.DataFrame

get_filled_period(kind, from_meta=False)

Get the min and max Timestamp for which there is data in the corresponding timeserie.

Computes the period from the timeserie or meta table.

Parameters:
  • kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

  • from_meta (bool, optional) – Should the period be from the meta table? If False: the period is returned from the timeserie. In this case this function is only a wrapper for .get_period_meta. The default is False.

Raises:
  • NotImplementedError – If the given kind is not valid.

  • ValueError – If the given kind is not a string.

Returns:

A TimestampPeriod of the filled timeserie. (NaT, NaT) if the timeserie is all empty or not defined.

Return type:

util.TimestampPeriod

get_geom(format='EWKT', crs=None)

Get the point geometry of the station.

Parameters:
  • format (str or None, optional) – The format of the geometry to return. Needs to be a format that is understood by Postgresql. ST_AsXXXXX function needs to exist in postgresql language. If None, then the binary representation is returned. the default is “EWKT”.

  • crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

string or bytes representation of the geometry, depending on the selected format.

Return type:

str or bytes

get_geom_shp(crs=None)

Get the geometry of the station as a shapely Point object.

Parameters:

crs (str, int or None, optional) – If None, then the geometry is returned in WGS84 (EPSG:4326). If string, then it should be one of “WGS84” or “UTM”. If int, then it should be the EPSG code.

Returns:

The location of the station as shapely Point.

Return type:

shapely.geometries.Point

get_last_imp_period(all=False)

Get the last imported Period for this Station.

Parameters:

all (bool, optional) – Should the maximum Timespan for all the last imports be returned. If False only the period for this station is returned. The default is False.

Returns:

(minimal datetime, maximal datetime)

Return type:

TimespanPeriod or tuple of datetime.datetime

get_ma()
get_max_period(kinds, nas_allowed=False, **kwargs)

Get the maximum available period for this stations timeseries.

If nas_allowed is True, then the maximum range of the timeserie is returned. Else the minimal filled period is returned

Parameters:
  • kinds (str or list of str) – The data kinds to update. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid.

  • nas_allowed (bool, optional) – Should NAs be allowed? If True, then the maximum possible period is returned, even if there are NAs in the timeserie. If False, then the minimal filled period is returned. The default is False.

Returns:

The maximum Timestamp Period

Return type:

utils.TimestampPeriod

get_meta(infos='all')

Get Information from the meta table.

Parameters:

infos (list of str or str, optional) – A list of the information to get from the database. If “all” then all the information are returned. The default is “all”.

Returns:

dict with the meta information. The first level has one entry per parameter. The second level has one entry per information, asked for. If only one information is asked for, then it is returned as single value and not as subdict.

Return type:

dict or int/string

classmethod get_meta_explanation(infos='all')

Get the explanations of the available meta fields.

Parameters:

infos (list or string, optional) – The infos you wish to get an explanation for. If “all” then all the available information get returned. The default is “all”

Returns:

a pandas Series with the information names as index and the explanation as values.

Return type:

pd.Series

get_multi_annual()

Get the multi annual value(s) for this station.

Returns:

The corresponding multi annual value. For T en ET the yearly value is returned. For N the winter and summer half yearly sum is returned in tuple. The returned unit is mm or °C.

Return type:

list or number

get_name()
get_neighboor_stids(p_elev=(250, 1.5), **kwargs)[source]

Get the 5 nearest stations to this station.

Parameters:

p_elev (tuple, optional) –

In Larsim those parameters are defined as $P_1 = 500$ and $P_2 = 1$. Stoelzle et al. (2016) found that $P_1 = 100$ and $P_2 = 4$ is better for Baden-Würtemberg to consider the quick changes in topographie. For all of germany, those parameter values are giving too much weight to the elevation difference, which can result in getting neighboor stations from the border of the Tschec Republic for the Feldberg station. Therefor the values $P_1 = 250$ and $P_2 = 1.5$ are used as default values. literature:

The default is (250, 1.5).

Returns:

_description_

Return type:

_type_

get_period_meta(kind, all=False)

Get a specific period from the meta information table.

This functions returns the information from the meta table. In this table there are several periods saved, like the period of the last import.

Parameters:
  • kind (str) – The kind of period to return. Should be one of [‘filled’, ‘raw’, ‘last_imp’]. filled: the maximum filled period of the filled timeserie. raw: the maximum filled timeperiod of the raw data. last_imp: the maximum filled timeperiod of the last import.

  • all (bool, optional) – Should the maximum Timespan for all the filled periods be returned. If False only the period for this station is returned. The default is False.

Returns:

The TimespanPeriod of the station or of all the stations if all=True.

Return type:

TimespanPeriod

Raises:

ValueError – If a wrong kind is handed in.

get_qc(**kwargs)

Get the quality checked timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The quality checked timeserie for this station and the given period.

Return type:

pd.DataFrame

get_raster_value(raster)
get_raw(**kwargs)

Get the raw timeserie.

Parameters:

kwargs (dict, optional) – The keyword arguments get passed to the get_df function. Possible parameters are “period”, “agg_to” or “nas_allowed”

Returns:

The raw timeserie for this station and the given period.

Return type:

pd.DataFrame

get_zipfiles(only_new=True, ftp_file_list=None)

Get the zipfiles on the CDC server with the raw data.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

Returns:

A DataFrame of zipfiles and the corresponding modification time on the CDC server to import.

Return type:

pandas.DataFrame or None

is_last_imp_done(kind)

Is the last import for the given kind already worked in?

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”, “best”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “qn” and “corr” are valid.

Returns:

True if the last import of the given kind is already treated.

Return type:

bool

is_real()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is real, false if it is virtual.

Return type:

bool

is_virtual()

Check if the station is a real station or only a virtual one.

Real means that the DWD is measuring here. Virtual means, that there are no measurements here, but the station got created to have timeseries for every parameter for every precipitation station.

Returns:

true if the station is virtual, false if it is real.

Return type:

bool

isin_db()

Check if Station is already in a timeseries table.

Returns:

True if Station has a table in DB, no matter if it is filled or not.

Return type:

bool

isin_ma()

Check if Station is already in the multi annual table.

Returns:

True if Station is in multi annual table.

Return type:

bool

isin_meta()

Check if Station is already in the meta table.

Returns:

True if Station is in meta table.

Return type:

bool

isin_meta_n()

Check if Station is in the precipitation meta table.

Returns:

True if Station is in the precipitation meta table.

Return type:

bool

last_imp_fillup(_last_imp_period=None)

Do the gap filling of the last import.

last_imp_qc()
last_imp_quality_check()

Do the quality check of the last import.

plot(period=(None, None), kind='filled', agg_to=None, **kwargs)

Plot the data of this station.

Parameters:
  • period (TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

  • kind (str, optional) – The data kind to plot. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”, “adj”. For the precipitation also “qn” and “corr” are valid. The default is “filled.

  • agg_to (str or None, optional) – Aggregate to a given timespan. Can be anything smaller than the maximum timespan of the saved data. If a Timeperiod smaller than the saved data is given, than the maximum possible timeperiod is returned. For T and ET it can be “month”, “year”. For N it can also be “hour”. If None than the maximum timeperiod is taken. The default is None.

quality_check(period=(None, None), **kwargs)

Quality check the raw data for a given period.

Parameters:

period (util.TimestampPeriod or (tuple or list of datetime.datetime or None), optional) – The minimum and maximum Timestamp for which to get the timeseries. If None is given, the maximum or minimal possible Timestamp is taken. The default is (None, None).

update_ma(skip_if_exist=True, drop_when_error=True)

Update the multi annual values in the stations_raster_values table.

Get new values from the raster and put in the table.

update_period_meta(kind)

Update the time period in the meta file.

Compute teh filled period of a timeserie and save in the meta table.

Parameters:

kind (str) – The data kind to look for filled period. Must be a column in the timeseries DB. Must be one of “raw”, “qc”, “filled”. If “best” is given, then depending on the parameter of the station the best kind is selected. For Precipitation this is “corr” and for the other this is “filled”. For the precipitation also “corr” are valid.

update_raw(only_new=True, ftp_file_list=None, remove_nas=True)

Download data from CDC and upload to database.

Parameters:
  • only_new (bool, optional) – Get only the files that are not yet in the database? If False all the available files are loaded again. The default is True

  • ftp_file_list (list of (strings, datetime), optional) – A list of files on the FTP server together with their modification time. If None, then the list is fetched from the server. The default is None

  • remove_nas (bool, optional) – Remove the NAs from the downloaded data before updating it to the database. This has computational advantages. The default is True.

Returns:

The raw Dataframe of the Stations data.

Return type:

pandas.DataFrame