twodlearn.datasets.tsdataset module¶

class twodlearn.datasets.tsdataset.AsynchronousRecord(data, start_time=None, prop=None, name='')[source]¶

Bases: object

collapse()[source]¶: Collapse the list of DataFrame in data to a single DataFrame

property data[source]¶

property end_time[source]¶

get_group(group_name)[source]¶: returns the data for the given group as an np.ndarray

mean(group)[source]¶

property n_samples[source]¶

prop2array()[source]¶

prop2list()[source]¶

set_groups(group_tags)[source]¶

property start_time[source]¶

std(group)[source]¶

class twodlearn.datasets.tsdataset.DatasetsSaver(train, valid, test)[source]¶

Bases: tuple

property test[source]¶: Alias for field number 2

property train[source]¶: Alias for field number 0

property valid[source]¶: Alias for field number 1

class twodlearn.datasets.tsdataset.Record(data, start_time=None, prop=None, name='')[source]¶

Bases: object

property columns[source]¶

copy()[source]¶

Creates a new copy of the record.

Returns: new copy of the record.
Return type: Record

property data[source]¶

property end_time[source]¶

classmethod from_saved_data(data)[source]¶: Create record from saved Record.SaveData

get_group(group_name)[source]¶

Get the np.Array corresponding to the given group.

Parameters: group_name (str) – name of the group.
Returns: Array corresponding to the data of the group.
Return type: np.Array

get_save_data()[source]¶

property group_tags[source]¶: dictionary with the names of the groups and the associated columns

property n_samples[source]¶

prop2array()[source]¶

prop2list()[source]¶

set_groups(group_tags)[source]¶

Group the features on the data according to the provided tags.

Parameters: group_tags (dict) – dictionary where the keys correspond to the name for the groups and the values the columns for the groups

split_continuous(column_name)[source]¶: splits the record following continuous chunks of data from the provided series

property start_time[source]¶

class twodlearn.datasets.tsdataset.RecordSaveData(data, start_time, prop, name)[source]¶

Bases: tuple

property data[source]¶: Alias for field number 0

property name[source]¶: Alias for field number 3

property prop[source]¶: Alias for field number 2

property start_time[source]¶: Alias for field number 1

class twodlearn.datasets.tsdataset.TSDataset(records=[])[source]¶

Bases: object

class BatchNormalizer[source]¶

Bases: object

property mu[source]¶

normalize(batch)[source]¶

reset()[source]¶

property std[source]¶

class Cursor(dataset, global_pointer)[source]¶

Bases: object

Manages one of the continuous elements of the batch. Hence, there are as many cursors as elements in the batch

property cummulative_n_samples[source]¶

property local_pointer[source]¶

next_sequence(window_size)[source]¶

property record[source]¶

property record_id[source]¶

class DatasetStats(mean, stddev, min, max, n_samples)[source]¶

Bases: tuple

property max[source]¶: Alias for field number 3

property mean[source]¶: Alias for field number 0

property min[source]¶: Alias for field number 2

property n_samples[source]¶: Alias for field number 4

property stddev[source]¶: Alias for field number 1

add_record(other)[source]¶: add a record into the dataset @type other: Record @param other: record to be added into the dataset

as_array()[source]¶: Returns the dataset as arrays separated in the groups

property columns[source]¶

cummulative_n_samples(recompute=False)[source]¶

property dtypes[source]¶

classmethod from_saved_data(saved_data)[source]¶

get_prop_mat()[source]¶

get_prop_table()[source]¶

get_save_data()[source]¶

get_stats(groups=None)[source]¶

Obtain mean and standard deviation of the dataset to be used for normalization

@param groups: list of the group names that you want to measure

classmethod load(filename)[source]¶

property n_samples[source]¶: total number of samples in all records

n_vars(group=None)[source]¶

next_batch(window_size, batch_size, reset=False)[source]¶

Returns the next batch_size sequences of length window_size.

Parameters

window_size –
batch_size –
reset – reset the cursors that point where data is currently being extracted

Returns

A dictionary with the batch samples, the format is:

batch[group] = array[window_size, batch_size, n_vars(group)]

Return type

dict

next_batch_discontinuous(batch_size)[source]¶

Get a batch when window_size is 1

This function is used by next_batch, is not intended for being used outside the class.

next_windowed_batch(sequences_length, batch_size, window_size, groups=None, reset=False)[source]¶

Returns the next batch where each sample contains a sequence off window_size elements

Parameters

sequences_length – length of the sequences
batch_size – number of sequences
window_size – size of the window

Returns

A dictionary with the batch samples. The format is:

batch[group] = array[sequences_length, batch_size, n_vars(group)*window_size]

Return type

dict

normalize(groups=None, mu=None, std=None)[source]¶

Obtain mean and standard deviation of the dataset to be used for normalization

@param groups: list of the group names that you want to normalize

property normalizer[source]¶

reset_cursors(batch_size=None)[source]¶

save(filename)[source]¶

set_groups(group_tags)[source]¶

split_continuous(column_name, min_samples=None)[source]¶: splits the records following continuous chunks of data from the provided column

to_dense()[source]¶

Return a dense representation of the dataset.

Returns: a tuple of the dense array and the length of each record. The records are padded with nan values.
Return type: (array, length)

to_tf_dataset(dtype=<class 'numpy.float32'>)[source]¶

Get a tf.data.Dataset with a dense representation of the dataset.

Returns: with elements ‘data’, ‘length’. ‘data’ is a dense tensor representation of the dataset formated as (record, time, features). Records are padded with nan values.
Return type: tf.data.Dataset

update_props()[source]¶: updates static properties. This functions is called by add_record() after a new record has been added to the dataset

class twodlearn.datasets.tsdataset.TSDatasetSaver(records_data, group_tags)[source]¶

Bases: tuple

property group_tags[source]¶: Alias for field number 1

property records_data[source]¶: Alias for field number 0

class twodlearn.datasets.tsdataset.TSDatasets(train=None, valid=None, test=None)[source]¶

Bases: object

classmethod from_saved_file(filename, encoding=None)[source]¶

get_save_data()[source]¶

normalize(groups)[source]¶

save(filename)[source]¶

set_groups(group_tags)[source]¶

twodlearn.datasets.tsdataset.sample_batch_window(data, length, window_size, batch_size=None)[source]¶

Sample continuous windows of window_size from tensor data.

Parameters

data (tf.Tensor) – dense representation of a set of continuous records. The format should be (record, time, features).
length (type) – length of each record.
window_size (type) – window size of the window to sample.
batch_size (type) – batch size of data. If not provided, batch_size = data.shape[0]

Returns

continuous random continuous windows. The format is (record, time, features)

Return type

tf.Tensor

twodlearn.datasets.tsdataset.sample_window(data, length, window_size)[source]¶

Sample continuous windows of window_size from tensor data.

Parameters

data (tf.Tensor) – dense representation of a set of continuous records. The format should be (time, features).
length (type) – length of each record.
window_size (type) – window size of the window to sample.

Returns

continuous random continuous windows. The format is (record, time, features)

Return type

tf.Tensor

twodlearn.datasets.tsdataset.signal_edges(series, dtype=<class 'int'>)[source]¶: detect signal edges