twodlearn.datasets.tsdataset module

class twodlearn.datasets.tsdataset.AsynchronousRecord(data, start_time=None, prop=None, name='')[source]

Bases: object

collapse()[source]

Collapse the list of DataFrame in data to a single DataFrame

property data[source]
property end_time[source]
get_group(group_name)[source]

returns the data for the given group as an np.ndarray

mean(group)[source]
property n_samples[source]
prop2array()[source]
prop2list()[source]
set_groups(group_tags)[source]
property start_time[source]
std(group)[source]
class twodlearn.datasets.tsdataset.DatasetsSaver(train, valid, test)[source]

Bases: tuple

property test[source]

Alias for field number 2

property train[source]

Alias for field number 0

property valid[source]

Alias for field number 1

class twodlearn.datasets.tsdataset.Record(data, start_time=None, prop=None, name='')[source]

Bases: object

property columns[source]
copy()[source]

Creates a new copy of the record.

Returns

new copy of the record.

Return type

Record

property data[source]
property end_time[source]
classmethod from_saved_data(data)[source]

Create record from saved Record.SaveData

get_group(group_name)[source]

Get the np.Array corresponding to the given group.

Parameters

group_name (str) – name of the group.

Returns

Array corresponding to the data of the group.

Return type

np.Array

get_save_data()[source]
property group_tags[source]

dictionary with the names of the groups and the associated columns

property n_samples[source]
prop2array()[source]
prop2list()[source]
set_groups(group_tags)[source]

Group the features on the data according to the provided tags.

Parameters

group_tags (dict) – dictionary where the keys correspond to the name for the groups and the values the columns for the groups

split_continuous(column_name)[source]

splits the record following continuous chunks of data from the provided series

property start_time[source]
class twodlearn.datasets.tsdataset.RecordSaveData(data, start_time, prop, name)[source]

Bases: tuple

property data[source]

Alias for field number 0

property name[source]

Alias for field number 3

property prop[source]

Alias for field number 2

property start_time[source]

Alias for field number 1

class twodlearn.datasets.tsdataset.TSDataset(records=[])[source]

Bases: object

class BatchNormalizer[source]

Bases: object

property mu[source]
normalize(batch)[source]
reset()[source]
property std[source]
class Cursor(dataset, global_pointer)[source]

Bases: object

Manages one of the continuous elements of the batch. Hence, there are as many cursors as elements in the batch

property cummulative_n_samples[source]
property local_pointer[source]
next_sequence(window_size)[source]
property record[source]
property record_id[source]
class DatasetStats(mean, stddev, min, max, n_samples)[source]

Bases: tuple

property max[source]

Alias for field number 3

property mean[source]

Alias for field number 0

property min[source]

Alias for field number 2

property n_samples[source]

Alias for field number 4

property stddev[source]

Alias for field number 1

add_record(other)[source]

add a record into the dataset @type other: Record @param other: record to be added into the dataset

as_array()[source]

Returns the dataset as arrays separated in the groups

property columns[source]
cummulative_n_samples(recompute=False)[source]
property dtypes[source]
classmethod from_saved_data(saved_data)[source]
get_prop_mat()[source]
get_prop_table()[source]
get_save_data()[source]
get_stats(groups=None)[source]

Obtain mean and standard deviation of the dataset to be used for normalization

@param groups: list of the group names that you want to measure

classmethod load(filename)[source]
property n_samples[source]

total number of samples in all records

n_vars(group=None)[source]
next_batch(window_size, batch_size, reset=False)[source]

Returns the next batch_size sequences of length window_size.

Parameters
  • window_size

  • batch_size

  • reset – reset the cursors that point where data is currently being extracted

Returns

A dictionary with the batch samples, the format is:

batch[group] = array[window_size, batch_size, n_vars(group)]

Return type

dict

next_batch_discontinuous(batch_size)[source]

Get a batch when window_size is 1

This function is used by next_batch, is not intended for being used outside the class.

next_windowed_batch(sequences_length, batch_size, window_size, groups=None, reset=False)[source]

Returns the next batch where each sample contains a sequence off window_size elements

Parameters
  • sequences_length – length of the sequences

  • batch_size – number of sequences

  • window_size – size of the window

Returns

A dictionary with the batch samples. The format is:

batch[group] = array[sequences_length, batch_size, n_vars(group)*window_size]

Return type

dict

normalize(groups=None, mu=None, std=None)[source]

Obtain mean and standard deviation of the dataset to be used for normalization

@param groups: list of the group names that you want to normalize

property normalizer[source]
reset_cursors(batch_size=None)[source]
save(filename)[source]
set_groups(group_tags)[source]
split_continuous(column_name, min_samples=None)[source]

splits the records following continuous chunks of data from the provided column

to_dense()[source]

Return a dense representation of the dataset.

Returns

a tuple of the dense array and the length of each record. The records are padded with nan values.

Return type

(array, length)

to_tf_dataset(dtype=<class 'numpy.float32'>)[source]

Get a tf.data.Dataset with a dense representation of the dataset.

Returns

with elements ‘data’, ‘length’. ‘data’ is a dense tensor representation of the dataset formated as (record, time, features). Records are padded with nan values.

Return type

tf.data.Dataset

update_props()[source]

updates static properties. This functions is called by add_record() after a new record has been added to the dataset

class twodlearn.datasets.tsdataset.TSDatasetSaver(records_data, group_tags)[source]

Bases: tuple

property group_tags[source]

Alias for field number 1

property records_data[source]

Alias for field number 0

class twodlearn.datasets.tsdataset.TSDatasets(train=None, valid=None, test=None)[source]

Bases: object

classmethod from_saved_file(filename, encoding=None)[source]
get_save_data()[source]
normalize(groups)[source]
save(filename)[source]
set_groups(group_tags)[source]
twodlearn.datasets.tsdataset.sample_batch_window(data, length, window_size, batch_size=None)[source]

Sample continuous windows of window_size from tensor data.

Parameters
  • data (tf.Tensor) – dense representation of a set of continuous records. The format should be (record, time, features).

  • length (type) – length of each record.

  • window_size (type) – window size of the window to sample.

  • batch_size (type) – batch size of data. If not provided, batch_size = data.shape[0]

Returns

continuous random continuous windows. The format is (record, time, features)

Return type

tf.Tensor

twodlearn.datasets.tsdataset.sample_window(data, length, window_size)[source]

Sample continuous windows of window_size from tensor data.

Parameters
  • data (tf.Tensor) – dense representation of a set of continuous records. The format should be (time, features).

  • length (type) – length of each record.

  • window_size (type) – window size of the window to sample.

Returns

continuous random continuous windows. The format is (record, time, features)

Return type

tf.Tensor

twodlearn.datasets.tsdataset.signal_edges(series, dtype=<class 'int'>)[source]

detect signal edges