modbot.training.data_handling.DataGenerator

class modbot.training.data_handling.DataGenerator(df, **kwargs)[source]

Bases: Sequence

Generator class for batching

Methods

get_deterministic_batch(i)

Get batches of randomly selected texts and targets

get_random_batch(_)

Get batches of randomly selected texts and targets with specified weights

on_epoch_end()

Method called at the end of every epoch.

Attributes

batch_chunks

Get index chunks for batching.

chunk_size

Get chunk size for splitting data loading

chunks

Get dataset chunks.

indices

Indices for data samples

n_batches

Get number of batches based on batch size

n_chunks

Get number of chunks to divide full dataset into for smaller reads

n_samples

Number of data samples

property batch_chunks

Get index chunks for batching. Each chunk corresponds to a batch

property chunk_size

Get chunk size for splitting data loading

property chunks

Get dataset chunks. Used to only keep part of full dataset sample in memory.

get_deterministic_batch(i)[source]

Get batches of randomly selected texts and targets

Parameters

i (int) – Index of chunk used to select slice of full dataframe

Returns

arrs – List of data batches

Return type

list

get_random_batch(_)[source]

Get batches of randomly selected texts and targets with specified weights

Returns

arrs – List of randomly selected data batches

Return type

list

property indices

Indices for data samples

property n_batches

Get number of batches based on batch size

property n_chunks

Get number of chunks to divide full dataset into for smaller reads

property n_samples

Number of data samples

on_epoch_end()

Method called at the end of every epoch.