modbot.training.data_handling.WeightedGenerator

class modbot.training.data_handling.WeightedGenerator(df, **kwargs)[source]

Generator class for training over batches

Methods

`get_deterministic_batch`(i)	Get batches of randomly selected texts and targets
`get_random_batch`(_)	Get batches of randomly selected texts and targets with specified weights
`on_epoch_end`()	Method called at the end of every epoch.
`sample_indices`(Y[, offensive_weight, ...])	Sample data with weights as per offensive weight
`transform`(function)	Transform texts

Attributes

`X`	alias for texts
`Y`	alias for targets
`batch_chunks`	Get index chunks for batching.
`chunk_size`	Get chunk size for splitting data loading
`chunks`	Get dataset chunks.
`indices`	Indices for data samples
`n_batches`	Get number of batches based on batch size
`n_chunks`	Get number of chunks to divide full dataset into for smaller reads
`n_samples`	Number of data samples
`one_count`	Get number of samples with target=1
`zero_count`	Get number of samples with target=0

property batch_chunks: Get index chunks for batching. Each chunk corresponds to a batch

property chunks: Get dataset chunks. Used to only keep part of full dataset sample in memory.

get_deterministic_batch(i)

Get batches of randomly selected texts and targets

Parameters: i (int) – Index of chunk used to select slice of full dataframe
Returns: arrs – List of data batches
Return type: list

get_random_batch(_)

Get batches of randomly selected texts and targets with specified weights

property n_chunks: Get number of chunks to divide full dataset into for smaller reads

classmethod sample_indices(Y, offensive_weight=None, sample_size=None)[source]

Sample data with weights as per offensive weight

Parameters

Y (pd.DataFrame) – Pandas dataframe of labels for the corresponding texts
offensive_weight (float) – Desired ratio of one labels to the total number of labels
sample_size (int) – Desired sample size. Corresponds to the size of the returned texts and labels

Returns

indices – Indices of the samples such that X[indices] is the requested sample size with the requested offensive_weight

Return type

np.array