modbot.training.data_handling.WeightedGenerator

class modbot.training.data_handling.WeightedGenerator(df, **kwargs)[source]

Bases: DataGenerator

Generator class for training over batches

Methods

get_deterministic_batch(i)

Get batches of randomly selected texts and targets

get_random_batch(_)

Get batches of randomly selected texts and targets with specified weights

on_epoch_end()

Method called at the end of every epoch.

sample_indices(Y[, offensive_weight, ...])

Sample data with weights as per offensive weight

transform(function)

Transform texts

Attributes

X

alias for texts

Y

alias for targets

batch_chunks

Get index chunks for batching.

chunk_size

Get chunk size for splitting data loading

chunks

Get dataset chunks.

indices

Indices for data samples

n_batches

Get number of batches based on batch size

n_chunks

Get number of chunks to divide full dataset into for smaller reads

n_samples

Number of data samples

one_count

Get number of samples with target=1

zero_count

Get number of samples with target=0

property X

alias for texts

property Y

alias for targets

property batch_chunks

Get index chunks for batching. Each chunk corresponds to a batch

property chunk_size

Get chunk size for splitting data loading

property chunks

Get dataset chunks. Used to only keep part of full dataset sample in memory.

get_deterministic_batch(i)

Get batches of randomly selected texts and targets

Parameters

i (int) – Index of chunk used to select slice of full dataframe

Returns

arrs – List of data batches

Return type

list

get_random_batch(_)

Get batches of randomly selected texts and targets with specified weights

Returns

arrs – List of randomly selected data batches

Return type

list

property indices

Indices for data samples

property n_batches

Get number of batches based on batch size

property n_chunks

Get number of chunks to divide full dataset into for smaller reads

property n_samples

Number of data samples

on_epoch_end()

Method called at the end of every epoch.

property one_count

Get number of samples with target=1

classmethod sample_indices(Y, offensive_weight=None, sample_size=None)[source]

Sample data with weights as per offensive weight

Parameters
  • Y (pd.DataFrame) – Pandas dataframe of labels for the corresponding texts

  • offensive_weight (float) – Desired ratio of one labels to the total number of labels

  • sample_size (int) – Desired sample size. Corresponds to the size of the returned texts and labels

Returns

indices – Indices of the samples such that X[indices] is the requested sample size with the requested offensive_weight

Return type

np.array

transform(function)[source]

Transform texts

property zero_count

Get number of samples with target=0