modbot.training.data_handling.WeightedGenerator
- class modbot.training.data_handling.WeightedGenerator(df, **kwargs)[source]
Bases:
DataGenerator
Generator class for training over batches
Methods
Get batches of randomly selected texts and targets
Get batches of randomly selected texts and targets with specified weights
Method called at the end of every epoch.
sample_indices
(Y[, offensive_weight, ...])Sample data with weights as per offensive weight
transform
(function)Transform texts
Attributes
alias for texts
alias for targets
Get index chunks for batching.
Get chunk size for splitting data loading
Get dataset chunks.
Indices for data samples
Get number of batches based on batch size
Get number of chunks to divide full dataset into for smaller reads
Number of data samples
Get number of samples with target=1
Get number of samples with target=0
- property X
alias for texts
- property Y
alias for targets
- property batch_chunks
Get index chunks for batching. Each chunk corresponds to a batch
- property chunk_size
Get chunk size for splitting data loading
- property chunks
Get dataset chunks. Used to only keep part of full dataset sample in memory.
- get_deterministic_batch(i)
Get batches of randomly selected texts and targets
- Parameters
i (int) – Index of chunk used to select slice of full dataframe
- Returns
arrs – List of data batches
- Return type
list
- get_random_batch(_)
Get batches of randomly selected texts and targets with specified weights
- Returns
arrs – List of randomly selected data batches
- Return type
list
- property indices
Indices for data samples
- property n_batches
Get number of batches based on batch size
- property n_chunks
Get number of chunks to divide full dataset into for smaller reads
- property n_samples
Number of data samples
- on_epoch_end()
Method called at the end of every epoch.
- property one_count
Get number of samples with target=1
- classmethod sample_indices(Y, offensive_weight=None, sample_size=None)[source]
Sample data with weights as per offensive weight
- Parameters
Y (pd.DataFrame) – Pandas dataframe of labels for the corresponding texts
offensive_weight (float) – Desired ratio of one labels to the total number of labels
sample_size (int) – Desired sample size. Corresponds to the size of the returned texts and labels
- Returns
indices – Indices of the samples such that X[indices] is the requested sample size with the requested offensive_weight
- Return type
np.array
- property zero_count
Get number of samples with target=0