modbot.training.models.ModerationModel
- class modbot.training.models.ModerationModel(texts=None, model=None, **kwargs)[source]
 Bases:
ABCBase moderation model class
Methods
clean_text(text)Clean single text so it is utf-8 compliant
clean_texts(X)Clean texts so they are utf-8 compliant
continue_training(data_file, config)Continue training.
detailed_score([test_gen, n_matches, out_dir])Score model and print confusion matrix and multiple other metrics
Get info about class numbers
get_data_generators(data_file, **kwargs)Get data generators with correct sizes
load(inpath, **kwargs)Load model from path
load_data(data_file)Load data from csv file
Test model on some key phrases
predict(X[, verbose])Predict classification
predict_one(X[, verbose])Predict probability of label=1
predict_proba(X[, verbose])Predict probability
predict_zero(X[, verbose])Predict probability of label=0
run(data_file, config)Run model pipeline.
save(outpath)Save model
save_params(outpath, kwargs)Save params to model path
score(X, Y)Score model against targets
split_data(df[, test_split])Split data into training and test sets
train(train_gen, test_gen, **kwargs)Train model
transform(X)Transform texts
- static clean_text(text)[source]
 Clean single text so it is utf-8 compliant
- Parameters
 text (str) – Text string to clean
- Returns
 Cleaned text string
- Return type
 str
- classmethod clean_texts(X)[source]
 Clean texts so they are utf-8 compliant
- Parameters
 X (pd.DataFrame) – Pandas dataframe of texts
- Returns
 X – Pandas dataframe of cleaned texts
- Return type
 pd.DataFrame
- classmethod continue_training(data_file, config)[source]
 Continue training. Load model, load data, tokenize texts, and train.
- Parameters
 data_file (str) – Path to csv file storing texts and labels
config (RunConfig) – Config class with kwargs
- Returns
 Trained sequential and evaluated model
- Return type
 keras.Sequential
- detailed_score(test_gen=None, n_matches=10, out_dir=None)[source]
 Score model and print confusion matrix and multiple other metrics
- Parameters
 test_gen (WeightedGenerator) – generator for test data
n_matches (int) – Number of positive matches to print
out_dir (str | None) – Path to save scores
- Returns
 df_scores – A dataframe containing all model scores
- Return type
 pd.DataFrame
- classmethod get_data_generators(data_file, **kwargs)[source]
 Get data generators with correct sizes
- Parameters
 data_file (str) – Path to csv file storing texts and labels
kwargs (dict) – Dictionary with optional keyword parameters. Can include sample_size, batch_size, epochs, n_batches.
- Returns
 train_gen (WeightedGenerator) – WeightedGenerator instance used for training batches
test_gen (WeightedGenerator) – WeightedGenerator instance used for evaluation batches
- classmethod load_data(data_file)[source]
 Load data from csv file
- Parameters
 data_file (str) – Path to csv file storing texts and labels
- Returns
 df – Pandas dataframe of texts and labels
- Return type
 pd.DataFrame
- model_test()[source]
 Test model on some key phrases
- Parameters
 model (ModerationModel) –
- predict(X, verbose=False)[source]
 Predict classification
- Parameters
 X (ndarray | list | pd.DataFrame) – Set of texts to classify
verbose (bool) – Whether to show progress bar for predictions
- Returns
 List of predicted classifications for input texts
- Return type
 list
- predict_one(X, verbose=False)[source]
 Predict probability of label=1
- Parameters
 X (ndarray | list | pd.DataFrame) – Set of texts to classify
verbose (bool) – Whether to show progress bar for predictions
- Returns
 List of predicted probability for label=1 for input texts
- Return type
 list
- predict_zero(X, verbose=False)[source]
 Predict probability of label=0
- Parameters
 X (ndarray | list | pd.DataFrame) – Set of texts to classify
verbose (bool) – Whether to show progress bar for predictions
- Returns
 List of predicted probability for label=0 for input texts
- Return type
 list
- classmethod run(data_file, config)[source]
 Run model pipeline. Load data, tokenize texts, and train
- Parameters
 data_file (str) – Path to csv file storing texts and labels
config (RunConfig) – Config class with kwargs
- Returns
 Trained and evaluated keras model or svm
- Return type
 
- static save_params(outpath, kwargs)[source]
 Save params to model path
- Parameters
 outpath (str) – Path to model
kwargs (dict) – Dictionary of kwargs used to build model
- score(X, Y)[source]
 Score model against targets
- Parameters
 X (pd.DataFrame) – Pandas dataframe of texts
Y (pd.DataFrame) – Pandas dataframe of labels for the corresponding texts
- Returns
 Value of accuracy calulated from the correct predictions vs base truth
- Return type
 float
- classmethod split_data(df, test_split=0.1)[source]
 Split data into training and test sets
- Parameters
 df (pd.DataFrame) – Pandas dataframe of texts and labels
test_split (float) – Fraction of full dataset to use for test data
- Returns
 df_train (pd.DataFrame) – Pandas dataframe of texts and labels for training
df_test (pd.DataFrame) – Pandas dataframe of texts and labels for testing