Training Command

Update model with new data

usage: python -m modbot.training [-h] [-infile INFILE] [-clean] [-train]
                                 [-append] [-running_check]
                                 [-review_decisions] [-model_type MODEL_TYPE]
                                 [-model_path MODEL_PATH]
                                 [-offensive_weight OFFENSIVE_WEIGHT]
                                 [-n_batches N_BATCHES] [-epochs EPOCHS]
                                 [-batch_size BATCH_SIZE]
                                 [-chunk_size CHUNK_SIZE]
                                 [-sample_size SAMPLE_SIZE]
                                 [-test_split TEST_SPLIT]
                                 [-eval_steps EVAL_STEPS] [-continue_training]
                                 [-just_evaluate] [-log_dir LOG_DIR]
                                 [-data_dir DATA_DIR]
                                 [-bert_preprocess BERT_PREPROCESS]
                                 [-bert_encoder BERT_ENCODER]
                                 [-chatty_dir CHATTY_DIR] [-channel CHANNEL]
                                 [-nickname NICKNAME] [-config CONFIG]

Named Arguments

-infile

Input file for training and/or classification.

-clean

Process infile for future training.

Default: False

-train

Vectorize text and train model.

Default: False

-append

Append from input file to existing classification dataset.

Default: False

-running_check

Use model to check messages that meet the lower probability threshold (CHECK_PMIN) defined in environment variables, which may have been missed.

Default: False

-review_decisions

Reclassify all decisions made by the bot.

Default: False

-model_type

Model type to train

-model_path

Path to model

-offensive_weight

Desired ratio of ones to number of total samples.

-n_batches

Number of training batches per epoch.

-epochs

Number of training epochs.

-batch_size

Number of samples per batch.

-chunk_size

Number of samples to transform at one time.

-sample_size

Number of total samples to use for training.

-test_split

Fraction of full dataset used as validation

-eval_steps

Number of steps between model evaluations

-continue_training

Whether to continue training from saved model.

Default: False

-just_evaluate

Whether to just evaluate and skip training.

Default: False

-log_dir

Directory to save logs

Default: “/home/runner/work/modbot/modbot/data/logs”

-data_dir

Parent directory for logs

Default: “/home/runner/work/modbot/modbot/data”

-bert_preprocess

Path to bert preprocess model

Default: “https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3”

-bert_encoder

Path to bert encoder

Default: “https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2”

-chatty_dir

Path to chatty logs. Used only if updating, appending, or rerunning log using source=chatty.

-channel

Channel to moderate

-nickname

Name of modbot

-config, -c

Configuration file