modbot.preprocessing

Preprocessing methods

Functions

check_msgs(line, checks)

Check line for specified phrases

check_phrases(infile, outfile)

Check phrases for reclassification

check_probs(config, infile, outfile, bounds, wc)

Check messages for reclassification if they have a non-wholesome probability above the minimum threshold

contains_link(text)

Check whether message contains a link

correct_messages(infile, outfile)

Sanitize messages for easier classification

correct_msg(line)

Santitize message.

filter_all_emotes(texts, y, proc_config)

Filter emotes from all texts

filter_emotes(line, proc_config)

Filter emotes from a single line

filter_log(infile, outfile, proc_config)

Filter log.

get_info_from_chatty(line)

Populate info dictionary with info from IRC line

join_words(lines)

Join words back together after splitting

my_lemmatizer(words)

Lemmatize words

my_tokenizer(s)

Tokenize string

preproc_words(texts)

Tokenize texts

read_data(dfile)

Read csv file and create DataFrame

remove_stopwords(words)

Remove stop words from word list

segment_words(line)

Segment message into words

separate_links(infile, outfile)

Separate messages with and without links

separate_tocheck(config, infile, bounds, wc)

Separate to_check from all other messages

write_data(outfile, texts, y)

Write data to outfile

Classes

LogCleaning(config[, model, filter_emotes])

Class to handle different types of log cleaning

MsgMemory([config])

Class to store messages and message info