modbot.preprocessing.LogCleaning
- class modbot.preprocessing.LogCleaning(config, model=None, filter_emotes=False)[source]
Bases:
object
Class to handle different types of log cleaning
Methods
act_check
(m)Check if banned or deleted message
act_filt_check
(user, m)Check if deleted or banned and a valid message
append_messages
(bmsgs, cmsgs)Append messages to final texts array
ban_filt_check
(user, m)Check if banned and a valid message
clean_check
(m)Check if clean message
clean_filt_check
(user, m)Check if clean and valid message
clean_log
(rawfile[, cleanfile])Clean log.
del_filt_check
(user, m)Check if deleted and a valid message
divide_messages
(tocheck, bmsgs, dmsgs, ctmp)Divide messages into clean, bad, and to_check.
further_review
(tocheck, bmsgs, cmsgs)Review tocheck messages and append to either cmsgs or bmsgs
is_valid_line
(line)Check if line contains relevant information
mod_check
(m)Check if mod not in ignore list
msg_check
(m)Check if message is not None and not a link
prep_log
(rawfile)Read log and do some preprocessing.
read_log
(rawfile)Read log and return lines
Remove whitelisted phrases from msg
review_messages
(tocheck, bmsgs, cmsgs, ctmp, ...)Perform initial review on temporarily clean messages
valid_check
(user, m)Check if sub or pleb and not in ignore list
- append_messages(bmsgs, cmsgs)[source]
Append messages to final texts array
- Parameters
bmsgs (list) – Messages classified as non-wholesome
cmsgs (list) – Messages classified as wholesome
- Returns
texts (list) – Final list of texts to write to file
y (list) – Corresponding list of classifications/targets
- clean_log(rawfile, cleanfile=None)[source]
Clean log. Do preprocessing and classify messages according to whether they were timed out or banned by a moderator. Manually check some lines if they contain certain phrases or if they were moderated by this bot.
- Parameters
rawfile (str) – Path to raw chat data
cleanfile (str) – Path to clean output file
- divide_messages(tocheck, bmsgs, dmsgs, ctmp)[source]
Divide messages into clean, bad, and to_check. Includes checks on each user according to configuration parameters.
- Parameters
tocheck (list) – List of messages to manually check
bmsgs (list) – Messages from timeouts or bans
dmsgs (list) – Deleted messages
ctmp (list) – Messages for another layer of review
- Returns
tocheck (list) – List of messages to manually check
bmsgs (list) – Messages from timeouts or bans
dmsgs (list) – Deleted messages
ctmp (list) – Messages for another layer of review
- further_review(tocheck, bmsgs, cmsgs)[source]
Review tocheck messages and append to either cmsgs or bmsgs
- Parameters
tocheck (list) – List of messages to manually check
bmsgs (list) – Messages classified as non-wholesome
cmsgs (list) – Messages classified as wholesome
- Returns
bmsgs (list) – Messages classified as non-wholesome
cmsgs (list) – Messages classified as wholesome
- static is_valid_line(line)[source]
Check if line contains relevant information
- Parameters
line (str) – String containing line to check
- Return type
bool
- classmethod prep_log(rawfile)[source]
Read log and do some preprocessing. Remove usernames and only return valid lines
- Parameters
rawfile (str) – Path to raw chat data
- Returns
List of valid lines from log with usernames removed
- Return type
list
- classmethod read_log(rawfile)[source]
Read log and return lines
- Parameters
rawfile (str) – Path to raw chat file
- Returns
List of lines from log
- Return type
list
- review_messages(tocheck, bmsgs, cmsgs, ctmp, probs)[source]
Perform initial review on temporarily clean messages
- Parameters
tocheck (list) – List of messages to manually check
bmsgs (list) – Messages classified as non-wholesome
cmsgs (list) – Messages classified as wholesome
ctmp (list) – Messages another layer of review
probs (list) – List of probabilities for the messages in ctmp
- Returns
tocheck (list) – List of messages to manually check
bmsgs (list) – Messages classified as non-wholesome
cmsgs (list) – Messages classified as wholesome