modbot.preprocessing.LogCleaning

class modbot.preprocessing.LogCleaning(config, model=None, filter_emotes=False)[source]

Bases: object

Class to handle different types of log cleaning

Methods

act_check(m)

Check if banned or deleted message

act_filt_check(user, m)

Check if deleted or banned and a valid message

append_messages(bmsgs, cmsgs)

Append messages to final texts array

ban_filt_check(user, m)

Check if banned and a valid message

clean_check(m)

Check if clean message

clean_filt_check(user, m)

Check if clean and valid message

clean_log(rawfile[, cleanfile])

Clean log.

del_filt_check(user, m)

Check if deleted and a valid message

divide_messages(tocheck, bmsgs, dmsgs, ctmp)

Divide messages into clean, bad, and to_check.

further_review(tocheck, bmsgs, cmsgs)

Review tocheck messages and append to either cmsgs or bmsgs

is_valid_line(line)

Check if line contains relevant information

mod_check(m)

Check if mod not in ignore list

msg_check(m)

Check if message is not None and not a link

prep_log(rawfile)

Read log and do some preprocessing.

read_log(rawfile)

Read log and return lines

remove_whitelist(m)

Remove whitelisted phrases from msg

review_messages(tocheck, bmsgs, cmsgs, ctmp, ...)

Perform initial review on temporarily clean messages

valid_check(user, m)

Check if sub or pleb and not in ignore list

act_check(m)[source]

Check if banned or deleted message

act_filt_check(user, m)[source]

Check if deleted or banned and a valid message

append_messages(bmsgs, cmsgs)[source]

Append messages to final texts array

Parameters
  • bmsgs (list) – Messages classified as non-wholesome

  • cmsgs (list) – Messages classified as wholesome

Returns

  • texts (list) – Final list of texts to write to file

  • y (list) – Corresponding list of classifications/targets

ban_filt_check(user, m)[source]

Check if banned and a valid message

clean_check(m)[source]

Check if clean message

clean_filt_check(user, m)[source]

Check if clean and valid message

clean_log(rawfile, cleanfile=None)[source]

Clean log. Do preprocessing and classify messages according to whether they were timed out or banned by a moderator. Manually check some lines if they contain certain phrases or if they were moderated by this bot.

Parameters
  • rawfile (str) – Path to raw chat data

  • cleanfile (str) – Path to clean output file

del_filt_check(user, m)[source]

Check if deleted and a valid message

divide_messages(tocheck, bmsgs, dmsgs, ctmp)[source]

Divide messages into clean, bad, and to_check. Includes checks on each user according to configuration parameters.

Parameters
  • tocheck (list) – List of messages to manually check

  • bmsgs (list) – Messages from timeouts or bans

  • dmsgs (list) – Deleted messages

  • ctmp (list) – Messages for another layer of review

Returns

  • tocheck (list) – List of messages to manually check

  • bmsgs (list) – Messages from timeouts or bans

  • dmsgs (list) – Deleted messages

  • ctmp (list) – Messages for another layer of review

further_review(tocheck, bmsgs, cmsgs)[source]

Review tocheck messages and append to either cmsgs or bmsgs

Parameters
  • tocheck (list) – List of messages to manually check

  • bmsgs (list) – Messages classified as non-wholesome

  • cmsgs (list) – Messages classified as wholesome

Returns

  • bmsgs (list) – Messages classified as non-wholesome

  • cmsgs (list) – Messages classified as wholesome

static is_valid_line(line)[source]

Check if line contains relevant information

Parameters

line (str) – String containing line to check

Return type

bool

mod_check(m)[source]

Check if mod not in ignore list

msg_check(m)[source]

Check if message is not None and not a link

classmethod prep_log(rawfile)[source]

Read log and do some preprocessing. Remove usernames and only return valid lines

Parameters

rawfile (str) – Path to raw chat data

Returns

List of valid lines from log with usernames removed

Return type

list

classmethod read_log(rawfile)[source]

Read log and return lines

Parameters

rawfile (str) – Path to raw chat file

Returns

List of lines from log

Return type

list

remove_whitelist(m)[source]

Remove whitelisted phrases from msg

review_messages(tocheck, bmsgs, cmsgs, ctmp, probs)[source]

Perform initial review on temporarily clean messages

Parameters
  • tocheck (list) – List of messages to manually check

  • bmsgs (list) – Messages classified as non-wholesome

  • cmsgs (list) – Messages classified as wholesome

  • ctmp (list) – Messages another layer of review

  • probs (list) – List of probabilities for the messages in ctmp

Returns

  • tocheck (list) – List of messages to manually check

  • bmsgs (list) – Messages classified as non-wholesome

  • cmsgs (list) – Messages classified as wholesome

valid_check(user, m)[source]

Check if sub or pleb and not in ignore list