editor.py

### editor is a language agnostic linter

# the overall idea is to address general cognitive density in a largely 
# language-agnostic way, although obviously thresholds will need to be 
# configurable per language (statically typed languages typically end up 
# with longer words, languages with braces tend to have more indentation).

## levels & items

# the levels are: directory -> file -> [block] -> line -> [token] -> word

# a [file] splits into [blocks] at the top level of indentation (many languages
#  will specify only one top-level module per file)
# a [block] splits into [lines] obviously, but may also contain other blocks 
#  at a deeper indentation level
# a [line] splits into [names] and special characters
# a [name] splits into [words] by case semantics (CamelCase, kebab-case, snake_case)

# [level]: an organizational unit that defines the code's structure
# [item]: an instance of a level
# [item name]: the name of this item, for example a directory or file name. similar to
#   the [names] pulled out of [lines], will be split into [words] for name-level analysis
# [child item]: an item "embedded" in this one, either at the same level or the next 
#   level "down"

## level-agnostic metrics

# although largely level-agnostic, these might not apply to each level -- e.g a line 
# doesn't really have "name", so it doesn't make sense to check for name length.

# there is likely both a maximum and minimum threshold for each metric -- say, 10 max 2 min

# [obfuscation count]: the number of non-dictionary words in a name (customizable to problem domain)
# [special character density]: special character to alphanumeric character ratio
# [child count]: the number of child items
# - e.g the number of lines per block
# [child complexity]: the number of child items with bad metrics
# - e.g the number of sub-directories with only one file
# - apply a [same level factor] if the child item is at the same hierarchy level
# [child depth]: the deepest level of children at the same hierarchy level
# - maybe covered by child complexity?
# - e.g: the max directory depth under this directory, or blocks within other blocks
# [child names]: the collection of all names used in child items
# [prefix repetition]: the number of times a prefix of words appears in [child names]

### the editor

# starting at the root directory of the project, work down through its child items
#  and collect complexity metrics on the way up
# a mini-hypothesis: [child complexity] is the best thing to try to address

## overall
# if there is an [item name]
  # run the name-level editor
  # add to [child names]
# for each child: 
  # run editor for that level
    # returns [child complexity], [child names], [child depth]
  # if child is at the same hierarchy level:
    # multiply [child complexity] by [same level factor]
    # overwrite [child depth] if it's the greatest so far
  # collect into [child complexity] (max or sum? :thinking:)
  # collect into [child names]
  # add 1 to [child count]
# add 1 to [child depth]
# [prefix repetition] for [child names]
# calculate an overall complexity (weighted sum?)
# return metrics for parent

## line-level editor
# adds [special character density] -- the ratio of [names] to special characters
# the source for most [child names] (~~~ maybe [child words] with usage count? ~~~)
#   - filters out language keywords & special characters

## name-level editor
# adds [obfuscation count] -- the number of words in this name that are not dictionary words 
#  (specific to language and problem domain).
# adds [helper word count] -- the number of "helper" type words that don't add context on their own

### to thonk
## if we're thinking of getting language-specific
# - type-hinting? :squint: recommend at a certain size of codebase, or to address other recommendations?
# - ignore library code, of course
# - method call tracing
  # - max call stack depth (non-built-in, non-library)
  # - daisy-chain methods vs controller method (a->b->c->d vs a->b,c,d)
  # - max scroll diff (how far are you going up and down, how often do you have to switch directions)
  # - max file changes in a stack trace
# - "empty" files (e.g just a class definition)
# - avoid use of `not` in complex expressions
# - avoid mixing `and` and `or` and `not`