-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patheditor.py
91 lines (76 loc) · 4.19 KB
/
editor.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
### editor is a language agnostic linter
# the overall idea is to address general cognitive density in a largely
# language-agnostic way, although obviously thresholds will need to be
# configurable per language (statically typed languages typically end up
# with longer words, languages with braces tend to have more indentation).
## levels & items
# the levels are: directory -> file -> [block] -> line -> [token] -> word
# a [file] splits into [blocks] at the top level of indentation (many languages
# will specify only one top-level module per file)
# a [block] splits into [lines] obviously, but may also contain other blocks
# at a deeper indentation level
# a [line] splits into [names] and special characters
# a [name] splits into [words] by case semantics (CamelCase, kebab-case, snake_case)
# [level]: an organizational unit that defines the code's structure
# [item]: an instance of a level
# [item name]: the name of this item, for example a directory or file name. similar to
# the [names] pulled out of [lines], will be split into [words] for name-level analysis
# [child item]: an item "embedded" in this one, either at the same level or the next
# level "down"
## level-agnostic metrics
# although largely level-agnostic, these might not apply to each level -- e.g a line
# doesn't really have "name", so it doesn't make sense to check for name length.
# there is likely both a maximum and minimum threshold for each metric -- say, 10 max 2 min
# [obfuscation count]: the number of non-dictionary words in a name (customizable to problem domain)
# [special character density]: special character to alphanumeric character ratio
# [child count]: the number of child items
# - e.g the number of lines per block
# [child complexity]: the number of child items with bad metrics
# - e.g the number of sub-directories with only one file
# - apply a [same level factor] if the child item is at the same hierarchy level
# [child depth]: the deepest level of children at the same hierarchy level
# - maybe covered by child complexity?
# - e.g: the max directory depth under this directory, or blocks within other blocks
# [child names]: the collection of all names used in child items
# [prefix repetition]: the number of times a prefix of words appears in [child names]
### the editor
# starting at the root directory of the project, work down through its child items
# and collect complexity metrics on the way up
# a mini-hypothesis: [child complexity] is the best thing to try to address
## overall
# if there is an [item name]
# run the name-level editor
# add to [child names]
# for each child:
# run editor for that level
# returns [child complexity], [child names], [child depth]
# if child is at the same hierarchy level:
# multiply [child complexity] by [same level factor]
# overwrite [child depth] if it's the greatest so far
# collect into [child complexity] (max or sum? :thinking:)
# collect into [child names]
# add 1 to [child count]
# add 1 to [child depth]
# [prefix repetition] for [child names]
# calculate an overall complexity (weighted sum?)
# return metrics for parent
## line-level editor
# adds [special character density] -- the ratio of [names] to special characters
# the source for most [child names] (~~~ maybe [child words] with usage count? ~~~)
# - filters out language keywords & special characters
## name-level editor
# adds [obfuscation count] -- the number of words in this name that are not dictionary words
# (specific to language and problem domain).
# adds [helper word count] -- the number of "helper" type words that don't add context on their own
### to thonk
## if we're thinking of getting language-specific
# - type-hinting? :squint: recommend at a certain size of codebase, or to address other recommendations?
# - ignore library code, of course
# - method call tracing
# - max call stack depth (non-built-in, non-library)
# - daisy-chain methods vs controller method (a->b->c->d vs a->b,c,d)
# - max scroll diff (how far are you going up and down, how often do you have to switch directions)
# - max file changes in a stack trace
# - "empty" files (e.g just a class definition)
# - avoid use of `not` in complex expressions
# - avoid mixing `and` and `or` and `not`