|
| 1 | +# [Problem 726: Number of Atoms](https://leetcode.com/problems/number-of-atoms/description/?envType=daily-question) |
| 2 | + |
| 3 | +## Initial thoughts (stream-of-consciousness) |
| 4 | +- We are not dealing with super long formulas, so we can afford to be a little inefficient (if needed) |
| 5 | +- I think we should start by "tokenizing" the formula by splitting it into: |
| 6 | + - Elements (capital letter followed by 0+ lowercase letters) |
| 7 | + - Numbers (consecutive sequences of digits; convert these to integers) |
| 8 | + - Parentheses |
| 9 | +- Tokenizing will allow us to do bookkeeping more easily |
| 10 | +- I also think it'd be worth doing a first pass to identify the positions of all the parentheses. In $O(n)$ time we could start a counter at 0 and then move through the string character by character. Each time we hit a "(" we increment the counter and add the position to a hash table (key: counter value? position?; value: could either be the position of the matching closing parentheses, or a list where the first element is the position of the open parenthesis and the second element is the position of the matching close parenthesis). If we hit a ")" we decrement the counter and update the hash table accordingly. This will let us easily do recursion later: |
| 11 | + - When we're doing the main processing, if we hit "(" we can get from the hash table the entire contents (up to its matching ")"), run our helper counting function on that sub-string, and then add it to our running total. (The running total, btw, should also be stored in a hash table. Aside: I'm not sure if dicts can be added, or how that works if they don't have exactly the same keys; need to figure this out...) |
| 12 | + - Once we've finished processing the content inside the parentheses, we can skip ahead to after the parentheses |
| 13 | + - This will save a lot of time, because we won't have to keep scanning forward (potentially recursively) to match up parentheses |
| 14 | + - Note: the "helper" function (i.e., the function called recursively) will need to have an "offset" parameter (default: 0) to enable us to avoid needing to re-compute the parenthesis matching each time we enter a new recursion depth. E.g. something like `close_pos = parens[i + offset] - offset`. And then if we encounter nested parentheses, we'd need to pass in `offset = i + offset` to the recursion call. |
| 15 | +- Then I think the basic approach is straightforward: |
| 16 | + - Tokenize the string |
| 17 | + - Create a hash table for the parentheses pairings |
| 18 | + - Start a hash table with the atom counts: |
| 19 | + - This could either be created during the tokenization process (e.g., whenever an element is found, add a key for that element and initialize its count to 0), or we could just initialize the hash table to an empty dict and add new elements as needed if they haven't already been accounted for. |
| 20 | + - Set `current` to `{}` (used to process digits) |
| 21 | + - Then go through each token one by one: |
| 22 | + - If we encounter an element (`x`): |
| 23 | + - Add `current` to the running totals |
| 24 | + - update `current` to `{x: 1}` |
| 25 | + - If we encounter a number (`i`): |
| 26 | + - Multiply every value in `current` by `i` |
| 27 | + - Add `current` to the running totals |
| 28 | + - Reset `current` to `{}` |
| 29 | + - If we encounter a parenthesis: |
| 30 | + - `current = helper(<get contents of parens>, offset=i + offset)` |
| 31 | + - At the end of the helper function, add `current` to the total and then return the total counts (a dict) |
| 32 | + - Finally, put the output in the right format: |
| 33 | + - Let's say that `counts` is the element-wise counts |
| 34 | + - `counts = sorted([[key, val] for key, val in counts.items], key=lambda x: x[0])` |
| 35 | + - `return ''.join([f'{x[0]x[1]}' if x[1] > 1 else x[0] for x in counts])` |
| 36 | + |
| 37 | +## Refining the problem, round 2 thoughts |
| 38 | +- Some helper functions are needed: |
| 39 | + - Tokenize the formula-- take in the formula and return a list of tokens |
| 40 | + - This might have some tricky parts to it |
| 41 | + - What I'm imagining is that we initialize `t` (current token) to an empty string and then go through character by character (current character: `c`): |
| 42 | + - If `c in "()"`: |
| 43 | + - append `c` to the current list of parsed tokens |
| 44 | + - set `t = ''` |
| 45 | + - If `c` is a capital letter: |
| 46 | + - if `len(t) > 0`: |
| 47 | + - if `t[0]` is a digit: |
| 48 | + - `t = int(t)` |
| 49 | + - append `t` to the current list of parsed tokens |
| 50 | + - otherwise, if `t[0]` is a lowercase or capital letter, append `t` to the current list of parsed tokens |
| 51 | + - reset `t` to `''` |
| 52 | + - set `t = c` |
| 53 | + - If `c` is a lowercase letter, `t += c` |
| 54 | + - If `c` is a digit: |
| 55 | + - If `len(t) > 0`: |
| 56 | + - If `t[-1]` is also a digit, `t += c` |
| 57 | + - Else: |
| 58 | + - Append `t` to the current list of parsed tokens |
| 59 | + - Set `t = c` |
| 60 | + - Otherwise `t = c` |
| 61 | + - At the end, make sure to add `t` to the list of tokens if it's not empty. (If it's a digit, convert to an `int` first.) |
| 62 | + - Then just return the list of parsed tokens |
| 63 | + - Parenthesis matching function |
| 64 | + - A potentially tricky case could arise, whereby the "depth" for several parenthesis pairs is the same. E.g., for the formula "X(XX)XXX(XX)XXXX..." both parenthesis pairs have the same depth. I think a hash table is still the "right" way to handle parenthesis matching, but instead of using 2-element lists of ints, maybe we should instead use lists of 2-element lists. Then as we use each new pair, we'll just dequeue it from the front of that entry in the hash table so that we don't need to continually match up the current position with all of the entries. |
| 65 | + - Add two dicts, potentially with mismatched keys-- take in two count dicts and return a single "merged" count dict |
| 66 | + - Multiply a dict by a constant-- take in a count dict and an integer and return a new count dict with updated values |
| 67 | + - Main helper function-- take in a list of tokens and an offset (default: 0) and return a count dict |
| 68 | +- I might be missing an edge case...but if not, nothing here is too crazy. There are just a bunch of pieces to this problem (more than the usual short solutions). |
| 69 | + |
| 70 | +## Attempted solution(s) |
| 71 | +```python |
| 72 | +class Solution: |
| 73 | + def countOfAtoms(self, formula: str) -> str: |
| 74 | + def tokenize(formula): # double check logic here |
| 75 | + digits = '0123456789' |
| 76 | + lowercase = 'abcdefghijklmnopqrstuvwxyz' |
| 77 | + uppercase = lowercase.upper() |
| 78 | + parentheses = '()' |
| 79 | + |
| 80 | + tokens = [] |
| 81 | + t = '' |
| 82 | + for c in formula: |
| 83 | + if c in parentheses: |
| 84 | + tokens.append(c) |
| 85 | + t = '' |
| 86 | + elif c in uppercase: # note for later: need to fix this... |
| 87 | + if len(t) > 0: |
| 88 | + if t[0] in digits: |
| 89 | + t = int(t) |
| 90 | + tokens.append(t) |
| 91 | + t = c |
| 92 | + elif c in lowercase: |
| 93 | + t += c |
| 94 | + else: # c is a digit |
| 95 | + if len(t) > 0: |
| 96 | + if t[-1] in digits: |
| 97 | + t += c |
| 98 | + else: |
| 99 | + tokens.append(t) |
| 100 | + t = c |
| 101 | + else: |
| 102 | + t = c |
| 103 | + |
| 104 | + if len(t) > 0: |
| 105 | + if t[0] in digits: |
| 106 | + tokens.append(int(t)) |
| 107 | + else: |
| 108 | + tokens.append(t) |
| 109 | + |
| 110 | + return tokens |
| 111 | +``` |
0 commit comments