GLSLShader refactoring: singleton factory architecture#12587
GLSLShader refactoring: singleton factory architecture#12587Lex-DRL wants to merge 12 commits intoComfy-Org:masterfrom
GLSLShader refactoring: singleton factory architecture#12587Conversation
📝 WalkthroughWalkthroughThe pull request refactors the OpenGL context management in 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
comfy_extras/nodes_glsl.py (1)
251-264:⚠️ Potential issue | 🔴 Critical
NameErrorifglGenVertexArraysitself raises.
vaois only assigned inside thetrybody. IfglGenVertexArraysthrows (e.g., on systems where VAOs are unsupported), the name is never bound and theif vao:guard in theexceptblock raisesNameError, masking the original exception.🐛 Proposed fix
+ vao = None try: vao = gl.glGenVertexArrays(1) gl.glBindVertexArray(vao)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy_extras/nodes_glsl.py` around lines 251 - 264, The try/except can raise a NameError in GLContext.__init__ because vao is only assigned inside the try; initialize a sentinel before calling gl.glGenVertexArrays (e.g., set vao = None) and change the cleanup guard to check "if vao is not None" before calling gl.glDeleteVertexArrays; ensure self._vao is only set after successful bind as currently done and that the except block references the sentinel to avoid masking the original exception from gl.glGenVertexArrays/gl.glBindVertexArray.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_extras/nodes_glsl.py`:
- Around line 748-749: The code in _init_backend_concrete (OSMesa) sets
os.environ["PYOPENGL_PLATFORM"] = "osmesa" as a permanent side effect; change
this to save the original os.environ.get("PYOPENGL_PLATFORM") before setting,
perform the OSMesa/OpenGL initialization inside a try block, and in the
except/finally restore the original value (either resetting to the saved value
or deleting the key if it was absent) so the PYOPENGL_PLATFORM value is only
left modified on successful initialization and is reverted on failure.
- Around line 563-566: The pad logic currently appends the same numpy buffer
`black_img` multiple times, causing aliasing when converted via
`torch.from_numpy`; fix by creating a fresh array for each unused slot (e.g.,
allocate a new zeros array or append `black_img.copy()`) when adding to
`batch_outputs` in the padding loop that references `black_img`,
`batch_outputs`, `MAX_OUTPUTS`, and later
`all_outputs`/`torch.from_numpy`/`torch.stack`, so each padded output has its
own independent buffer.
- Around line 157-163: The singleton initialization in GLContext.__new__ has a
TOCTOU race: guard the check-and-create sequence with a class-level lock (e.g.,
a threading.Lock stored on GLContext, like GLContext._init_lock) so only one
thread can run the creation path that calls
GLContext.__new_instance_using_concrete_class_fallback_order and assigns
GLContext.__instance; drop the redundant assert isinstance(...) after
assignment. Ensure the lock is acquired before checking GLContext.__instance and
released after assignment (use a context manager).
---
Outside diff comments:
In `@comfy_extras/nodes_glsl.py`:
- Around line 251-264: The try/except can raise a NameError in
GLContext.__init__ because vao is only assigned inside the try; initialize a
sentinel before calling gl.glGenVertexArrays (e.g., set vao = None) and change
the cleanup guard to check "if vao is not None" before calling
gl.glDeleteVertexArrays; ensure self._vao is only set after successful bind as
currently done and that the except block references the sentinel to avoid
masking the original exception from gl.glGenVertexArrays/gl.glBindVertexArray.
| def __new__(cls): | ||
| if cls._instance is None: | ||
| cls._instance = super().__new__(cls) | ||
| return cls._instance | ||
| # Since ``GLContext`` is a singleton anyway, we should store it | ||
| # explicitly in ``GLContext.__instance``, NOT in ``cls.__instance``. | ||
| if GLContext.__instance is None: | ||
| GLContext.__instance = GLContext.__new_instance_using_concrete_class_fallback_order() | ||
| assert isinstance(GLContext.__instance, GLContext) | ||
| return GLContext.__instance |
There was a problem hiding this comment.
Missing lock on singleton initialization creates a TOCTOU race.
Two concurrent callers can both observe GLContext.__instance is None, both invoke __new_instance_using_concrete_class_fallback_order, and both create a real GL context. The second assignment on line 161 silently overwrites the first, leaking the earlier context and leaving whichever thread received the first instance holding a now-abandoned GL context object.
🔒 Suggested fix: guard with a class-level lock
+import threading
class GLContext:
...
__instance: 'GLContext' = None
+ __init_lock: threading.Lock = threading.Lock()
def __new__(cls):
- if GLContext.__instance is None:
- GLContext.__instance = GLContext.__new_instance_using_concrete_class_fallback_order()
- assert isinstance(GLContext.__instance, GLContext)
- return GLContext.__instance
+ if GLContext.__instance is None:
+ with GLContext.__init_lock:
+ if GLContext.__instance is None: # double-checked locking
+ GLContext.__instance = GLContext.__new_instance_using_concrete_class_fallback_order()
+ return GLContext.__instanceThe assert can be dropped — it is vacuously true by construction and is silently removed under python -O.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy_extras/nodes_glsl.py` around lines 157 - 163, The singleton
initialization in GLContext.__new__ has a TOCTOU race: guard the
check-and-create sequence with a class-level lock (e.g., a threading.Lock stored
on GLContext, like GLContext._init_lock) so only one thread can run the creation
path that calls GLContext.__new_instance_using_concrete_class_fallback_order and
assigns GLContext.__instance; drop the redundant assert isinstance(...) after
assignment. Ensure the lock is acquired before checking GLContext.__instance and
released after assignment (use a context manager).
| # Pad with black images for unused outputs | ||
| black_img = np.zeros((height, width, 4), dtype=np.float32) | ||
| for _ in range(num_outputs, MAX_OUTPUTS): | ||
| batch_outputs.append(black_img) |
There was a problem hiding this comment.
Padding outputs within a single batch share the same np.zeros buffer.
All unused output slots in one batch iteration point to the same black_img array. torch.from_numpy shares the buffer rather than copying, so the tensors in all_outputs are aliased until torch.stack materializes them. If any code path between these lines and torch.stack were to write into one of those tensors, it would corrupt the others. Constructing a fresh array per slot (or using .copy()) eliminates the alias.
✨ Suggested fix
- black_img = np.zeros((height, width, 4), dtype=np.float32)
for _ in range(num_outputs, MAX_OUTPUTS):
- batch_outputs.append(black_img)
+ batch_outputs.append(np.zeros((height, width, 4), dtype=np.float32))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy_extras/nodes_glsl.py` around lines 563 - 566, The pad logic currently
appends the same numpy buffer `black_img` multiple times, causing aliasing when
converted via `torch.from_numpy`; fix by creating a fresh array for each unused
slot (e.g., allocate a new zeros array or append `black_img.copy()`) when adding
to `batch_outputs` in the padding loop that references `black_img`,
`batch_outputs`, `MAX_OUTPUTS`, and later
`all_outputs`/`torch.from_numpy`/`torch.stack`, so each padded output has its
own independent buffer.
| logger.debug("_init_backend_concrete (OSMesa): starting") | ||
| os.environ["PYOPENGL_PLATFORM"] = "osmesa" |
There was a problem hiding this comment.
PYOPENGL_PLATFORM is set as a permanent process-wide side effect.
os.environ["PYOPENGL_PLATFORM"] = "osmesa" persists for the lifetime of the process even if OSMesa initialization subsequently fails. Any other component in the process that later imports OpenGL.GL (before or after this node) will inherit the OSMesa platform selection. The effect is intentional here (the env var must precede the OpenGL.GL import), but the lack of cleanup on failure is worth noting — a context manager or try/finally that restores the original value on failure would be safer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy_extras/nodes_glsl.py` around lines 748 - 749, The code in
_init_backend_concrete (OSMesa) sets os.environ["PYOPENGL_PLATFORM"] = "osmesa"
as a permanent side effect; change this to save the original
os.environ.get("PYOPENGL_PLATFORM") before setting, perform the OSMesa/OpenGL
initialization inside a try block, and in the except/finally restore the
original value (either resetting to the saved value or deleting the key if it
was absent) so the PYOPENGL_PLATFORM value is only left modified on successful
initialization and is reverted on failure.
christian-byrne
left a comment
There was a problem hiding this comment.
Is this going to work with process isolation and async nodes? i.e., each node being in a separate process.
|
I haven't changed the original behavior in any way. I merely rearranged the existing code. Originally, the backend initialization was done entirely in What I did is extracted the body of those try-excepts into special class-specific functions, to be called from If anything, the initialization became more syncronous, i.e. more thread-safe. Though, as I mentioned, I couldn't fully test it because the node in its current implementation (before refactoring) doesn't work for me.. |
|
What I can tell is the existing implementation seem to be an MVP with a HUUUUUGE room for improvement, so probably asyncronous execution wasn't considered by the original author ( @pythongosssss ) |
The current
GLSLShadernode implementation is designed (vibe coded?) in such a way, that it contains runtime conditional logic for various backends (which is checked at each execution).This causes GLFW/EGL/OSMesa-specific code to be spread across the whole module and mixed together in the same functions, i.e. it relies on globals, is bug prone, and hard to maintain/extend in the future (e.g., add a new OGL backend).
I've refactored the entire module to adhere to singleton factory pattern instead:
GLContextis still a singleton, but it's an "abstract" one (follows the same approach as with WebDriver in Selenium) and works like this:GLContextinstance methods.No changes to the logic itself.
Though, I wasn't able to test it because the current implementation doesn't work for me either: #12584
But at least the error I get is the same as the one I got before refactoring.