Skip to content

Comments

GLSLShader refactoring: singleton factory architecture#12587

Open
Lex-DRL wants to merge 12 commits intoComfy-Org:masterfrom
Lex-DRL:glsl-context-singleton-factory
Open

GLSLShader refactoring: singleton factory architecture#12587
Lex-DRL wants to merge 12 commits intoComfy-Org:masterfrom
Lex-DRL:glsl-context-singleton-factory

Conversation

@Lex-DRL
Copy link

@Lex-DRL Lex-DRL commented Feb 23, 2026

The current GLSLShader node implementation is designed (vibe coded?) in such a way, that it contains runtime conditional logic for various backends (which is checked at each execution).
This causes GLFW/EGL/OSMesa-specific code to be spread across the whole module and mixed together in the same functions, i.e. it relies on globals, is bug prone, and hard to maintain/extend in the future (e.g., add a new OGL backend).

I've refactored the entire module to adhere to singleton factory pattern instead: GLContext is still a singleton, but it's an "abstract" one (follows the same approach as with WebDriver in Selenium) and works like this:

  • During its first initialization, the instance is created from one of concrete subclasses, dedicated for different backends.
  • All the backend-specific code is contained within each of these concrete classes, some of it is hidden as protected/private.
  • While all the shared rendering functions (the ones which depend on backend already being initialized) are turned to GLContext instance methods.

No changes to the logic itself.
Though, I wasn't able to test it because the current implementation doesn't work for me either: #12584
But at least the error I get is the same as the one I got before refactoring.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

The pull request refactors the OpenGL context management in comfy_extras/nodes_glsl.py by introducing a singleton-like GLContext framework that replaces previous lazy-loading patterns. A new backend selection system attempts to instantiate one of three concrete implementations (GLFW, EGL, or OSMesa) in a defined fallback order. The GLContext class now encompasses shader compilation, program creation, and a render_shader_batch workflow that handles ES-to-desktop GLSL conversion, framebuffer pipeline setup, multi-pass rendering, and resource management. The GLSLShader execute path is updated to leverage this new GLContext architecture for rendering operations.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main refactoring work: a singleton factory architecture pattern applied to GLSLShader.
Description check ✅ Passed The pull request description clearly explains the refactoring from per-execution backend conditionals to a singleton factory pattern with concrete subclasses, directly related to the substantial changes in comfy_extras/nodes_glsl.py.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
comfy_extras/nodes_glsl.py (1)

251-264: ⚠️ Potential issue | 🔴 Critical

NameError if glGenVertexArrays itself raises.

vao is only assigned inside the try body. If glGenVertexArrays throws (e.g., on systems where VAOs are unsupported), the name is never bound and the if vao: guard in the except block raises NameError, masking the original exception.

🐛 Proposed fix
+        vao = None
         try:
             vao = gl.glGenVertexArrays(1)
             gl.glBindVertexArray(vao)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_extras/nodes_glsl.py` around lines 251 - 264, The try/except can raise
a NameError in GLContext.__init__ because vao is only assigned inside the try;
initialize a sentinel before calling gl.glGenVertexArrays (e.g., set vao = None)
and change the cleanup guard to check "if vao is not None" before calling
gl.glDeleteVertexArrays; ensure self._vao is only set after successful bind as
currently done and that the except block references the sentinel to avoid
masking the original exception from gl.glGenVertexArrays/gl.glBindVertexArray.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_extras/nodes_glsl.py`:
- Around line 748-749: The code in _init_backend_concrete (OSMesa) sets
os.environ["PYOPENGL_PLATFORM"] = "osmesa" as a permanent side effect; change
this to save the original os.environ.get("PYOPENGL_PLATFORM") before setting,
perform the OSMesa/OpenGL initialization inside a try block, and in the
except/finally restore the original value (either resetting to the saved value
or deleting the key if it was absent) so the PYOPENGL_PLATFORM value is only
left modified on successful initialization and is reverted on failure.
- Around line 563-566: The pad logic currently appends the same numpy buffer
`black_img` multiple times, causing aliasing when converted via
`torch.from_numpy`; fix by creating a fresh array for each unused slot (e.g.,
allocate a new zeros array or append `black_img.copy()`) when adding to
`batch_outputs` in the padding loop that references `black_img`,
`batch_outputs`, `MAX_OUTPUTS`, and later
`all_outputs`/`torch.from_numpy`/`torch.stack`, so each padded output has its
own independent buffer.
- Around line 157-163: The singleton initialization in GLContext.__new__ has a
TOCTOU race: guard the check-and-create sequence with a class-level lock (e.g.,
a threading.Lock stored on GLContext, like GLContext._init_lock) so only one
thread can run the creation path that calls
GLContext.__new_instance_using_concrete_class_fallback_order and assigns
GLContext.__instance; drop the redundant assert isinstance(...) after
assignment. Ensure the lock is acquired before checking GLContext.__instance and
released after assignment (use a context manager).

---

Outside diff comments:
In `@comfy_extras/nodes_glsl.py`:
- Around line 251-264: The try/except can raise a NameError in
GLContext.__init__ because vao is only assigned inside the try; initialize a
sentinel before calling gl.glGenVertexArrays (e.g., set vao = None) and change
the cleanup guard to check "if vao is not None" before calling
gl.glDeleteVertexArrays; ensure self._vao is only set after successful bind as
currently done and that the except block references the sentinel to avoid
masking the original exception from gl.glGenVertexArrays/gl.glBindVertexArray.

Comment on lines 157 to +163
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
# Since ``GLContext`` is a singleton anyway, we should store it
# explicitly in ``GLContext.__instance``, NOT in ``cls.__instance``.
if GLContext.__instance is None:
GLContext.__instance = GLContext.__new_instance_using_concrete_class_fallback_order()
assert isinstance(GLContext.__instance, GLContext)
return GLContext.__instance
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing lock on singleton initialization creates a TOCTOU race.

Two concurrent callers can both observe GLContext.__instance is None, both invoke __new_instance_using_concrete_class_fallback_order, and both create a real GL context. The second assignment on line 161 silently overwrites the first, leaking the earlier context and leaving whichever thread received the first instance holding a now-abandoned GL context object.

🔒 Suggested fix: guard with a class-level lock
+import threading
 
 class GLContext:
     ...
     __instance: 'GLContext' = None
+    __init_lock: threading.Lock = threading.Lock()
 
     def __new__(cls):
-        if GLContext.__instance is None:
-            GLContext.__instance = GLContext.__new_instance_using_concrete_class_fallback_order()
-            assert isinstance(GLContext.__instance, GLContext)
-        return GLContext.__instance
+        if GLContext.__instance is None:
+            with GLContext.__init_lock:
+                if GLContext.__instance is None:  # double-checked locking
+                    GLContext.__instance = GLContext.__new_instance_using_concrete_class_fallback_order()
+        return GLContext.__instance

The assert can be dropped — it is vacuously true by construction and is silently removed under python -O.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_extras/nodes_glsl.py` around lines 157 - 163, The singleton
initialization in GLContext.__new__ has a TOCTOU race: guard the
check-and-create sequence with a class-level lock (e.g., a threading.Lock stored
on GLContext, like GLContext._init_lock) so only one thread can run the creation
path that calls GLContext.__new_instance_using_concrete_class_fallback_order and
assigns GLContext.__instance; drop the redundant assert isinstance(...) after
assignment. Ensure the lock is acquired before checking GLContext.__instance and
released after assignment (use a context manager).

Comment on lines +563 to +566
# Pad with black images for unused outputs
black_img = np.zeros((height, width, 4), dtype=np.float32)
for _ in range(num_outputs, MAX_OUTPUTS):
batch_outputs.append(black_img)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Padding outputs within a single batch share the same np.zeros buffer.

All unused output slots in one batch iteration point to the same black_img array. torch.from_numpy shares the buffer rather than copying, so the tensors in all_outputs are aliased until torch.stack materializes them. If any code path between these lines and torch.stack were to write into one of those tensors, it would corrupt the others. Constructing a fresh array per slot (or using .copy()) eliminates the alias.

✨ Suggested fix
-            black_img = np.zeros((height, width, 4), dtype=np.float32)
             for _ in range(num_outputs, MAX_OUTPUTS):
-                batch_outputs.append(black_img)
+                batch_outputs.append(np.zeros((height, width, 4), dtype=np.float32))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_extras/nodes_glsl.py` around lines 563 - 566, The pad logic currently
appends the same numpy buffer `black_img` multiple times, causing aliasing when
converted via `torch.from_numpy`; fix by creating a fresh array for each unused
slot (e.g., allocate a new zeros array or append `black_img.copy()`) when adding
to `batch_outputs` in the padding loop that references `black_img`,
`batch_outputs`, `MAX_OUTPUTS`, and later
`all_outputs`/`torch.from_numpy`/`torch.stack`, so each padded output has its
own independent buffer.

Comment on lines +748 to +749
logger.debug("_init_backend_concrete (OSMesa): starting")
os.environ["PYOPENGL_PLATFORM"] = "osmesa"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

PYOPENGL_PLATFORM is set as a permanent process-wide side effect.

os.environ["PYOPENGL_PLATFORM"] = "osmesa" persists for the lifetime of the process even if OSMesa initialization subsequently fails. Any other component in the process that later imports OpenGL.GL (before or after this node) will inherit the OSMesa platform selection. The effect is intentional here (the env var must precede the OpenGL.GL import), but the lack of cleanup on failure is worth noting — a context manager or try/finally that restores the original value on failure would be safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_extras/nodes_glsl.py` around lines 748 - 749, The code in
_init_backend_concrete (OSMesa) sets os.environ["PYOPENGL_PLATFORM"] = "osmesa"
as a permanent side effect; change this to save the original
os.environ.get("PYOPENGL_PLATFORM") before setting, perform the OSMesa/OpenGL
initialization inside a try block, and in the except/finally restore the
original value (either resetting to the saved value or deleting the key if it
was absent) so the PYOPENGL_PLATFORM value is only left modified on successful
initialization and is reverted on failure.

Copy link
Contributor

@christian-byrne christian-byrne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work with process isolation and async nodes? i.e., each node being in a separate process.

@Lex-DRL
Copy link
Author

Lex-DRL commented Feb 23, 2026

I haven't changed the original behavior in any way. I merely rearranged the existing code.
If the original implementation wasn't going to work with process isolation and async nodes, this refactor won't work either.

Originally, the backend initialization was done entirely in GLContext.__init__ with a bunch of try-except statements, sequentially trying various backends and flagging GLContext._initialized = True in the end.

What I did is extracted the body of those try-excepts into special class-specific functions, to be called from __init__s of each concrete class (no try-except there, to propagate the error into the outer scope). Then, when the singleton is first created, GLContext.__new__ now does BOTH instance creation (__new__ itself) and initialization (__init__), explicitly called right after creation. If any error occurs, it's caught at __new__ stage (not __init__ stage), and the next concrete class tried. Also, I now mark the instance as initialized, not the class.

If anything, the initialization became more syncronous, i.e. more thread-safe.

Though, as I mentioned, I couldn't fully test it because the node in its current implementation (before refactoring) doesn't work for me..

@Lex-DRL
Copy link
Author

Lex-DRL commented Feb 23, 2026

What I can tell is the existing implementation seem to be an MVP with a HUUUUUGE room for improvement, so probably asyncronous execution wasn't considered by the original author ( @pythongosssss )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants