Conversation
In this JEP we propose: 1. A way for objects to represent themselves at runtime for AI 2. A registry for users to define representations for objects that do not have them 3. A new messaging protocol to query for this data JEP for jupyter#128
dlqqq
left a comment
There was a problem hiding this comment.
@mlucool @govinda18 Thank you for opening this JEP! Really excited to see this moving forward. 💪
Left some feedback & typo fixes below.
|
|
||
| #### Introducing the `_ai_repr_` Protocol | ||
|
|
||
| The `_ai_repr_` method allows objects to define representations tailored for AI interactions. This method returns a dictionary (`Dict[str, Any]`), where keys are MIME types (e.g., `text/plain`, `image/png`) and values are the corresponding representations. |
There was a problem hiding this comment.
Comment: Consumers of this protocol (e.g. Jupyter AI) will likely not be able to support every MIME type, nor do they need to. Jupyter AI should document which MIME types we read from. That way, people implementing these methods know how to define _ai_repr_() to provide usable reprs for Jupyter AI. Other consuming extensions should do the same.
| async def _ai_repr_(self, **kwargs): | ||
| return { | ||
| "text/plain": f"A chart titled {self.title} with series {self.series_names}", | ||
| "image/png": await self.render_image() | ||
| } |
There was a problem hiding this comment.
Is it worth defining a type for the dictionary returned by _ai_repr_() instead of using Dict[str, Any]? This may be important in the case of image reprs, since the same MIME type may be encoded in different ways. For example, it's ambiguous as to whether image/png may return a bytes object or a base64-encoded string.
MIME types do allow encodings/parameters to also be specified, so image/png;base64 could refer to a string object while image/png could refer to a bytes object. It may be worth defining this in the same proposal to reduce ambiguity.
There was a problem hiding this comment.
Hmm, one problem is that using a TypedDict doesn't allow extra keys to be defined. So if we define _ai_repr_() -> AiRepr, implementers can only use the keys we define in the AiRepr type. Another issue that this custom type would have to be provided by some dependency, which means every consumer & implementer needs 1 extra dependency.
We should continue to think of ways to provide better type safety guarantees on the values in the returned dictionary.
| In this case, `@my_tbl` would not only give the LLM data about its schema, but we'd know this is Pandas or Polars without | ||
| a user having to specify this. | ||
|
|
||
| It's possible that we'd want both an async and non-async version supported (or even just a sync version). If so, we can default one to the other: |
There was a problem hiding this comment.
It may be better to have the async version live in a different method, e.g. _async_ai_repr_().
| } | ||
| ``` | ||
| - **`object_name`**: (Required) The name of the variable or object in the kernel namespace for which the representation is being requested. | ||
| - **`kwargs`**: (Optional) Key-value pairs allowing customization of the representation (e.g., toggling multimodal outputs, adjusting verbosity). |
There was a problem hiding this comment.
It seems like the kwargs structure needs to be documented somewhere. Would this be standardized by the kernel implementation when handling ai_repr_request?
There was a problem hiding this comment.
my understanding is that they should be some consensus, but some free parameters that could depends on which repr understand the need of which models.
There was a problem hiding this comment.
My intent was not to be prescriptive here and let that evolve naturally. That is, given jupyter-ai popularity, I expect it to implicitly set some kwargs as a standard. No kwarg should be required is maybe a better statement here, but plugins are free to document things they support.
| - Should we support both or one of async/sync `_repr_ai_` | ||
| - What are good reccomended kwargs to pass | ||
| - How should this related to `repr_*`, if at all. | ||
| - What is the right default for objects without reprs/formatters defined? `str(obj)`, `None`, or `_repr_`? |
There was a problem hiding this comment.
A registry can define AI representations for objects that lack a _ai_repr_(). It seems natural to also a registry to define a "fallback" AI representation, i.e. the method to be used when neither a _ai_repr_() method or a registry entry exist for an object.
There was a problem hiding this comment.
My guess is the registry is already the fallback, because the object does not have a _ai_repr_, Unless you see the registry as an override of _ai_repr_ ?
Co-authored-by: David L. Qiu <david@qiu.dev>
Co-authored-by: David L. Qiu <david@qiu.dev>
Co-authored-by: David L. Qiu <david@qiu.dev>
Co-authored-by: David L. Qiu <david@qiu.dev>
| } | ||
| } | ||
| ``` | ||
| - **`object_name`**: (Required) The name of the variable or object in the kernel namespace for which the representation is being requested. |
There was a problem hiding this comment.
Do we want to allow dots in names ? it may be property and trigger side effects, but I think that is fine.
There was a problem hiding this comment.
I think that's ok too
| - Should we support both or one of async/sync `_repr_ai_` | ||
| - What are good reccomended kwargs to pass | ||
| - How should this related to `repr_*`, if at all. | ||
| - What is the right default for objects without reprs/formatters defined? `str(obj)`, `None`, or `_repr_`? | ||
| - Should thread-saftey be required so that this can be called via a comm | ||
| - Can `ai_repr_request` be canceled? |
There was a problem hiding this comment.
For passed kwargs, I would at least standardise 1 which is the list/set of mimetype accepted; I think it should be fairly standard to select between or text images, and having a standard would be good to start.
I think beyond the protocol, nothing is jupyter specific, and I'm happy to add some integration with IPython – even if likely not necessary.
I think we should likely have introspection facilities, like do we want to be able to list the various mimetype an object can provide ? And do we want the user to be able to ask for these ?
There is also the slight technical question that _ai_repr_(**kwargs) -> Dict[str, T], requires T to be serialisable, no really be Any.
But in general I'm +1, I'll see if I can prototype and push that to the Jupyter EC/SSC.
There was a problem hiding this comment.
There is also the slight technical question that ai_repr(**kwargs) -> Dict[str, T], requires T to be serialisable, no really be Any.
Agreed
|
I think there was a bug in github (or my laptop), I submitted my review yesterday, but there was no internet, and it resubmitted today when I reopened the tab and internet was back. Sorry if there are crossed wires. |
Co-authored-by: M Bussonnier <bussonniermatthias@gmail.com>
krassowski
left a comment
There was a problem hiding this comment.
This was discussed in the jupyter-ai call today.
We just wanted to ping CC @jupyter/software-steering-council on this :)
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
minrk
left a comment
There was a problem hiding this comment.
This seems generally sensible
My main design question is at the protocol level, why does this require a distinct message handler, as opposed to defining a schema for one (or more) custom x-jupyter/ai-repr mimetype via the existing display protocol, e.g. an inspect_request which already returns a mimebundle representing an object.
I don't really understand why the existing mimebundle messages have "performance concerns and their inability to adapt" while this doesn't, when it appears to do the same thing (returns a mimebundle representing an object), and has the same design (register reprs methods both via method name and explicit dispatch).
|
|
||
| ### Summary | ||
|
|
||
| This proposal introduces a standardized method, `_ai_repr_(self, **kwargs) -> Dict[str, Any]`, for objects in Jupyter environments to provide context-sensitive representations optimized for AI interactions. The protocol allows flexibility for multimodal or text-only representations and supports user-defined registries for objects lacking native implementations. Additionally, we propose a new messaging protocol for retrieving these representations, facilitating seamless integration into Jupyter AI tools. |
There was a problem hiding this comment.
Absolutely minor detail, but is there a reason this inverts the standardized _repr_ai_ naming scheme to _ai_repr_? If not, then I'd suggest following the prefix precedent (using a prefix improves discoverability via tab completion, etc.).
There was a problem hiding this comment.
Now, another message already in the protocol that could be considered is Details{
'type' : 'request',
'command' : 'richInspectVariables',
'arguments' : {
'variableName' : str,
# The frameId is used when the debugger hit a breakpoint only.
'frameId' : int
}
}and returns: {
'type' : 'response',
'success' : bool,
'body' : {
# Dictionary of rich representations of the variable
'data' : dict,
'metadata' : dict
}
}The advantage of using frame IDs is that local variables can be distinguished from global variables. Now, the problem is that debugger is opt-in, has side-effects and performance overhead so it is not necessarily a go-to solution. The basic DAP uses all three: I wonder if when debugger is active, should we have a way to pass |
|
A broader edge-case is definition disambiguation:
While it is fine to return description of all cases, it can grow long. In these cases to get a definition of a function unambiguously we would need to know both the function and the argument(s); This is where This JEP does not say how user would specify In R if they loaded a few statistical packages, they could be presented with a list dozens of different variants; a simple case (for brevity): # S3 method for glm
predict(object, newdata = NULL,
type = c("link", "response", "terms"),
se.fit = FALSE, dispersion = NULL, terms = NULL,
na.action = na.pass, …)
# S3 method for lm
predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
interval = c("none", "confidence", "prediction"),
level = 0.95, type = c("response", "terms"),
terms = NULL, na.action = na.pass,
pred.var = res.var/weights, weights = 1, …)Now, we may want to present these as separate items in
Then if a user chooses one or another, a different snippet would show up. These separate items could be annotated with a variable reference, so when it is used for AI request proposed in this JEP the user's choice would be remembered. I think defining a variable reference in |
|
AI is such a nebulous term that whatever description or formulation this representation would hold could both be useful to humans, and also useless to past, present, and future applications that would still nevertheless be reasonably called AI. So I came here to chime in that this can already be done as a set of conventions around mimebundle keys, and hence not require a protocol revision. Fortunately, @minrk is already doing the same
We need to be very deliberate and conservative with protocol revision, I think this would grow the surface area and complexity of interactions for the jupyter protocol. For example, if you end up with objects that declare "ai repr" but not other rich repr, do you end up with users who aren't using "ai" having to inspect objects twice in separate ways, hoping to get something better than |
I agree it is nebulous, do you have a better term we should use? I started with calling it
This is a important question and I believe there are a number of reasons
That all being said, 1/2 are a bit unrelated to the protocol change (we still need a way to get this data from UIs). If we came up with other solutions here (e.g. an async mimebundle, where there was a clean way to pass args to), I don't think it needs to be "ai". |
|
I don't want to get off on too much of a tangent, but being able to select supported mimetype representations in the request is something that's seemed useful for some time, to avoid computing representations that won't be understood (like an http Accept header). Would a custom mime-type in existing messages be satisfactory if it could be opt-in via the request, and not computed by default? Then the same request that asks for the "genai repi" could explicitly exclude all the other reprs, too. That appears to address all of the performance/efficiency questions, IIUC. |
Not a tangent at all. It's a great question. Here's a first pass at what I think would have to change:
|
|
Thanks for answering, I think the kwargs was the big missing piece I didn't understand before, which is more different than my initial interpretation. |
Great! @minrk is there anything else I can answer to help clarify the JEP? |
|
@mlucool thanks, I think my questions have been addressed. |
krassowski
left a comment
There was a problem hiding this comment.
This is just to align headings with the fixes in #130.
Also, can you merge with master branch to pull the fixes for build?
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
|
Would a JSON-LD 'ified nbformat and |
|
MCP: Model Context Protocol; https://modelcontextprotocol.io/introduction :
https://github.com/modelcontextprotocol Are there other similar standards for what the MCP Model Context Protocol specification solves for? Edit: |
|
I believe MCP to a bit tangent to this. MCP is more for agents tools (i.e. something that can do work) and this is meant to be a way to call for specific information for a specific variable. |
|
Having a dedicated message makes it simpler for kernels to implement it and is backward compatible, at the cost of potentially duplicating requests (inspect and ai_repr). Another solution could be to add optional arguments to the inspect_requests, that could be used to specify the user wants an AI representation, and would be unused in the case of a classical inspect_request. Similarly the In both cases kernels can advertise they support this new feature via the Optional Feature, making it possible to have a backward compatible protocol extension. |
|
Use cases; an MCP server for Jupyter:
|
|
Could we have |
|
I wrote a quick tool to transform ai chat exports from .md/.json to .ipynb
a few weeks ago; "transform_md" that still haven't factored extra
functionality out of;
ipynb supports multiple mimetype outputs, if they have different types; but
not multiple `text/html` outputs per input?
Where to display the model's thinking as output?
- input
- output_thinking
- output_response
.
- I: input
- O1: output_thinking
- O1: output_response
.
- I1: input
- O1: output_thinking
- I2:
- O2: output_response
It's clunky to try and copy/paste just an output_response if it's
string-concatenated with output_thinking.
GitHub Copilot JSON is a more complete trace than most other chat AI export
formats?
Is there already a standardized JSON-LD/YAML-LD "Open Trace" format that
covers?
That can support *multiple* _ai_repr_ outputs per input? With which MIME
types?
And then, just like notebooks in Markdown, copying binary files inlines
into an .ipynb file is clunky. IIRC the markdown nbformat issue(s) already
discuss how moving from .ipynb to .zip basically implies that you're
creating or adopting a new packaging format (when all they want is support
for pyproject.toml, requirements.txt, and also environment.yml, and PEP723
inline script metadata, repo2docker/repo2jupyterlite and container2wasm)
- "feat(cli): Add /export command to export chat history to markdown and
jsonl"
google-gemini/gemini-cli#5342 (review)
- "How to export the chat history of GitHub Copilot Chat?"
community/community#57190
- How many outputs per input?
- Which MIME types for the _repr_ai_ outputs?
- How/when to merge all of the JSON outputs into one document?
- How to merge all of the JSON-LD outputs?
- How to merge all of the _repr_json_ and _ai_repr_ outputs?
How to merge all of the _repr_json_ and _ai_repr_ outputs, with notebook
level metadata and cell level metadata; into one JSON document with a
JSON-LD @context?
.
- The other methods have the prefix `_repr_` as a convention, so `_repr_ai`
would be better
…On Wed, Apr 8, 2026, 10:28 AM Kyle Kelley ***@***.***> wrote:
*rgbkrk* left a comment (jupyter/enhancement-proposals#129)
<#129 (comment)>
Could we have _ai_repr_ go to a specialized text/llm+plain so that it's
differentiable from classic text/plain and can technically be richer than
a regular plain repr?
—
Reply to this email directly, view it on GitHub
<#129 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMNS6L3SKMAAA4H2ZG66L4UZO2RAVCNFSM6AAAAACXRCOZPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DEMBXGAYDSMRQGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
I've been stewing on this a bit and I don't really like the addition of a protocol to get what is effectively a summary and call it ai. It's clearly intended to be used by AI but is more like a variable inspection request where we pull summary data. There's several areas lurking at once in this JEP that I think are easy to push on without having to make a kernel protocol as we can use existing machinery or create more direct building blocks on top of existing ones. |
|
How to indicate the schema version of what is returned by _repr_json_ or
_repr_jsonld_ or _repr_yamlld12_ or _repr_ai_?
Or just gather each from all output cells into a document after each cell
runs
How to publish linked data for linked research, from: document level
metadata including kernelspecs, cell level metadata, cell output data?
- [x] schema.org/CreativeWork
- [x] schema.org/DigitalDocument
- [x] schema.org/ScholarlyArticle
- [ ] schema.org/Notebook ,
schema.org/DigitalNotebook
- [ ] schema.org/JupyterNotebook
or just Notebook
Justify adding _repr_ai_ or linked data support as described
…On Wed, Apr 8, 2026, 3:45 PM Kyle Kelley ***@***.***> wrote:
*rgbkrk* left a comment (jupyter/enhancement-proposals#129)
<#129 (comment)>
I've been stewing on this a bit and I don't really like the addition of a
protocol to get what is effectively a summary and call it ai. It's clearly
intended to be used by AI but is more like a variable inspection request
where we pull summary data. There's several areas lurking at once in this
JEP that I think are easy to push on without having to make a kernel
protocol as we can use existing machinery or create more direct building
blocks on top of existing ones.
—
Reply to this email directly, view it on GitHub
<#129 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMNS577OBD4E5R2NQ5NED4U2T4RAVCNFSM6AAAAACXRCOZPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DEMBZGEZTINBYHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
In this JEP we propose:
JEP for #128