New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

AI Representation #129

Open

mlucool wants to merge 6 commits into jupyter:master from mlucool:master

+209 −0

Contributor

mlucool commented Feb 7, 2025

In this JEP we propose:

A way for objects to represent themselves at runtime for AI
A registry for users to define representations for objects that do not have them
A new messaging protocol to query for this data

JEP for #128


          AI Representation

8f8711c

In this JEP we propose:
1. A way for objects to represent themselves at runtime for AI
2. A registry for users to define representations for objects that
do not have them
3. A new messaging protocol to query for this data

JEP for jupyter#128

Contributor Author

mlucool commented Feb 7, 2025 •

edited

Loading

CC @krassowski @ellisonbg @dlqqq @jtpio @SylvainCorlay @Zsailer @brichet @Carreau

dlqqq reviewed

View reviewed changes

dlqqq left a comment

@mlucool @govinda18 Thank you for opening this JEP! Really excited to see this moving forward. 💪

Left some feedback & typo fixes below.

129-ai-representation/ai-representation.md Outdated Show resolved Hide resolved

129-ai-representation/ai-representation.md Outdated Show resolved Hide resolved

129-ai-representation/ai-representation.md


		#### Introducing the `_ai_repr_` Protocol

		The `_ai_repr_` method allows objects to define representations tailored for AI interactions. This method returns a dictionary (`Dict[str, Any]`), where keys are MIME types (e.g., `text/plain`, `image/png`) and values are the corresponding representations.

dlqqq Feb 11, 2025

Comment: Consumers of this protocol (e.g. Jupyter AI) will likely not be able to support every MIME type, nor do they need to. Jupyter AI should document which MIME types we read from. That way, people implementing these methods know how to define _ai_repr_() to provide usable reprs for Jupyter AI. Other consuming extensions should do the same.

129-ai-representation/ai-representation.md

Comment on lines +37 to +41

+                  async def _ai_repr_(self, **kwargs):
+                      return {
+                          "text/plain": f"A chart titled {self.title} with series {self.series_names}",
+                          "image/png": await self.render_image()
+                      }

dlqqq Feb 11, 2025

Is it worth defining a type for the dictionary returned by _ai_repr_() instead of using Dict[str, Any]? This may be important in the case of image reprs, since the same MIME type may be encoded in different ways. For example, it's ambiguous as to whether image/png may return a bytes object or a base64-encoded string.

MIME types do allow encodings/parameters to also be specified, so image/png;base64 could refer to a string object while image/png could refer to a bytes object. It may be worth defining this in the same proposal to reduce ambiguity.

dlqqq Feb 11, 2025

Hmm, one problem is that using a TypedDict doesn't allow extra keys to be defined. So if we define _ai_repr_() -> AiRepr, implementers can only use the keys we define in the AiRepr type. Another issue that this custom type would have to be provided by some dependency, which means every consumer & implementer needs 1 extra dependency.

We should continue to think of ways to provide better type safety guarantees on the values in the returned dictionary.

129-ai-representation/ai-representation.md

+              In this case, `@my_tbl` would not only give the LLM data about its schema, but we'd know this is Pandas or Polars without
+              a user having to specify this.
+              It's possible that we'd want both an async and non-async version supported (or even just a sync version). If so, we can default one to the other:

dlqqq Feb 11, 2025

It may be better to have the async version live in a different method, e.g. _async_ai_repr_().

129-ai-representation/ai-representation.md Show resolved Hide resolved

129-ai-representation/ai-representation.md

+              }
+              ```
+              - **`object_name`**: (Required) The name of the variable or object in the kernel namespace for which the representation is being requested.
+              - **`kwargs`**: (Optional) Key-value pairs allowing customization of the representation (e.g., toggling multimodal outputs, adjusting verbosity).

dlqqq Feb 11, 2025

It seems like the kwargs structure needs to be documented somewhere. Would this be standardized by the kernel implementation when handling ai_repr_request?

Member

Carreau Feb 12, 2025

my understanding is that they should be some consensus, but some free parameters that could depends on which repr understand the need of which models.

Contributor Author

mlucool Feb 12, 2025

My intent was not to be prescriptive here and let that evolve naturally. That is, given jupyter-ai popularity, I expect it to implicitly set some kwargs as a standard. No kwarg should be required is maybe a better statement here, but plugins are free to document things they support.

129-ai-representation/ai-representation.md

+              - Should we support both or one of async/sync `_repr_ai_`
+              - What are good reccomended kwargs to pass
+              - How should this related to `repr_*`, if at all.
+              - What is the right default for objects without reprs/formatters defined? `str(obj)`, `None`, or `_repr_`?

dlqqq Feb 11, 2025

A registry can define AI representations for objects that lack a _ai_repr_(). It seems natural to also a registry to define a "fallback" AI representation, i.e. the method to be used when neither a _ai_repr_() method or a registry entry exist for an object.

Member

Carreau Feb 12, 2025

My guess is the registry is already the fallback, because the object does not have a _ai_repr_, Unless you see the registry as an override of _ai_repr_ ?

129-ai-representation/ai-representation.md Outdated Show resolved Hide resolved

mlucool and others added 4 commits

February 11, 2025 18:10


          Update 129-ai-representation/ai-representation.md

b278d7f

Co-authored-by: David L. Qiu <[email protected]>


          Update 129-ai-representation/ai-representation.md

90d20be

Co-authored-by: David L. Qiu <[email protected]>


          Update 129-ai-representation/ai-representation.md

7e4fccf

Co-authored-by: David L. Qiu <[email protected]>


          Update 129-ai-representation/ai-representation.md

f805b4e

Co-authored-by: David L. Qiu <[email protected]>

Carreau reviewed

View reviewed changes

129-ai-representation/ai-representation.md Outdated Show resolved Hide resolved

129-ai-representation/ai-representation.md

+                  }
+              }
+              ```
+              - **`object_name`**: (Required) The name of the variable or object in the kernel namespace for which the representation is being requested.

Member

Carreau Feb 11, 2025

Do we want to allow dots in names ? it may be property and trigger side effects, but I think that is fine.

Contributor Author

mlucool Feb 12, 2025

I think that's ok too

129-ai-representation/ai-representation.md Outdated

Comment on lines 203 to 208

+              - Should we support both or one of async/sync `_repr_ai_`
+              - What are good reccomended kwargs to pass
+              - How should this related to `repr_*`, if at all.
+              - What is the right default for objects without reprs/formatters defined? `str(obj)`, `None`, or `_repr_`?
+              - Should thread-saftey be required so that this can be called via a comm
+              - Can `ai_repr_request` be canceled?

Member

Carreau Feb 11, 2025

For passed kwargs, I would at least standardise 1 which is the list/set of mimetype accepted; I think it should be fairly standard to select between or text images, and having a standard would be good to start.

I think beyond the protocol, nothing is jupyter specific, and I'm happy to add some integration with IPython – even if likely not necessary.

I think we should likely have introspection facilities, like do we want to be able to list the various mimetype an object can provide ? And do we want the user to be able to ask for these ?

There is also the slight technical question that _ai_repr_(**kwargs) -> Dict[str, T], requires T to be serialisable, no really be Any.

But in general I'm +1, I'll see if I can prototype and push that to the Jupyter EC/SSC.

Contributor Author

mlucool Feb 12, 2025

There is also the slight technical question that ai_repr(**kwargs) -> Dict[str, T], requires T to be serialisable, no really be Any.

Agreed

Member

Carreau commented Feb 12, 2025

I think there was a bug in github (or my laptop), I submitted my review yesterday, but there was no internet, and it resubmitted today when I reopened the tab and internet was back. Sorry if there are crossed wires.


          Update 129-ai-representation/ai-representation.md

Co-authored-by: M Bussonnier <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet