Skip to content

Conversation

@ali-aljufairi
Copy link
Contributor

Add the ability to display images directly in the terminal when MCP tools return image data. This enables extensions like nanobanana to show generated images inline without requiring external viewers.

Changes:

  • Add ink-picture dependency for cross-terminal image rendering
  • Create ImageDisplay component with self-contained TerminalInfoProvider
  • Add ImageData and ImageResult types to core tools module
  • Update MCP tool handler to detect and return image blocks
  • Integrate image display into ToolResultDisplay component

Supported terminal protocols (auto-detected):

  • Kitty graphics protocol
  • iTerm2 inline images
  • Sixel graphics
  • Half-block/Braille/ASCII fallbacks

BREAKING CHANGE: None

⚠️ This is a Proof of Concept (POC) - NOT ready for merge

Add the ability to display images directly in the terminal when MCP tools return image data. This enables extensions like nanobanana to show generated images inline without requiring external viewers.

Note: This is an experimental feature exploring terminal image display capabilities. We acknowledge that not all terminals support inline images, and there are known limitations.

Details

What this PR does:

  • Adds ink-picture dependency for cross-terminal image rendering
  • Creates ImageDisplay component with self-contained TerminalInfoProvider
  • Adds ImageData and ImageResult types to core tools module
  • Updates MCP tool handler to detect and return image blocks from extensions
  • Integrates image display into ToolResultDisplay component

Terminal Protocol Support (via ink-picture):

  • ✅ Kitty graphics protocol
  • ✅ iTerm2 inline images
  • ✅ Sixel graphics
  • ⚠️ Half-block/Braille/ASCII fallbacks (lower quality)

Known Limitations & Issues:

  1. Not all terminals support inline images - This is a fundamental limitation. Terminals without graphics support will show ASCII/text fallbacks
  2. Flickering - There is a known flickering issue during image rendering. This somthing to look at not sure why though
  3. Image sizing - Currently fixed at 128x128 terminal cells

Changes required in nanobanana extension (not in this PR):

To test this feature, the following modifications were made to the nanobanana MCP server locally:

  1. types.ts - Added imageData and imageMimeType fields to ImageGenerationResponse
  2. imageGenerator.ts - Capture first image base64 data during generation
  3. index.ts - Include image content block in MCP response:
    ```typescript
    content.push({
    type: '''image''',
    data: result.imageData,
    mimeType: result.imageMimeType || '''image/png'''
    });
    ```

Demo

2025-11-27_13-37-40.mov

Related Issues

How to Validate

Prerequisites:

  • A terminal that supports inline images (tested on iTerm2 only)
  • nanobanana extension with image data modifications (see above)

Steps:

  1. Build the packages: npm run build
  2. Run gemini CLI in iTerm2
  3. Use nanobanana to generate an image: /nanobanana generate image of a sunset
  4. Observe the image displayed inline in the terminal

Expected Results:

  • Image should render inline in the terminal output
  • On unsupported terminals, ASCII/text fallback should display

Demo:

Success Criteria for Future Development

Before this can be considered for merge:

  1. Resolve or document flickering issue
  2. Test on multiple terminals (Kitty, WezTerm, VS Code integrated terminal)
  3. Add configuration option to enable/disable image display
  4. Consider adding protocol selection setting for users
  5. Document terminal compatibility matrix
  6. Upstream nanobanana extension changes (if desired)

Pre-Merge Checklist

⚠️ POC - Not all items completed

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run (iTerm2 only)
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

Add the ability to display images directly in the terminal when MCP tools
return image data. This enables extensions like nanobanana to show generated
images inline without requiring external viewers.

Changes:
- Add ink-picture dependency for cross-terminal image rendering
- Create ImageDisplay component with self-contained TerminalInfoProvider
- Add ImageData and ImageResult types to core tools module
- Update MCP tool handler to detect and return image blocks
- Integrate image display into ToolResultDisplay component

Supported terminal protocols (auto-detected):
- Kitty graphics protocol
- iTerm2 inline images
- Sixel graphics
- Half-block/Braille/ASCII fallbacks

BREAKING CHANGE: None
@ali-aljufairi ali-aljufairi requested review from a team as code owners November 27, 2025 04:52
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ali-aljufairi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces an experimental feature to render images directly in the terminal output of the CLI. By integrating the ink-picture library and modifying the tool result display, the CLI can now interpret and show image data returned by extensions, providing a richer interactive experience. This is currently a Proof of Concept, with identified limitations and a clear path for future development.

Highlights

  • Terminal Image Display: Added the capability to display images directly within the terminal when MCP tools return image data, enhancing the CLI experience for extensions like nanobanana.
  • New Component and Types: Introduced an ImageDisplay React component for rendering images and defined ImageData and ImageResult types to handle image data from tools.
  • MCP Tool Integration: Updated the MCP tool handler to detect and process image blocks from extension responses, converting them into ImageResult objects for display.
  • Cross-Terminal Support: Leverages the ink-picture library to support various terminal graphics protocols (Kitty, iTerm2, Sixel) with fallbacks to half-block/Braille/ASCII for wider compatibility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ali-aljufairi ali-aljufairi marked this pull request as draft November 27, 2025 04:53
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an exciting proof-of-concept for displaying images directly in the terminal. The implementation using ink-picture is a great start. I've identified a few areas for improvement to enhance robustness and correctness before this feature is ready for merging. My main concerns are around handling of mixed-content responses from tools, which could lead to information loss in the UI, and a bug in the new ImageDisplay component where a prop is not being used. Addressing these points will make the feature more reliable.

Comment on lines +31 to +74
export const ImageDisplay: React.FC<ImageDisplayProps> = ({
image,
width,
height,
}) => {
const { base64, filePath, mimeType, alt } = image;

// Determine the image source
let src: string | undefined;

if (base64) {
// Convert base64 to data URL
const mime = mimeType || 'image/png';
src = `data:${mime};base64,${base64}`;
} else if (filePath) {
src = filePath;
}

if (!src) {
return (
<Box>
<Text color="yellow">[Image: Unable to display - no valid source]</Text>
</Box>
);
}

return (
<TerminalInfoProvider>
<Box flexDirection="column">
<Image
src={src}
width={width}
height={height}
alt={alt || 'Generated image'}
/>
{alt && (
<Box marginTop={1}>
<Text dimColor>{alt}</Text>
</Box>
)}
</Box>
</TerminalInfoProvider>
);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The preserveAspectRatio prop is defined in ImageDisplayProps but is not being destructured from the props, and therefore not passed to the underlying Image component. This makes the prop non-functional, as it will always use the default behavior of ink-picture.

export const ImageDisplay: React.FC<ImageDisplayProps> = ({
  image,
  width,
  height,
  preserveAspectRatio,
}) => {
  const { base64, filePath, mimeType, alt } = image;

  // Determine the image source
  let src: string | undefined;

  if (base64) {
    // Convert base64 to data URL
    const mime = mimeType || 'image/png';
    src = `data:${mime};base64,${base64}`;
  } else if (filePath) {
    src = filePath;
  }

  if (!src) {
    return (
      <Box>
        <Text color="yellow">[Image: Unable to display - no valid source]</Text>
      </Box>
    );
  }

  return (
    <TerminalInfoProvider>
      <Box flexDirection="column">
        <Image
          src={src}
          width={width}
          height={height}
          preserveAspectRatio={preserveAspectRatio}
          alt={alt || 'Generated image'}
        />
        {alt && (
          <Box marginTop={1}>
            <Text dimColor>{alt}</Text>
          </Box>
        )}
      </Box>
    </TerminalInfoProvider>
  );
};

Comment on lines +402 to +425
if (imageBlock) {
// Collect any text content
const textParts = mcpContent
.filter((block): block is McpTextBlock => block.type === 'text')
.map((block) => block.text)
.join('\n');

// Try to extract file path from text content (e.g., "• /path/to/image.png")
const filePathMatch = textParts.match(
/[\-*]\s*([^\n]+\.(png|jpg|jpeg|gif|webp))/i,
);
const filePath = filePathMatch ? filePathMatch[1].trim() : undefined;

const imageResult: ImageResult = {
image: {
// Prefer file path over base64 for better compatibility with ink-picture
filePath,
base64: filePath ? undefined : imageBlock.data,
mimeType: imageBlock.mimeType,
alt: textParts || undefined,
},
};
return imageResult;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for displaying images finds the first image block and immediately returns, causing any other non-text content blocks in the response to be ignored by the UI. This can lead to information loss if a tool returns an image alongside other important content like resource links.

To prevent this, consider processing the image only if it's the sole non-text content type. If other content types are present, you could fall back to the existing string-based display for all blocks to ensure no information is hidden from the user.

Comment on lines +409 to +413
// Try to extract file path from text content (e.g., "• /path/to/image.png")
const filePathMatch = textParts.match(
/[\-*]\s*([^\n]+\.(png|jpg|jpeg|gif|webp))/i,
);
const filePath = filePathMatch ? filePathMatch[1].trim() : undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Extracting the file path from the text content using a regular expression is brittle as it relies on a specific text format. If the tool's text output for file paths changes, this logic will break.

A more robust approach would be for the MCP tool to provide the file path in a structured field within the image content block itself, avoiding the need for string parsing. While the server-side changes might be out of scope for this PR, this approach should be considered for a more resilient implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant