(EN/CN) Allow Multiple API Keys for Custom OCR (Gemini API) and Combined OCR and Translation in a Single Gemini API Request / 允许自定义 OCR (Gemini API) 使用多个 API 密钥 / 在单个 Gemini API 请求中组合 OCR 和翻译 #1406

darthalex2014 · 2025-02-17T15:31:16Z

darthalex2014
Feb 17, 2025

Feature Request 1: Allow Multiple API Keys for Custom OCR (Gemini API)

Summary: Currently, when configuring a custom OCR using the Gemini API, only a single API key can be entered. This is unlike the custom Translator configuration, which allows for multiple API keys. This feature requests the ability to add multiple API keys to the custom OCR (Gemini API) configuration for redundancy, load balancing, or cost management.

Motivation: Adding multiple API keys would improve reliability and potentially reduce costs by allowing for automatic failover or load distribution across different keys.

Proposed Solution: Implement the same multiple API key functionality for custom OCR (Gemini API) that is currently available for custom Translator.

Feature Request 2: Combined OCR and Translation in a Single Gemini API Request

Summary: Currently, performing OCR and translation requires two separate Gemini API requests: one for OCR and one for translation. This feature requests the ability to send an image to the Gemini API and receive the translated text in a single request.

Motivation: Combining OCR and translation into a single request would reduce latency, simplify the workflow, and potentially reduce API usage costs.

Proposed Solution: Implement functionality to allow users to send an image to the Gemini API and receive the translated text as a single response. This could involve adding a new API endpoint or extending the existing translation endpoint to accept image input.

Feature Request 1: Allow Multiple API Keys for Custom OCR (Gemini API)

Summary (摘要): 目前，在使用 Gemini API 配置自定义 OCR 时，只能输入一个 API 密钥。这与自定义翻译器配置不同，后者允许使用多个 API 密钥。此功能请求允许向自定义 OCR (Gemini API) 配置添加多个 API 密钥，以实现冗余、负载平衡或成本管理。

Motivation (动机): 添加多个 API 密钥将提高可靠性，并可能通过允许自动故障转移或跨不同密钥的负载分配来降低成本。

Proposed Solution (建议方案): 为自定义 OCR (Gemini API) 实现与当前可用于自定义翻译器的相同多 API 密钥功能。

Feature Request 2: Combined OCR and Translation in a Single Gemini API Request

Summary (摘要): 目前，执行 OCR 和翻译需要两个单独的 Gemini API 请求：一个用于 OCR，一个用于翻译。此功能请求允许将图像发送到 Gemini API，并在单个请求中接收翻译后的文本。

Motivation (动机): 将 OCR 和翻译合并到单个请求中将减少延迟，简化工作流程，并可能降低 API 使用成本。

Proposed Solution (建议方案): 实现允许用户将图像发送到 Gemini API 并接收翻译后的文本作为单个响应的功能。这可能涉及添加新的 API 端点或扩展现有的翻译端点以接受图像输入。

darthalex2014 · 2025-02-17T15:41:17Z

darthalex2014
Feb 17, 2025
Author

Example: Translating Images with Gemini API (C#)

This example demonstrates how to send an image to the Gemini API for translation using C#. It focuses on the core logic of making the API request and handling the response.

using System;
using System.Net.Http;
using System.Text;
using Newtonsoft.Json.Linq;
using System.Threading.Tasks;

public async Task<string> TranslateImageGeminiAsync(byte[] imageBytes)
{
    string apiKey = _geminiConfiguration.GetRandomApiKey(); // Get your Gemini API Key
    if (string.IsNullOrEmpty(apiKey))
    {
        return "No Gemini API keys configured in settings.";
    }

    var apiUrl = string.Format("https://generativelanguage.googleapis.com/v1beta/models/{0}:generateContent", _geminiConfiguration.GeminiModelName); // Gemini API Endpoint
    var apiKeyParam = $"?key={apiKey}";

    using var client = new HttpClient();
    client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));

    var base64Image = Convert.ToBase64String(imageBytes);

    var requestBody = new
    {
        contents = new[]
        {
            new
            {
                parts = new object[]
                {
                    new
                    {
                        inlineData = new
                        {
                            mimeType = "image/png",
                            data = base64Image
                        },
                    },
                    new { text = $"{_geminiConfiguration.GeminiPromptText} image to Russian." } // Your Prompt
                }
            }
        },
        safetySettings = GetSafetySettings() // Disable safety filters (use with caution!)
    };

    var jsonRequest = Newtonsoft.Json.JsonConvert.SerializeObject(requestBody);
    var content = new StringContent(jsonRequest, Encoding.UTF8, "application/json");

    try
    {
        var response = await client.PostAsync($"{apiUrl}{apiKeyParam}", content);
        response.EnsureSuccessStatusCode();

        var jsonResponse = await response.Content.ReadAsStringAsync();
        JObject parsedResponse = Newtonsoft.Json.JsonConvert.DeserializeObject<JObject>(jsonResponse);

        var translatedText = parsedResponse["candidates"]?[0]?["content"]?["parts"]?[0]?["text"]?.ToString();

        return translatedText ?? "Image translation failed: Gemini response format error.";
    }
    catch (Exception ex)
    {
        return $"Image translation error: {ex.Message}";
    }
}

// Example of disabling safety settings (use with caution)
private static object GetSafetySettings()
{
    return new[]
    {
        new { category = "HARM_CATEGORY_HARASSMENT", threshold = "BLOCK_NONE" },
        new { category = "HARM_CATEGORY_HATE_SPEECH", threshold = "BLOCK_NONE" },
        new { category = "HARM_CATEGORY_SEXUALLY_EXPLICIT", threshold = "BLOCK_NONE" },
        new { category = "HARM_CATEGORY_DANGEROUS_CONTENT", threshold = "BLOCK_NONE" },
        new { category = "HARM_CATEGORY_CIVIC_INTEGRITY", threshold = "BLOCK_NONE" }
    };
}

Key Points:

TranslateImageGeminiAsync: This method takes an image as a byte array, encodes it to Base64, and sends it to the Gemini API.
GetSafetySettings: This function disables safety filters.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(EN/CN) Allow Multiple API Keys for Custom OCR (Gemini API) and Combined OCR and Translation in a Single Gemini API Request / 允许自定义 OCR (Gemini API) 使用多个 API 密钥 / 在单个 Gemini API 请求中组合 OCR 和翻译 #1406

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

(EN/CN) Allow Multiple API Keys for Custom OCR (Gemini API) and Combined OCR and Translation in a Single Gemini API Request / 允许自定义 OCR (Gemini API) 使用多个 API 密钥 / 在单个 Gemini API 请求中组合 OCR 和翻译 #1406

darthalex2014 Feb 17, 2025

Replies: 1 comment

darthalex2014 Feb 17, 2025 Author

darthalex2014
Feb 17, 2025

darthalex2014
Feb 17, 2025
Author