Skip to content

This repository is dedicated to the reverse-engineering and implementation of a Python interface for Google Chrome's internal **Screen AI library**. Our goal is to enable Python scripts to directly access and utilize the powerful on-device models Chrome uses for OCR and main content extraction.

Notifications You must be signed in to change notification settings

MartialTerran/Project_OpenScreenAI._Unlocking_Chrome-s-On-Device-AI_for_Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Project_OpenScreenAI._Unlocking_Chrome-s-On-Device-AI_for_Python

This repository is dedicated to the reverse-engineering and implementation of a Python interface for Google Chrome's internal Screen AI library. Our goal is to enable Python scripts to directly access and utilize the powerful on-device models Chrome uses for OCR and main content extraction.

The Discovery: A Powerful On-Device AI

Within Chrome's user data directory (%LOCALAPPDATA%\Google\Chrome\User Data\screen_ai\), a powerful, private AI library exists. According to its own README.md file:

Chrome Screen AI Library

Purpose

Chrome Screen AI library provides two on-device functionalities for Chrome and ChromeOS:

  • Main Content Extraction: Intelligently isolates the main content of a web page...
  • Optical Character Recognition: Extracts text from image.

These functionalities are entirely on device and do not send any data to network or store on disk.

Our investigation has confirmed that these models use the TensorFlow Lite (.tflite) format, meaning they are highly optimized for local, efficient inference. This presents a unique opportunity to leverage a sophisticated, multi-language OCR and layout analysis engine directly from our own tools.

The Challenge: It's More Than Just a Model File

While we can locate and copy the model files (like screen2x_model.tflite and the various gocr_...tflite files), they are not usable out-of-the-box. The core challenge of this project lies in reconstructing the logic that surrounds the model calls.

This work is broken down into two main parts:

1. Reconstructing Input Preprocessing

The model doesn't just "see" an image. It expects a precisely formatted numerical tensor. To make it work, we need to figure out:

  • Precise Dimensions: What exact height and width (e.g., 224x224, 320x320) does the model require?
  • Color Channel Order: Does the model expect pixel data in RGB (Red, Green, Blue) or BGR (Blue, Green, Red) order?
  • Normalization: How are pixel values (typically 0-255) scaled? Is it to a [0, 1] range or a [-1, 1] range? What are the exact normalization constants?

This preprocessing logic is currently embedded within Chrome's compiled C++ source code and is not publicly documented.

2. Reconstructing Output Postprocessing

The model's output is not human-readable text. It's a raw tensor of probabilities (logits) for characters or tokens. To get a useful result, we must reverse-engineer the decoding pipeline. This involves:

  • Decoding Logits: Understanding the algorithm that converts the tensor of probabilities into the most likely sequence of characters.
  • Using Auxiliary Files: The model directory contains crucial helper files:
    • .binarypb files that likely define the overall processing graph.
    • .fst (Finite State Transducer) and _lm (Language Model) files that help the model make more accurate predictions based on language context.
    • .syms and _label_map.pb files that map the model's numerical output to actual characters.

This decoding logic is also internal to Chrome's source code.

Our Mission and How You Can Help

Our mission is to build an open-source Python library that reimplements this preprocessing and postprocessing logic, allowing any developer to use Chrome's powerful Screen AI models in their own applications.

We are looking for collaborators with the following skills:

  • TensorFlow Lite Experts: Anyone experienced with the tflite-runtime interpreter in Python.
  • C++ and Chromium Source Code Sleuths: The answers we need are likely hidden in the Chromium source code. If you are comfortable reading C++, you can help us find the exact functions responsible for the pre/post-processing pipelines.
  • OCR and Language Model Specialists: Expertise in how OCR pipelines, language models, and Finite State Transducers (.fst files) work would be invaluable for reconstructing the output decoder.
  • Reverse Engineering Enthusiasts: If you love taking things apart to see how they work, this is the project for you.

Getting Started

  1. Find the Model Files: Locate the screen_ai folder in your Chrome user data directory and copy the latest version folder to this repository.
  2. Inspect the Model Signature: Use the Python script below to inspect the .tflite files and see their expected input/output shapes. This is our starting point.
  3. Dive into the Source Code: Start exploring the Chromium source code link above. Search for terms like "ScreenAI," "OCR," "Aksara," and the model filenames.
  4. Share Your Findings: Open an issue or start a discussion to share what you've learned!

Python Script to Inspect a .tflite Model

Save this as inspect_model.py and place a copied .tflite file in the same directory to run it.

import tflite_runtime.interpreter as tflite
import os
import sys

def inspect_tflite_model(model_path: str):
    if not os.path.exists(model_path):
        print(f"Error: Model file not found at '{model_path}'")
        return

    try:
        interpreter = tflite.Interpreter(model_path=model_path)
        interpreter.allocate_tensors()
        input_details = interpreter.get_input_details()
        output_details = interpreter.get_output_details()

        print(f"--- Inspection Results for: {model_path} ---\n")
        print("--- Input Details ---")
        for detail in input_details:
            print(f"  Name: {detail['name']}, Shape: {detail['shape']}, Data Type: {detail['dtype']}")
        
        print("\n--- Output Details ---")
        for detail in output_details:
            print(f"  Name: {detail['name']}, Shape: {detail['shape']}, Data Type: {detail['dtype']}")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python inspect_model.py <path_to_model.tflite>")
    else:
        inspect_tflite_model(sys.argv)

Disclaimer: This project is for experimental and educational purposes. The internal structure of the Screen AI library is undocumented and may change with any Chrome update, which could break this project.

code
Code
download
content_copy
expand_less

About

This repository is dedicated to the reverse-engineering and implementation of a Python interface for Google Chrome's internal **Screen AI library**. Our goal is to enable Python scripts to directly access and utilize the powerful on-device models Chrome uses for OCR and main content extraction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published