This repository is dedicated to the reverse-engineering and implementation of a Python interface for Google Chrome's internal Screen AI library. Our goal is to enable Python scripts to directly access and utilize the powerful on-device models Chrome uses for OCR and main content extraction.
Within Chrome's user data directory (%LOCALAPPDATA%\Google\Chrome\User Data\screen_ai\), a powerful, private AI library exists. According to its own README.md file:
Chrome Screen AI library provides two on-device functionalities for Chrome and ChromeOS:
- Main Content Extraction: Intelligently isolates the main content of a web page...
- Optical Character Recognition: Extracts text from image.
These functionalities are entirely on device and do not send any data to network or store on disk.
Our investigation has confirmed that these models use the TensorFlow Lite (.tflite) format, meaning they are highly optimized for local, efficient inference. This presents a unique opportunity to leverage a sophisticated, multi-language OCR and layout analysis engine directly from our own tools.
While we can locate and copy the model files (like screen2x_model.tflite and the various gocr_...tflite files), they are not usable out-of-the-box. The core challenge of this project lies in reconstructing the logic that surrounds the model calls.
This work is broken down into two main parts:
The model doesn't just "see" an image. It expects a precisely formatted numerical tensor. To make it work, we need to figure out:
- Precise Dimensions: What exact height and width (e.g., 224x224, 320x320) does the model require?
- Color Channel Order: Does the model expect pixel data in
RGB(Red, Green, Blue) orBGR(Blue, Green, Red) order? - Normalization: How are pixel values (typically 0-255) scaled? Is it to a
[0, 1]range or a[-1, 1]range? What are the exact normalization constants?
This preprocessing logic is currently embedded within Chrome's compiled C++ source code and is not publicly documented.
The model's output is not human-readable text. It's a raw tensor of probabilities (logits) for characters or tokens. To get a useful result, we must reverse-engineer the decoding pipeline. This involves:
- Decoding Logits: Understanding the algorithm that converts the tensor of probabilities into the most likely sequence of characters.
- Using Auxiliary Files: The model directory contains crucial helper files:
.binarypbfiles that likely define the overall processing graph..fst(Finite State Transducer) and_lm(Language Model) files that help the model make more accurate predictions based on language context..symsand_label_map.pbfiles that map the model's numerical output to actual characters.
This decoding logic is also internal to Chrome's source code.
Our mission is to build an open-source Python library that reimplements this preprocessing and postprocessing logic, allowing any developer to use Chrome's powerful Screen AI models in their own applications.
We are looking for collaborators with the following skills:
- TensorFlow Lite Experts: Anyone experienced with the
tflite-runtimeinterpreter in Python. - C++ and Chromium Source Code Sleuths: The answers we need are likely hidden in the Chromium source code. If you are comfortable reading C++, you can help us find the exact functions responsible for the pre/post-processing pipelines.
- OCR and Language Model Specialists: Expertise in how OCR pipelines, language models, and Finite State Transducers (
.fstfiles) work would be invaluable for reconstructing the output decoder. - Reverse Engineering Enthusiasts: If you love taking things apart to see how they work, this is the project for you.
- Find the Model Files: Locate the
screen_aifolder in your Chrome user data directory and copy the latest version folder to this repository. - Inspect the Model Signature: Use the Python script below to inspect the
.tflitefiles and see their expected input/output shapes. This is our starting point. - Dive into the Source Code: Start exploring the Chromium source code link above. Search for terms like "ScreenAI," "OCR," "Aksara," and the model filenames.
- Share Your Findings: Open an issue or start a discussion to share what you've learned!
Save this as inspect_model.py and place a copied .tflite file in the same directory to run it.
import tflite_runtime.interpreter as tflite
import os
import sys
def inspect_tflite_model(model_path: str):
if not os.path.exists(model_path):
print(f"Error: Model file not found at '{model_path}'")
return
try:
interpreter = tflite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(f"--- Inspection Results for: {model_path} ---\n")
print("--- Input Details ---")
for detail in input_details:
print(f" Name: {detail['name']}, Shape: {detail['shape']}, Data Type: {detail['dtype']}")
print("\n--- Output Details ---")
for detail in output_details:
print(f" Name: {detail['name']}, Shape: {detail['shape']}, Data Type: {detail['dtype']}")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python inspect_model.py <path_to_model.tflite>")
else:
inspect_tflite_model(sys.argv)
Disclaimer: This project is for experimental and educational purposes. The internal structure of the Screen AI library is undocumented and may change with any Chrome update, which could break this project.
code
Code
download
content_copy
expand_less