Closed
Description
Description of the bug:
The Generative Service Client or GenerativeModel
classes don't document thread safety assumptions, and don't appear to be usable in a multithreaded environment for making concurrent API requests.
I'd suggest either:
- documenting thread safety assumptions and guarantees, or
- investigating behaviour when a client is shared between threads
Behaviour observed: After trying to make concurrent calls to the generative text api, most calls failed with a 60s timeout. The client never recovered (that is, every new call attempt also froze for 60s then ultimately timed out with an error).
Sample error output:
10%|▉ | 199/2047.0 [29:31<5:46:51, 11.26s/it]
HTTPConnectionPool(host='localhost', port=46423): Read timed out. (read timeout=60.0)
10%|▉ | 204/2047.0 [30:22<4:59:27, 9.75s/it]
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
10%|█ | 209/2047.0 [31:10<6:08:26, 12.03s/it]
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
11%|█ | 216/2047.0 [31:43<3:43:00, 7.31s/it]
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
11%|█ | 225/2047.0 [32:48<3:52:42, 7.66s/it]
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
11%|█▏ | 231/2047.0 [33:38<4:22:00, 8.66s/it]
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
12%|█▏ | 245/2047.0 [35:55<6:14:28, 12.47s/it]
HTTPConnectionPool(host='localhost', port=46423): Read timed out. (read timeout=60.0)
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
14%|█▍ | 296/2047.0 [43:38<4:30:46, 9.28s/it]
HTTPConnectionPool(host='localhost', port=46423): Read timed out. (read timeout=60.0)
Example snippet:
# [... regular imports ...]
from concurrent.futures import ThreadPoolExecutor
import tqdm
safety_settings = ...
executor = ThreadPoolExecutor(max_workers=5)
def build_data_batch():
## build batches of data to process
pass
def generate(data_batch):
model_out = 'error'
try:
# this ends up failing whether or not the model client is
# created freshly per-request, or shared across threads
model = genai.GenerativeModel('models/gemini-pro', safety_settings=safety_settings)
model_out = model.generate_content(build_prompt(data_batch)).text
except Exception as e:
print(e)
return model_out
all_outputs = []
all_outputs = executor.map(generate, build_data_batch())
with open('./outputs.txt', 'w') as f:
for result in tqdm.tqdm(all_outputs, total=totalbatches):
f.write(result)
Actual vs expected behavior:
Actual: all calls fail.
Expected: this case should either work, or client docs should document as non-thread-safe for concurrent usage given how common batch inference scenarios are likely to be.
Any other information you'd like to share?
No response