-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build direct interface #57
Comments
We believe there are fundamental limitations to the file-based I/O as pointed out by comment of developer of JuMP.jl. |
Hi @metab0t! Thank you for bringing PyOptInterface to my attention! Your approach, if I understand correctly, of calling Gurobi's C API directly is very neat! Great job implementing that as I imagine it was non-trivial to get the Python-C bindings working properly. Pyoframe is built on polars a Rust-based dataframe library that follows the Apache Arrow DataFrame format (and not numpy's format). So one issue I foresee with building off of PyOptInterface is the conversion from Polars to your C++ API (which might be slow?). I think long-term this could be a good goal as I agree file-based I/O is an inefficient way to build models. However, for now, I think file-based IO is good enough: our polars-based writer is extremely fast (~10s for very large models), Gurobi reads in the model as well extremely quickly (~10s for very large models) and we don't use files to read back the results. Incremental modification, re-solve, etc. are rather niche cases imo although something I'd like to support on the long-term at which point PyOptInterface might make a lot of sense. If you have an easy way to integrate Polars dataframes with your library do let me know as it would be great to support 4 solvers. However, I would guess such an integration is non-trivial and such a project would need to wait. Let me know! (Also happy to setup a call to discuss). Thanks for reaching out!! |
Thanks for your explanation! I think that it would be not difficult to integrate Polars DataFrame with PyOptInterface. Variables and Constraints in PyOptInterface are just lightweight Python objects, and they can be stored as There is no need to store UB, LB, RC of variables because they are stored internally by Gurobi and can be queried on demand. The file-based IO can be skipped because we have added them to the Gurobi model once they are created. Just call I will give a brief example later. |
import polars as pl
import pyoptinterface as poi
from pyoptinterface import gurobi
model = gurobi.Model()
# Create a DataFrame
df = pl.DataFrame({
"X": [1, 2, 3],
"Y": [4, 5, 6],
'lb': 0.0,
'ub': 2.0
})
def addvar(lb, ub):
return model.add_variable(lb=lb, ub=ub)
df = df.with_columns(
pl.struct(["lb", "ub"]).map_elements(lambda x: addvar(x["lb"], x["ub"]), return_dtype=pl.Object)
.alias
("Variable")
)
vars = df["Variable"]
model.add_linear_constraint(poi.quicksum(vars), poi.Geq, 1.0)
obj = poi.quicksum(v * v for v in vars)
model.set_objective(obj)
model.optimize()
df = df.with_columns(
pl.col("Variable").map_elements(lambda x: model.get_value(x), return_dtype=pl.Float64).alias(
"Value")
)
print(df) @staadecker This is a simple example to combine PyOptInterface and Polars to solve a QP problem. |
Thank you @metab0t ! I don't think we'd want to change the expression generation code over to In any case, I'm currently swamped with work so I need to put this on hold. |
I have played with Polars and find that its support for Python object is not complete pola-rs/polars#10189 The design of Pyoframe is quite neat. Constraint.lhs.data and Variable.data are compact polars.DataFrame to store their terms and indices, which makes it easy for a possible switch in the future. In general, using DataFrame to represent multidimensional indices and their sparse combination is a great choice. I remember the benchmark of GAMS and response of JuMP.jl where using DataFrames.jl improves the performance significantly. https://github.com/Gurobi/gurobipy-pandas is also an interesting project to use pandas.DataFrame as container of optimization. |
@metab0t thank you, I'm glad you like it :) Before building the library I actually tried to do something simple like gurobipy-pandas with polars but due to Python objects not being fully supported I couldn't store Gurobi Python expressions in a dataframe as gurobipy-pandas does. |
The support for Python object seems not to be the priority of Polars, otherwise a similar API like gurobipy-pandas will be easy to implement (to store persistent variables/constraints objects in DataFrame directly). Besides, the expression system of PyOptInterface is quite fast to construct expressions with many terms. The core is implemented by efficient hashmap in C++. I prepare an example based on the facility_problem in Pyoframe repo at https://gist.github.com/metab0t/c3c685a8b2ec1f14171772bd7bc7ea3e On my computer, the result is:
|
Very neat comparison. Do you have a breakdown of where the time is being taken in PyoFrame (expression building vs io)? |
I have updated my gist to report time of Pyoframe in detail. The time spent on expression building, write LP file and read LP file is approximately |
Very neat, this confirms that file io is not ideal and that when I have time I should build a direct interface, perhaps using PyOptInterface. For context, expressions are stored in a "narrow" format where each row is a term and there is a column for the term's coefficient, and another with an ID to indicate the variable. Would that be something easily converted to your API? I'm thinking it is at that level that I'd want to pass things off to C. |
The file based IO also requires a lot of code (i.e. all of io.py and io_mappers.py) so it would be great to get rid of it (we can always use gurobi to generate the .lp file for inspection). |
The representation of expression is OK. In fact, the variable in PyOptInterface is a thin wrapper of its ID, and the linear expression is two vectors representing the coefficients and indices of variables. Storing variable object (from PyOptInterface or gurobipy) directly in Polars is not recommended because Polars supports Object poorly. You can build ONE big array to store all the variables in the model and the variable id points to the array. When you want to add the constraint to the model, just traverse all rows and construct the expression object. By the way, PyOptInterface supports writing the model to LP/MPS files as well. We use the native C API provided by gurobi and the output should be identical with gurobipy. |
Just for my future reference, I re-ran @metab0t benchmark script with our new version (faster writing) and I get the following:
Note weirdly if |
I think I could build on top of PyOptInterface at the io_mapper stage by simply creating a solver that creates a variable and constraint mapping where the ids are taken from |
You mean replacing the writing/reading LP file with using PyOptInterface to create the model? |
Precisely! |
It is a minimal-effort approach to support multiple solvers. I see that the performance can be accelerated further if PyOptInterface can construct linear expressions from two NumPy arrays (coefficients and variables) directly, because Pyoframe have already formulated them as columns of tables. If you find this feature useful, I can implement it in the C++ core of PyOptInterface. |
Yes that would be very useful! A function to do that in your library would be key if it doesn't already exist. By the way, polars uses PyArrow under the hood not numpy and I think generally PyArrow is considered the modern version of numpy. Any plans to use PyArrow instead? |
PyArrow array can be passed without copy as numpy array or DLPack protocol. The proposed API can be named |
@metab0t sounds good to me! |
@metab0t after some further investigation this will be very difficult if not impossible. It turns out that to modify variable attributes through your API I need the VariableIndex object, not just the variable index integer. Since Polars doesn't support storing objects I'm not too sure what to do. I could of course store the objects in a Python list or dictionary but from experience I feel like that won't be very performant... |
The Now that Pyoframe has its own abstraction to construct expressions and you still want to modify the variable attributes via its index. I think there are two solutions:
I prefer the first approach because the ordinary users of PyOptInterface will only use the |
@metab0t this is helpful! I hadn't realized the IDs are monotonic from 0. Definitely your suggestion 1 is the one that makes more sense. I'll still need to think about this a bit more since it is possible that some Pyoframe variable IDs are skipped (e.g. if the user runs In any case, I'll generate the wrapper on the fly, thanks! |
How does Pyoframe assign the index of variables? I see that user can create free variables and attach them as attributes of model. For example, after creating a 2-element |
@metab0t The index is incremented every time (For multi-dimensional variables it's effectively the same) |
Get it. All Most modeling frameworks adopt a model-binding design, where a variable/constraint is binding to a specific model and cannot be used in other model instances. The other design (like Pyoframe) is to treat variables/constraints as freestanding entities and can be used interchangeably between models, so there is inevitably a translation layer from the object index to the internal model index even without introducing PyOptInterface. |
@metab0t I've made a lot of progress on integrating this however I've just noticed that there is no equivalent of |
I just pushed a new commit to allow it: metab0t/PyOptInterface@ae311fe The I assume that Pyoframe mainly deals with linear constraints, so you can use You can apply this patch locally to use it temporarily before the upcoming major release. |
The following line
raises the following error:
I couldn't figure out why. It seems that at somepoint |
Getting dual of constraint has been fixed in metab0t/PyOptInterface@cdc20ac |
@staadecker These fixes have been included in the latest 0.3.0 release of PyOptInterface. |
Hello, @staadecker!
I am the author of PyOptInterface, an efficient modeling interface for mathematical optimization in Python. Its performance is quite competitive compared with existing solutions (faster than vendored Python bindings of some optimizers).
As a researcher of power system, I deeply understand the need to construct large-scale optimization models efficiently. PyOptInterface provides user-friendly expression-based API to formulate problems with different optimizers, and the lightweight handles of variables and constraints can be stored freely in built-in multidimensional container or Numpy ndarray or Python dataframes (as you like).
I think that PyOptInterface might be a good abstraction layer for your package to handle different optimizers. You can try out our package and evaluate its performance (both memory consumption and speed) if you are interested, and we welcome feedbacks and suggestions in any form.
The text was updated successfully, but these errors were encountered: