Build direct interface #57

metab0t · 2024-05-20T04:51:54Z

I am the author of PyOptInterface, an efficient modeling interface for mathematical optimization in Python. Its performance is quite competitive compared with existing solutions (faster than vendored Python bindings of some optimizers).

As a researcher of power system, I deeply understand the need to construct large-scale optimization models efficiently. PyOptInterface provides user-friendly expression-based API to formulate problems with different optimizers, and the lightweight handles of variables and constraints can be stored freely in built-in multidimensional container or Numpy ndarray or Python dataframes (as you like).

I think that PyOptInterface might be a good abstraction layer for your package to handle different optimizers. You can try out our package and evaluate its performance (both memory consumption and speed) if you are interested, and we welcome feedbacks and suggestions in any form.

metab0t · 2024-05-20T05:21:49Z

We believe there are fundamental limitations to the file-based I/O as pointed out by comment of developer of JuMP.jl.
For example, file-based I/O makes the following advanced features nearly impossible: incremental modification and re-solve, extensible solver-specific attributes, on-demand query of solutions. Large file I/O is also slower than in-memory operations.

staadecker · 2024-05-21T08:41:58Z

Hi @metab0t!

Thank you for bringing PyOptInterface to my attention! Your approach, if I understand correctly, of calling Gurobi's C API directly is very neat! Great job implementing that as I imagine it was non-trivial to get the Python-C bindings working properly.

Pyoframe is built on polars a Rust-based dataframe library that follows the Apache Arrow DataFrame format (and not numpy's format). So one issue I foresee with building off of PyOptInterface is the conversion from Polars to your C++ API (which might be slow?). I think long-term this could be a good goal as I agree file-based I/O is an inefficient way to build models. However, for now, I think file-based IO is good enough: our polars-based writer is extremely fast (~10s for very large models), Gurobi reads in the model as well extremely quickly (~10s for very large models) and we don't use files to read back the results.

Incremental modification, re-solve, etc. are rather niche cases imo although something I'd like to support on the long-term at which point PyOptInterface might make a lot of sense.

If you have an easy way to integrate Polars dataframes with your library do let me know as it would be great to support 4 solvers. However, I would guess such an integration is non-trivial and such a project would need to wait. Let me know! (Also happy to setup a call to discuss).

Thanks for reaching out!!

metab0t · 2024-05-21T09:15:36Z

Thanks for your explanation!

I think that it would be not difficult to integrate Polars DataFrame with PyOptInterface. Variables and Constraints in PyOptInterface are just lightweight Python objects, and they can be stored as polars.Object column.

There is no need to store UB, LB, RC of variables because they are stored internally by Gurobi and can be queried on demand.

The file-based IO can be skipped because we have added them to the Gurobi model once they are created. Just call model.optimize() and the solution can be queried using the Variable and Constraint handles we previously stored in Polars DataFrame directly (to skip the io mapping process as well).

I will give a brief example later.

metab0t · 2024-05-21T09:48:05Z

import polars as pl
import pyoptinterface as poi
from pyoptinterface import gurobi

model = gurobi.Model()

# Create a DataFrame
df = pl.DataFrame({
    "X": [1, 2, 3],
    "Y": [4, 5, 6],
    'lb': 0.0,
    'ub': 2.0
})


def addvar(lb, ub):
    return model.add_variable(lb=lb, ub=ub)


df = df.with_columns(
    pl.struct(["lb", "ub"]).map_elements(lambda x: addvar(x["lb"], x["ub"]), return_dtype=pl.Object)
    .alias
    ("Variable")
)

vars = df["Variable"]

model.add_linear_constraint(poi.quicksum(vars), poi.Geq, 1.0)

obj = poi.quicksum(v * v for v in vars)
model.set_objective(obj)

model.optimize()

df = df.with_columns(
    pl.col("Variable").map_elements(lambda x: model.get_value(x), return_dtype=pl.Float64).alias(
        "Value")
)

print(df)

@staadecker This is a simple example to combine PyOptInterface and Polars to solve a QP problem.

staadecker · 2024-05-22T23:21:41Z

Thank you @metab0t !

I don't think we'd want to change the expression generation code over to poi.quicksum as the whole benefit of this library is the rapid creation of very large expressions using polars. Additionally, I'd be afraid that .map_elements would be quite slow (perhaps even slower than the fileIO).

In any case, I'm currently swamped with work so I need to put this on hold.

metab0t · 2024-05-23T06:20:32Z

I have played with Polars and find that its support for Python object is not complete pola-rs/polars#10189

The design of Pyoframe is quite neat. Constraint.lhs.data and Variable.data are compact polars.DataFrame to store their terms and indices, which makes it easy for a possible switch in the future.

In general, using DataFrame to represent multidimensional indices and their sparse combination is a great choice. I remember the benchmark of GAMS and response of JuMP.jl where using DataFrames.jl improves the performance significantly. https://github.com/Gurobi/gurobipy-pandas is also an interesting project to use pandas.DataFrame as container of optimization.

staadecker · 2024-05-23T16:38:28Z

@metab0t thank you, I'm glad you like it :)

Before building the library I actually tried to do something simple like gurobipy-pandas with polars but due to Python objects not being fully supported I couldn't store Gurobi Python expressions in a dataframe as gurobipy-pandas does.

metab0t · 2024-05-23T16:58:26Z

The support for Python object seems not to be the priority of Polars, otherwise a similar API like gurobipy-pandas will be easy to implement (to store persistent variables/constraints objects in DataFrame directly).

Besides, the expression system of PyOptInterface is quite fast to construct expressions with many terms. The core is implemented by efficient hashmap in C++.

I prepare an example based on the facility_problem in Pyoframe repo at https://gist.github.com/metab0t/c3c685a8b2ec1f14171772bd7bc7ea3e

On my computer, the result is:

Pyoframe elapsed time: 28.71 seconds
POI elapsed time: 14.48 seconds

staadecker · 2024-05-23T17:39:30Z

Very neat comparison. Do you have a breakdown of where the time is being taken in PyoFrame (expression building vs io)?

metab0t · 2024-05-23T17:45:43Z

I have updated my gist to report time of Pyoframe in detail.

The time spent on expression building, write LP file and read LP file is approximately 1:1:2. So expression building occupies 25% time and file io occupies the other 75% time.

staadecker · 2024-05-24T13:59:22Z

Very neat, this confirms that file io is not ideal and that when I have time I should build a direct interface, perhaps using PyOptInterface. For context, expressions are stored in a "narrow" format where each row is a term and there is a column for the term's coefficient, and another with an ID to indicate the variable. Would that be something easily converted to your API? I'm thinking it is at that level that I'd want to pass things off to C.

staadecker · 2024-05-24T14:02:46Z

The file based IO also requires a lot of code (i.e. all of io.py and io_mappers.py) so it would be great to get rid of it (we can always use gurobi to generate the .lp file for inspection).

metab0t · 2024-05-24T14:10:51Z

The representation of expression is OK.

In fact, the variable in PyOptInterface is a thin wrapper of its ID, and the linear expression is two vectors representing the coefficients and indices of variables.

Storing variable object (from PyOptInterface or gurobipy) directly in Polars is not recommended because Polars supports Object poorly.

You can build ONE big array to store all the variables in the model and the variable id points to the array. When you want to add the constraint to the model, just traverse all rows and construct the expression object.

By the way, PyOptInterface supports writing the model to LP/MPS files as well. We use the native C API provided by gurobi and the output should be identical with gurobipy.

staadecker · 2024-11-05T21:51:35Z

Just for my future reference, I re-ran @metab0t benchmark script with our new version (faster writing) and I get the following:

	Pyoframe	PyOptInterface
Building model	12s
Writing .lp file	16s
Gurobi reads .lp file	19s
Total	47s	23s

Note weirdly if use_var_names=False in .to_file the time becomes 11s, 15s, 35s. Which means Gurobi is slower in reading the file despite the file being smaller...

staadecker · 2024-11-06T21:50:05Z

I think I could build on top of PyOptInterface at the io_mapper stage by simply creating a solver that creates a variable and constraint mapping where the ids are taken from .add_variable() and .add_linear_constraint(). This would be a really neat way to a) instantly support different solvers, b) potentially half the time it takes to build the model?

metab0t · 2024-11-07T02:00:18Z

You mean replacing the writing/reading LP file with using PyOptInterface to create the model?

staadecker · 2024-11-07T22:03:05Z

You mean replacing the writing/reading LP file with using PyOptInterface to create the model?

Precisely!

metab0t · 2024-11-08T03:05:59Z

You mean replacing the writing/reading LP file with using PyOptInterface to create the model?

Precisely!

It is a minimal-effort approach to support multiple solvers.

I see that the performance can be accelerated further if PyOptInterface can construct linear expressions from two NumPy arrays (coefficients and variables) directly, because Pyoframe have already formulated them as columns of tables. If you find this feature useful, I can implement it in the C++ core of PyOptInterface.

staadecker · 2024-11-08T12:16:52Z

Yes that would be very useful! A function to do that in your library would be key if it doesn't already exist.

By the way, polars uses PyArrow under the hood not numpy and I think generally PyArrow is considered the modern version of numpy. Any plans to use PyArrow instead?

metab0t · 2024-11-09T03:12:04Z

PyArrow array can be passed without copy as numpy array or DLPack protocol.

The proposed API can be named build_linear_expression(coefs, indices) where coefs is an array of coefficients (float64) and indices is an array of indices of PyOptInterface variable ids (int32).

staadecker · 2024-11-12T01:08:12Z

@metab0t sounds good to me!

staadecker · 2024-11-24T15:45:26Z

@metab0t after some further investigation this will be very difficult if not impossible. It turns out that to modify variable attributes through your API I need the VariableIndex object, not just the variable index integer. Since Polars doesn't support storing objects I'm not too sure what to do. I could of course store the objects in a Python list or dictionary but from experience I feel like that won't be very performant...

metab0t · 2024-11-25T02:41:03Z

@staadecker

The VariableIndex is a wrapper of monotonic numeric index from 0. In use cases where we do not remove variables, the index is the same with the index in Pyoframe (describing the column index of variable). The reason to use VariableIndex instead of int is that custom class supports operator overloading to construct expression.

Now that Pyoframe has its own abstraction to construct expressions and you still want to modify the variable attributes via its index. I think there are two solutions:

Do not store the VariableIndex object, just construct it on the fly using constructor VariableIndex(id). At the initialization step, you can iterate from 0 to N_variables and call model.add_variable(lb=lbs[i], ub=ubs[i]). If you want to modify the upper bound of variable at column 4 afterward, you can use model.set_variable_attribute(VariableIndex(4), poi.VariableAttribute.UpperBound, 1.0).
Modify PyOptInterface to construct VariableIndex internally if the user passes an integer.

I prefer the first approach because the ordinary users of PyOptInterface will only use the VariableIndex returned by model.add_variable(). Advanced users with special need like Pyoframe can exploit the fact that VariableIndex.index is the same with the column index of variable to construct VariableIndex directly.

staadecker · 2024-11-25T03:42:58Z

@metab0t this is helpful! I hadn't realized the IDs are monotonic from 0. Definitely your suggestion 1 is the one that makes more sense.

I'll still need to think about this a bit more since it is possible that some Pyoframe variable IDs are skipped (e.g. if the user runs Variable() without assigning it to the model). I could either disallow this (somewhat hacky and a breaking change but perhaps more aligned with the vision) or I could maintain a mapping of Pyoframe IDs to pyoptinterface IDs (pretty easy although it adds an overhead).

In any case, I'll generate the wrapper on the fly, thanks!

metab0t · 2024-11-25T03:56:54Z

How does Pyoframe assign the index of variables?

I see that user can create free variables and attach them as attributes of model.

For example, after creating a 2-element Variable with index [0, 1] and the model has 10 variables now, does Pyoframe automatically overwrites the index of new variables to [10, 11] after m.X = x?

staadecker · 2024-11-25T04:03:35Z

@metab0t The index is incremented every time Variable() is called. So if a Variable() is called but not saved to the model then from the perspective of the Model there will be a missing variable index.

(For multi-dimensional variables it's effectively the same)

metab0t · 2024-11-25T04:18:00Z

Get it. All Variables share one auto-incremental counter to assign indices.

Most modeling frameworks adopt a model-binding design, where a variable/constraint is binding to a specific model and cannot be used in other model instances.

The other design (like Pyoframe) is to treat variables/constraints as freestanding entities and can be used interchangeably between models, so there is inevitably a translation layer from the object index to the internal model index even without introducing PyOptInterface.

staadecker · 2024-11-28T13:22:01Z

@metab0t I've made a lot of progress on integrating this however I've just noticed that there is no equivalent of VariableIndex(id) for ConstraintIndex which is an issue. Is it possible to create a constructor: ConstraintIndex(constraint_id) or do constraints work differently than variables?

metab0t · 2024-11-28T13:54:49Z

I just pushed a new commit to allow it: metab0t/PyOptInterface@ae311fe

The ConstraintIndex needs to specify the type because different kinds of constraints are indexed independently in optimizers such as linear/quadratic/SOS constraints.

I assume that Pyoframe mainly deals with linear constraints, so you can use ConstraintIndex(poi.ConstraintType.Linear, id) after this commit.

You can apply this patch locally to use it temporarily before the upcoming major release.

staadecker · 2024-11-29T00:11:38Z

The following line

poi_model.get_constraint_attribute(poi.ConstraintIndex(poi.ConstraintType.Linear, 9), poi.ConstraintAttribute.Dual)

raises the following error:

Traceback (most recent call last):
  File "c:\Users\machs\Software\pyoframe\.venv\lib\site-packages\pyoptinterface\_src\gurobi.py", line 607, in get_constraint_attribute
    value = _direct_get_entity_attribute(
  File "c:\Users\machs\Software\pyoframe\.venv\lib\site-packages\pyoptinterface\_src\solver_common.py", line 74, in _direct_get_entity_attribute
    value = get_function(model, entity)
  File "c:\Users\machs\Software\pyoframe\.venv\lib\site-packages\pyoptinterface\_src\gurobi.py", line 425, in get_constraint_dual
    ConstraintType.Quadratric: "QCPi",
  File "C:\Users\machs\AppData\Local\Programs\Python\Python38\lib\enum.py", line 384, in __getattr__
    raise AttributeError(name) from None
AttributeError: Quadratric

I couldn't figure out why. It seems that at somepoint constraint.type is Quadratic but my model in this case has no quadratic equations. Any thoughts?

metab0t · 2024-11-29T00:52:38Z

Getting dual of constraint has been fixed in metab0t/PyOptInterface@cdc20ac

metab0t · 2024-11-29T15:53:50Z

@staadecker These fixes have been included in the latest 0.3.0 release of PyOptInterface.

staadecker closed this as completed May 21, 2024

staadecker reopened this May 24, 2024

staadecker changed the title ~~Greeting from PyOptInterface~~ Build direct interface May 24, 2024

staadecker added enhancement New feature or request p-medium Medium priority labels May 24, 2024

staadecker mentioned this issue Nov 6, 2024

Add support for other solvers #75

Closed

staadecker mentioned this issue Nov 29, 2024

Migrate to PyOptInterface + Quadratics + Benchmarks #78

Closed

11 tasks

staadecker mentioned this issue Nov 29, 2024

Migrate to PyOptInterface #79

Merged

12 tasks

staadecker closed this as completed in #79 Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build direct interface #57

Build direct interface #57

metab0t commented May 20, 2024

metab0t commented May 20, 2024 •

edited

Loading

staadecker commented May 21, 2024

metab0t commented May 21, 2024 •

edited

Loading

metab0t commented May 21, 2024 •

edited

Loading

staadecker commented May 22, 2024

metab0t commented May 23, 2024 •

edited

Loading

staadecker commented May 23, 2024

metab0t commented May 23, 2024 •

edited

Loading

staadecker commented May 23, 2024

metab0t commented May 23, 2024 •

edited

Loading

staadecker commented May 24, 2024

staadecker commented May 24, 2024

metab0t commented May 24, 2024 •

edited

Loading

staadecker commented Nov 5, 2024 •

edited

Loading

staadecker commented Nov 6, 2024

metab0t commented Nov 7, 2024

staadecker commented Nov 7, 2024

metab0t commented Nov 8, 2024

staadecker commented Nov 8, 2024

metab0t commented Nov 9, 2024

staadecker commented Nov 12, 2024

staadecker commented Nov 24, 2024

metab0t commented Nov 25, 2024

staadecker commented Nov 25, 2024

metab0t commented Nov 25, 2024

staadecker commented Nov 25, 2024

metab0t commented Nov 25, 2024

staadecker commented Nov 28, 2024

metab0t commented Nov 28, 2024

staadecker commented Nov 29, 2024

metab0t commented Nov 29, 2024 •

edited

Loading

metab0t commented Nov 29, 2024

Build direct interface #57

Build direct interface #57

Comments

metab0t commented May 20, 2024

metab0t commented May 20, 2024 • edited Loading

staadecker commented May 21, 2024

metab0t commented May 21, 2024 • edited Loading

metab0t commented May 21, 2024 • edited Loading

staadecker commented May 22, 2024

metab0t commented May 23, 2024 • edited Loading

staadecker commented May 23, 2024

metab0t commented May 23, 2024 • edited Loading

staadecker commented May 23, 2024

metab0t commented May 23, 2024 • edited Loading

staadecker commented May 24, 2024

staadecker commented May 24, 2024

metab0t commented May 24, 2024 • edited Loading

staadecker commented Nov 5, 2024 • edited Loading

staadecker commented Nov 6, 2024

metab0t commented Nov 7, 2024

staadecker commented Nov 7, 2024

metab0t commented Nov 8, 2024

staadecker commented Nov 8, 2024

metab0t commented Nov 9, 2024

staadecker commented Nov 12, 2024

staadecker commented Nov 24, 2024

metab0t commented Nov 25, 2024

staadecker commented Nov 25, 2024

metab0t commented Nov 25, 2024

staadecker commented Nov 25, 2024

metab0t commented Nov 25, 2024

staadecker commented Nov 28, 2024

metab0t commented Nov 28, 2024

staadecker commented Nov 29, 2024

metab0t commented Nov 29, 2024 • edited Loading

metab0t commented Nov 29, 2024

metab0t commented May 20, 2024 •

edited

Loading

metab0t commented May 21, 2024 •

edited

Loading

metab0t commented May 21, 2024 •

edited

Loading

metab0t commented May 23, 2024 •

edited

Loading

metab0t commented May 23, 2024 •

edited

Loading

metab0t commented May 23, 2024 •

edited

Loading

metab0t commented May 24, 2024 •

edited

Loading

staadecker commented Nov 5, 2024 •

edited

Loading

metab0t commented Nov 29, 2024 •

edited

Loading