-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Matlab-like type system #21
Conversation
…labClassWrapper) + [Enh] better logic in __new__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing all these changes @balbasty -- I think this is definitely a big step forward for our type system!
I left a few comments in the review, in particular about attaching helper functions as static class methods to the class they belong to, to keep a clear namespace for people doing from spm import *
. I think the main implementations (e.g., for converting a cell to a num array) should be attached to the class it belong (e.g., Array.from_cell(...)
or Cell.as_array()
), with the possibility of adding helper wrappers for users (e.g., num2cell
) that just rely on this implementation and that can be imported selectively (e.g., from spm.helpers import *
). This would keep ease of use for Matlab users but allow for more Python-like syntaxes, e.g.
my_cell.as_array().as_type(dtype=np.uint8).reshape((4,3))
The second point is that we need to test that this type systems actually interfaces from and to Matlab without issues. The tests do not test for that yet. A simple check is to construct an identity function and test for equality:
idt = Runtime.call('eval', '@(x) x')
assert Runtime.call(idt, s) == s
I think we need to get this into the tests before approving the PR.
Other than this, everything looks good, thank you for this :)
spm/__wrapper__.py
Outdated
# ---------------------------------------------------------------------- | ||
|
||
|
||
def cell(*iterable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to overload object constructors instead of implementing new types helper. One reason for that is that it links constructors to the objects to which they belong. As most of spm functions have the format spm_xxx
, one would do from spm import *
for convenience, and should be able to use most of these helpers function without having a cluttered namespace.
Alternatives would be:
- several possible syntaxes in
__init__
, e.g.,__init__(self, shape_or_iterable, ...)
- static methods to implement specific constructors, we could have
Cell.from_iterable(iterable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current constructors do indeed follow the __init__(self, shape_or_iterable, ...)
syntax. However, it leaves room to ill-definite cases:
Cell([1, 2, 3])
generates a cell-array of size1x2x3
so, how to create a cell that contains1, 2, 3
-- i.e.{1, 2, 3}
in matlab?Cell([[3, 4], [5, 6]])
generates a cell-array of size2x2
that contains the values3, 4, 5, 6
, so how to create a cell of cells -- i.e.{{3, 4}, {5, 6}}
in matlab?
So the cell(...)
helper (which resolves these issues) should probably become the Cell.from_iterable(iterable)
class method that you propose.
Alternatively, we could have specific methods to construct cell/struct/num arrays of a given shape, like Cell.from_shape(shape)
or Cell.empty(shape)
, and always trigger a copy-construction when using the Cell(array_like)
syntax. But it deviates a bit from the matlab syntax, so I am not a fan.
Do we agree with that we should implement the Cell.from_iterable(iterable)
syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a really good point !
Do we agree with that we should implement the
Cell.from_iterable(iterable)
syntax?
Yes, that's my preferred option, if there is no strong arguments against it.
spm/__wrapper__.py
Outdated
return obj | ||
|
||
|
||
def num2cell(array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, I think Array.from_cell
could be preferred. Maybe we could move these explicit translation of Matlab functions in a different subpackage, e.g., from spm.cheats import *
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
spm/__wrapper__.py
Outdated
return asmatlab(Struct(other)) | ||
|
||
|
||
def asmatlab(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clearing that out :)
I think I would still prefer to link it to some class (e.g., Runtime), to keep things ordered. In particular, this method should be hidden (e.g., _as_matlab
), as it could be used by the advanced user to debug their code but should not appear in the scope by default. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that some users may want to convert python/json structures to our types, e.g.
asmatlab([{"a": 1}, [0, 1, 2]])
returns aCell
that contains aStruct
and anArray
.
But I agree that may not be needed (especially as users can directly pass such a structure to an spm_*
function without needing the explicit conversion).
spm/__wrapper__.py
Outdated
) | ||
|
||
|
||
def _as_matlab_object(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can rename this to _from_matlab
to relate to _as_matlab
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this makes me think we could just create an abstract base class with these two abstract methods, in order to reunite MatlabClassWrapper and Matlab types you proposed.
spm/__wrapper__.py
Outdated
obj = obj.reshape(shape) | ||
return obj.tolist() | ||
return obj | ||
# TODO: what about sparse numpy arrays? Does matlab understand them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: what about sparse numpy arrays? Does matlab understand them?
No, it does not. Matlab does not understand all types, and the trick I have used is to label data with the Matlab type they should have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so if the user passes a sparse numpy array as input, we should convert it to a dict with type__="sparse"
?
Do you have a documentation about the expected layout/content of the data__
field in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, sparse arrays are converted to dense arrays and passed in the data__
field, we could find a more efficient way to handle this in the future.
spm/__wrapper__.py
Outdated
if isinstance(other, Array): | ||
return other | ||
|
||
if isinstance(other, Cell): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I am surprised the tests work: Matlab only supports cell vector (M, 1) or (1, N), not cell arrays (M, N). For this, I have used the "type__" field of a struct to specify their type. All of these types of translations are made here. This makes me think tests need to be updated as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a misunderstanding about the use of the asmatlab
function. It does not convert objects to representations that we can pass to the runtime -- I use the _as_matlab_object
for this purpose.
Instead, asmatlab
converts python objects to our type system (recursively).
Hence why this function should be renamed.
Essentially there are two functions
- One that converts python objects to "Our types"
- One that converts python objects to "Matlab-compatible types"
Matlab | Matlab-compatible types | Our types |
---|---|---|
Nx... {single, double, ...} |
{ndarray, matlab.single, matlab.double, ...} |
Array(N, ...) |
1x1 {single, double, ...} |
{double, int, complex, bool} |
Array() |
1xN cell |
{list, tuple, set} |
Cell(N) |
NxMx... cell |
dict(type__="cellarray") |
Cell(N, M, ...) |
1x1 struct |
dict |
Struct() |
Nx... struct |
dict(type__="structarray") |
Struct(N, ...) |
Is that it?
spm/__wrapper__.py
Outdated
* `a.as_cell[x,y] = any` indicates that `a` is a cell array; | ||
* `a.as_struct[x,y].f = any` indicates that `a` is a struct array; | ||
* `a.as_cell[x,y].f = any` indicates that `a` is a cell array that contains a struct; | ||
* `a.as_num[x,y] = num` indicates that `a` is a numeric array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, thank you for putting in examples :)
spm/__wrapper__.py
Outdated
# * we should hide all math functions. | ||
# * `__add__`, `__iadd__`, `__radd__` should fallback to `extend()`, | ||
# as in tuples, rather than `np.add` as in arrays. | ||
# * "scalar" cells should be forbidden (already the case at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this: in Matlab, both {}
and {'a'}
would create cells. User should have a way to specify scalar cells, as it might appear in matlabbatch specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we agree. A Cell
with a zero or one elements exist, but they must have at least one dimension -- their shape must be (0,)
or (1,)
. What I call a "scalar cell" (and think should be forbidden) is a Cell
whose shape is an empty tuple()
, which could be created by doing something like Cell(1).reshape([])
.
In contrast, "scalar" Array
and Struct
are allowed:
- a "scalar array" is the same as e.g.
np.asarray(1.0)
, which is identical to the python scalar1.0
for all purposes. - a "scalar struct" is the same as a dictionary (with dot access to fields)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I had the wrong idea of what a scalar cell is. This sounds good!
spm/__wrapper__.py
Outdated
|
||
# TODO: | ||
# I am thinking that cells should behave more like | ||
# "multidimensional tuples/lists" than "ndarray". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with this, they should behave like multidimensional list (mutability is necessary).
spm/__wrapper__.py
Outdated
# * "scalar" cells should be forbidden (already the case at | ||
# construction, but we should also check that reshape/view do | ||
# not return scalar cells). | ||
# * maybe we should inherit from `tuple` (or `list`?). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a nice idea. As you mentioned below, we have to change the insert/pop/remove
Following up on the |
Thanks for the thorough review @johmedr! I've answered to a few comments. I should have some time next week to implement your suggested changes. It would be nice to have the identity function in the bindings, indeed. Or, can we call pure matlab function through the binding? Like can I call |
Great, thank you! I'll try to update the tests asap to go pass data to the Matlab Runtime and back.
We can call |
The type system is now in close-to-final form, but needs many more unit tests. One final important point to deal: currently, I have renamed quite a few methods. @johmedr can you please go through the code and check that the API suits you? The final thing that is not renamed is |
One of the tests fails (but was not failing locally). It seems the Fixed by 5a5dbaa |
Also, I've added |
It's starting to look good. I've put a couple of questions/comments in the code, but it would be good of someone else than me tried it now :) |
Also, it seems that flat iterators on struct are infinite, I don't really now what is happening. Any idea of what's happening? |
I had the same issue with iter which I had to overload. I think it's because numpy relies on IndexError to stop its iteration. But we don't have them anymore thanks to implicit resizing 😅 |
Here is what I get: >>> Struct(1).flat # stuck
>>> [*Struct(1).flat] # stuck
>>> s = Struct()
>>> s.foo = 'bar'
>>> s.flat
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/johan/Documents/Python/spm-python/spm/__wrapper__.py", line 551, in _error_is_not_finalized
raise IndexOrKeyOrAttributeError(
spm.__wrapper__.IndexOrKeyOrAttributeError: 'This DelayedArray has not been finalized, and you are attempting to use it in a way that may break its finalization cycle.\nIt most likely means that you are indexing out-of-bounds without *setting* the out-of-bound value.\n* Correct usage: a.b(i).c = x\n* Incorrect usage: x = a.b(i).c\n'
>>> [*s.flat] # stuck |
Ha that's different. I did not include "flat" as one of the authorised methods or properties in a Struct. I tried to keep them at a minimum so that they can be used as keys instead. If you want to add flat to the list, you can add it to the class variable defined at the beginning of Struct. |
@balbasty, here are the remaining failing tests:Cells have one thing left to fix:>>> Runtime.call('cell', 1,3)
Cell([Cell([]), Cell([]), Cell([])]) should be >>> Runtime.call('cell', 1,3)
Cell([Array([]), Array([]), Array([])]) and same for Structs have one thing to fixDoing s = Struct()
s.foo = "bar"
s.bar = "baz"
s[1].baz = 42 does not initialize the missing fields in subslices: >>> list(s[1].keys())
['baz'] Any fixes in mind? |
…cts + infinite loop when accessing undefined fields
@johmedr I believe I have fixed everything. I had to change some tests (mostly replaced 1D indexing of column vectors with 2D indexing). Please check that you're fine with their behaviour. One discrepency remaining is that |
Amazing, thank you so much ! Happy with the change you made.
Yes, I think that makes sense (for consistency). |
One (last?) inconsistency: >>> Runtime.call('cell', 1, 3).shape
(3,)
>>> Runtime.call('zeros', 1, 3).shape
(1, 3)
>>> Runtime.call('struct', 'a', [1, 2, 3], 'b', [4, 5, 6]).shape
(1, 3) So in essence, a row cell is converted to 1D Should we always convert row arrays to 1D things python side? And let me paste comments from the code so that they are not forgotten:
|
That could be nice to squeeze the first axis of 2d Arrays and Structs so that indexing is consistent, e.g., >>> s = Runtime.call('struct', 'a', [1, 2, 3], 'b', [4, 5, 6])
>>> s[1].b
5 instead of >>> s[0, 1].b
5
Makes sense.
The problem is that we don't have much freedom with this, see Matlab doc here and here.
We could use
Agreed. In a first time, if you |
OK so the only thing that's blocking the merge is squeezing the first axis of 2d arrays and structs. Do you want me to do it python side (in |
I've done it python side. It's actually quite nice that |
Fantastic! I also think it's better and easier to squeeze on the Python side. |
All looks good, merging! Thanks for this huge development @balbasty! |
Implementation of #20
I am leaving this as a draft PR for now.
Todo:
The(fixed)MatlabClassWrapper
test fails on my side, but I don't think I've modified that class.