-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-34953: Implement mmap.mmap.__repr__
#9891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
mmap.mmap.__repr__
mmap.mmap.__repr__
Modules/mmapmodule.c
Outdated
|
||
reprfmt = "<%s is_closed=True fileno=%d access=%s>"; | ||
repr = PyUnicode_FromFormat(reprfmt, tp_name, fd, access_str); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the wrapped empty lines. And most C APIs could fail and we need to check it.
Please add some tests. |
Okay, thanks. I'll remove wrapped empty lines and add some tests. |
@thautwarm Did you have questions about creating the tests for this? |
Sorry, I'll have it out before the next week. |
Sorry for my late commits. May you please review my codes @zhangyangyu when your schedule is available? |
Modules/mmapmodule.c
Outdated
break; | ||
|
||
default: | ||
PyErr_SetString(PyExc_IOError, "Unknown access mode for mmap object."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not possible to reach here IMHO. And IOError
is not a right exception type here. I think no need to raise a exception here, a simple comment // should not get here
, or an assertion assert(null)
, or access_str = "UNKNOWN"
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I'll insert an assert
here.
Modules/mmapmodule.c
Outdated
} | ||
|
||
Py_ssize_t size = m_obj->size; | ||
const char *tp_name = self->ob_type->tp_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use Py_TYPE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll assemble your notes and make a commit tonight.
Modules/mmapmodule.c
Outdated
|
||
repr = PyUnicode_FromFormat(reprfmt, tp_name, fileno, access_str, | ||
length, offset, slice1, slice2); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this part is necessary. IMHO we should not show the content in repr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original proposer of this said he felt like to have these ones:
- Length
- Whether it's a file or anonymous memory
- The entire contents
- The clipped contents
- Whether it's opened or closed
- The access type
I think having entire contents is infeasible, but others seem to be reasonable.
|
Modules/mmapmodule.c
Outdated
default: | ||
// should not get here | ||
assert(0); | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we get a new Py_UNREACHABLE
macro, we could use it here, no need to return.
Modules/mmapmodule.c
Outdated
Py_ssize_t pos = m_obj->pos; | ||
|
||
PyObject *length = PyLong_FromSize_t(size); | ||
PyObject *offset = PyLong_FromSize_t(pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C APIs could fail. You need to check here and if they fail, let the function returns and propagates exceptions. And PyUnicode_FromFormat
directly support format characters for these basic types, do we still need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we don't really need this. I used PyObject
for arbitrary precision, but it seems that mmap_object
supports only long integer size. Thus I use %lu
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos
and size
are both of type Py_ssize_t
, you don't need to care what actually precision it is for represention case, the format function will take care of it. %lu
is not the exact format here, checkout https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_FromFormat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for this!
|
||
repr = PyUnicode_FromFormat(reprfmt, tp_name, fileno, access_str, | ||
length, offset); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no difference between these. I'd suggest make closed
a local variable and the logic could be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference, that when data
is not available, getting offset
could be insecure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get it. Why could it be insecure? After new_mmap_object
they are all initialized right? What I am missing here? And if they are secure, I don't think we need to distingiush them between closed and open status, just show whatever they are. One more thing, why leave out the actual offset
property and make pos
the offset
? It's confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I'd say sorry for offset
should be pos
.
At the first beginning I mean if mmap
gets closed, the data
of mmap_object
has already been NULL
. In this situation why should we allow uers to get a few outdated members like pos
or size
? I know people might feel like to ask a closed mmap
for the historical records so I think you're right in that case, but in fact, the tell
method and size
method just raise exception PyErr_SetString(PyExc_ValueError, "mmap closed or invalid")
:
- tell: https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L567
- size: https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L417
So there's no way to get size
and pos
from a mmap
object when it has been closed, why should we break this in just repr
method? That's also what now motivates me to still distinguish closed status from the open one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I know why I once used offset
now..
It's the argument of mmap constructor: https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1058
Also, length
is used there instead of size
.
I think we should do more to make it consistent in both low level(C data structures) and high level(interfaces for Python).
It might be better to change size
to length
and consist using offset
via
- https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1144
- https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1133
The wording seems to be a thing, e.t.c, in previous codes there're already phrases like length of mmap
, file size
or memory offset
:
https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1133-L1145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Some properties are meaningless when closed, we don't need them, I mean pos
. For others, I'd suggest just using the constructor arguments, length
is length
, pos
is pos
, offset
is offset
. Don't use size
please since it's confusing, it's stored internally as a caculated field but unforunately there is also a size()
method returns something different. As for fileno
, I am okay to omit it for now. Actually no pos
is also okay for me.
Modules/mmapmodule.c
Outdated
repr = PyUnicode_FromFormat(reprfmt, tp_name, fileno, access_str); | ||
} | ||
else { | ||
const char *data = m_obj->data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need.
Modules/mmapmodule.c
Outdated
|
||
|
||
#ifdef MS_WINDOWS | ||
int fileno = m_obj->file_handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. HANDLE
on windows is a pointer. Seems no official way to stringify HANDLE
, it's okay for me to omit it on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, fileno
is quite inaccurate here. However I've already found a use case that giving fileno
here as an integer will make sense, say, people might try to parse some repr string by regex, if we remove fileno
here, it can cause some incompatibility between Unix and Windows.
What deserves to be mentioned is, there's no way to get fileno
from mmap
regardless of we're in Windows or Unix, so I think it'll be appealing to remove fileno
here(although the author of this bpo-34953 has requested it and fileno
in Unix does have useful semantics).
…arify usage of 'pos', 'offset' and 'length')
Modules/mmapmodule.c
Outdated
PyObject *offset = PyLong_FromSize_t((Py_ssize_t)m_obj->offset); | ||
reprfmt = "<%s closed=False access=%s length=%R pos=%R offset=%R>"; | ||
repr = PyUnicode_FromFormat(reprfmt, tp_name, access_str, | ||
length, pos, offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be reprfmt = "<%s closed=False, pos=%zd, length=%zd, offset=%ld/%lld, access=%s>"
, then PyUnicode_FromFormat(reprfmt, tp_name, pos, length, offset, access_str)
. You don't need to call the PyLong_FromSize_t
yourself. More, I suggest add comma between properties. And be careful to decide which format character offset
uses, you'd better don't cast anyway, you need to decide using macros, check new_mmap_object
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but for the ,
separator, I'm not sure yet. In fact there're both styles(<a b c>
like fileio and <a, b, c>
) in repr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For offset
, I do want to use PyLong_FromOff_t
because off_t
type is a bit tricky in cross-platform issues.
If I include another header file for only one function, it's much too cumbersome; but if I don't, I have to do what that've been done in _iomodule.h once again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comma it's just preference, I prefer it. There is no PyLong_FromOff_t
and don't include _iomodule.h, usually we don't include dependency between modules. Use macros to determine the right format character, refer new_mmap_object please. Don't call PyLong_From*
yourself, just use PyUnicode_FromFormat
. Check my previous comment please.
Modules/mmapmodule.c
Outdated
#endif | ||
{ | ||
reprfmt = "<%s closed=True access=%s>"; | ||
repr = PyUnicode_FromFormat(reprfmt, tp_name, access_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for closed one, I think offset
and length
could still be displayed, they are passed in by users and not gonna be changed. We could tell more info then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact they could get changed: https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L557
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter. Once it's closed, you can't do more operations on it. Just show what they are when they are closed.
Modules/mmapmodule.c
Outdated
@@ -712,6 +766,7 @@ static struct PyMethodDef mmap_object_methods[] = { | |||
{"write_byte", (PyCFunction) mmap_write_byte_method, METH_VARARGS}, | |||
{"__enter__", (PyCFunction) mmap__enter__method, METH_NOARGS}, | |||
{"__exit__", (PyCFunction) mmap__exit__method, METH_VARARGS}, | |||
{"__repr__", (PyCFunction) mmap__repr__method, METH_NOARGS}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being added here? tp_repr
should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just got it after doing some search. Will remove this.
Add pretty
__repr__
tommap.mmap
and its subtypes.If the contents are too long, only the first
5032 and the last5032 would be presented:https://bugs.python.org/issue34953