Skip to content

bpo-34953: Implement mmap.mmap.__repr__ #9891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 17, 2019
Merged

Conversation

thautwarm
Copy link
Contributor

@thautwarm thautwarm commented Oct 15, 2018

Add pretty __repr__ to mmap.mmap and its subtypes.

import mmap
f = open('../a.txt', 'r+b')
mm = mmap.mmap(f.fileno(), 0)
print(mm) # =><mmap.mmap is_closed=False fileno=4 access=ACCESS_DEFAULT length=4 offset=0 entire_contents=b'\x00\xf0\x93S'>
mm.close()
print(mm) # => <mmap.mmap is_closed=True fileno=-1 access=ACCESS_DEFAULT>

If the contents are too long, only the first 50 32 and the last 50 32 would be presented:

... content=b'This is my' ... b'file.'>

https://bugs.python.org/issue34953

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@thautwarm thautwarm changed the title bpo-34953 bpo-34953 Implement mmap.mmap.__repr__ Oct 15, 2018
@zhangyangyu zhangyangyu changed the title bpo-34953 Implement mmap.mmap.__repr__ bpo-34953: Implement mmap.mmap.__repr__ Oct 16, 2018
@zhangyangyu zhangyangyu self-requested a review October 16, 2018 02:32

reprfmt = "<%s is_closed=True fileno=%d access=%s>";
repr = PyUnicode_FromFormat(reprfmt, tp_name, fd, access_str);

Copy link
Member

@zhangyangyu zhangyangyu Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the wrapped empty lines. And most C APIs could fail and we need to check it.

@zhangyangyu
Copy link
Member

Please add some tests.

@thautwarm
Copy link
Contributor Author

Okay, thanks. I'll remove wrapped empty lines and add some tests.

@csabella
Copy link
Contributor

@thautwarm Did you have questions about creating the tests for this?

@thautwarm
Copy link
Contributor Author

thautwarm commented Apr 10, 2019

@thautwarm Did you have questions about creating the tests for this?

Sorry, I'll have it out before the next week.

@thautwarm
Copy link
Contributor Author

Sorry for my late commits. May you please review my codes @zhangyangyu when your schedule is available?

break;

default:
PyErr_SetString(PyExc_IOError, "Unknown access mode for mmap object.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not possible to reach here IMHO. And IOError is not a right exception type here. I think no need to raise a exception here, a simple comment // should not get here, or an assertion assert(null), or access_str = "UNKNOWN" is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I'll insert an assert here.

}

Py_ssize_t size = m_obj->size;
const char *tp_name = self->ob_type->tp_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use Py_TYPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll assemble your notes and make a commit tonight.


repr = PyUnicode_FromFormat(reprfmt, tp_name, fileno, access_str,
length, offset, slice1, slice2);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this part is necessary. IMHO we should not show the content in repr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original proposer of this said he felt like to have these ones:

  • Length
  • Whether it's a file or anonymous memory
  • The entire contents
  • The clipped contents
  • Whether it's opened or closed
  • The access type

I think having entire contents is infeasible, but others seem to be reasonable.

@thautwarm
Copy link
Contributor Author

assert(false) passed my local compilation, but fail in azure pipeline.

default:
// should not get here
assert(0);
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we get a new Py_UNREACHABLE macro, we could use it here, no need to return.

Py_ssize_t pos = m_obj->pos;

PyObject *length = PyLong_FromSize_t(size);
PyObject *offset = PyLong_FromSize_t(pos);
Copy link
Member

@zhangyangyu zhangyangyu Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C APIs could fail. You need to check here and if they fail, let the function returns and propagates exceptions. And PyUnicode_FromFormat directly support format characters for these basic types, do we still need to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we don't really need this. I used PyObject for arbitrary precision, but it seems that mmap_object supports only long integer size. Thus I use %lu instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pos and size are both of type Py_ssize_t, you don't need to care what actually precision it is for represention case, the format function will take care of it. %lu is not the exact format here, checkout https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_FromFormat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for this!


repr = PyUnicode_FromFormat(reprfmt, tp_name, fileno, access_str,
length, offset);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no difference between these. I'd suggest make closed a local variable and the logic could be clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference, that when data is not available, getting offset could be insecure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get it. Why could it be insecure? After new_mmap_object they are all initialized right? What I am missing here? And if they are secure, I don't think we need to distingiush them between closed and open status, just show whatever they are. One more thing, why leave out the actual offset property and make pos the offset? It's confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I'd say sorry for offset should be pos.

At the first beginning I mean if mmap gets closed, the data of mmap_object has already been NULL. In this situation why should we allow uers to get a few outdated members like pos or size? I know people might feel like to ask a closed mmap for the historical records so I think you're right in that case, but in fact, the tell method and size method just raise exception PyErr_SetString(PyExc_ValueError, "mmap closed or invalid"):

So there's no way to get size and pos from a mmap object when it has been closed, why should we break this in just repr method? That's also what now motivates me to still distinguish closed status from the open one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I know why I once used offset now..
It's the argument of mmap constructor: https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1058

Also, length is used there instead of size.

I think we should do more to make it consistent in both low level(C data structures) and high level(interfaces for Python).
It might be better to change size to length and consist using offset via

The wording seems to be a thing, e.t.c, in previous codes there're already phrases like length of mmap, file size or memory offset:
https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1133-L1145

Copy link
Member

@zhangyangyu zhangyangyu Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Some properties are meaningless when closed, we don't need them, I mean pos. For others, I'd suggest just using the constructor arguments, length is length, pos is pos, offset is offset. Don't use size please since it's confusing, it's stored internally as a caculated field but unforunately there is also a size() method returns something different. As for fileno, I am okay to omit it for now. Actually no pos is also okay for me.

repr = PyUnicode_FromFormat(reprfmt, tp_name, fileno, access_str);
}
else {
const char *data = m_obj->data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need.



#ifdef MS_WINDOWS
int fileno = m_obj->file_handle;
Copy link
Member

@zhangyangyu zhangyangyu Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. HANDLE on windows is a pointer. Seems no official way to stringify HANDLE, it's okay for me to omit it on Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, fileno is quite inaccurate here. However I've already found a use case that giving fileno here as an integer will make sense, say, people might try to parse some repr string by regex, if we remove fileno here, it can cause some incompatibility between Unix and Windows.

What deserves to be mentioned is, there's no way to get fileno from mmap regardless of we're in Windows or Unix, so I think it'll be appealing to remove fileno here(although the author of this bpo-34953 has requested it and fileno in Unix does have useful semantics).

…arify usage of 'pos', 'offset' and 'length')
PyObject *offset = PyLong_FromSize_t((Py_ssize_t)m_obj->offset);
reprfmt = "<%s closed=False access=%s length=%R pos=%R offset=%R>";
repr = PyUnicode_FromFormat(reprfmt, tp_name, access_str,
length, pos, offset);
Copy link
Member

@zhangyangyu zhangyangyu May 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be reprfmt = "<%s closed=False, pos=%zd, length=%zd, offset=%ld/%lld, access=%s>", then PyUnicode_FromFormat(reprfmt, tp_name, pos, length, offset, access_str). You don't need to call the PyLong_FromSize_t yourself. More, I suggest add comma between properties. And be careful to decide which format character offset uses, you'd better don't cast anyway, you need to decide using macros, check new_mmap_object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but for the , separator, I'm not sure yet. In fact there're both styles(<a b c> like fileio and <a, b, c>) in repr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For offset, I do want to use PyLong_FromOff_t because off_t type is a bit tricky in cross-platform issues.
If I include another header file for only one function, it's much too cumbersome; but if I don't, I have to do what that've been done in _iomodule.h once again.

Copy link
Member

@zhangyangyu zhangyangyu May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comma it's just preference, I prefer it. There is no PyLong_FromOff_t and don't include _iomodule.h, usually we don't include dependency between modules. Use macros to determine the right format character, refer new_mmap_object please. Don't call PyLong_From* yourself, just use PyUnicode_FromFormat. Check my previous comment please.

#endif
{
reprfmt = "<%s closed=True access=%s>";
repr = PyUnicode_FromFormat(reprfmt, tp_name, access_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for closed one, I think offset and length could still be displayed, they are passed in by users and not gonna be changed. We could tell more info then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter. Once it's closed, you can't do more operations on it. Just show what they are when they are closed.

@@ -712,6 +766,7 @@ static struct PyMethodDef mmap_object_methods[] = {
{"write_byte", (PyCFunction) mmap_write_byte_method, METH_VARARGS},
{"__enter__", (PyCFunction) mmap__enter__method, METH_NOARGS},
{"__exit__", (PyCFunction) mmap__exit__method, METH_VARARGS},
{"__repr__", (PyCFunction) mmap__repr__method, METH_NOARGS},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added here? tp_repr should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just got it after doing some search. Will remove this.

@zhangyangyu zhangyangyu requested a review from jdemeyer October 13, 2019 17:19
@zhangyangyu zhangyangyu merged commit d8ca235 into python:master Oct 17, 2019
jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants