-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unpacking of filenames with contains UTF-8 characters #52
Conversation
Pull Request Test Coverage Report for Build 618426689Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
But does it mean that user code in callbacks (for example, functions on pub fn uncompress_data<R, W>(source: R, target: W) -> Result<usize>
where
R: Read,
W: Write, I think this is not ideal? Also, does it mean that libarchive functions are called with a different locale when reading or writing archive and when opening or closing archive? |
0f0bd54
to
49f8c01
Compare
Yes, @afdw. This was the best choice I have found up to now. Do you see another way to solve this? I'd love to have a second pair of eyes on the issue and a proposed alternative. |
I think what can be done is the following: on every transition of execution to libarchive, locale is changed to UTF-8; on every transition of execution from libarchive, locale is restored. This means the following:
One the ways to guarantee both of these is creating a module with the same API as the |
In general, not all functions need to take this into consideration. I am aware of Your idea isn't bad. In fact, it's a really clever idea. We found this error while unpacking a rootfs during an OTA update so we'll first confirm this current approach works and if it does, we can rework it as you said. |
By the way, I did not find how to do the same thing on Windows. Are you aware of some way to do it? |
a76136c
to
6a22b7d
Compare
6a22b7d
to
d4ad584
Compare
Restoring locale? Yes, after reading the Microsoft's documentation on
So, I think something like that should do the trick: #[cfg(windows)]
unsafe fn set_utf8_locale() -> std::ffi::CString {
let locale = b".UTF-8\0";
let old_locale = std::ffi::CStr::from_ptr(libc::setlocale(libc::LC_CTYPE, std::ptr::null())).to_owned();
libc::setlocale(
libc::LC_CTYPE,
std::ffi::CStr::from_bytes_with_nul_unchecked(locale).as_ptr(),
);
old_locale
}
#[cfg(windows)]
unsafe fn restore_locale(old_locale: std::ffi::CString) {
libc::setlocale(libc::LC_CTYPE, old_locale.as_ptr());
} |
233a1fd
to
560d9ee
Compare
@afdw mostly, yes. However, it seems we are getting a segfault. I tried using |
f995cf6
to
d4ad584
Compare
@afdw I tried: #[cfg(windows)]
unsafe fn set_utf8_locale() -> Option<ffi::CString> {
let locale = b".UTF-8\0";
let old_locale = {
let old_locale = libc::setlocale(libc::LC_CTYPE, ptr::null());
if old_locale.is_null() {
None
} else {
Some(ffi::CStr::from_ptr(old_locale).to_owned())
}
};
libc::setlocale(
libc::LC_CTYPE,
ffi::CStr::from_bytes_with_nul_unchecked(locale).as_ptr(),
);
old_locale
}
#[cfg(windows)]
unsafe fn restore_locale(old_locale: Option<ffi::CString>) {
if let Some(old_locale) = old_locale {
libc::setlocale(libc::LC_CTYPE, old_locale.as_ptr());
}
} However, it seems to segfault. |
@innobead, could you take a look at this? As you seem to be using the library in Windows, it'd be nice if you could help. |
0ab5b16
to
6191f16
Compare
I finally got it working. The code is much simpler, and we essentially rely on Windows to do the right thing (if possible, heh). I ended using: #[cfg(windows)]
unsafe fn set_utf8_locale() -> *mut libc::c_char {
let locale = b".UTF-8\0";
let old_locale = libc::setlocale(libc::LC_CTYPE, std::ptr::null());
libc::setlocale(libc::LC_CTYPE, locale.as_ptr() as *const libc::c_char);
old_locale
}
#[cfg(windows)]
unsafe fn restore_locale(old_locale: *mut libc::c_char) {
libc::setlocale(libc::LC_CTYPE, old_locale);
} We need to confirm this work on the embedded device before the merge. It was a long journey, but it seems we got it right after all. I want to thank @afdw for his nice idea of wrapping the FFI method and his help in figuring how to do the locale setting in Windows; really appreciated. |
6191f16
to
ee476e5
Compare
The reason why I did not write it initially like that was:
So it seems to me that But I am not sure if I understand the documentation correctly here. |
For me, Were you getting a segfault on |
So the code (#52 (comment)) is the closest to the correct approach I've tested.
Just on What puzzles me is it is flaky as 1 of 3 Windows runs has complete successfully, so it points to some concurrency or other indeterministic issue. |
Do you have a backtrace? If not, I will probably need to install a Windows VM to test it myself, maybe I will be able to say something (as I can not get any code version to run this test case on Wine, probably due to some unrelated issue). Another thing which is I think worth trying is wrapping every method in change-to-UTF-8/restore locale code, not just |
I don't. It is not shown in the CI job.
Indeed; this is worth checking. |
cda039d
to
cc89ccd
Compare
cc89ccd
to
83e54d3
Compare
@Jonathas-Conceicao if CI passes, please invert the order of patches so it updates the container version first. |
Signed-off-by: Jonathas-Conceicao <jonathas.conceicao@ossystems.com.br>
This commit is a preparation to a bigger change to wrapper few `ffi` calls. We are adding the generated bindings inside an inner `ffi::generated` module and re-exporting it as `ffi` so requiring no code changes to use it. This also change the symbols visibility to be pub(crate) as we do not intent people to call the `ffi` module directly. Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
Change from the C to C.UTF-8 locale, allowing libarchive to handle filenames in UTF-8. We restrict to change LC_CTYPE only, since libarchive only needs the charset set. See on libarchive Website for a more complete description of the issue: libarchive/libarchive#587 https://github.com/libarchive/libarchive/wiki/Filenames Once we complete the uncompress operation, we restore the original LC_CTYPE after extraction to avoid side effects. Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
f388ee1
to
aac7dbf
Compare
Change from the C to C.UTF-8 locale, allowing libarchive to handle
filenames in UTF-8. We restrict to change LC_CTYPE only, since
libarchive only needs the charset set.
See on libarchive Website for a more complete description of the issue:
libarchive/libarchive#587
https://github.com/libarchive/libarchive/wiki/Filenames
Once we complete the uncompress operation, we restore the original
LC_CTYPE after extraction to avoid side effects.
Signed-off-by: Otavio Salvador otavio@ossystems.com.br