-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mach-O file format support #617
Conversation
Test cases for this module are failing when compiled for 32bit with -m32. Check this: https://travis-ci.org/VirusTotal/yara/jobs/208363567. Could you take a look at what's happening? |
I managed to fix 32-bit build error. I will need help with rest of the build issues as I do not have neither Windows nor OS X machine. |
Conflicts: Makefile.am libyara/Makefile.am tests/blob.h
with friend's help I managed to fix rest of the build issues on Windows and OS X. Module should be ready for review. |
libyara/modules/macho.c
Outdated
if (is_undefined(module, "nfat_arch")) | ||
return_integer(UNDEFINED); | ||
|
||
for (uint64_t i = 0; i < nfat; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to use a 64bit iterator here, the fat counter is ut32, also is better to use i++, because preincrement doesn't gives any benefit here
libyara/modules/macho.c
Outdated
int64_t type_arg = integer_argument(1); | ||
int64_t subtype_arg = integer_argument(2); | ||
|
||
uint64_t nfat = get_integer(module, "nfat_arch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nfat is uint32_t, there's no need to use 64bit integer here
libyara/modules/macho.c
Outdated
size_t offset = yr_be32toh(archs[i].offset); | ||
if (size < offset) | ||
{ | ||
st->offsets = (size_t*)yr_realloc(st->offsets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont do realloc over the same pointer, use a temporary intermediate one and check if it's returning null. to avoid memory leaks
libyara/modules/macho.c
Outdated
set_integer(yr_be32toh(header->nfat_arch), object, "nfat_arch"); | ||
|
||
fat_arch_t* archs = (fat_arch_t*)(header + 1); | ||
for (size_t i = 0; i < yr_be32toh(header->nfat_arch); ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont preincrement
libyara/modules/macho.c
Outdated
set_integer(yr_be32toh(header->nfat_arch), object, "nfat_arch"); | ||
|
||
fat_arch_t* archs = (fat_arch_t*)(header + 1); | ||
for (size_t i = 0; i < yr_be32toh(header->nfat_arch); ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't call yr_be32toh on every iteration, this field is not suposed to change during the loop
libyara/modules/macho.c
Outdated
|
||
// All data in Mach-O fat binary header(s) is in big-endian byte order. | ||
|
||
fat_header_t* header = (fat_header_t*)data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const
libyara/modules/macho.c
Outdated
YR_OBJECT* object, | ||
YR_SCAN_CONTEXT* context) | ||
{ | ||
uint8_t* magic = (uint8_t*)data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i cant see any boundary checks, without trying this pr, it feels like this code will crash when parsing truncated files
libyara/modules/macho.c
Outdated
|
||
// Get length of string with limit (some strings are terminated by size limit). | ||
|
||
size_t macho_limited_strlen(char* string, size_t limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's already in libc and it's called strnlen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but as long as it's from ISO C90, microsoft didn't had time yet to include it in their toolchains, so this strnlen will not work for mingw/msvc builds. But libc version will be more optimal, so it will be good to use it if available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input string is const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with strnlen - seems OK according to build tests.
libyara/modules/macho.c
Outdated
|
||
// Get length of string with limit (some strings are terminated by size limit). | ||
|
||
size_t macho_limited_strlen(char* string, size_t limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function should be static, it's not used outside this .c file
|
||
// Convert virtual address to file offset. Segments have to be already loaded. | ||
|
||
int macho_rva_to_offset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdbool is available on all compilers, maybe it will be good to use bool
and true
/ false
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this. It is not used anywhere in the project. Maybe I would prefer consistency here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a good code practice, but consistency across the project makes sense. this change should be proposed across the whole project, so it fits better in a separate PR.
libyara/modules/macho.c
Outdated
for (uint64_t i = 0; i < segment_count; ++i) | ||
{ | ||
uint64_t start = get_integer(object, "segments[%i].vmaddr", i); | ||
uint64_t end = start + get_integer(object, "segments[%i].vmsize", i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end can be < start because get_integer returns a signed integer.
libyara/modules/macho.c
Outdated
) | ||
{ | ||
uint64_t segment_count = get_integer(object, "number_of_segments"); | ||
for (uint64_t i = 0; i < segment_count; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont preincrement unless strictly necessary
libyara/modules/macho.c
Outdated
YR_OBJECT* object | ||
) | ||
{ | ||
uint64_t segment_count = get_integer(object, "number_of_segments"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_integer returns a signed integer, casting it blindly to unsigned can result on some unexpected behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this problem even with exact-wide types (has to be exactly wide and in 2's complement)? If so, how to go around this? Via pointers - *(uint64_t*)&value?
Because I need it to be unsigned (e.g. offset in file bigger that 2^32 - unlikely but possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you tested this code with any mach0 kernel or module which use 64bit addresses to see if it behaves correctly? probably an & -1LL would remove that undefined behaviour
int64_t type = get_integer(module, "fat_arch[%i].cputype", i); | ||
if (type == type_arg) | ||
{ | ||
uint64_t file_offset = get_integer(module, "fat_arch[%i].offset", i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implicit sign cast
libyara/modules/macho.c
Outdated
|
||
foreach_memory_block(iterator, block) | ||
{ | ||
void* block_data = block->fetch_data(block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch_data should return the amount of data left
libyara/modules/macho.c
Outdated
YR_MEMORY_BLOCK_ITERATOR* iterator = context->iterator; | ||
|
||
// Prepare storage. | ||
FAT_DATA* st = (FAT_DATA*)yr_malloc(sizeof(FAT_DATA)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if malloc returns null
libyara/modules/macho.c
Outdated
void* file_data = (uint8_t*)block_data + offset; | ||
YR_OBJECT* object = get_object(module_object, "file[%i]", | ||
st->file_count); | ||
macho_parse_file(file_data, object, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macho_parse_file knows the file_data pointer, but doesn't knows the size of it
Looks good to merge for me! cc @plusvic |
looks good to me for merging! |
@plusvic ? there are tons of prs that lgtm and are green for a long time, anything missing to merge? |
I just need to make some time for reviewing the PR, I did take a quick look
and there's something I miss, which is making the inclusion of the
"macho"module configurable at build time, just like "dotnet". The idea is
that new modules are off by default and once they are tested enough
in the wild they are enabled by default.
…On Fri, Aug 11, 2017 at 6:28 PM, Sergi Àlvarez i Capilla < ***@***.***> wrote:
@plusvic <https://github.com/plusvic> ? there are tons of prs that lgtm
and are green for a long time, anything missing to merge?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#617 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AALKmZsxGv94mjYLaR5x3nHu4_F5oBaYks5sXIExgaJpZM4MUuJ8>
.
|
i think that modules should change to be correct. there’s no sense to modify a project just to include an extension, those should be self contained in separate directories, without having to patch makefiles and they should be loaded dynamically too. it’s pretty anoying to have different forks of yara just for each seaprate module just because its not included in master or in release or just because of testing.
the only reason for not supporting dynamically loaded modules is performance, but that shouldnt be a problem if people just use statically linked modules like they do right now. but this is probably a topic to discuss in a different thread not here
… On 11 Aug 2017, at 18:39, Victor M. Alvarez ***@***.***> wrote:
I just need to make some time for reviewing the PR, I did take a quick look
and there's something I miss, which is making the inclusion of the
"macho"module configurable at build time, just like "dotnet". The idea is
that new modules are off by default with, and once they are tested enough
in the wild they are enabled by default.
On Fri, Aug 11, 2017 at 6:28 PM, Sergi Àlvarez i Capilla <
***@***.***> wrote:
> @plusvic <https://github.com/plusvic> ? there are tons of prs that lgtm
> and are green for a long time, anything missing to merge?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#617 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AALKmZsxGv94mjYLaR5x3nHu4_F5oBaYks5sXIExgaJpZM4MUuJ8>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#617 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3-lqh8VgvAEVNNpzbSo2r3ugGs3Bxxks5sXIO8gaJpZM4MUuJ8>.
|
I'm not sure that dynamically loadable modules are worth the effort, at
least for the typical use case, and by dynamically loadable modules I mean
a module in compiled form that yara can load on the fly, a dynamic library
in essence.
Having the modules in separate self-contained directories would be helpful
indeed, because that way you can have your module in a separate repository
and use git submodules to include it into yara's repo.
…On Fri, Aug 11, 2017 at 6:42 PM, radare ***@***.***> wrote:
i think that modules should change to be correct. there’s no sense to
modify a project just to include an extension, those should be self
contained in separate directories, without having to patch makefiles and
they should be loaded dynamically too. it’s pretty anoying to have
different forks of yara just for each seaprate module just because its not
included in master or in release or just because of testing.
the only reason for not supporting dynamically loaded modules is
performance, but that shouldnt be a problem if people just use statically
linked modules like they do right now. but this is probably a topic to
discuss in a different thread not here
> On 11 Aug 2017, at 18:39, Victor M. Alvarez ***@***.***>
wrote:
>
> I just need to make some time for reviewing the PR, I did take a quick
look
> and there's something I miss, which is making the inclusion of the
> "macho"module configurable at build time, just like "dotnet". The idea is
> that new modules are off by default with, and once they are tested enough
> in the wild they are enabled by default.
>
> On Fri, Aug 11, 2017 at 6:28 PM, Sergi Àlvarez i Capilla <
> ***@***.***> wrote:
>
> > @plusvic <https://github.com/plusvic> ? there are tons of prs that
lgtm
> > and are green for a long time, anything missing to merge?
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <#617 (comment)>,
or mute
> > the thread
> > <https://github.com/notifications/unsubscribe-auth/
AALKmZsxGv94mjYLaR5x3nHu4_F5oBaYks5sXIExgaJpZM4MUuJ8>
> > .
> >
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub <https://github.com/
VirusTotal/yara#617#issuecomment-321861133>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AA3-
lqh8VgvAEVNNpzbSo2r3ugGs3Bxxks5sXIO8gaJpZM4MUuJ8>.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#617 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AALKmWOvn8uDJBbB5KCla9-lvzGQ6pxdks5sXIR_gaJpZM4MUuJ8>
.
|
looks green for travis, but there are still comments from my review that havent been fixed. and cant be merged because of conflicts, pls review. PRs shouldnt take ages to be merged :(( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments. My biggest concern with this PR so far is that the parsing is done assuming the correctness of the parsed file, no efforts are made in order to make sure that it won't crash with malformed files.
libyara/modules/macho.c
Outdated
|
||
uint32_t* magic = (uint32_t*)block_data; | ||
// Parse classic Mach-O file. | ||
if (macho_is_file_block(magic)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macho_is_file_block
dereferences a pointer to uint32_t
without checking if the buffer size is at least sizeof(uint32_t)
libyara/modules/macho.c
Outdated
\ | ||
uint64_t num_sg = 0; \ | ||
uint8_t *command = (uint8_t*)(header + 1); \ | ||
for (unsigned i = 0; i < yr_##bo##32toh(header->ncmds); i++) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sanity checks for preventing out-of-bounds reads. If header->ncmds
is altered or the file is truncated after the header this will segfault.
libyara/modules/macho.c
Outdated
size_t offset = yr_be##bits##toh(archs[i].offset); \ | ||
if (size < offset) \ | ||
{ \ | ||
st->offsets = (size_t*)yr_realloc(st->offsets, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yr_realloc
could return NULL.
libyara/modules/macho.c
Outdated
\ | ||
/* Get specific Mach-O file data. */ \ | ||
void* file_data = (uint8_t*)data + offset; \ | ||
macho_parse_file(file_data, size, get_object(object, "file[%i]", i), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blindly assume that offset
and data
are correct and will lay inside the buffer, but these values come from the file header and should be sanitized.
libyara/modules/macho.c
Outdated
} | ||
|
||
// Parse Mach-O fat binary. | ||
if (macho_is_fat_file_block(magic)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here, we don't know if the buffer is at least sizeof(uint32_t)
.
libyara/modules/macho.c
Outdated
set_integer(yr_##bo##32toh(sg->flags), \ | ||
object, "segments[%i].flags", i); \ | ||
\ | ||
for (unsigned j = 0; j < yr_##bo##32toh(sg->nsects); ++j) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sanity checks for preventing out-of-bounds reads. If sg->nsects
is altered or the file is truncated after the header this will segfault.
libyara/modules/macho.c
Outdated
|
||
declare_function("file_index", "i", "i", file_index_type); | ||
declare_function("file_index", "ii", "i", file_index_subtype); | ||
declare_function("ep_for_arch", "i", "i", entry_point_for_arch_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this entry_point_for_arch
for consistency with entry_point
.
libyara/modules/macho.c
Outdated
} | ||
|
||
// Parse previously out of block files. | ||
for (size_t i = 0; i < st->count; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not actually necessary. You can safely assume that a Mach-O file will come in a single block, no matter how big it is. Multiple blocks are mainly for scanning process memory, where each block represents memory region.
I was working on other task at work and it is middle of midterm exams season too, I will take look at it asap - probably end of this week or next one. |
Conflicts: Makefile.am libyara/Makefile.am
This is not necessary as multiple memory blocks are used only in process scans.
Most problems should be fixed. There are two problems remaining:
|
i think its ok to merge as it is, other modules got merged with much less review and those tiny issues only affect when working with kernel mach0s and can be fixed after that with proper tests |
Coverity has found 8 issues with this PR:
|
Along with the coverity findings mentioned above, this PR unintentionally reverts some changes to |
I'll review the changes in Besides the issues found by Coverity, ASAN found a heap overflow, it may be related, maybe not, I didn't look into the details. I can share the poc file.
|
In 300374f I've excluded the |
New module adds support for Mach-O files (binary format for OS X, iOS, watchOS and tvOS). Module can load basic header (architecture information, file type...), segments, sections and entry point information. Module also supports fat Mach-O files (archive for Mach-O files for different architectures). Tests are included. Documentation is not included, but I can write it when module is eventually accepted.