Skip to content

Fix custom write function segfault#32

Merged
freedmand merged 7 commits into
washingtonpost:developfrom
actblue:develop
Aug 1, 2022
Merged

Fix custom write function segfault#32
freedmand merged 7 commits into
washingtonpost:developfrom
actblue:develop

Conversation

@james-clemer-actblue

Copy link
Copy Markdown
Contributor

This is just a PR-back of this PR in our fork

Below is the original PR description reproduced for convenience.

One caveat is, the git history here isn't as clean as might be hoped for in this repository. If there are any org-wide contribution guidelines or implicit norms that this PR breaks, let me know. I'm happy to edit the PR, or else reproduce it as a PR from a different fork &c.

Addresses this issue.

We call client.py via parse_as_files_custom.

In turn parse_as_files_custom takes an argument, open_function, and passes it in to utils.provide_write_callback, which handles writing to various files and uses open_function to create new file handles for various files.

utils.provide_write_callback uses a type-factory from ctypes and deems it the CUSTOM_WRITE type.

CUSTOM_WRITE represents, on the C side of things, CustomWriteFunction.

CUSTOM_WRITE = CFUNCTYPE(None, c_char_p, c_char_p, POINTER(c_char), c_int)

...

typedef void (*CustomWriteFunction)(char *filename, char *extension, char *contents, int numBytes);

But! Calling any python CustomWriteFunction induces a SEGFAULT!

So, I wrote a CustomWriteFunction in C, and passed that in. This, and some heavy printf-debugging, led me to notice that contents is not a NULL-terminated char*. That made me wonder if somewhere along the way, there was machinery in ctypes that assumed it was, since NULL-terminated char*s are the defacto str type in C-land.

It turns out, in the docs for ctypes.c_char_p, we're told that c_char_p:

Represents the C char* datatype when it points to a zero-terminated string. For a general character pointer that may also point to binary data, POINTER(c_char) must be used. The constructor accepts an integer address, or a bytes object.

Doing so means that calls to:

#! python

import smart_open
from fastfec import *


if __name__ == "__main__":
    headers = {'headers': {'User-Agent': 'Mozilla/5.0'}}
    with smart_open.open(f'http://docquery.fec.gov/dcdev/posted/1606847.fec', 'rb', transport_params=headers) as f:
        with FastFEC() as fastfec:
            fastfec.parse_as_files(f, "some_output_directory", include_filing_id='1606847')

no longer SEGFAULT.

@freedmand

freedmand commented Aug 1, 2022

Copy link
Copy Markdown
Contributor

Thanks for the PR! This is a very important one that will solve some issues others have encountered as well. I just pushed a lot of changes to the release process for FastFEC that should make the GitHub actions checks pass. Could you pull/merge develop back into your branch @james-clemer-actblue? Once the GH actions checks pass, I will give it the go-ahead.

@freedmand freedmand left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works! Thanks again

@freedmand freedmand merged commit 57b8b3f into washingtonpost:develop Aug 1, 2022
@james-clemer-actblue

Copy link
Copy Markdown
Contributor Author

Awesome! For future reference, is "open an issue, investigate, open a PR" the best way to contribute? This seems like it went smoothly enough but if you have any processes that make it easier on you, I'm happy to respect them.

freedmand added a commit that referenced this pull request Aug 1, 2022
* Remove external deps (Curl) and refactor release process (#35)

* fix: remove curl and get pcre to work as natural dependency

* feat: readme updates, mappings test script

* feat: reusable github workflow revision

* fix: license includes BSD/PCRE, fix tests

* fix: don't rely on windows compress-archive

* fix: refactor release workflow, bump timeout to 10 mins

* fix: add input for pr-release workflow

* fix: refactor workflows to use nested with format

* feat: CLI tests

* chore: remove unneeded cli test comments

* docs: update docquery->filing urls, and secondary -> converted paper

* fix: trailing cli filing id bug

* fix a few oversights with v8.4 mappings (#31)

Co-authored-by: Dylan Freedman <freedmand@gmail.com>

* Fix custom write function segfault (#32)

* DENG-116 Change buffer size

* Revert "DENG-116 Change buffer size"

This reverts commit 0ae41c3.

* Use a POINTER(c_char) instead of a c_char_p for CUSTOM_WRITE contents

* Add a smoke test for filing 1606847

Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>

* fix: release to latest tag, bump to 0.1.0 (#36)

Co-authored-by: Chris Zubak-Skees <chriszs@gmail.com>
Co-authored-by: Evan Sonderegger <evan@rpy.xyz>
Co-authored-by: james-clemer-actblue <69259105+james-clemer-actblue@users.noreply.github.com>
Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>
@freedmand freedmand mentioned this pull request Aug 1, 2022
freedmand added a commit that referenced this pull request Aug 1, 2022
* v0.1.0 (#37)

* Remove external deps (Curl) and refactor release process (#35)

* fix: remove curl and get pcre to work as natural dependency

* feat: readme updates, mappings test script

* feat: reusable github workflow revision

* fix: license includes BSD/PCRE, fix tests

* fix: don't rely on windows compress-archive

* fix: refactor release workflow, bump timeout to 10 mins

* fix: add input for pr-release workflow

* fix: refactor workflows to use nested with format

* feat: CLI tests

* chore: remove unneeded cli test comments

* docs: update docquery->filing urls, and secondary -> converted paper

* fix: trailing cli filing id bug

* fix a few oversights with v8.4 mappings (#31)

Co-authored-by: Dylan Freedman <freedmand@gmail.com>

* Fix custom write function segfault (#32)

* DENG-116 Change buffer size

* Revert "DENG-116 Change buffer size"

This reverts commit 0ae41c3.

* Use a POINTER(c_char) instead of a c_char_p for CUSTOM_WRITE contents

* Add a smoke test for filing 1606847

Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>

* fix: release to latest tag, bump to 0.1.0 (#36)

Co-authored-by: Chris Zubak-Skees <chriszs@gmail.com>
Co-authored-by: Evan Sonderegger <evan@rpy.xyz>
Co-authored-by: james-clemer-actblue <69259105+james-clemer-actblue@users.noreply.github.com>
Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>

* fix: main release workflow bug

* feat: 0.1.1

* feat: use ref name to get current tag

* feat: version 0.1.3

Co-authored-by: Chris Zubak-Skees <chriszs@gmail.com>
Co-authored-by: Evan Sonderegger <evan@rpy.xyz>
Co-authored-by: james-clemer-actblue <69259105+james-clemer-actblue@users.noreply.github.com>
Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>
freedmand added a commit that referenced this pull request Aug 4, 2022
* Sync main->dev (#39)

* v0.1.0 (#37)

* Remove external deps (Curl) and refactor release process (#35)

* fix: remove curl and get pcre to work as natural dependency

* feat: readme updates, mappings test script

* feat: reusable github workflow revision

* fix: license includes BSD/PCRE, fix tests

* fix: don't rely on windows compress-archive

* fix: refactor release workflow, bump timeout to 10 mins

* fix: add input for pr-release workflow

* fix: refactor workflows to use nested with format

* feat: CLI tests

* chore: remove unneeded cli test comments

* docs: update docquery->filing urls, and secondary -> converted paper

* fix: trailing cli filing id bug

* fix a few oversights with v8.4 mappings (#31)

Co-authored-by: Dylan Freedman <freedmand@gmail.com>

* Fix custom write function segfault (#32)

* DENG-116 Change buffer size

* Revert "DENG-116 Change buffer size"

This reverts commit 0ae41c3.

* Use a POINTER(c_char) instead of a c_char_p for CUSTOM_WRITE contents

* Add a smoke test for filing 1606847

Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>

* fix: release to latest tag, bump to 0.1.0 (#36)

Co-authored-by: Chris Zubak-Skees <chriszs@gmail.com>
Co-authored-by: Evan Sonderegger <evan@rpy.xyz>
Co-authored-by: james-clemer-actblue <69259105+james-clemer-actblue@users.noreply.github.com>
Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>

* fix: main release workflow bug

* feat: 0.1.1

* feat: use ref name to get current tag

* feat: version 0.1.3

Co-authored-by: Chris Zubak-Skees <chriszs@gmail.com>
Co-authored-by: Evan Sonderegger <evan@rpy.xyz>
Co-authored-by: james-clemer-actblue <69259105+james-clemer-actblue@users.noreply.github.com>
Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>

* fix: upload artifacts in release process (#40)

* fix: upload artifacts in release process

* fix: erroneous steps

* fix: run wheels on all platforms

* fix: add inputs prefix to if conditionals in actions

* feat: bump version to 0.1.4

* feat: GH actions matrix

* fix: add exe ext to actions map

* feat: remove apt-get update

* fix: give windows the .exe extension, not linux

* feat: matrix for wheels as well

* test: speed up test workflow

* feat: split test workflow into two parallel tracks

test c and test python

* fix: missing actions checkout step added

* feat: remove non-wheel artifacts for PyPI publish (#41)

* test: use uppercase secrets

* feat: refactor pypiToken as reusable workflow input

* fix: pass secrets properly

* fix: pass secrets properly

* test: pass secrets appropriately

* test: inherit secrets

* test: remove type from secrets

* test: remove secrets section

Co-authored-by: Chris Zubak-Skees <chriszs@gmail.com>
Co-authored-by: Evan Sonderegger <evan@rpy.xyz>
Co-authored-by: james-clemer-actblue <69259105+james-clemer-actblue@users.noreply.github.com>
Co-authored-by: crystaljg <70042918+crystaljg@users.noreply.github.com>
Co-authored-by: Crystal Gong <cgong@actbluetech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants