-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: read_sas, to_sas #4052
Comments
can you post a link to the format? and see if any converters have been writtenin python? obviously the idea would be to read the native format file |
so looks like simple binary read/write stuff....could be done....only other question is there any license issue with doing this? |
not sure. @benjello want to contact SAS and ask them? |
That technote is describing the XPORT format, which isn't the native binary format. I've never seen a published layout of the native format. Some people have partially reverse engineered it, but I've never seen a solution that could handle data sets with compression. |
@mtkni thanks had no idea. |
aside from using SAS to actually export (e.g. csv or whatever), is there aformat that one could save that provides some interoperbility (and is openish)? |
a 10 minute search suggests no but maybe someone else knows more. |
I am not a specialist much more a potential user in heavy need of such a tool. |
probably could do
|
I don't think 1) is a good idea export data in stata to xport format or csv format |
ok. just throwing it out there. i don't like calling out to other programs either, but this seems like it's going to be tough. i can implement the R code above...if u think that's a good idea...but it basically forces users to use that particular version of the format and if it ever changes we won't know until it breaks. |
i wouldn't be able to test to_sas though since i don't have sas |
I would be glad to test everything that would do the job. I have sas. |
I've spent some time on this in the past. These are my thoughts:
As much as I wish there was a good solution to this, and as much as I'd be willing to help build it, I just don't think there is. I've built read_sas using CSV as an intermediate format. Obviously, this requires a SAS license. It takes very few lines to implement, but most of those lines are specific to our SAS environment and are not well-portable. I will study the performance of XPORT vs CSV next week. If it's dramatically faster, then it may be worth the effort to implement. Even then, I'm not sure it's worth taking that on as part of the Pandas project. I would be interested in comments from other SAS users on that. Just my two cents. |
nice to hear from someone who tried to do this. FWIW i think it might be tough to beat CSV for speed, most of it is written C/Cython. |
Agreed. The new, fast CSV was a game changer. |
can sas export to HDF5? |
No, it can't. |
export in STATA format? |
No, and if it did it would be an expensive add-on module. SAS is pretty good at reading from databases (http://www.sas.com/resources/factsheet/sas-access-factsheet.pdf), although each database platform is a separate license. I haven't found it good at all at writing to databases (it can, but it's slow). Other than that, interoperability doesn't seem to be part of their business model. |
Oh wait, I may have spoken to soon. Apparently I can export to a stata file: http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#a003102702.htm Is STATA supported in Pandas? It would still requires a SAS license, but I can benchmark that versus CSV. |
There is a read_stata that will be available in te coming version but already available on github |
BTW, @mtkni I would be happy to look at the read_sas you implemented if you would share it ... |
Just to close the loop on this, exporting to STATA requires an add-on for which I'm not licensed, so I can't benchmark it. |
FYI, the XPT or transport format is a non-proprietary format that has no licensing issues and is the only format currently accepted by the Food and Drug Administration (FDA) for clinical trial data. Most pharmaceutical companies submit XPT format to the FDA. It would be nice to have a way to read these files in just like a csv file. |
@dramage1 Is "XPT" the same as "XPORT" above? |
Heyo - there's at least one Python package for reading XPT files - https://pypi.python.org/pypi/xport/0.1.0 |
Just what I needed, much appreciated. |
@dramage1 if you use this enough to want to write up a pandas wrapper for it, that could be a useful addition to pandas (depending on the stability of xport) |
@jreback I'm actually working on transcribing the SAScii (http://cran.r-project.org/web/packages/SAScii/index.html) package from R to Python. I would be happy to share the results if it is ok with the author of that package. I haven't done much development, though, so don't know much about sop's. I'm also not sure how compatible it would be with the library you mentioned. Perhaps just an add on with the option of ...format="sas"... or something along those lines. |
just saw this. https://pypi.python.org/pypi/sas7bdat/2.0.1. Even if this is pure-python (slower), that is ok to start. Better to have it able to read than not. cc @kshedden |
@jreback @kshedden I use extensively https://pypi.python.org/pypi/sas7bdat/2.0.1 It is slow but works well. |
So, I don't think the https://pypi.python.org/pypi/psid_py I have a few days during which I could work on incorporating the |
I am going to mark this for 0.17.1. The implementation for using sas7bdat is quite trivial. So should start with that. |
I used
There is some useful information hidden in the sas file that does not make it into the dataframe, though, such as the column labels. |
agreed. all that is really needed are:
|
In case you're curious, I revised the |
@jreback I'd want to change the API. So long as |
@selik what do you need to change? the user API is very simple actually, just |
@jreback Sounds good. Not sure what I might need to change internally, but the effort is more pleasant if there's more freedom. I'd say I'll get around to it soon, but looking back I suddenly realize it took me 3+ years from the first time I told myself I'd revise the XPORT reader. |
sure feel free to take a look around |
Just FYI, I added The conversion from Python floats to IBM-mainframe 64-bit floats seems to be working quite well, very rarely losing precision. At least when I round-trip from IEEE to IBM and back to IEEE. |
It would be really convenient to be able to at least import SAS tables into pandas dataframe. Is this planned ? Are they insurmountable issues ?
Thanks
The text was updated successfully, but these errors were encountered: