You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Internally, reading should use the read_csv function from pandas. A few things should be hardcoded by default
Use the sep and encoding options from init
Pandas should not detect an index column from the data
Pandas should not try to infer datetime formats (or cast them to np.datetime objects). Any datetime column should be left as a dtype 'object'
Pandas should not error if there is a badly formatted line. We should just raise a warning and read the remaining lines.
After reading the data, we should use it to infer a MultiTableMetadata object. (Even if there is only 1 CSV file, we should still create a MultiTableMetadata object.)
(required) folder_name: The name of the folder that contains the CSV files, can include the entire path to the folder
file_names: A list of file names inside the folder to read
(default) None: Read all files in the folder that end with ".csv"
list(str): Only files with these names will be read into Python
Returns
data: A dictionary mapping each table name to a pandas DataFrame with the data. The table name is the same as the file name (excluding the '.csv' suffix)
metadata: A MultiTableMetadata object that describes the data
write
Functionality
Internally, writing should use the to_csv function from pandas. A few things should be hardcoded by default
(required) synthetic_data: A dictionary that maps each table name to a pandas.DataFrame containing data from it
file_name_suffix: An optional suffix to add when writing each file
(default) None: Do not add a suffix. The file name will be the same as the table name with a ".csv" extension
string: Append the suffix after the table name. Eg. a suffix "_synthetic" will write a file as "TABLENAME_synthetic.csv"
mode: A string signaling which mode of writing to use
(default) 'x': Write to new files, raising errors if any existing files exist with the same name
'w': Write to new files, clearing any existing files that exist
'a': Append the new CSV rows to any existing files
Additional context
We will add a number of local file handlers for different file types (see Add ExcelHandler #1950). Therefore the implementation of this class should also add a base class.
Optionally, the init, read and write functions can include a subset of arguments that the corresponding pandas functions use
if both the read and write for pandas are the same for a parameter (eg. decimal), then put it in the init.
We can ignore most of these parameters. Only add ones that seem impactful
If there are some that the different file types have in common, consider adding to the Base.
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, I'd like an streamlined way to load my data and metadata from files so that I can get right to using SDV.
Expected behavior
sdv.io
subpackage, add a folder calledlocal
CSVHandler
__init__
Parameters
read
Functionality
Internally, reading should use the read_csv function from pandas. A few things should be hardcoded by default
Parameters
Returns
MultiTableMetadata
object that describes the datawrite
Functionality
Internally, writing should use the to_csv function from pandas. A few things should be hardcoded by default
Parameters
Additional context
The text was updated successfully, but these errors were encountered: