You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I convert pyspark dataframe to two columns: one for feature column, it's a dense vector, and another is a label column. When I transform to tensorflow dataset using make_spark_converter, it raised an error:
/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py:28: FutureWarning: pyarrow.LocalFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.LocalFileSystem instead.
from pyarrow import LocalFileSystem
/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/hdfs/namenode.py:270: FutureWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
return pyarrow.hdfs.connect(hostname, url.port or 8020, **kwargs)
Traceback (most recent call last):
File "/mytest/tf_with_spark.py", line 381, in <module>
train_test()
File "/mytest/tf_with_spark.py", line 345, in train_test
converter = make_spark_converter(train_transformed_sdf)
File "/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py", line 696, in make_spark_converter
df, parent_cache_dir_url, parquet_row_group_size_bytes, compression_codec, dtype)
File "/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py", line 512, in _cache_df_or_retrieve_cache_data_url
compression_codec, dtype)
File "/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py", line 436, in create_cached_dataframe_meta
dtype=dtype)
File "/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py", line 579, in _materialize_df
df = _convert_vector(df, dtype)
File "/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py", line 558, in _convert_vector
vector_to_array(df[col_name], dtype))
File "/mnt/softwares/hvd_env/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py", line 40, in vector_to_array
raise RuntimeError("Vector columns are only supported in pyspark>=3.0")
RuntimeError: Vector columns are only supported in pyspark>=3.0
Does it not support pyspark < 3.0? But in the setup.py file I see it required 'pyspark>=2.1.0'. How to salve this problem?
The text was updated successfully, but these errors were encountered:
I convert pyspark dataframe to two columns: one for feature column, it's a dense vector, and another is a label column. When I transform to tensorflow dataset using
make_spark_converter
, it raised an error:Does it not support pyspark < 3.0? But in the
setup.py
file I see it required 'pyspark>=2.1.0'. How to salve this problem?The text was updated successfully, but these errors were encountered: