Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

akharche
Copy link
Contributor

@akharche akharche commented Apr 21, 2020

Extension for #801

  • Implementation of new DataFrame structure based on lists instead of tuples
  • Improved df.count() codegen for testing

Example:

df = pd.DataFrame({'A': [1,2,3], 'B': [.5, .6, .7], 'C': [4, 5, 6], 'D': ['a', 'b', 'c']})

(['A', 'B', 'C', 'D'],)
([array([1, 2, 3], dtype=int64), array([4, 5, 6], dtype=int64)], [array([0.5, 0.6, 0.7])], [array(['a', 'b', 'c'], dtype=object)])

Reproduce:

@njit
def run_df():
    df = pd.DataFrame({'A': [1,2,3], 'B': [.5, .6, .7], 'C': [4, 5, 6], 'D': ['a', 'b', 'c']})

    print(df._columns)
    print(df._data)

    return df.count()

if col_typ not in data_typs_map:
data_typs_map[col_typ] = (type_id, [col_id])
# The first column in each type always has 0 index
df_structure[col_name] = (type_id, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we could use named tuple?

self.df_structure = df_structure
super(DataFrameType, self).__init__(
name="dataframe({}, {}, {}, {})".format(data, index, columns, has_parent))
name="dataframe({}, {}, {}, {}, {})".format(data, index, columns, has_parent, df_structure))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want structure to be part of type name?

('data', types.Tuple([types.List(typ) for typ in df_types])),
('index', fe_type.index),
('columns', types.UniTuple(string_type, n_cols)),
('columns', types.UniTuple(types.List(string_type), 1)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just list?

('columns', types.UniTuple(string_type, n_cols)),
('columns', types.UniTuple(types.List(string_type), 1)),
('parent', types.pyobject),
('df_structure', types.pyobject),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it here?

@akharche
Copy link
Contributor Author

Duplicate of #817

@akharche akharche closed this Apr 23, 2020
@akharche akharche deleted the change_df_structure branch April 23, 2020 14:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants