-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow accessing the entire row of selected values in gr.DataFrame
#9128
Conversation
🪼 branch checks and previews
Install Gradio from this PR pip install https://gradio-pypi-previews.s3.amazonaws.com/83f33c80b5d4144aaba33f20340c84368da2d5d5/gradio-4.41.0-py3-none-any.whl Install Gradio Python Client from this PR pip install "gradio-client @ git+https://github.com/gradio-app/gradio@83f33c80b5d4144aaba33f20340c84368da2d5d5#subdirectory=client/python" Install Gradio JS Client from this PR npm install https://gradio-npm-previews.s3.amazonaws.com/83f33c80b5d4144aaba33f20340c84368da2d5d5/gradio-client-1.5.0.tgz |
🦄 change detectedThis Pull Request includes changes to the following packages.
With the following changelog entry.
Maintainers or the PR author can modify the PR title to modify this entry.
|
@abidlabs Based on my testing, works great! Thanks! |
@abidlabs - seeing this weird behavior when sorting more than once - I think the https://www.loom.com/share/f8e799ef89c24e578979cf4e7703ccd9?sid=45e1e768-a361-4991-bad4-667684df436b I would be more in favor of passing the entire row value as Code import gradio as gr
import pandas as pd
from pathlib import Path
abs_path = Path(__file__).parent.absolute()
df = pd.read_json(str(abs_path / "assets/leaderboard_data.json"))
invisible_df = df.copy()
COLS = [
"T",
"Model",
"Average ⬆️",
"ARC",
"HellaSwag",
"MMLU",
"TruthfulQA",
"Winogrande",
"GSM8K",
"Type",
"Architecture",
"Precision",
"Merged",
"Hub License",
"#Params (B)",
"Hub ❤️",
"Model sha",
"model_name_for_query",
]
ON_LOAD_COLS = [
"T",
"Model",
"Average ⬆️",
"ARC",
"HellaSwag",
"MMLU",
"TruthfulQA",
"Winogrande",
"GSM8K",
"model_name_for_query",
]
TYPES = [
"str",
"markdown",
"number",
"number",
"number",
"number",
"number",
"number",
"number",
"str",
"str",
"str",
"str",
"bool",
"str",
"number",
"number",
"bool",
"str",
"bool",
"bool",
"str",
]
NUMERIC_INTERVALS = {
"?": pd.Interval(-1, 0, closed="right"),
"~1.5": pd.Interval(0, 2, closed="right"),
"~3": pd.Interval(2, 4, closed="right"),
"~7": pd.Interval(4, 9, closed="right"),
"~13": pd.Interval(9, 20, closed="right"),
"~35": pd.Interval(20, 45, closed="right"),
"~60": pd.Interval(45, 70, closed="right"),
"70+": pd.Interval(70, 10000, closed="right"),
}
MODEL_TYPE = [str(s) for s in df["T"].unique()]
Precision = [str(s) for s in df["Precision"].unique()]
# Searching and filtering
def update_table(
hidden_df: pd.DataFrame,
columns: list,
type_query: list,
precision_query: str,
size_query: list,
query: str,
):
filtered_df = filter_models(hidden_df, type_query, size_query, precision_query) # type: ignore
filtered_df = filter_queries(query, filtered_df)
df = select_columns(filtered_df, columns)
return df
def search_table(df: pd.DataFrame, query: str) -> pd.DataFrame:
return df[(df["model_name_for_query"].str.contains(query, case=False))] # type: ignore
def select_columns(df: pd.DataFrame, columns: list) -> pd.DataFrame:
# We use COLS to maintain sorting
filtered_df = df[[c for c in COLS if c in df.columns and c in columns]]
return filtered_df # type: ignore
def filter_queries(query: str, filtered_df: pd.DataFrame) -> pd.DataFrame:
final_df = []
if query != "":
queries = [q.strip() for q in query.split(";")]
for _q in queries:
_q = _q.strip()
if _q != "":
temp_filtered_df = search_table(filtered_df, _q)
if len(temp_filtered_df) > 0:
final_df.append(temp_filtered_df)
if len(final_df) > 0:
filtered_df = pd.concat(final_df)
filtered_df = filtered_df.drop_duplicates( # type: ignore
subset=["Model", "Precision", "Model sha"]
)
return filtered_df
def filter_models(
df: pd.DataFrame,
type_query: list,
size_query: list,
precision_query: list,
) -> pd.DataFrame:
# Show all models
filtered_df = df
type_emoji = [t[0] for t in type_query]
filtered_df = filtered_df.loc[df["T"].isin(type_emoji)]
filtered_df = filtered_df.loc[df["Precision"].isin(precision_query + ["None"])]
numeric_interval = pd.IntervalIndex(
sorted([NUMERIC_INTERVALS[s] for s in size_query]) # type: ignore
)
params_column = pd.to_numeric(df["#Params (B)"], errors="coerce")
mask = params_column.apply(lambda x: any(numeric_interval.contains(x))) # type: ignore
filtered_df = filtered_df.loc[mask]
return filtered_df
demo = gr.Blocks(css=str(abs_path / "assets/leaderboard_data.json"))
with demo:
gr.Markdown("""Test Space of the LLM Leaderboard""", elem_classes="markdown-text")
with gr.Tabs(elem_classes="tab-buttons") as tabs:
with gr.TabItem("🏅 LLM Benchmark", elem_id="llm-benchmark-tab-table", id=0):
with gr.Row():
with gr.Column():
with gr.Row():
search_bar = gr.Textbox(
placeholder=" 🔍 Search for your model (separate multiple queries with `;`) and press ENTER...",
show_label=False,
elem_id="search-bar",
)
with gr.Row():
shown_columns = gr.CheckboxGroup(
choices=COLS,
value=ON_LOAD_COLS,
label="Select columns to show",
elem_id="column-select",
interactive=True,
)
with gr.Column(min_width=320):
filter_columns_type = gr.CheckboxGroup(
label="Model types",
choices=MODEL_TYPE,
value=MODEL_TYPE,
interactive=True,
elem_id="filter-columns-type",
)
filter_columns_precision = gr.CheckboxGroup(
label="Precision",
choices=Precision,
value=Precision,
interactive=True,
elem_id="filter-columns-precision",
)
filter_columns_size = gr.CheckboxGroup(
label="Model sizes (in billions of parameters)",
choices=list(NUMERIC_INTERVALS.keys()),
value=list(NUMERIC_INTERVALS.keys()),
interactive=True,
elem_id="filter-columns-size",
)
selected_data = gr.Json()
leaderboard_table = gr.components.Dataframe(
value=df[ON_LOAD_COLS], # type: ignore
headers=ON_LOAD_COLS,
datatype=TYPES,
elem_id="leaderboard-table",
interactive=False,
visible=True,
)
# Dummy leaderboard for handling the case when the user uses backspace key
hidden_leaderboard_table_for_search = gr.components.Dataframe(
value=invisible_df[COLS], # type: ignore
headers=COLS,
datatype=TYPES,
visible=False,
)
search_bar.submit(
update_table,
[
hidden_leaderboard_table_for_search,
shown_columns,
filter_columns_type,
filter_columns_precision,
filter_columns_size,
search_bar,
],
leaderboard_table,
)
for selector in [
shown_columns,
filter_columns_type,
filter_columns_precision,
filter_columns_size,
]:
selector.change(
update_table,
[
hidden_leaderboard_table_for_search,
shown_columns,
filter_columns_type,
filter_columns_precision,
filter_columns_size,
search_bar,
],
leaderboard_table,
queue=True,
)
def select_data(data: gr.SelectData):
return {"index": data.index, "original_index": data.original_index,
"model_name": df.iloc[data.original_index[0]]['model_name_for_query'],
}
leaderboard_table.select(select_data, None, selected_data)
if __name__ == "__main__":
demo.launch() |
Can you explain what you mean by if you don't have access to the original dataframe? You can always get the value of the dataframe by passing it is as an input component. I like the idea of passing in the original indices over passing in a row in case we later introduce functionality that allows re-ordering the columns as well |
Yea that's true! |
I'll check to see what the issue is, thanks |
This does generally create an enormous payload tho, far from ideal imo. |
In cases where a user doesn't have access to the original dataframe yes, but otherwise its just sending the tuple of indices. If we were to send the entire row, it'd be a bigger payload than just sending the tuple in all cases, and it might not be future proof in case we introduce mechanisms to reorder columns at some point |
After thinking about it some more, I think that @freddyaboulton's suggestion of sending the updated row values makes more sense than sending the original index. Reason being that the the concept of "original index" is not well-defined for interactive dataframes (e.g. if a user has modified the value of a cell, or has inserted/deleted rows/columns, what should the original index reflect?). Whereas a @dwipper if you have access to the sorted |
@abidlabs It's interesting. In 4.41 it appears the index_num = evt.index[0] is returning the original index value. This would have worked for my use case when I created a shadow DF that had the original DF, and was looking up a unique reference id in that shadow DF. Since that didn't work, I moved the reference id into a column of the DF, and now the |
hmm @dwipper I can't think of any recent change to the dataframe that would have caused this. |
@abidlabs To clarify, on the df.select event, the index_num = evt.index[0] is providing the sorted row index. But when I use the df.ait(index_num, 2) function to try get a cell value in the row of sorted df, the lookup seems to reference the unsorted index values, and returns the cell value in the unsorted row. I'm thinking this isn't the expected/desired behavior? Running this modification your above code in 4.41 will show you the issue:
|
@abidlabs My bigger picture use case is this: I have a list of data in a database table. The dataframe shows a subset of the columns in the table. When the user clicks on a row in the dataframe, I need to go back to the table and get additional fields to display in the UI for the row the user clicked on. In order to do that, I have a hidden record_id column in the dataframe to get back to the record in the table. Due to the above behavior of the df.ait() function, if the dataframe is sorted, the incorrect record_id value is returned and the wrong record and data from the table is returned and displayed in the UI. So at least in my use case, the way the df.ait() works is the issue. In looking at the Pandas docs, I can't see how to access the sorted dataframe.....if there were a function evt.selected_row that returned all the cells in the selected row as a list, that would work for my use case. |
gr.DataFrame
gr.DataFrame
… df-original-index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works well @abidlabs ! Two comments:
- Should we send the current headers in a separate key? In the event columns are reordered.
- Tables are typically taller than they are wide. I'm a bit worried sending the entire column over is a bit too much data. Perhaps that can be made opt-in?
This fixes the given issues so will approve
Good points @freddyaboulton, for now I'll only send |
gr.DataFrame
gr.DataFrame
Thanks for the feedback everyone! |
@abidlabs The new function works great! Based on some related testing, I figured out that the root issue here is that when the gr.Dataframe gets sorted by the user by clicking on the column header arrow, that sort isn't reflected in the gr.Dataframe instance. So when the gr.Dataframe instance is passed to a function, it has the original index order, not the sorted order that is represented in the UI. This is apparently why, as mentioned above, the df.iat() function uses the original index order, not the sorted index order. Is this a bug or the expected behavior? |
I see, yes that is expected behavior. Sorting a DataFrame only changes the "view" in the UI, but it doesn't actually change the underlying value, so when you access the value in your Python function, you'll get the original dataframe |
@abidlabs Thanks for the clarification. Any way you can think around that, i.e. getting the sorted DF? Does a gr.Dataset() function differently? While getting the row the user clicked on is really helpful, in my app, I want the user to be able to scroll through the list with a VCR control (see following image). It works fine on the unsorted list, but if the user sorts the list, the scrolling doesn't work properly since it's based on the original index. |
-Fixed bug the arises from the unexpected behavior of gradio dataframes not passing an updated index when sorting by clicking column headers (gradio-app/gradio#9128) -Implemented a "send to generation tab" button from the history page.
-Fixed bug the arises from the unexpected behavior of gradio dataframes not passing an updated index when sorting by clicking column headers (gradio-app/gradio#9128) -Implemented a "send to generation tab" button from the history page.
Closes: #7601
Closes: #7127
This PR adds a
.row_value
parameters togr.SelectData
forgr.DataFrame
.Example: