Skip to content

Fixing the namespace issue #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sadavarterohit
Copy link

No description provided.

text <- paste(text, collapse = "")

# remove line breaks and trailing white space
text <- trimws(text)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you took out removing the line breaks? (It may not be necessary to remove them but I haven't fully tested.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apolgize, I responded to another question instead of this about why I did this.

While I was testing on this example out of the readme, I accidentally selected #Before as well, and that put the mtcars variable in the comment, the line breaking did not persist some of the line breaks that already exist.
Since we're unlisting, breaking and putting the strings back together based on the operators in between, this didn't change anything. (atleast in the cases I tested.)

Removing the breaklines solved the issue like a charm, since the line break simply never got removed from the end of the comment.

# BEFORE
mtcars |> group_by(cyl) |> summarize(mean_mpg = mean(mpg)) |> ggplot(aes(factor(cyl),
mean_mpg)) + theme_bw(14) + geom_col() + labs(title = "mtcars", x = "number of cylinders", y = "average miles per gallon")

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds right... I will do a little more testing as soon as I get a chance.

text <- gsub("\\s*\\+\\s*\\+\\s*", " \\+ ", text)

# remove + before split words
splitwords <- c("\\s\\w*:?:?geom_","stat_", "coord_", "facet_", "scale_", "xlim\\(",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the regex here partially works -- the problem is that the splitwords are also used for the token column later on for the purpose of sorting. We'll need to think about this. A minor issue: \\s shouldn't be there because it's not necessary for there to be a space before a geom.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I did make that assumption in most of the code I tested.
Let me try to come up with a different regex to solve this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will work to keep "geom_" as a split word and add "\w*::geom_" as an additional one. Then in the section where df$token is created, remove \w*:: from the namespaced tokens so that all the tokens will be recognized as factor levels and the sort can be performed correctly.

@jtr13
Copy link
Owner

jtr13 commented Nov 7, 2024

Thanks for tackling this issue... it's a good start! Please see the comments above. Here's an example that I tested it on:

mtcars |> group_by(cyl) |> summarize(mean_mpg = mean(mpg)) |> ggplot(aes(factor(cyl),
mean_mpg)) + theme_bw(14) + geom_col() + labs(title = "mtcars",
                                              
  x = "number of cylinders", y = "average miles per gallon")
 + ggalluvial::geom_alluvium()

For the reasons I detailed above the sort order is not correct:

mtcars |>
  group_by(cyl) |>
  summarize(mean_mpg = mean(mpg)) |>
  ggplot(aes(factor(cyl), mean_mpg)) +
  labs(title = "mtcars", x = "number of cylinders", y = "average miles per gallon") +
  theme_bw(14) +
  geom_col() +
  ggalluvial::geom_alluvium()

More specifically, the df$token column after sorting should be:

      token
1 firstline
2        |>
3        |>
4   ggplot(
6     geom_
8     geom_
7     labs(
5    theme_

but with the new code it's:

     token
1 firstline
2        |>
3        |>
4   ggplot(
7     labs(
5    theme_
6      <NA>
8      <NA>

since it doesn't recognize either of the geoms as tokens.

I'd be glad to meet so we can brainstorm how to address this if you're interested. (To be clear, you don't need to complete this for the community contribution!)

Readme.md Outdated
the search terms.

## How To :
Here's the link of a how to video on installation, usage and contribution : [link](...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you have the link, please add it here and then I'll merge it (whether or not the namespace issue is completed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants