Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deduplicate() and extract_domain() #97

Closed
sebstier opened this issue Dec 9, 2023 · 6 comments
Closed

deduplicate() and extract_domain() #97

sebstier opened this issue Dec 9, 2023 · 6 comments

Comments

@sebstier
Copy link
Collaborator

sebstier commented Dec 9, 2023

Hi,

the new version of deduplicate() returns a data.frame instead of a wt_dt object.

the new version of extract_domain() with the adaR implementation does not accept drop_na = FALSE anymore, but keeps all rows by default. That is intentional, right?

schochastics added a commit that referenced this issue Dec 9, 2023
@schochastics
Copy link
Member

deduplicate is fixed now.

The second was intentional, but probably should have a proper note. The reasons to drop the parameter are twofold:

  1. adaR fails less (actually almost never) to parse the domain and
  2. if we want to drop certain rows then we should write a proper filter function

@sebstier sebstier closed this as completed Dec 9, 2023
@sebstier sebstier reopened this Dec 9, 2023
@sebstier
Copy link
Collaborator Author

sebstier commented Dec 9, 2023

Thanks. Two more points on extract_domain():

  • Examples are still tailored towards the previous version
  • extract_domain() only seems to extract a valid domain when a url starts with http/https. That is sometimes not the case, e.g. in the 6-country 2019 data.

@schochastics
Copy link
Member

The open pull request #95 is going to deal with those issues. Just had no time to finalize it yet

@sebstier
Copy link
Collaborator Author

two more issues:

  • deduplicate() creates two timestamp columns
  • deduplicate() converts format of column timestamp from dttm to dbl

@schochastics
Copy link
Member

thanks, thats a bug. I will fix that in the open PR

@schochastics
Copy link
Member

schochastics commented Dec 11, 2023

@sebstier can you post a snippet that produces this error? cannot reproduce it
nvm, found an example. Happens for method=aggregation

schochastics added a commit that referenced this issue Dec 11, 2023
schochastics added a commit that referenced this issue Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants