-
Notifications
You must be signed in to change notification settings - Fork 20
Development #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #22
Conversation
Some of these libraries get flagged as security vulnerabilities by GitHub dependency checks. Given that they are not required, removing is simpler than updating.
Although a link to Google Colab is a good idea, the data is not on Google Colab, making executing the notebook there impossible. The notebook could be adapted to deal with that, but for now, it is better to remove it.
- Update all packages - Add Altair
Reviewer's Guide by SourceryThis pull request includes several updates and additions, including adding new dependencies, updating the Altair notebook, improving the formatting of the polars notebook, adding a new notebook illustrating Copy-on-Write, and adding a new script to preprocess the OpenMP FAQ. Updated class diagram for Altair chart selectionsclassDiagram
class alt_selection {
+type: str
+nearest: bool
+on: str
+fields: list[str]
+empty: str
}
class alt_selection_point {
+type: str
+fields: list[str]
+bind: str
}
note for alt_selection "Replaced alt.selection with alt.selection_point"
note for alt_selection_point "To be used for legend and interactive charts"
alt_selection <|-- alt_selection_point
Flow diagram for web scraping preprocessinggraph LR
A[HTML File] --> B(Beautiful Soup Parsing)
B --> C{Extract Questions and Answers}
C --> D[JSONL Output]
style A fill:#f9f,stroke:#333,stroke-width:2px
style D fill:#ccf,stroke:#333,stroke-width:2px
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gjbex - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- The Google Analytics ID is hardcoded in the HTML. (link)
- The Google Analytics ID is hardcoded in the HTML. (link)
- The Google Analytics ID is hardcoded in the HTML. (link)
- The Google Analytics ID is hardcoded in the HTML. (link)
- reCAPTCHA site key is hardcoded in the HTML. (link)
- reCAPTCHA site key is hardcoded in the HTML. (link)
- reCAPTCHA site key is hardcoded in the HTML. (link)
Overall Comments:
- The environment.yml file has a large number of dependencies; consider using a more minimal set of dependencies and letting conda resolve the rest.
- The notebook diffs show changes to the vega-lite version; consider updating the notebook to use the latest version of vega-lite.
Here's what I looked at during the review
- 🟡 General issues: 4 issues found
- 🔴 Security: 7 blocking issues
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
def clean_answer(text): | ||
text = text.replace('\n', ' ').strip() | ||
pos = text.find('Version 3.0 Last updated:') | ||
text = text[:pos] | ||
pos = text.find('(Quote from:') | ||
return text[:pos].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider handling cases when the expected marker substring is not found.
The function uses text.find() and then slices using the returned position. If the substring (e.g. 'Version 3.0 Last updated:' or '(Quote from:') is absent, find() returns -1 and may lead to unintended truncation. Adding an explicit check for -1 would make the code more robust.
def clean_answer(text): | |
text = text.replace('\n', ' ').strip() | |
pos = text.find('Version 3.0 Last updated:') | |
text = text[:pos] | |
pos = text.find('(Quote from:') | |
return text[:pos].strip() | |
def clean_answer(text): | |
text = text.replace('\n', ' ').strip() | |
pos = text.find('Version 3.0 Last updated:') | |
if pos != -1: | |
text = text[:pos] | |
pos = text.find('(Quote from:') | |
if pos != -1: | |
text = text[:pos] | |
return text.strip() |
"source": [ | ||
"## Requirments" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (typo): Typo in markdown header.
The header '## Requirments' contains a typo; it should be spelled '## Requirements'.
"source": [ | |
"## Requirments" | |
] | |
"source": [ | |
"## Requirements" | |
] |
"source": [ | ||
"Indeed, the first column of `data2` was affected, its minimum and maximem values aore no longer approximately -1000 and 1000 as before, but rather -500 and 500 respectively." | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (typo): Typographical errors in descriptive text.
The markdown text contains typographical errors: 'maximem' should be 'maximum' and 'aore' should be 'are'. Correcting these will improve the professionalism of the notebook.
"source": [ | |
"Indeed, the first column of `data2` was affected, its minimum and maximem values aore no longer approximately -1000 and 1000 as before, but rather -500 and 500 respectively." | |
] | |
"source": [ | |
"Indeed, the first column of `data2` was affected, its minimum and maximum values are no longer approximately -1000 and 1000 as before, but rather -500 and 500 respectively." | |
] |
1. `copy_on_write.ipynb`: Jupyter notebook that illustrates how data is shared | ||
between related notebooks and the role Copy-on-Write plays in order to | ||
prevent accidental data modifications in more than one dataframe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (typo): "Copy-on-Write" should likely be "copy-on-write".
<meta property="og:image:height" content="100"/> | ||
<meta property="og:image:type" content="image/png"/> | ||
<!-- This site uses the Google Analytics by MonsterInsights plugin v9.2.2 - Using Analytics tracking - https://www.monsterinsights.com/ --> | ||
<script src="//www.googletagmanager.com/gtag/js?id=G-WFCTSPN1R1" data-cfasync="false" data-wpfc-render="false" type="text/javascript" async></script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 issue (security): The Google Analytics ID is hardcoded in the HTML.
/* ]]> */ | ||
</script> | ||
<script type="text/javascript" src="https://www.openmp.org/wp-content/plugins/contact-form-7/includes/js/index.js?ver=6.0" id="contact-form-7-js"></script> | ||
<script type="text/javascript" src="https://www.google.com/recaptcha/api.js?render=6Lepn-wUAAAAAJLIq_PnJ7mEek8wpQyykB8Ktt2v&ver=3.0" id="google-recaptcha-js"></script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 issue (security): reCAPTCHA site key is hardcoded in the HTML.
<script type="text/javascript" src="https://www.openmp.org/wp-includes/js/dist/vendor/wp-polyfill.min.js?ver=3.15.0" id="wp-polyfill-js"></script> | ||
<script type="text/javascript" id="wpcf7-recaptcha-js-before"> | ||
/* <![CDATA[ */ | ||
var wpcf7_recaptcha = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 issue (security): reCAPTCHA site key is hardcoded in the HTML.
<script type="text/javascript" id="wpcf7-recaptcha-js-before"> | ||
/* <![CDATA[ */ | ||
var wpcf7_recaptcha = { | ||
"sitekey": "6Lepn-wUAAAAAJLIq_PnJ7mEek8wpQyykB8Ktt2v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 issue (security): reCAPTCHA site key is hardcoded in the HTML.
answer = '' | ||
next_tag = h4_tag.find_next('p') | ||
while next_tag and next_tag.name == 'p': | ||
answer += next_tag.text.strip() + ' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation
)
answer += next_tag.text.strip() + ' ' | |
answer += f'{next_tag.text.strip()} ' |
json_strs = [] | ||
for tag in doc.find_all(is_question_header): | ||
json_strs.append(f'{{"input": "{get_question(tag)}", "output": "{get_answer(tag)}"}}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Convert for loop into list comprehension (list-comprehension
)
json_strs = [] | |
for tag in doc.find_all(is_question_header): | |
json_strs.append(f'{{"input": "{get_question(tag)}", "output": "{get_answer(tag)}"}}') | |
json_strs = [ | |
f'{{"input": "{get_question(tag)}", "output": "{get_answer(tag)}"}}' | |
for tag in doc.find_all(is_question_header) | |
] |
Summary by Sourcery
Update project dependencies, documentation, and add new example scripts for web scraping and data preprocessing
New Features:
Enhancements:
Documentation: