Skip to content

Commit 4ba156e

Browse files
Notes for March and April meetings (#36)
* added notes for 3/5, 3/19, 4/2 * Added 4/16 meeting notes * Added upcoming meeting dates to READMEs * Added April 30, 2024 notes
1 parent 1a24f8d commit 4ba156e

File tree

2 files changed

+162
-3
lines changed

2 files changed

+162
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Shared space for the Python{4}Lib group.
44
See our [meeting notes](mtg_notes.md) for more details.
55

66
Upcoming meetings (meetings at 11am Eastern time):
7-
+ February 20th, 2024
8-
+ March 5th, 2024
9-
+ March 19th, 2024
7+
+ *No meeting on May 14 during Code4Lib conference*
8+
+ May 28, 2024
9+
+ June 11, 2024
1010

1111
Would like to suggest a worthy resource? See [contributing instructions](CONTRIBUTING.md).
1212

mtg_notes.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,162 @@
1+
### April 30, 2024
2+
+ David asked if anyone had experience with or knew of any automated discard assessment tools
3+
+ Javier said he has 25,000 volumes to assess for discard
4+
+ Tomasz said other groups may know more about these types of tools because tech services may not have responsibility for collections assessment. Reference librarians may know more about potential tools to use.
5+
+ Sara Amato has used OCLC API “to look at WC holdings and compare also to HathiTrust and comparisons to other libraries in our group to help make decisions - not great for large scale projects but good for smaller lists. I don’t have the code up anywhere though… and it doesn’t have any item level data like circ.”
6+
+ Tomasz asked if Pymarc will have a new release due to a change in how indicators are handled
7+
+ Indicators will be a named tuple that can only have two positions rather than a list which could be of any length
8+
+ The change is outlined in this merge request: https://gitlab.com/pymarc/pymarc/-/merge_requests/206
9+
+ Ed: No scheduled release, reluctant to introduce another major version with breaking changes
10+
+ More discussion of the change is in the [pymarc google group](https://groups.google.com/g/pymarc/c/cMkDb-dDDBY?pli=1)
11+
+ Michael asked if anyone has experience working with APIs for wikimedia/wikimedia commons
12+
+ He has copyright free newspaper images he would like to upload in bulk as PDFs (rather than image files which the other wikicommons tools can use)
13+
+ Javier mentioned using the APIs to get data out of wikimedia commons but not to POST data
14+
+ Tomasz asked about Michael’s involvement in movement to preserve Ukrainian cultural heritage materials after the start of the full scale invasion
15+
+ Michael noted there are two parts to this preservation work:
16+
+ [SUCHO](https://www.sucho.org/) works on preserving publicly available materials
17+
+ There is a separate effort to back up digital materials that are not publicly available
18+
+ Michael mentioned Maryna Paliienko, a Fulbright Scholar from Taras Shevchenko University, whose project focuses on archives
19+
+ Maryna and Michael recently gave a presentation at NYU: https://www.nycarchivists.org/event-5671162
20+
+ Michelle asked for help figuring out why her API calls hang when she tries to upload large files
21+
+ Files are ~2GB and she is posting them using the DSpace API. The files have to be read in binary before uploading them and the requests just hang after uploading the file successfully
22+
+ Yamil mentioned that Python has issues with downloading files that are larger than available RAM and wondered if it has a similar issue with uploading files larger than available RAM
23+
+ He also provided link to streaming uploads with Requests: https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads
24+
+ Impromptu code review: https://github.com/mjanowiecki/dspace7-rest-api/blob/main/post/postItemsToCollection.py
25+
+ Susan asked if the code is sending the correct residual size
26+
+ If chunks are in unequal sizes (or the last chunk is not the same size as the others), the API will wait for the last chunk to reach the size of the other chunks
27+
+ Ed said it could be helpful to add the complete upload size in the content-length header with the POST request
28+
+ Michelle provided a link to a tool that makes it easier to authenticate using the DSpace API: https://github.com/the-library-code/dspace-rest-python/tree/main
29+
+ John asked if anyone had recommendations for tools to use to take messy data from google docs and publish it to a dashboard a couple of times a year
30+
+ Has been looking at [Streamlit](https://streamlit.io/) and [Pygwalker](https://github.com/Kanaries/pygwalker) as potential options
31+
+ Pygwalker has tableau-like display
32+
+ Jeremy used streamlit for a project with Hopkins Marine Station: https://taxa.stanford.edu/
33+
+ One issue he noted was that every time a user would interact with the dashboard it would completely reload
34+
+ Michael mentioned stumbling across a tool called [Discorpy](https://discorpy.readthedocs.io/en/latest/index.html) and thought it may be of interest after discussion in last Python4Lib session about image cropping/manipulation
35+
+ It is a tool for measuring lens distortion in a camera
36+
+ Yamil mentioned he is learning about [SeleniumBase](https://seleniumbase.io/)
37+
38+
### April 16, 2024
39+
+ David provided an update on the upcoming Python4Lib presentation schedule:
40+
+ April 30 - open topics
41+
+ May 14 - skipped, C4L in person
42+
+ May 28 - Thomas will be talking Jupyter Kernel Gateways
43+
+ June 11 - Rebecca will be talking Postman
44+
+ Eric Phetteplace spoke about hosting a Python4Lib workshop at the upcoming Code4Lib conference
45+
+ https://2024.code4lib.org/workshop/Python4Lib
46+
+ He mentioned that he would welcome a a volunteer to help with session and mentione that he can probably get the cost of the workshop refunded for the volunteer
47+
+ It’ll be a loose conversation similar to a Python4Lib missing and will cover more specific topics in the second half
48+
+ He mentioned asyncio as a potential topic he would like to explore in the session
49+
+ Eric spoke about getting access to some High Performance Computing and exploring parallel processing
50+
+ He mentioned that this set up has a “head node” that coordinates with the other nodes
51+
+ We shared some links with information on parallel work in Python
52+
+ https://realpython.com/python-concurrency/
53+
+ https://docs.python.org/3/library/multiprocessing.html
54+
+ https://realpython.com/async-io-python/
55+
+ https://realpython.com/python-gil/
56+
+ Then we spent a long time talking about the pros and cons of doing parallel work with Python
57+
+ Clinton had some details and examples of reasons why Python’s language design makes it comparatively very slow for parallel work compared to many other languages like Rust and C
58+
+ GIL is going away https://www.blog.pythonlibrary.org/2023/08/16/global-interpreter-lock-optional-in-python-3-13/
59+
+ We also talked about how despite the fact that Python is slower than other languages, you can take existing Python code/projects and update them over to the current parallel options in Python and in many situations you can still get really good improvements in performance
60+
+ Michelle shared an example of working with the Alma API using asyncio
61+
+ Her work went from a runtime of 1 hour for 2000 API calls to 5 minutes for 2000 API calls
62+
+ https://github.com/jhu-library-applications/alma-api/blob/main/updateItemFieldsFromCSVAsync.py
63+
+ Her code updates Alma items from a CSV, doing batches of 1000 rows at a time from the spreadsheet (to help catch errors in more manageable sets)
64+
+ Clinton also shared a Python profiler, to help see what parts of your code are running slow/fast and which parts are using C-based code (which runs faster)
65+
+ https://github.com/plasma-umass/scalene
66+
+ He also shared apresentaion on python performance
67+
+ [Python Performance Matters by Emery Berger (Strange Loop 2022)](https://www.youtube.com/watch?v=vVUnCXKuNOg)
68+
+ Jerrell asked if anyone had been working on AI assisted image cropping
69+
+ No one had worked on this yet but many people are interested in the topic
70+
+ We briefly talked about the use of [Whisper (from OpenAI)](https://openai.com/research/whisper) to create transcripts of videos
71+
+ We also spoke about [Otter AI](https://otter.ai/), another transcript platform that can use Zoom
72+
+ Handprint also came up
73+
+ https://2022.code4lib.org/talks/Handprint-A-program-to-explore-and-compare-major-cloudbased-services-for-handwritten-text-recognition
74+
75+
### April 2, 2024
76+
+ Charlotte and Tomasz have released a new [version (1.0) of Bookops-Worldcat](https://github.com/BookOps-CAT/bookops-worldcat), a Python wrapper for the WorldCat Metadata API.
77+
+ The new version supports changes made in [version 2.0 of the Metadata API](https://developer.api.oclc.org/wc-metadata-v2).
78+
+ The documentation is available on GitHub pages: https://bookops-cat.github.io/bookops-worldcat/
79+
+ Lauren at Rice is working on a reclamation project, gave a shoutout to Rebecca for some python notes she shared in the past.
80+
+ Here is Rebecca’s code:
81+
+ Pulls specified data from holdings records in Alma, using the Bibs API
82+
+ https://github.com/LibraryNinja/Holdings_Record_Inpsector
83+
+ Rebecca talked about her recent work using Tkinter. She has been changing code written using PySimpleGUI to Tkinter after PySimpleGUI changed their licensing and would require a fee for higher ed use.
84+
+ https://docs.python.org/3/library/tkinter.html
85+
+ https://realpython.com/python-gui-tkinter/
86+
+ https://github.com/TomSchimansky/CustomTkinter
87+
+ Someone asked Rebecca for beginer Tkinter resources and she recommended two courses/videos
88+
+ [Create Graphical User Interfaces With Python And TKinter](https://www.youtube.com/playlist?list=PLCC34OHNcOtoC6GglhF3ncJ5rLwQrLGnV)
89+
+ [A Linkedin Learning Course](https://www.linkedin.com/learning/python-gui-development-with-tkinter-2?u=2147385)
90+
+ Eric asked if once can create a single executable with a custom desktop icon for the resulting app with Tkinter
91+
+ Rebecca said it is possible, but would require the use of a packaging utility
92+
+ Rebecca: “PyInstaller is the thing that packages it all up using the command line, Auto-py-to-exe is a layer on top for it”
93+
+ Emily had a question about using pymarc for some batch edits, but it did not work as she hoped(?)
94+
+ “At my institution, we’ve got one person (me) identifying OCLC numbers for changes in one, now pymarc script, that a second person then feeds into the Metadata API 2.0 to make changes. Using the BookOps library would we be able to integrate the script searching for identifiers with the script that makes batch changes?”
95+
+ Charles shared a new project he and Eddie are working on using Flask to connect to the Alma API
96+
+ https://flask.palletsprojects.com/en/3.0.x/
97+
+ https://en.wikipedia.org/wiki/Flask_(web_framework)
98+
+ The application lives on the Azure cloud, but it runs via Docker for local tests and on the cloud
99+
+ Javier asked about Charles' use of ChatGPT 4, if he could share reasons to justify the cost of chatGPT 4
100+
+ Javier also asked about the various “personas” that Charles used.
101+
+ Charles then explained how to give “context” to each “persona.” Like stating that the human users is already experienced in programming.
102+
+ Charles also mentioned that he asks chatGPT questions that chatGPT may need answered before it can properly answer a particular prompt (or all prompts going forward for a single “persona”)
103+
+ Charles also recommended other LLMs that worked well for him for code questions if you cannot pay for ChatGPT 4 (some of the ones below have paid versions too)
104+
+ https://www.phind.com/search
105+
+ https://www.anthropic.com/claude
106+
107+
### March 19th, 2024
108+
+ Yamil and Charlotte gave a presentation on Python Virtual Environments & requirements.txt
109+
+ https://docs.google.com/presentation/d/1XvnmQFdCkBWnD4javgJ0SPn-Uzp7F8if4dIh6qPxKos/edit?usp=sharing
110+
+ Q&A/Discussion
111+
+ Using pyproject.toml vs. requirements.txt
112+
+ pyproject.toml files are more complex/powerful
113+
+ this should be a presentation topic in the future
114+
+ https://packaging.python.org/en/latest/guides/writing-pyproject-toml/
115+
+ Dependency management and how to properly deploy code to someone else’s machine
116+
+ pipx: https://github.com/pypa/pipx
117+
+ how to install packages globally while still keeping them separate form the global Python install
118+
119+
### March 5th, 2024
120+
+ Rebecca mentioned that Pysimple GUI has moved to a license model and was wondering if it is common for a package to move to a closed license
121+
+ Clinton mentioned he has seen it maybe 5 times
122+
+ It makes projects very brittle because every person needs to get a key annually
123+
+ We discussed alternatives to PySimpleGUI
124+
+ TKinter: https://docs.python.org/3/library/tkinter.html
125+
+ PyQt: https://wiki.python.org/moin/PyQt
126+
+ Clinton also mentioned using a python backend with a simple HTML frontend in the past as a potential alternative to PySimpleGUI
127+
+ If the project doesnt need the user interface to change, the project won't require any javascript
128+
+ Buttons can send calls to Flask endpoints
129+
+ Example: randomizing math exercises from text book
130+
+ Basic inputs with some rendering in Flask
131+
+ It has a low barrier to entry
132+
+ The python is running locally and you type in the local host in the browser
133+
+ Will always use a browser as the front end
134+
+ Brooks mentioned [FastUI](https://github.com/pydantic/FastUI) and [DearPyGUI](https://github.com/hoffstadt/DearPyGui)
135+
+ https://talkpython.fm/episodes/show/348/dear-pygui-simple-yet-fast-python-gui-apps
136+
+ Tomasz mentioned that python isn’t really known for windows apps especially because TKinter is part of the standard library but looks very dated
137+
+ The library isn’t copied into your virtual environment
138+
+ https://beeware.org/project/projects/libraries/toga/
139+
+ Rebecca mentioned TTKbootstrap: https://ttkbootstrap.readthedocs.io/en/latest/
140+
+ Rebecca asked how to ensure that one won’t be burned in the future
141+
+ Clinton suggested focussing on tools with very wide adoption (like Flask or Django)
142+
+ Tools that are widely used can’t make that sort of change without it being too disruptive
143+
+ If anyone would like to evaluate any of these tools and present on their findings it would be a welcome presentation
144+
+ Rebecca mentioned a self-checkout tool that she is developing and asked for feedback
145+
+ She is working with a group within CUNY to develop this tool
146+
+ It will run in a terminal where someone could enter their User ID and check out a book
147+
+ Charlotte asked for feedback on [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat)
148+
+ David mentioned that he and Lauren are working on an OCLC reclamation using [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat)
149+
+ Clinton offered to present on creating simple APIs in the future
150+
+ Eric said he was interested in learning more about FastAPI
151+
+ Tomasz asked about Jupyter Kernel Gateway to implement a local API to query from within an OpenRefine project
152+
+ https://github.com/MichaelMarkert/GND4C/blob/main/APIs_for_OpenRefine/localAPI.ipynb
153+
+ Kate asked about adding 758 fields to ILS records
154+
+ She is exploring adding them to their collection in a batch
155+
156+
### February 20th, 2024
157+
(Missing notes from Jeremy's presentation on pyscript)
158+
159+
1160
### February 6, 2024
2161
+ Upcoming scheduled presentations/chats:
3162
+ Jeremy Nelson will talk about [pyscript](https://pyscript.net/) on Feb 20

0 commit comments

Comments
 (0)