|
| 1 | +### April 30, 2024 |
| 2 | + + David asked if anyone had experience with or knew of any automated discard assessment tools |
| 3 | + + Javier said he has 25,000 volumes to assess for discard |
| 4 | + + Tomasz said other groups may know more about these types of tools because tech services may not have responsibility for collections assessment. Reference librarians may know more about potential tools to use. |
| 5 | + + Sara Amato has used OCLC API “to look at WC holdings and compare also to HathiTrust and comparisons to other libraries in our group to help make decisions - not great for large scale projects but good for smaller lists. I don’t have the code up anywhere though… and it doesn’t have any item level data like circ.” |
| 6 | + + Tomasz asked if Pymarc will have a new release due to a change in how indicators are handled |
| 7 | + + Indicators will be a named tuple that can only have two positions rather than a list which could be of any length |
| 8 | + + The change is outlined in this merge request: https://gitlab.com/pymarc/pymarc/-/merge_requests/206 |
| 9 | + + Ed: No scheduled release, reluctant to introduce another major version with breaking changes |
| 10 | + + More discussion of the change is in the [pymarc google group](https://groups.google.com/g/pymarc/c/cMkDb-dDDBY?pli=1) |
| 11 | + + Michael asked if anyone has experience working with APIs for wikimedia/wikimedia commons |
| 12 | + + He has copyright free newspaper images he would like to upload in bulk as PDFs (rather than image files which the other wikicommons tools can use) |
| 13 | + + Javier mentioned using the APIs to get data out of wikimedia commons but not to POST data |
| 14 | + + Tomasz asked about Michael’s involvement in movement to preserve Ukrainian cultural heritage materials after the start of the full scale invasion |
| 15 | + + Michael noted there are two parts to this preservation work: |
| 16 | + + [SUCHO](https://www.sucho.org/) works on preserving publicly available materials |
| 17 | + + There is a separate effort to back up digital materials that are not publicly available |
| 18 | + + Michael mentioned Maryna Paliienko, a Fulbright Scholar from Taras Shevchenko University, whose project focuses on archives |
| 19 | + + Maryna and Michael recently gave a presentation at NYU: https://www.nycarchivists.org/event-5671162 |
| 20 | + + Michelle asked for help figuring out why her API calls hang when she tries to upload large files |
| 21 | + + Files are ~2GB and she is posting them using the DSpace API. The files have to be read in binary before uploading them and the requests just hang after uploading the file successfully |
| 22 | + + Yamil mentioned that Python has issues with downloading files that are larger than available RAM and wondered if it has a similar issue with uploading files larger than available RAM |
| 23 | + + He also provided link to streaming uploads with Requests: https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads |
| 24 | + + Impromptu code review: https://github.com/mjanowiecki/dspace7-rest-api/blob/main/post/postItemsToCollection.py |
| 25 | + + Susan asked if the code is sending the correct residual size |
| 26 | + + If chunks are in unequal sizes (or the last chunk is not the same size as the others), the API will wait for the last chunk to reach the size of the other chunks |
| 27 | + + Ed said it could be helpful to add the complete upload size in the content-length header with the POST request |
| 28 | + + Michelle provided a link to a tool that makes it easier to authenticate using the DSpace API: https://github.com/the-library-code/dspace-rest-python/tree/main |
| 29 | + + John asked if anyone had recommendations for tools to use to take messy data from google docs and publish it to a dashboard a couple of times a year |
| 30 | + + Has been looking at [Streamlit](https://streamlit.io/) and [Pygwalker](https://github.com/Kanaries/pygwalker) as potential options |
| 31 | + + Pygwalker has tableau-like display |
| 32 | + + Jeremy used streamlit for a project with Hopkins Marine Station: https://taxa.stanford.edu/ |
| 33 | + + One issue he noted was that every time a user would interact with the dashboard it would completely reload |
| 34 | + + Michael mentioned stumbling across a tool called [Discorpy](https://discorpy.readthedocs.io/en/latest/index.html) and thought it may be of interest after discussion in last Python4Lib session about image cropping/manipulation |
| 35 | + + It is a tool for measuring lens distortion in a camera |
| 36 | + + Yamil mentioned he is learning about [SeleniumBase](https://seleniumbase.io/) |
| 37 | + |
| 38 | +### April 16, 2024 |
| 39 | + + David provided an update on the upcoming Python4Lib presentation schedule: |
| 40 | + + April 30 - open topics |
| 41 | + + May 14 - skipped, C4L in person |
| 42 | + + May 28 - Thomas will be talking Jupyter Kernel Gateways |
| 43 | + + June 11 - Rebecca will be talking Postman |
| 44 | + + Eric Phetteplace spoke about hosting a Python4Lib workshop at the upcoming Code4Lib conference |
| 45 | + + https://2024.code4lib.org/workshop/Python4Lib |
| 46 | + + He mentioned that he would welcome a a volunteer to help with session and mentione that he can probably get the cost of the workshop refunded for the volunteer |
| 47 | + + It’ll be a loose conversation similar to a Python4Lib missing and will cover more specific topics in the second half |
| 48 | + + He mentioned asyncio as a potential topic he would like to explore in the session |
| 49 | + + Eric spoke about getting access to some High Performance Computing and exploring parallel processing |
| 50 | + + He mentioned that this set up has a “head node” that coordinates with the other nodes |
| 51 | + + We shared some links with information on parallel work in Python |
| 52 | + + https://realpython.com/python-concurrency/ |
| 53 | + + https://docs.python.org/3/library/multiprocessing.html |
| 54 | + + https://realpython.com/async-io-python/ |
| 55 | + + https://realpython.com/python-gil/ |
| 56 | + + Then we spent a long time talking about the pros and cons of doing parallel work with Python |
| 57 | + + Clinton had some details and examples of reasons why Python’s language design makes it comparatively very slow for parallel work compared to many other languages like Rust and C |
| 58 | + + GIL is going away https://www.blog.pythonlibrary.org/2023/08/16/global-interpreter-lock-optional-in-python-3-13/ |
| 59 | + + We also talked about how despite the fact that Python is slower than other languages, you can take existing Python code/projects and update them over to the current parallel options in Python and in many situations you can still get really good improvements in performance |
| 60 | + + Michelle shared an example of working with the Alma API using asyncio |
| 61 | + + Her work went from a runtime of 1 hour for 2000 API calls to 5 minutes for 2000 API calls |
| 62 | + + https://github.com/jhu-library-applications/alma-api/blob/main/updateItemFieldsFromCSVAsync.py |
| 63 | + + Her code updates Alma items from a CSV, doing batches of 1000 rows at a time from the spreadsheet (to help catch errors in more manageable sets) |
| 64 | + + Clinton also shared a Python profiler, to help see what parts of your code are running slow/fast and which parts are using C-based code (which runs faster) |
| 65 | + + https://github.com/plasma-umass/scalene |
| 66 | + + He also shared apresentaion on python performance |
| 67 | + + [Python Performance Matters by Emery Berger (Strange Loop 2022)](https://www.youtube.com/watch?v=vVUnCXKuNOg) |
| 68 | + + Jerrell asked if anyone had been working on AI assisted image cropping |
| 69 | + + No one had worked on this yet but many people are interested in the topic |
| 70 | + + We briefly talked about the use of [Whisper (from OpenAI)](https://openai.com/research/whisper) to create transcripts of videos |
| 71 | + + We also spoke about [Otter AI](https://otter.ai/), another transcript platform that can use Zoom |
| 72 | + + Handprint also came up |
| 73 | + + https://2022.code4lib.org/talks/Handprint-A-program-to-explore-and-compare-major-cloudbased-services-for-handwritten-text-recognition |
| 74 | + |
| 75 | +### April 2, 2024 |
| 76 | + + Charlotte and Tomasz have released a new [version (1.0) of Bookops-Worldcat](https://github.com/BookOps-CAT/bookops-worldcat), a Python wrapper for the WorldCat Metadata API. |
| 77 | + + The new version supports changes made in [version 2.0 of the Metadata API](https://developer.api.oclc.org/wc-metadata-v2). |
| 78 | + + The documentation is available on GitHub pages: https://bookops-cat.github.io/bookops-worldcat/ |
| 79 | + + Lauren at Rice is working on a reclamation project, gave a shoutout to Rebecca for some python notes she shared in the past. |
| 80 | + + Here is Rebecca’s code: |
| 81 | + + Pulls specified data from holdings records in Alma, using the Bibs API |
| 82 | + + https://github.com/LibraryNinja/Holdings_Record_Inpsector |
| 83 | + + Rebecca talked about her recent work using Tkinter. She has been changing code written using PySimpleGUI to Tkinter after PySimpleGUI changed their licensing and would require a fee for higher ed use. |
| 84 | + + https://docs.python.org/3/library/tkinter.html |
| 85 | + + https://realpython.com/python-gui-tkinter/ |
| 86 | + + https://github.com/TomSchimansky/CustomTkinter |
| 87 | + + Someone asked Rebecca for beginer Tkinter resources and she recommended two courses/videos |
| 88 | + + [Create Graphical User Interfaces With Python And TKinter](https://www.youtube.com/playlist?list=PLCC34OHNcOtoC6GglhF3ncJ5rLwQrLGnV) |
| 89 | + + [A Linkedin Learning Course](https://www.linkedin.com/learning/python-gui-development-with-tkinter-2?u=2147385) |
| 90 | + + Eric asked if once can create a single executable with a custom desktop icon for the resulting app with Tkinter |
| 91 | + + Rebecca said it is possible, but would require the use of a packaging utility |
| 92 | + + Rebecca: “PyInstaller is the thing that packages it all up using the command line, Auto-py-to-exe is a layer on top for it” |
| 93 | + + Emily had a question about using pymarc for some batch edits, but it did not work as she hoped(?) |
| 94 | + + “At my institution, we’ve got one person (me) identifying OCLC numbers for changes in one, now pymarc script, that a second person then feeds into the Metadata API 2.0 to make changes. Using the BookOps library would we be able to integrate the script searching for identifiers with the script that makes batch changes?” |
| 95 | + + Charles shared a new project he and Eddie are working on using Flask to connect to the Alma API |
| 96 | + + https://flask.palletsprojects.com/en/3.0.x/ |
| 97 | + + https://en.wikipedia.org/wiki/Flask_(web_framework) |
| 98 | + + The application lives on the Azure cloud, but it runs via Docker for local tests and on the cloud |
| 99 | + + Javier asked about Charles' use of ChatGPT 4, if he could share reasons to justify the cost of chatGPT 4 |
| 100 | + + Javier also asked about the various “personas” that Charles used. |
| 101 | + + Charles then explained how to give “context” to each “persona.” Like stating that the human users is already experienced in programming. |
| 102 | + + Charles also mentioned that he asks chatGPT questions that chatGPT may need answered before it can properly answer a particular prompt (or all prompts going forward for a single “persona”) |
| 103 | + + Charles also recommended other LLMs that worked well for him for code questions if you cannot pay for ChatGPT 4 (some of the ones below have paid versions too) |
| 104 | + + https://www.phind.com/search |
| 105 | + + https://www.anthropic.com/claude |
| 106 | + |
| 107 | +### March 19th, 2024 |
| 108 | + + Yamil and Charlotte gave a presentation on Python Virtual Environments & requirements.txt |
| 109 | + + https://docs.google.com/presentation/d/1XvnmQFdCkBWnD4javgJ0SPn-Uzp7F8if4dIh6qPxKos/edit?usp=sharing |
| 110 | + + Q&A/Discussion |
| 111 | + + Using pyproject.toml vs. requirements.txt |
| 112 | + + pyproject.toml files are more complex/powerful |
| 113 | + + this should be a presentation topic in the future |
| 114 | + + https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ |
| 115 | + + Dependency management and how to properly deploy code to someone else’s machine |
| 116 | + + pipx: https://github.com/pypa/pipx |
| 117 | + + how to install packages globally while still keeping them separate form the global Python install |
| 118 | + |
| 119 | +### March 5th, 2024 |
| 120 | + + Rebecca mentioned that Pysimple GUI has moved to a license model and was wondering if it is common for a package to move to a closed license |
| 121 | + + Clinton mentioned he has seen it maybe 5 times |
| 122 | + + It makes projects very brittle because every person needs to get a key annually |
| 123 | + + We discussed alternatives to PySimpleGUI |
| 124 | + + TKinter: https://docs.python.org/3/library/tkinter.html |
| 125 | + + PyQt: https://wiki.python.org/moin/PyQt |
| 126 | + + Clinton also mentioned using a python backend with a simple HTML frontend in the past as a potential alternative to PySimpleGUI |
| 127 | + + If the project doesnt need the user interface to change, the project won't require any javascript |
| 128 | + + Buttons can send calls to Flask endpoints |
| 129 | + + Example: randomizing math exercises from text book |
| 130 | + + Basic inputs with some rendering in Flask |
| 131 | + + It has a low barrier to entry |
| 132 | + + The python is running locally and you type in the local host in the browser |
| 133 | + + Will always use a browser as the front end |
| 134 | + + Brooks mentioned [FastUI](https://github.com/pydantic/FastUI) and [DearPyGUI](https://github.com/hoffstadt/DearPyGui) |
| 135 | + + https://talkpython.fm/episodes/show/348/dear-pygui-simple-yet-fast-python-gui-apps |
| 136 | + + Tomasz mentioned that python isn’t really known for windows apps especially because TKinter is part of the standard library but looks very dated |
| 137 | + + The library isn’t copied into your virtual environment |
| 138 | + + https://beeware.org/project/projects/libraries/toga/ |
| 139 | + + Rebecca mentioned TTKbootstrap: https://ttkbootstrap.readthedocs.io/en/latest/ |
| 140 | + + Rebecca asked how to ensure that one won’t be burned in the future |
| 141 | + + Clinton suggested focussing on tools with very wide adoption (like Flask or Django) |
| 142 | + + Tools that are widely used can’t make that sort of change without it being too disruptive |
| 143 | + + If anyone would like to evaluate any of these tools and present on their findings it would be a welcome presentation |
| 144 | + + Rebecca mentioned a self-checkout tool that she is developing and asked for feedback |
| 145 | + + She is working with a group within CUNY to develop this tool |
| 146 | + + It will run in a terminal where someone could enter their User ID and check out a book |
| 147 | + + Charlotte asked for feedback on [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat) |
| 148 | + + David mentioned that he and Lauren are working on an OCLC reclamation using [bookops-worldcat](https://github.com/BookOps-CAT/bookops-worldcat) |
| 149 | + + Clinton offered to present on creating simple APIs in the future |
| 150 | + + Eric said he was interested in learning more about FastAPI |
| 151 | + + Tomasz asked about Jupyter Kernel Gateway to implement a local API to query from within an OpenRefine project |
| 152 | + + https://github.com/MichaelMarkert/GND4C/blob/main/APIs_for_OpenRefine/localAPI.ipynb |
| 153 | + + Kate asked about adding 758 fields to ILS records |
| 154 | + + She is exploring adding them to their collection in a batch |
| 155 | + |
| 156 | +### February 20th, 2024 |
| 157 | +(Missing notes from Jeremy's presentation on pyscript) |
| 158 | + |
| 159 | + |
1 | 160 | ### February 6, 2024 |
2 | 161 | + Upcoming scheduled presentations/chats: |
3 | 162 | + Jeremy Nelson will talk about [pyscript](https://pyscript.net/) on Feb 20 |
|
0 commit comments