Skip to content
View mollydesjardin's full-sized avatar

Block or report mollydesjardin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mollydesjardin/README.md

Hi there ✌️

Right now I am actively working on:

Past projects

These are older tutorials (with Python code) for pre-processing Japanese text datasets for use with common analysis software, which I'm sharing as-is. Please freely reuse, fork, adapt, and/or steal for your own purposes -- that's why it's here!

The writeups are much longer than the code itself! I created them as a resource for getting started with the niche technical issues you'll often encounter with trying to use Japanese data sources. (Not-Unicode and no word boundaries are the main challenges.)

Other resources

Each of my projects above has their own dataset-specific resource section, but you might be interested in a more extensive guide at East Asian Digital Humanities page (external link). This includes semester-long course syllabus, weekend workshop materials, and previous blog posts about the Aozora project. It is not being actively updated, so be aware nothing is more recent than late 2019.

East Asian Digital Humanities has been taught since 2021 as part of UPenn's annual Dream Lab digital humanities workshop series, by Paula Curtis and Paul Vierthaler. Paula has extensively taught about Japanese text mining and digital methods, and you can find more information on her website.

Check out Digital Humanities Japan for a wiki and mailing list promoting resource-sharing and collaboration on Japanese-language digital projects and tech issues.

Pinned Loading

  1. aozora-lambda aozora-lambda Public

    Aozora Corpus Builder for AWS Lambda

    Python

  2. aozora aozora Public

    Aozora Corpus Builder

    Python 1

  3. narwhals-dev/narwhals narwhals-dev/narwhals Public

    Lightweight and extensible compatibility layer between dataframe libraries!

    Python 1.5k 176