Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Refining Workshop #2

Open
ebeshero opened this issue May 9, 2018 · 5 comments
Open

Data Refining Workshop #2

ebeshero opened this issue May 9, 2018 · 5 comments
Assignees

Comments

@ebeshero
Copy link
Member

ebeshero commented May 9, 2018

  • Send the spreadsheet of MSS

  • Subset of data (perhaps all historical people) from SI for VIAF and for occupations.

  • I will work on "cross-walking" back from TSV to XML via ID transform.

@mbolam

@ebeshero
Copy link
Member Author

ebeshero commented May 9, 2018

By SUNDAY!

@ebeshero ebeshero self-assigned this May 9, 2018
@ebeshero
Copy link
Member Author

@mbolam Yikes--looks like I missed a deadline to deliver files to you. But I'm also wondering if we've actually figured it out: Would you like to work with the Digital Mitford Site index? You can download the always-current one here:

http://digitalmitford.org/si.xml

There's also the giant spreadsheet of Mitford MSS locations: Instead of pushing this to GitHub, here's a direct link to it at its Box home .

Let me know what you need...I'm back in Pittsburgh again at Pitt tomorrow if you'd like to meet and chat a bit.

@mbolam
Copy link
Collaborator

mbolam commented May 16, 2018

@ebeshero -- You had volunteered to extract tabular data (CSV) out of the si.xml, specifically the historical people section, I believe.

@ebeshero
Copy link
Member Author

@mbolam That’s right—will do this AM!

@ebeshero
Copy link
Member Author

ebeshero commented May 16, 2018

@mbolam I've made a TSV file from the Digital Mitford Site Index here:
https://github.com/ebeshero/DigMitCS/tree/master/data-cleaning

Basically it includes on each line of a tab-separated set the following data on each historical person:

  • xml:id
  • a joined list of names associated with this person
  • a joined list of occupations associated with this person
  • a VIAF entry if we currently have one (we don't for everyone)

I can include more, but the need to make joined lists (since each entry can contain one or more persNames as well as one or more occupations) struck me as potentially complicating things, so I figured this would be a decent start for us! Let me know how this looks and sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants