Skip to content

Commit 6eb55ec

Browse files
committed
Updating the readme, and fixing the commit flag for insert_new.
1 parent 55512a9 commit 6eb55ec

File tree

4 files changed

+74
-14
lines changed

4 files changed

+74
-14
lines changed

.gitignore

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,23 @@ helpers/archives/
3232
helpers/figshareUpload/settings.yaml
3333

3434
helpers/figshareUpload/lib/__pycache__/
35+
36+
helpers/.env
37+
38+
Proposals/publications/.env
39+
40+
Proposals/publications/data/neotoma_citations_202410070925.csv
41+
42+
Proposals/publications/data/neotoma_publications_202410151545.csv
43+
44+
Proposals/publications/json_pubs.json
45+
46+
Proposals/publications/output.csv
47+
48+
*.csv#
49+
50+
Proposals/speleothem_data/.env
51+
52+
Proposals/publications/src/publications/__pycache__/
53+
54+
Proposals/publications/data/raw_doi.csv

Proposals/publications/README.md

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,20 @@ This project is intended to help add DOIs to existing publications, support meta
44

55
## Updating Existing Neotoma Records
66

7+
These scripts are run (for the most part) directly against the Neotoma Database. Because of this, we use an `.env` file when running the scripts. The `.env` file should contain the connection string to the database, as a JSON object, related to `DBAUTH`. For example:
8+
9+
```env
10+
DBAUTH={"host":"localhost","port":5555,"user":"PUBLICATIONGUY","password":"PUBLICATIONSCOOL","database":"neotoma"}
11+
```
12+
13+
Without this connection string the scripts will not work properly. For development work, please use `neotomatank` as the database for connection.
14+
715
### Finding new DOIs
816

917
For existing publications that do not include DOIs we can scan the Neotoma Publications database from the commandline:
1018

11-
```python
12-
uv run src/find_potential_dois.py --limit 100 --skip 100 --output ./data/offset100.csv
19+
```bash
20+
uv run src/find_potential_dois.py --limit 30000 --output ./data/offset100.csv
1321
```
1422

1523
This will return a CSV file (saved in the `--output` directory) with the Neotoma `publicationid`, current `citation`, the `doi` stored in Neotoma (generally empty) and then columns for the `newdoi`, obtained from a CrossRef search, as well as the `bibtex` citation.
@@ -32,9 +40,41 @@ The `--commit` flag allows us to test the run, to ensure that we don't accidenta
3240

3341
Otherwise the upload will end with the statement:
3442

35-
```
43+
```bash
3644
The --commit flag was set to False, rolling back operation.
3745
```
3846

3947
## Inserting New Publicaitons from DOIs
4048

49+
To facilitate bulk uploads of records we can use a file with raw DOI strings and use CrossRef to resolve the citation for us. The functions we use are in the [`publications`](./src/publications/) folder, both [`return_bibtex.py`](./src/publications/return_bibtex.py) -- which takes the DOI and returns a formatted BibTex citation -- and [`add_citation`](./src/publications/add_citation.py), which takes the BibTex citation and formats it using APA style.
50+
51+
Given a text file with DOIs and, potentially empty spaces (to support simply copying a column from a spreadsheet), we process each unique entry.
52+
53+
```csv
54+
10.1017/S0033822200001089
55+
56+
10.1017/S0033822200020452
57+
10.1017/S0033822200001089
58+
10.1017/S0033822200001089
59+
10.1017/S0033822200001089
60+
10.1080/0734578X.2017.1377510
61+
```
62+
63+
Using the script:
64+
65+
```bash
66+
uv run insert_new_from_doi.py --input FILEPATH.csv
67+
```
68+
69+
The script will parse individual DOIs and dry-run insertion. To insert the records into the database, include the `--commit` flag. If `--commit` is set, you will see the output:
70+
71+
```text
72+
Committing the following citation:
73+
Emslie, S. D., & Mead, J. I. (2020 , August). The age and vertebrate paleontology of labor-of-love cave, white pine county, nevada. Western North American Naturalist, 80(3). URL: http://dx.doi.org/10.3398/064.080.0301, doi:10.3398/064.080.0301
74+
```
75+
76+
for each record submitted. Note here that capitalization is often inconsistent. Most publishers on CrossRef do not properly capitalize records, and as such, any automated system will do a poor job of returning properly capitalized records.
77+
78+
## Conclusion
79+
80+
The scripts together support the management of publication records within Neotoma.

Proposals/publications/src/insert_new_from_doi.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@
77

88
parser = argparse.ArgumentParser()
99
parser.add_argument('--input', '-I', help="A valid output filename.", type= str, default = 'output.csv')
10-
parser.add_argument('--doi', '-d', help="Which column contains the DOI for upload?", type=str)
11-
parser.add_argument('--commit', '-c', help="Should we commit the data to the database?", type=bool, default= False)
10+
parser.add_argument('--commit', '-c', help="Should we commit the data to the database? (exclude flag to avoid committing)", type=bool, default= False)
1211

1312
args = parser.parse_args()
13+
print(args)
1414

1515
dotenv.load_dotenv()
1616

@@ -29,14 +29,13 @@
2929
citation = add_citation(bibtex)
3030
if bibtex is not None and citation is not None:
3131
with conn.cursor() as cur:
32-
cur.execute(QUERY, {'doi': i,
33-
'bibtex': bibtex,
34-
'citation': citation})
35-
if args.commit:
36-
print(f'Committing the following citation:\n{citation}.')
37-
conn.commit()
38-
cur.close()
39-
32+
cur.execute(QUERY, {'doi': i,
33+
'bibtex': bibtex,
34+
'citation': citation})
35+
if args.commit is True:
36+
print(f'Committing the following citation:\n{citation}.')
37+
conn.commit()
38+
cur.close()
4039
if args.commit is True:
4140
conn.commit()
4241
else:

Proposals/publications/src/publications/add_citation.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,5 @@ def add_citation(bibtex:str) -> str:
2929
else:
3030
return [entry.text.render_as('text') for entry in formattedBib][0]
3131
except Exception as e:
32-
32+
print(e)
33+
return ''

0 commit comments

Comments
 (0)