Skip to content

Commit 0826107

Browse files
author
Philip Mateescu
committed
created sections
1 parent 5c0d6fc commit 0826107

File tree

1 file changed

+17
-6
lines changed

1 file changed

+17
-6
lines changed

README.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,22 @@
1-
From original source: http://code.google.com/p/discogs-sql-importer/
21

32
----
4-
5-
This is a python program for importing the discogs data dumps found at http://www.discogs.com/data/ into a PostgreSQL database.
3+
# What is it?
4+
This is a python program for importing the discogs data dumps found at http://www.discogs.com/data/ into PostgreSQL, CouchDB, or MongoDB database.
65

76
MySQL or other databases are not supported at the moment, but you are welcome to submit a patch.
87

8+
9+
# How do I use it?
910
Steps to import the datadumps (into PostgreSQL):
1011

11-
1. Download and extract the data dumps
12+
1. Download and extract the data dumps (you can use `get_latest_dump.sh` to get the latest dumps).
1213
2. Create the empty database: `createdb -U {user-name} discogs`
1314
3. Import the database schema: `psql -U {user-name} -d discogs -f discogs.sql`
1415
4. The XML data dumps often contain control characters and do not have root tags. To fix this run `fix-xml.py _release_`, where release is the release date of the dump, for example `20100201`.
1516
5. Finally import the data with `python discogsparser.py -o pgsql -p "dbname=discogs" pgsql _release_`, where release is the release date of the dump, for example `20100201`
1617

17-
Options for `discogsparser.py`:
18+
19+
# Options for `discogsparser.py`
1820

1921
* **Input**: `-d`/`--date` parses all three files (artists, labels, releases) for a given monthly dump:
2022
* `discogsparser.py -d 20111101` will look for `discogs_20111101_artists.xml`, `discogs_20111101_labels.xml`, and `discogs_20111101_releases.xml` in the current directory;
@@ -34,10 +36,19 @@ Options for `discogsparser.py`:
3436
* `-o mongo -p "file:///path/to/dir/"`: outputs each of the Artists, Labels, Releases into a separate JSON file into the specified directory, `/path/to/dir/` in this case, one line for each. Pass `--ignoreblanks` to `mongoimport` in case extra new-lines are added; you probably also want `--upsert --upseftFields id`.
3537

3638

37-
Examples:
39+
# Examples:
3840

3941
discogsparser.py -n 200 -o couch --params http://127.0.0.1:5984/discogs -d 20111101
4042
discogsparser.py -o mongo -p mongodb://localhost,remote1/discogs discogs_20111101_artists.xml discogs_20111101_releases.xml
4143
discogsparser.py -o pgsql -p "host=remote1 dbname=discogs user=postgres password=s3cret" discogs_20111101_artists.xml
4244

4345

46+
# Credits
47+
48+
Original project: [discogs-sql-importer](http://code.google.com/p/discogs-sql-importer/)
49+
50+
# Some sort of changelog
51+
52+
* v0.60 - support for CouchDB and MongoDB
53+
* v0.50 - command line parameters controlling various import options
54+
* v0.15 - Original import of discogs-sql-importer

0 commit comments

Comments
 (0)