Skip to content

Commit 5c0d6fc

Browse files
author
Philip Mateescu
committed
script to retrieve the latest xmls
1 parent 1fbb9a5 commit 5c0d6fc

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

get_latest_dumps.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#/bin/bash
2+
USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/5.1.1 Safari/534.51.22"
3+
ACCEPT="Accept-Encoding: gzip, deflate"
4+
D_URL="http://www.discogs.com/data/"
5+
D_TMP=/tmp/discogs.urls
6+
D_PATTERN="discogs_\d+_(artists|labels|masters|releases).xml.gz"
7+
8+
TEST=""
9+
[[ "$1" == '--test' ]] && TEST='--spider -S'
10+
11+
echo "" > $D_TMP
12+
13+
for f in `wget -c --user-agent="$USER_AGENT" --header="$ACCEPT" -qO- $D_URL | ack -io "$D_PATTERN" | sort | uniq | tail -n 4` ; do
14+
echo $D_URL$f >> $D_TMP
15+
done
16+
17+
wget -c --user-agent="$USER_AGENT" --header="$ACCEPT" --no-clobber --input-file=$D_TMP $TEST --append-output=$D_TMP.log
18+
19+

0 commit comments

Comments
 (0)