Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download_metadata fails to download #404

Closed
louis-h-p opened this issue Feb 28, 2022 · 4 comments · Fixed by #405
Closed

download_metadata fails to download #404

louis-h-p opened this issue Feb 28, 2022 · 4 comments · Fixed by #405
Labels

Comments

@louis-h-p
Copy link

I'm trying to install geonames. I've done this dozens of times in the past but I can't get it to download any data now. I've tried using the geonames docker image, an AWS instance and my a local vagrant image.

During the postinstall steps (npm run download_metadata) nothing is downloaded, it throws the error below immediately. Note I can download the entire AU.zip (or others) from geonames without issue.

Another question - Is it possible to use geonames to import from a local file? or do I have to rely on download_metadata etc?

vagrant@ubuntu-focal:~/geonames$ npm run download_metadata

> pelias-geonames@0.0.0-development download_metadata /home/vagrant/geonames
> mkdirp metadata && node bin/updateMetadata.js

internal/streams/legacy.js:61
      throw er; // Unhandled stream error in pipe.
      ^

CsvError: Invalid Record Length: columns length is 19, got 1 on line 1
    at Parser.__onRecord (/home/vagrant/geonames/node_modules/csv-parse/lib/index.js:792:9)
    at Parser.__parse (/home/vagrant/geonames/node_modules/csv-parse/lib/index.js:668:38)
    at Parser._transform (/home/vagrant/geonames/node_modules/csv-parse/lib/index.js:474:22)
    at Parser.Transform._read (_stream_transform.js:191:10)
    at Parser.Transform._write (_stream_transform.js:179:12)
    at doWrite (_stream_writable.js:403:12)
    at writeOrBuffer (_stream_writable.js:387:5)
    at Parser.Writable.write (_stream_writable.js:318:11)
    at Request.ondata (internal/streams/legacy.js:19:31)
    at Request.emit (events.js:314:20) {
  code: 'CSV_RECORD_DONT_MATCH_COLUMNS_LENGTH',
  bytes: 36,
  comment_lines: 0,
  empty_lines: 0,
  invalid_field_length: 0,
  lines: 1,
  records: 0,
  columns: [
    { name: 'ISO' },
    { name: 'ISO3' },
    { name: 'ISO_Numeric' },
    { name: 'fips' },
    { name: 'Country' },
    { name: 'Capital' },
    { name: 'Area' },
    { name: 'Population' },
    { name: 'Continent' },
    { name: 'tld' },
    { name: 'CurrencyCode' },
    { name: 'CurrencyName' },
    { name: 'Phone' },
    { name: 'Postal_Code_Format' },
    { name: 'Postal_Code_Regex' },
    { name: 'Languages' },
    { name: 'geonameid' },
    { name: 'neighbours' },
    { name: 'EquivalentFipsCode' }
  ],
  error: undefined,
  header: false,
  index: 1,
  column: 'ISO3',
  quoting: false,
  record: [ '# ================================' ]
}
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! pelias-geonames@0.0.0-development download_metadata: `mkdirp metadata && node bin/updateMetadata.js`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the pelias-geonames@0.0.0-development download_metadata script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/vagrant/.npm/_logs/2022-02-28T23_38_26_524Z-debug.log
@missinglink
Copy link
Member

Hi @louis-h-p I'm not 100% sure what's going here but it seems to be due to a change in the file format and specifically how the '#' is being used for comments.

Using a totally unrelated CSV tool I'm able to reproduce this error, which makes me more confident the issue isn't in our codebase:

curl -s http://download.geonames.org/export/dump/countryInfo.txt | sed '/^#/d' | xsv cat rows
AD	AND	020	AN	Andorra	Andorra la Vella	468	77006	EU	.ad	EUR	Euro	376	AD###	^(?:AD)*(\d{3})$	ca	3041565	ES,FR
CSV error: record 1 (line: 1, byte: 110): found record with 6 fields, but the previous record has 2 fields

That said, we can hopefully work around it, I will open a PR which implements my own handling of CSV comments which seems to work fine, I'm still not completely clear on why mine works and these other ones don't 🤔

@orangejulius
Copy link
Member

The Geonames servers are pretty notorious for changing file formats or hosting broken files for quite some time. Usually they change it back after a while.

But I checked and found the same thing as @missinglink. It looks like the countryInfo.txt file has a bunch of comments at the start. Pruning those out might help prevent issues like this.

@missinglink
Copy link
Member

missinglink commented Mar 1, 2022

Thanks for the bug report @louis-h-p, this issue seems to be due to the geonames files changing to include a CSV comment header prefixed with # characters.

Since it's a non-standard format things broke, but we're handling it in our codebase now so please try again.

@louis-h-p
Copy link
Author

Thanks @missinglink & @orangejulius . That works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants