Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depriciated URL in the script kaldi/tools/extras/ install_srilm.sh #4771

Open
nayanjha16 opened this issue Aug 9, 2022 · 12 comments
Open

Depriciated URL in the script kaldi/tools/extras/ install_srilm.sh #4771

nayanjha16 opened this issue Aug 9, 2022 · 12 comments
Assignees
Labels
enhancement in progress Issue has been taken and is being worked on

Comments

@nayanjha16
Copy link

nayanjha16 commented Aug 9, 2022

The URL present in the code file kaldi/tools/extras/install_srilm.sh to download the srilm seems to be depreciated.
The current url provided is : http://www.speech.sri.com/projects/srilm/srilm_download.php
kindly update it with the new one.

@nayanjha16 nayanjha16 added the bug label Aug 9, 2022
@jtrmal
Copy link
Contributor

jtrmal commented Aug 10, 2022 via email

@jtrmal
Copy link
Contributor

jtrmal commented Aug 12, 2022

I contacted some people at SRI, hopefully the will resolve this. Not our bug, but I keep this open to keep a track of things.

@kkm000
Copy link
Contributor

kkm000 commented Aug 26, 2022

The URL responds now. But this whole automation idea is borked from multiple angles. First, it's in a kinda legal gray area, we should point to the license at the very least. Second, the script does not URL-encode data. wget doesn't care, and we apparently do not require curl, which is a bummer. I'll see what we can do with Python 3. It has urllib, which can properly do it.

@kkm000
Copy link
Contributor

kkm000 commented Aug 27, 2022

Nice. The honest download of version 1.7.3 through the web site with all form stuff filled out returns 200 OK with a zero-length file. Tarballs for 1.7.2, 1.7.1, 1.6.0 are downloading fine, only 1.7.3 is missing. @jtrmal, if you know who to talk to, could you please let them know?

As I'm reading SRILM license, it allows redistribution "with a prominent license notice", nothing unusual. @danpovey, @jtrmal, could we put the source to openslr? Or better yet, to GitHub, because it's source code only? We can pre-apply the configuration, avoiding the sed gymnastics in install_srilm.sh. License allows this, under a usual source availability requirement. GitHub looks even better w.r.t visibility of the applied changes. We have OpenFST and sph2pipe on GitHub already. Tangentially, maybe moving this stuff to kaldi-asr org is a good idea? or, even better, create a separate org for Kaldi dependencies? For SRILM, the only special requirement in its license is to register the URL of the redistribution point:

3.3. Licensee Registration. Before You Distribute [SRILM] under this License, You must first register by sending email [...] to srilm@speech.sri.com, including a statement confirming that you accept [...] this License and [...] identify the URL [used] to make Source Code available.

@kkm000
Copy link
Contributor

kkm000 commented Aug 29, 2022

Meanwhile, I googled up this: https://github.com/weimeng23/SRILM :)

@danpovey
Copy link
Contributor

If the license allows that, then yes, i think we could just put a fork on github.

@kkm000
Copy link
Contributor

kkm000 commented Aug 30, 2022

@danpovey, I'm thinking of setting up an org, e.g. kaldi-dependencies, with pre-patched and/or fixed dependencies like this. For SRILM in particular, we apply a patch for 1.7.1 and earlier (probably irrelevant anymore) and always do awk/sed gymnastics to makefiles. We have openfst (patched for Windows) and sph2pipe (with a makefile) under my own account. We're going to have the third repo. We have ancient dependencies, like sph2pipe, which need to be maintained. I see the number growing over time, they are rarely needed but can't be dropped. Besides, check_dependencies has pointers to mirrors on openslr, and we once had an issue with the main unresponsive and the backup outdated. GitHub is consistently available.

Or do you think kaldi-asr is a better place? My take,, I don't want to clutter it.

@danpovey
Copy link
Contributor

danpovey commented Sep 1, 2022

I think that's a great idea!
if it's on github it will be more future-proof I think.

@jtrmal
Copy link
Contributor

jtrmal commented Sep 1, 2022 via email

@kkm000
Copy link
Contributor

kkm000 commented Sep 1, 2022

On second thought, @jtrmal is right. any case. We have nothing in kaldi-asr but Kaldi anyway. There are orgs with hundreds of repos, and we're talking no more than 5 now, probably under 10 in the future, and that's it. By far not a clutter. Github sorts them the most active first, so Kaldi will show up on top, and can also be pinned. @danpovey, what's your take?

@kkm000
Copy link
Contributor

kkm000 commented Sep 5, 2022

@danpovey?

@danpovey
Copy link
Contributor

danpovey commented Sep 5, 2022

Yes I agree, kaldi-asr is fine

@kkm000 kkm000 self-assigned this Sep 14, 2022
@kkm000 kkm000 added help wanted Please help us with this issue! in progress Issue has been taken and is being worked on enhancement and removed bug help wanted Please help us with this issue! labels Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement in progress Issue has been taken and is being worked on
Projects
None yet
Development

No branches or pull requests

4 participants