Skip to content
This repository was archived by the owner on Oct 30, 2018. It is now read-only.

Conversation

@chechu
Copy link

@chechu chechu commented Feb 29, 2012

First, thank you for your work!

I have included the autodetection of the char encoding used by the web page using the juniversalchardet library. Feel free to include it in the master branch or discard it :-)

Moreover, I have added some code to make easy the integration of an autodetection language library (as jlangdetect or lingpipe) in StopWords.scala. Nowadays I am using my own private language identifier but it would be easy to include some other library. Maybe in the future :-)

Thank you again, and good luck

Jesus Lanchas added 2 commits February 29, 2012 14:18
…library.

Moreover, the system now is prepared to use a language detection previously to count the stop words in each fragment.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant