SberFraudDetection

An algorithm for recognizing fraudulent calls, written as part of a joint hackathon between Sberbank and MIPT.

Algorithm

Getting a speech file via json and apply the Whisper "small" speech-recognition model to this file.
After that, we auto-correct the received by Whisper text data using the autocorrection library.
Normalise our text
Apply CountVectorizer
Partial MultinomialNB on train-samples data
Run server and get audio. We use the Postgres database to compose feedback and check our predicts. If predicted value does not match with target value we use partial fit again.

During the hackathon received:

  python3 app.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
kamil_text		kamil_text
text		text
train-samples		train-samples
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
additional_data_450.csv		additional_data_450.csv
app.py		app.py
cleaned_model		cleaned_model
dat_1.csv		dat_1.csv
data_2.csv		data_2.csv
data_6.csv		data_6.csv
kamil.ipynb		kamil.ipynb
mnb		mnb
mnb2		mnb2
requirements.txt		requirements.txt
research.ipynb		research.ipynb
solution.py		solution.py
test.py		test.py