This thesis represents my bachelor’s project in Computer Science at the University of Catania, focusing on the TAP course taught by Professor Salvatore Nicotra. “PARL” is an innovative solution aimed at enhancing the experience of speakers and listeners through real-time Speech to Text and advanced analysis of audio files. Implemented on a containerized distributed architecture with Docker, the platform leverages open-source technologies such as Kafka, Spark, Fluent Bit, Elasticsearch, and Kibana to optimize the process.
"PARL" offers advanced features, including real-time Speech to Text with high accuracy, Topic Modeling, Sentiment Analysis, and Text Summarization. Targeted at a diverse audience, including professionals in the audiovisual industry, journalists, academic researchers, and businesses, the project promises to significantly improve the experience by utilizing stream processing tools. These tools enable real-time processing of audio signals and the execution of operations for text comprehension and analysis, opening new opportunities for semantic and sentiment analysis in the field of information and communication.
- Clone the repository
git clone https://github.com/ManciSee/PARL.git
cd folder_name
- Install the requirements that are inside the
transcript
folder
cd transcript
pip install -r requirements.txt
- Build and run the docker container
cd folder_name
docker compose-up -d
- Run the Flask server
cd transcript
python3 transcript.py
- Open the server on the port
8880
- Before starting the recording, choose the correct language in the html file or upload the WAV file
- Enjoy!
If the docker compose
doesn't work, run the container in this order:
- Zookeper
- Kafka
- Fluent Bit
- Elasticsearch
- Spark
- Kibana
If your browser doesn't support the Web Speech API, change the browser... Unfortunately at the moment only Google Chrome (not Brave or other) seems to be working at its best.
After downloading the repository, before being able to start everything, it is necessary to change the various ports and IP addresses of the server. In some files, sections of code for localhost have been commented.
The first video demonstrates using the microphone for accurate voice transcriptions, while the second video showcases uploading a WAV file for a quick transcription. Both videos have been sped up for more efficient viewing. Remember to choose the appropriate Whisper model for your needs to optimize transcription based on your desired speed and accuracy.