Replies: 6 comments 1 reply
-
This was really fun to play around with! I had a paper last year at ACSAC on acoustic keystroke transcription: Our approach was definitely trained (basically adapt a speech-to-text deep learning model to transcribe keystrokes); we experimented with unsupervised approaches and had a really hard time getting them to converge to anything meaningful, especially when room noise, different microphone placements, and errors/backspaces/special characters are involved. Cool stuff! |
Beta Was this translation helpful? Give feedback.
-
One additional example of Keytap2 in action: I found this CTF challenge from 2012 and decided to solve it using Keytap2. Click on the image below to watch a video of the process: |
Beta Was this translation helpful? Give feedback.
-
Disclaimer: I am by no means versed in AI or cipher braking, but I had an idea. You make predictions of characters based on n-gram statistics. In a way, n-gram stats are a primitive language model. Maybe it would be possible to combine your approach (or replace n-grams by) a state-of-the-art language model like GPT-2. Since GPT-2 is able to create texts that are really difficult distinguish from a human-generated text, it might give you faster convergence on the true input text. To be fair, I wouldn't really know how to combine GPT-2 with your approach, just thought I'd share the thought. |
Beta Was this translation helpful? Give feedback.
-
Hi, @ggerganov ! About that:
I trying rusificate your programm now, but i have some troubles with it. if I understand correctly, to translate both programs (keytap and keytap2), you need to make changes to the files:
but, when i change that files, i had a bugs, e.g.: changes in the key_logger.cpp they did not give any results, they still return the same English letters from there when i change letters in constants.h in ( line #49 ) :
it turns out this error: i had this or this
considering that in Bulgarian (you're from Bulgaria, right?) and in the Russian language, approximately the same alphabet (Cyrillic) is used, perhaps you will be able to give good advice on this |
Beta Was this translation helpful? Give feedback.
-
Sir @ggerganov, |
Beta Was this translation helpful? Give feedback.
-
对于不同的环境中,音频所解析出来的东西是不一样的,不同音频代表这不一样的编码介意的功能不完全相同,如果你是想要解析以下mp4,这可能会耗费些功夫, |
Beta Was this translation helpful? Give feedback.
-
Introduction
Keytap is my hobby project for acoustic keyboard eavesdropping. In short, it works like this:
Here is a short 1 min video of my original approach where I train Keytap on my keyboard and then use it to sniff keys from another application:
Vid. Eavesdropping on a Google Chrome browser using Keytap
One of the main drawbacks in the outlined process is that one first needs to collect training data for the specific keyboard that they would be eavesdropping on (step 1). Moreover, in step 2, one has to use the exact same keyboard+microphone setup that they used in step 1 in order to get somewhat reliable results.
The new Keytap2 algorithm is an attempt to solve these problems. In this approach we no longer need to collect training data. The algorithm works by clustering the detected keystrokes based on their sound similarity and then using statistical information about the frequency of the letter n-grams in the supposed language of the text (for example, English). This implies that we need to have prior knowledge about the language of the text, but this is now a lot weaker requirement compared to step 1 above. Additionally, we need enough recorded samples in order to perform successful statistical analysis.
In this post, I will give some details about the implementation. Before starting, here is a 5 min video demonstrating Keytap2 in action where I use it to recover an unknown English text just from the recorded audio of me tapping on the keyboard (note that Keytap2 runs in the browser and I am typing in another application):
Vid. Short 5 min. demo of Keytap2
As we all know how new things are usually presented for the first time - these are some "typical" results... meaning - I have carefully selected an example where my algorithm produces good results 😄 So, keep in mind - this is just a proof-of-concept, so if you decide to try it - lower your expectations.
An online version of the algorithm is available here: keytap2.ggerganov.com (about 8 MB WASM module + assets)
Alternatively, you can build this repository locally and run the keytap2-gui binary.
Fundamentals
Most of the algorithm fundamentals are the same as in Keytap, so make sure to check my previous blog post for more details. In short:
We use an adaptive threshold technique on the recorded sound waveform in order to automatically detect the location of the keystrokes.
Fig. The detected keystrokes in the recording are marked with red rectangles. The tool also allows manually specifying the location of the keys.
Source code
Next, we compute the "Key Similarity Map" Sij - the "sound similarity" between the detected keystrokes i and j. The higher the value of S, the more similar the 2 keystrokes sound and Sii=1. In Keytap2 I continue to use the Cross Correlation between the waveforms of the keystrokes to calculate Sij
Source code
Fig. The "Key Similarity Map" Sij presented visually: brighter cells indicate higher value.
Substitution cipher
Yes, you read that correctly! In Keytap2 we transform the problem of recovering the unknown text into the problem of breaking a substitution cipher.
To explain this better, lets for a moment imagine that our Key Similarity Map Sij was somehow perfect. This means we are somehow able to know with 100% certainty whether 2 detected keystrokes are produced by the same keyboard key, just from the sound they emitted. In other words, for every keystroke pair ij for which the pressed keys were the same, we get Sij=1. Otherwise, we get Sij=0.
If that were the case, we could then group the keystrokes into clusters based on Sij and obtain an "encryption" of the original text that we are trying to recover. This "encryption" is exactly a substitution cipher.
Breaking a substitution cipher is relatively easy, given that we have enough encrypted text. One popular technique is to use a genetic algorithm which tries to maximize the probability of encountering the deciphered n-grams, given their frequency distribution in the target language. In Keytap2 I used a variation of the following implementation. I found that if the encrypted text has at least 100 letters, it manages to break it quite easily.
Source Code
English 3-grams
Non-exact Substitution Cipher
In reality, our Sij is nowhere near perfect. This is due to many factors - the same key can sound different depending on the way it was pressed, recording noise, background noise, etc. Still, we can at least hope that Sij will be correlated to the probability that the two keystrokes i and j were produced by the same keyboard key.
In this setup, we can still try to group the keystrokes into clusters and convert the problem into breaking a substitution cipher. However, to increase chances of success, the algorithm needs to explore many different clusterings of the keystrokes and find balance between how well the clusters fit the Key Similarity Map and how well the deciphered text fits the language n-gram statistics. Keytap2 tries to solve this problem by generating a large number of clusterings using a Metropolis-Hastings-like approach. For each clustering that it generates, it applies the original substitution-cipher-breaking algorithm. In the process, it keeps track of the best results that match well enough the English n-gram frequency stats.
The clustering generation works like this:
Source Code
Overall, this is the most crucial part of the described approach. I don't think my solution is the best out there and I believe there could be some interesting alternative approaches to attack this problem. Definitely let me know if you have ideas how to solve this challenge.
Performance
The performance of Keytap2 is questionable. I kind of made it work with my own setup, but I have not tried it in any other setup, so there is a high probability that the algorithm is "over fitted" to my configuration.
There are few important limitations when using Keytap2:
I couldn't achieve full automation of the attack. Therefore, I introduced an option to provide "hints" to the algorithm. This is demonstrated best in the video at the beginning of this post. You can provide such hints while the optimization is running and the algorithm will automatically readjust. Sometimes, it is very useful to reset the optimization in order to avoid getting stuck in local minima.
Fig. Provide "hints" to the algorithm by selecting decoded parts that you think make sense.
You need a mechanical keyboard. Not sure if all mechanical keyboards are susceptible to this type of attack, but mine is since it makes loud clicking noises.
The browser version of Keytap2 is quite CPU heavy so if your machine is not powerful enough, make sure to try the native program.
Currently, the recorded text must be in English, since this is the n-gram frequency data that I have. In theory, one can use n-gram frequencies for other languages.
Make sure to record at least 100 characters of meaningful text. It is very important that the text makes sense, otherwise it will not follow the average statistical distribution of the n-grams and the approach will completely fail.
Do not record more than 300 characters or the computation will become extremely slow.
Try not to make mistakes when typing :) This can confuse the algorithm. Also, use only letters and "Space" - there is no statistical information about the other keys.
Feedback
It was very fun working on this problem. Although I couldn't achieve the performance that I was initially hoping for, I still think there are some interesting ideas here and things to explore. Let me know what you think.
Beta Was this translation helpful? Give feedback.
All reactions