teawd's blog

10.04.2022

Speech to text diary scripts

First of all, this blog post is about a somewhat specific problem. Despite this, I think it is quite useful to most readers looking for some Linux knowledge as the tools I used are very general. Let's start with the explanation of

The problem

Sometimes I have good ideas when I'm outside. Sometimes I really wanna preserve these ideas, but it's very inconvenient to use phone keyboards, especially on the move. One solution is recording my voice. But then, I would want to transfer those recordings to my computer for future reference. But what if I also want to read the text version of those voice recordings? Let's handle these problems one at a time.

Syncing files

Syncthing

Syncthing is an open source cross-platform program that can sync folders accross multiple devices. It's perfect for this problem, as it allows for P2P syncing (for example from a phone to a computer).
It's very easy to set up aswell, you download it on both devices, open a browser (Synthing has a web frontend) tab on 127.0.0.1:8384 and scan the ID QR with your phones camera. Then share whatever is the folder that has your audio recordings.

If your init system is systemd, to start synthing (and enable autostart at boot) you can run (where USER is your linux user): systemctl enable syncthing@USER.service --now Synthing has configuration options for encryption and file filtering. If you want more robust syncing, you could also host a Syncthing instance on a server.

Speech to text

To convert my speech to text I use the SpeechRecognition python package: pip install SpeechRecognition It supports many speech recognition options, including offline ones. It's also really easy to use, check my script.

Options (present in my script as comments):

Pocketsphinx (low-accuracy offline option): pip install pocketsphinx
Google translate (has a limit on length)
Google cloud (free, requires registration)

There are also other online and offline options. (SpeechRecognition documentation)

Storing recorings in notes

Vimwiki

I use vimwiki vimwiki (a (neo)vim plugin) that makes managing markdown files in a wiki-like manner extremely easy. Vimwiki also has a diary option. To store the recording path and text in my notes, I wrote a few scripts.

The speech-to-diary.sh script calls the speech recognition script on a selected file, appends transcription to a vimwiki diary and (optionally) moves the voice recording to the vimwiki folder. it has a bunch of flags, you can get a help message by running: ./speech-to-diary.sh -h

The text that it puts looks like this: 20220328_162342.m4a 16:24:05 Whatever speech recognition returned /home/tea/vimwiki/diary/resources/20220328_162342.wav I have a vim shortcut that plays an audio file under the cursor so that I can also listen to it if the transcription is not accurate enough.

If you don't use vim or vimwiki, you can just change the last line of the script to save the text to a file of your choosing.

Tying it all together

Actual Linux knowledge

Now, we need a way to react to new files appearing in the syncing directory. For that, we can use the linux inotify interface, that waits for changes in files. For syncthing, the following command works: inotifywait -c -r -m -e attrib $PATH_TO_WATCH You can also see it used in a script. Be sure to read the readme on the repo for some additional information on setting up a systemd service (if you don't know how).

Now, if you autostart syncthing and the watchscript at system boot, it will process all new audio files.

GH Repo SpeechRec Docs

← newer post older post →