Even more unnotable note: this is a really rough getting started guide to Kaldi that I also use to refresh myself on Kaldi things. I’ll probably write another document later on going into more detail about the structures and whatnot of Kaldi.
The following downloading and compiling section works best on MacOS and Linux. On Windows, using WSL 2.0 may work (I have not personally tried it extensively) but line endings have to be fixed with a tool like dos2unix. Docker should also work on Windows machines.
Downloading and compiling
Before starting, the system should have some C compiler installed (usually gcc) and some other required libraries. If any related errors occur, just install the required packages with the appropriate package manager, e.g. apt on Debian and Ubuntu, pacman for Arch etc.
From the command line, inside your working repository, download the github repository with:
Enter the directory created (usually named kaldi or whatever name was entered when pulling from Github):
Reading the INSTALL file gives the overall instructions:
With this, move into the tools directory and read the INSTALL file there:
First, run check_dependencies.sh and install any of the missing tools required (as mentioned above). Then, run make or if you want to compile using multiple threads make -j <num-threads>, where <num-threads> can be found using the nproc command:
Just in case, install IRSTLM as well for when needed to make own language models, but you can always come back to this later:
The next part of the first INSTALL file is to enter the src directory and work from there:
Pretty straightforward from here on by following the instructions here, if the tools compiled successfully. First, run the ./configure script:
Then, make depend, while also using the appropriate number of threads, as above:
Then, make:
Important note for MacOS:
You may run into a few errors concerning commandline tools and arguments with certain scripts in Kaldi, such as awk, find and sed. This is because MacOS uses the BSD versions of these tools, as opposed to the GNU versions on Linux.
You can check this with man, to see the manual of the tool required (e.g. man awk) and it may say BSD on the first few lines.
Fix this by installing the GNU version of the tool in the appropriate package, either from Homebrew or alternative and change the path in your shell config script to use the new GNU version.
An important one is coreutils, but a few others may be needed as well:
These tools can be run with the g prefix in front of their names, such as gawk instead of awk.
Otherwise, to change their path, check their brew description
Testing out Kaldi
To see if Kaldi works, try out the yesno model, which classifies if the person if saying a “yes” or a “no”.
First, enter the examples directory and into the yesno one:
Run the ./run.sh file and it should run without any errors:
Some important files and directories in Kaldi
The Kaldi directory has a few important sub-directories.
The tools directory is where the important things Kaldi depends on can be installed, such as OpenFst.
The src directory is where the various C and C++ source code is held. These are called from shell scripts in the recipes that come with Kaldi.
The egs directory is where the example recipes are. These, like the yesno example, can be run (mostly) directly with minimal tweaking for out-of-the-box ASR systems.
Before scripts in egs can be directly run however, you have to prepare data to run the recipes on. These files have to be put into a data directory where the run.sh script and others can reference.
Looking at the yesno example, before running ./run.sh:
After ./run.sh, an extra data directory is created
Some files of note are wav.scp, segments and utt2spk.
wav.scp links the actual audio file to a “recording ID”, which can be referenced later on in other files. Its format is as:
segments indicates the segments in the audio file where speech occurs. It links an “utterance ID” to the correct start and end time (in seconds) in the audio file:
utt2spk links the utterance to the speaker, which may also be the recording ID depending on the naming scheme:
This means in some cases, such as in diarization, the utt2spk file can be created with the shell command:
Up until last December, I worked on a game called SkyLei.
The game was written based on a custom game engine in C++ and OpenGL, where I developed a most of t...
This month, I have now had time to get back into working on completing Civet’s feature set and I’m happy to say that it’s pretty much at a place I’m satisfie...
It’s been a bit since the last update and I have been working almost exclusively on the Civet rendering engine, mostly on the rasterizer side.
Civet is now i...
I didn’t manage to land a meaningful internship this summer of 2022
so I decided to work on a large project that I was always interested in,
alongside the co...