Forced alignment

Use the Penn Phonetics Lab Forced Aligner (P2FA) to replicate part of the vowel plot assignment

Compared to present-day alternatives, P2FA is a simple aligner. It expects one wav file and one text file in English, and it will write the result to a textgrid file whose name you specify. To align the test files that came with P2FA, do this:

python2 /phon/p2fa/align.py ../files/BREY00538.wav ../files/BREY00538.txt BREY00536.TextGrid

Copy the wav file for my vowel plot recording to your directory and align it too. You can align any arbitrary English wav and text file in this way.

cp ../files/jeffmielke2017.wav ./
python2 /phon/p2fa/align.py jeffmielke2017.wav ../files/OHDARE2.txt jeffmielke2017.TextGrid

The aligner will warn you that it doesn’t recognize one of the words. So you can copy a dictionary supplement.

cp ../files/vowelplot.dict ./dict.local

Then align again as above (you don’t need to copy the wav file again). Try using up arrows before retyping the whole python2 command.

Prepare the Montreal Forced Aligner

Compared to P2FA, the Montreal Forced Aligner has more features. It supports several languages with pretrained models and it allows you to train new models for any language. It is also easier to install on your own computer.

Do these commands the first time you want to use MFA on a *particular* lab computer. First install the aligner (say Y when prompted):

conda create -n aligner -c conda-forge montreal-forced-aligner

Set up the aligner for aligning English recordings (downloading acoustic models of English sounds, an English pronunciation dictionary, and loading our own English dictionary that includes additional words from our interviews):

conda activate aligner
mfa model download acoustic english_us_arpa
mfa model download dictionary english_us_arpa
mfa model save --name ral_mfa dictionary /phon/ENG523/files/ral_mfa_dict

Set up the aligner for Spanish:

conda activate aligner
mfa model download acoustic spanish_mfa 
mfa model download dictionary spanish_mfa

To see what other languages are supported, see the list of pretrained acoustic models and dictionaries.

This is how you exit conda:

conda deactivate

Remember that if you connect to a different phonetics lab computer in the future you will need to repeat these steps the first time you use MFA on it.

Use the Montreal Forced Aligner

P2FA expects one wav files and one txt file as input. MFA expects a directory with one or more wav files, each with a matching txt file (with all the words in it) or TextGrid file (which can have different tiers for different speakers and different intervals for different speaker turns). If you have interview1.wav and interview2.wav, you should also have either interview1.TextGrid and interview2.TextGrid or interview1.txt and interview2.txt. txt files are appropriate for short recordings of one speaker, and TextGrid files are better for longer recordings and necessary when you want to distinguish the speech of multiple speakers. TextGrid tier names will be interpreted as speaker labels. File names are just used to match up sound files with text files.

Start your aligner session:

conda activate aligner

Align Jeff’s vowelplot recording (using files in /phon/ENG523/files/jeff_vowelplot).

mfa align ../files/jeff_vowelplot ral_mfa english_us_arpa jeff_vowelplot_output

After it completes and says “Done! Everything took ??? seconds”, you may see the error message “psycopg2.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.” This is not a problem as long as it happens after your files get created. Check whether your textgrid(s) have been created like this:

ls -l jeff_vowelplot_output

If you see alignment_analysis.csv and jeff.TextGrid, your alignment was successful.

Here is how to align another English recording (using files in /phon/ENG523/files/english_data):

mfa align --clean ../files/english_data ral_mfa english_us_arpa english_data_output

Align a sample Spanish speech recording (using files in /phon/ENG523/files/spanish_data):

mfa align --clean ../files/spanish_data spanish_mfa spanish_mfa spanish_data_output

 

Forced alignment on your own computer

Installing P2FA on your own computer is complicated and not recommended. MFA is easier to install and more powerful. There have been recent changes to how MFA is installed. Here are instructions for installing the currentversion of MFA.

Below are step-by-step instructions for installing older versions of MFA in case you need them or are unable to install the latest version.

Installing MFA 1.0.0 in Mac OS X

Go to this page to download MFA version 1.0.0. It’s the version before the latest stable version, but the last one that has a Mac OS executable (which means you don’t have to compile the source code to install it):

Click on “montreal-forced-aligner_macosx.zip” to download it.

Open it with the Archive Utility. I extracted it to my Documents folder, so the aligner ended up at:

/Users/rmdodswo/Documents/montreal-forced-aligner_macosx/montreal-forced-aligner

To make it easier to call the aligner, move the folder montreal-forced-aligner out of montreal-forced-aligner_macosx and rename it to mfa. After doing this, my aligner is at:

/Users/rmdodswo/Documents/mfa

Let’s check whether MFA appears to be working, without giving it anything to align yet:

Open a Terminal window. Run this command to test the aligner.

/Users/rmdodswo/Documents/mfa/bin/mfa_align

If it’s working, you should see a message that tells you what arguments are required to run the aligner. Depending on your OS X version (I think), it may say that it’s blocking the app you are trying to run because it is from an unknown developer. To fix this we need to allow the terminal to run in Developer Mode. To do this, enter this in your terminal:

spctl developer-mode enable-terminal

Then in the menus (at the top of the screen), go to Apple symbol > System Preferences > Security & Privacy > Privacy, then scroll down to Developer Tools, click Unlock, and then check Terminal in the window on the right. You may need to close and re-open your terminal to cause this change to take effect. Then repeat the mfa_align command above and see if you get the expected output now.

Installing MFA 1.0.1 in Windows 10

Note: This will eventually be replaced by instructions for installing MFA 2.0.0 in Windows

Visit the home page for MFA version 1.0.1.

Click on “Windows” under “Installation”, to see the installation instructions.

Scroll back up and click on “Montreal Forced Aligner releases”, then scroll down to “Version 1.0.1” and click on “montreal-forced-aligner_win64.zip” to download it.

Extract it somewhere on your computer. I extracted it to my Documents folder, so the aligner ended up at:

C:\Users\jimielke\Documents\montreal-forced-aligner_win64\montreal-forced-aligner

To make it easier to call the aligner, move the folder montreal-forced-aligner out of montreal-forced-aligner_win64 and rename it to mfa. After doing this, my aligner is at:

C:\Users\jimielke\Documents\mfa

Let’s check whether MFA appears to be working, without giving it anything to align yet:

Open the Windows command prompt, clicking on the “Type here to search” in the taskbar thing and typing cmd and pressing enter, opening a Command Prompt window. You can paste text into this window by right-clicking on it. Run this command to test the aligner.

C:\Users\jimielke\Documents\mfa\bin\mfa_align

If it’s working, you should see a message that tells you what arguments are required to run the aligner.

Downloading the other files

You should create a directory somewhere (these instructions will assume its in your Documents folder) and put all these files in it.

The SLAAP dictionary:
slaap_dict

one_script version 24 and associated files:
one_script.zip

Sound files and transcripts for the Lab 2 corpus:
lab2_wavs_and_transcripts.zip

Align the Lab 2 corpus in Mac OS

Go to your Terminal window and cd to your lab6 directory.

Run the aligner like this (adjusting paths for your computer):

/Users/rmdodswo/Documents/mfa/bin/mfa_align lab2_wavs_and_transcripts slaap_dict english lab2_corpus

It should say “There are words not found in the dictionary. Would you like to abort to fix them? (Y/N)”

You can enter Y, then dig around in Documents/MFA/lab2_wavs_and_transcripts and its subdirectories until you find a file called oovs_found.txt. Open it in a text editor to see what words are missing. Open slaap_dict in a text editor, and paste the words from oovs_found.txt at the bottom, and add their transcriptions. Refer to other words in the dictionary if you aren’t sure what phones to use. Save this as slaap_dict2 (your own expanded dictionary).

Run the aligner again using slaap_dict2. It should not complain about missing words this time:

/Users/rmdodswo/Documents/mfa/bin/mfa_align lab2_wavs_and_transcripts slaap_dict2 english lab2_corpus

Depending on your computer’s speed, it could finish the alignment in a few minutes, or it could take longer. After it completes, there should be an output directory called lab2_corpus, containing the aligned textgrids for the Lab 2 corpus. Copy or move lab2_files.csv and the seven wav files from lab2_wavs_and_transcripts into lab2_corpus.

Finally, open lab2_files.csv in Excel or another spreadsheet program, and change the file paths in the wav and textgrid columns to match their actual locations on your computer. For example, the first wav file no longer has the path /home/jeff/teaching/eng523tutorial/lab2_corpus/daulton.wav (where it was on my computer). Its path is now something like /Users/rmdodswo/Documents/lab2_corpus/daulton.wav.

Align the Lab 2 corpus in Windows

Go to your Command Prompt window and cd to your lab6 directory

Run the aligner like this (adjusting paths for your computer):
C:\Users\jimielke\Documents\mfa\bin\mfa_align lab2_wavs_and_transcripts slaap_dict english lab2_corpus

It should say “There are words not found in the dictionary. Would you like to abort to fix them? (Y/N)”

You can enter Y, then dig around in Documents\MFA\lab2_wavs_and_transcripts and its subdirectories until you find a file called oovs_found.txt. Open it in a text editor to see what words are missing. Open slaap_dict in a text editor, and paste the words from oovs_found.txt at the bottom, and add their transcriptions. Refer to other words in the dictionary if you aren’t sure what phones to use. Save this as slaap_dict2 (your own expanded dictionary).

Run the aligner again using slaap_dict2. It should not complain about missing words this time:

C:\Users\jimielke\Documents\mfa\bin\mfa_align lab2_wavs_and_transcripts slaap_dict2 english lab2_corpus

Depending on your computer’s speed, it could finish the alignment in a few minutes, or it could take longer. After it completes, there should be an output directory called lab2_corpus, containing the aligned textgrids for the Lab 2 corpus. Copy or move the seven wav files and the one csv file from lab2_wavs_and_transcripts into lab2_corpus.

Finally, open lab2_files.csv in Excel or another spreadsheet program, and change the file paths in the wav and textgrid columns to match their actual locations on your computer. For example, the first wav file no longer has the path /phon/ENG536/files/lab2_corpus/daulton_lab2.wav (where it was on the lab computer). Its path is now something like C:\Users\jimielke\Documents\lab2_corpus\daulton_lab2.wav.

Older Forced Alignment Instructions (mostly superseded by what’s above)

There are a few options for automatically aligning a transcript to your sound file. For each of them, you will need a wav file and a text file containing the transcript (in words). Most forced alignment systems are based on the HTK Speech Recognition Toolkit. HTK stands for Hidden Markov Model Toolkit. It was developed at Cambridge in the late 1980s. P2FA (the Penn Phonetics Lab Forced Aligner) is now a popular python-based interface to HTK. It uses HTK, the CMU Pronouncing Dictionary, and a set of acoustic models derived from a corpus of recordings of SCOTUS, the U.S. Supreme Court.

a web interface for the PENN PHONETICS LAB FORCED ALIGNER

If you don’t need to change anything about the alignment process (such as adding words to the dictionary or doing batches of files), your easiest option may be to use a local installation of P2FA. Where it says “text file (optional)” upload your transcript. Where it says “wav file”, upload your wav file. Then enter your e-mail address and click “Submit”. When the program is done (usually a few minutes), you will receive a TextGrid file at the e-mail address you entered. Currently the file limit for this public installation is ~ 20MB: around 16 minutes 16-bit audio sampled at 11025 Hz or 4 minutes sampled at 44100 Hz. All uploaded files are downsampled to 11025 Hz for processing, so consider downsampling before uploading to maximize file duration.

RUNNING P2FA ON THE PHONetics LAB computer

P2FA is installed on our server, so you can use it by logging in to a lab computer instead of using the web interface. Running P2FA this way will allow you to add custom dictionary entries or do multiple batches of alignments. Run the aligner by issuing a command like this:

python /phon/p2fa/align.py yourwavfile.wav yourtranscript.txt youroutput.TextGrid

The aligner may work for a long time, and when it’s done, the prompt will reappear and there should be a TextGrid that wasn’t there before. It is also worth remembering that it takes much longer or just doesn’t work if any of the files you are aligning are open.

If you think an alignment may take longer than you will be at your computer, you can can use the command screen. Another way to handle this is to tell the server to keep processing the files even if you log off the server. Any completed text grids will be available in your directory the next time you log on, just type your command like this instead:

nohup python /phon/p2fa/align.py yourwavfile.wav yourtranscript.txt youroutput.TextGrid

If you want to align part of a wav file (for example, only the second minute of your recording), you can type enter the start time and end time in seconds like this:

python /phon/p2fa/align.py -s 60 -e 120 yourwavfile.wav yourtranscript.txt youroutput.TextGrid

To see instructions, you can simply type this:

python /phon/p2fa/align.py

ADDING WORDS TO THE DICTIONARY

A major advantage of using the aligner directly (instead of through the web interface) is being able to modify the dictionary. You can see the 100 lines of the CMU dictionary that P2FA uses by typing this:

head -100 /phon/p2fa/model/dict

You can create a file in your home directory (or wherever you align files) called dict.local. Whenever you run the aligner the contents of this file will be appended to the regular dictionary. You can create it on your own computer and then upload it, or you can create it directly on the server like this:

vim dict.local

Follow the formatting of the original dict file. The aligner is very picky about formatting. Entries must be in all capitals, and each word must be followed by two spaces and then all of its phonetic symbols separated by one space. The phonetic symbols are as follows:

English consonants: [p]=P, [t]=T, [k]=K, [b]=B, [d]=D, [ɡ]=G, [tʃ]=CH, [dʒ]=JH, [f]=F, [θ]=TH, [s]=S, [ʃ]=SH, [h]=HH, [v]=V, [d]=DH, [z]=Z, [ʒ]=ZH, [m]=M, [n]=N, [ŋ]=NG, [l]=L, [w]=W, [ɹ]=R, [j]=Y

English primary stressed vowels: [i]=IY1, [u]=UW1, [ɪ]=IH1, [ʊ]=UH1, [e]=EY1, [o]=OW1, [ʌ]=AH1, [ɛ]=EH1, [ɚ]=ER1, [ɔ]=AO1, [æ]=AE1, [ɑ]=AA1, [aɪ]=AY1, [aʊ]=AW1, [ɔɪ]=OY1

English secondary stressed vowels: [i]=IY2, etc.

English unstressed vowels: [i]=IY0, [ə]=AH0, etc.

The dictionary also contains the following noises, which you can put in your transcript ad needed (the curly brackets are important): cough={CG}, laugh={LG}, lip smack={LS}, noise={NS}, silence={SL}

Troubleshooting

If you made your audio recording in Ultraspeech, there is a good chance that you have a 32-bit wave file, and when you run align.py or align_interview_turns.py, you will get an error ending in wave.Error: unknown format: 65534.

It’s easy to make a 16-bit version of your wav file:

sox yourwavefile.wav -b 16 yourwavefile2.wav

Then run the aligner using yourwavefile2.wav instead of yourwavefile.wav. You can continue to analyze your audio using either file.

ALIGNING INTERVIEWS USING p2FA

The basic align.py script for P2FA creates a textgrid with one words tier and one phone tier, so it doesn’t distinguish between different speakers. To process interviews, we made a python script that reads turns from a textgrid, aligns each turn separately, and makes a textgrid file with separate tiers for each speaker.

python /phon/p2fa/align_interview_turns.py yourwavfile.wav yourtranscript.TextGrid youroutput.TextGrid

ALIGNING Spanish RECORDINGS

In addition to P2FA for English, we have a FASE, a Spanish forced aligner created by Eric Wilbanks (formerly of NCSU). Running FASE is similar to running P2FA (but note the -w, -t, and -o tmp. You can use a text file (*.txt) with speaker tags in curly brackets or a textgrid with different speakers’ turns segmented on different tiers.

python /phon/fase/fase_align.py -w yourwavfile.wav -t yourtranscript.txt -o tmp
python /phon/fase/fase_align.py -w yourwavfile.wav -t yourtranscript.TextGrid -o tmp

FASE must create a temporary directory (called “tmp” in this example) which it deletes when it’s finished. FASE assumes the directory does not already exist, so if it does, you need to delete it first:

rm -rf tmp

This is a very powerful command (to delete a directory and its contents without asking you to confirm individual files), so use it very carefully.

Troubleshooting

There is a good chance that FASE will tell you that your transcript has words that are missing from the dictionary. To see the list of missing words, do this:

cat tmp/missing_words

Then make a text file with one line for each missing word, with its phonetic transcription. If the content of missing_words was ANANÁS, make a text file with one line:

ANANÁS a n a n a s

If you save your dictionry text file as dict.local, you will now want to delete tmp (as above) and then run the aligner with the -m option pointing to your dictionary, e.g.:

python /phon/fase/fase_align.py -w yourwavfile.wav -t yourtranscript.TextGrid -o tmp -m dict.local

If you made your audio recording in Ultraspeech, there is a good chance that you have a 32-bit wave file, and when you run fase_align, you will get an error ending in wave.Error: unknown format: 65534.

It’s easy to make a 16-bit version of your wav file:

sox yourwavefile.wav -b 16 yourwavefile2.wav

Then run the aligner using yourwavefile2.wav instead of yourwavefile.wav. You can continue to analyze your audio using either file.

ALIGNING FRENCH RECORDINGS

For French, we have SPLaligner, a forced aligner created by Peter Milne of the University of Ottawa. Using SPLaligner is very similar to using P2FA (but the dictionary uses IPA symbols):

python /phon/SPLaligner/align.py yourwavfile.wav yourtranscript.txt youroutput.TextGrid yourdictionary

ALIGNING OR REALIGNING IN PRAAT

We have a Praat script that is useful for touching up existing textgrids: /phon/scripts/editor_align.praat. It requires P2FA or another aligner to be installed on the computer where you are using it. Therefore you can use it on the lab computers that run Linux. The script is an editor script, so it needs to be opened from inside Praat’s Editor window.