part 3

Measurements that are already available

If you want to work with formant measurements that were created in a typical way from transcribed corpora that we possess, your measurements probably exist already. Here are some available repositories of corpus measurements.

SLAAP corpus formant measurements

We have run one_script on the corpora that can be measured on the phonetics lab computers, using the following query for each corpus:

praat /phon/scripts/one_script.praat /phon/ENG523/files/nc_files.csv 'VOWEL VL' 'formants(measurements=21)' 'l'

The output files can be found in /phon/one_script_formants/, including the following:

  • one_script_out_buckeye_formants.csv (252718 vowel tokens)
  • one_script_out_nc_formants.csv (213675 vowel tokens)
  • one_script_out_northtown_formants.csv (60168 vowel tokens)
  • one_script_out_ohio_formants.csv (138020 vowel tokens)
  • one_script_out_raleigh_formants.csv (1250726 vowel tokens)
  • one_script_out_exslave_formants.csv (7532 vowel tokens)

The main reason you would want to measure vowel formants yourself in these corpora is if you wanted to change measurement or output settings.

SPADE measurements

The SPeech Across Dialects of English (SPADE) project has used a one_script-like procedure to measure dozens of English speech corpora. Single-time point formant measurements, sibilant fricative spectral measurements, and duration measures have been extracted from all these corpora and csv files have been made publicly available on OSF. Speaker mean values can be viewed in the SPADE shiny app. For the corpora that were processed at NCSU, we also have 21-time point formant tracks and speaker demographic information stored in /phon/SPADE_measurements, including the following:

  • spade-Raleigh_formant_tracks_all.csv (414735 stressed vowel tokens)
  • spade-SLAAP-Ex-Slave_formant_tracks.csv (2641 stressed vowel tokens)
  • spade-SLAAP-NC-AA_formant_tracks.csv (57834 stressed vowel tokens)
  • spade-SLAAP-NC-misc_formant_tracks.csv (15487 stressed vowel tokens)
  • spade-SLAAP-NorthTown-Anglo_formant_tracks.csv (7636 stressed vowel tokens)
  • spade-SLAAP-NorthTown-Latinx_formant_tracks.csv (18723 stressed vowel tokens)
  • spade-SLAAP-Ohio_formant_tracks.csv (50823 stressed vowel tokens)

Using screen to run big queries

Some one_script queries take a long time to complete. Creating the one_script formant files listed above took 25 minutes for the smallest corpus and three days for the largest corpus. To keep them running after you log out, use the command screen to open up a new terminal session that persists after you log out:

screen

You will probably see some messages that you need to press Enter to clear. Then you can enter whatever commands you want to run on their own for a while. Then press Ctrl-A followed by Ctrl-D to disconnect from the session.

Later you can ssh to the same computer and resume your session like this:

screen -r

If you have multiple active sessions, you will see a list, and then you can try again specifying one of the numbers, like this:

screen -r 67910

If you want to clean up all the sessions you opened, you can do this:

screen -wipe

What to do with your one_script output

The file linking_one_script_output.r provides examples of R commands to link acoustic measurements to MFC coding of the same tokens and to link acoustic measurements to demographic information from the corpora.

Forced alignment on your own computer

Installing P2FA on your own computer is complicated and not recommended. MFA is easier to install and more powerful. There have been recent changes to how MFA is installed. Here are instructions for installing the currentversion of MFA.

Below are step-by-step instructions for installing older versions of MFA in case you need them or are unable to install the latest version.

Installing MFA 1.0.0 in Mac OS X

Go to this page to download MFA version 1.0.0. It’s the version before the latest stable version, but the last one that has a Mac OS executable (which means you don’t have to compile the source code to install it):

Click on “montreal-forced-aligner_macosx.zip” to download it.

Open it with the Archive Utility. I extracted it to my Documents folder, so the aligner ended up at:

/Users/rmdodswo/Documents/montreal-forced-aligner_macosx/montreal-forced-aligner

To make it easier to call the aligner, move the folder montreal-forced-aligner out of montreal-forced-aligner_macosx and rename it to mfa. After doing this, my aligner is at:

/Users/rmdodswo/Documents/mfa

Let’s check whether MFA appears to be working, without giving it anything to align yet:

Open a Terminal window. Run this command to test the aligner.

/Users/rmdodswo/Documents/mfa/bin/mfa_align

If it’s working, you should see a message that tells you what arguments are required to run the aligner. Depending on your OS X version (I think), it may say that it’s blocking the app you are trying to run because it is from an unknown developer. To fix this we need to allow the terminal to run in Developer Mode. To do this, enter this in your terminal:

spctl developer-mode enable-terminal

Then in the menus (at the top of the screen), go to Apple symbol > System Preferences > Security & Privacy > Privacy, then scroll down to Developer Tools, click Unlock, and then check Terminal in the window on the right. You may need to close and re-open your terminal to cause this change to take effect. Then repeat the mfa_align command above and see if you get the expected output now.

Installing MFA 1.0.1 in Windows 10

Note: This will eventually be replaced by instructions for installing MFA 2.0.0 in Windows

Visit the home page for MFA version 1.0.1.

Click on “Windows” under “Installation”, to see the installation instructions.

Scroll back up and click on “Montreal Forced Aligner releases”, then scroll down to “Version 1.0.1” and click on “montreal-forced-aligner_win64.zip” to download it.

Extract it somewhere on your computer. I extracted it to my Documents folder, so the aligner ended up at:

C:\Users\jimielke\Documents\montreal-forced-aligner_win64\montreal-forced-aligner

To make it easier to call the aligner, move the folder montreal-forced-aligner out of montreal-forced-aligner_win64 and rename it to mfa. After doing this, my aligner is at:

C:\Users\jimielke\Documents\mfa

Let’s check whether MFA appears to be working, without giving it anything to align yet:

Open the Windows command prompt, clicking on the “Type here to search” in the taskbar thing and typing cmd and pressing enter, opening a Command Prompt window. You can paste text into this window by right-clicking on it. Run this command to test the aligner.

C:\Users\jimielke\Documents\mfa\bin\mfa_align

If it’s working, you should see a message that tells you what arguments are required to run the aligner.

Downloading the other files

You should create a directory somewhere (these instructions will assume its in your Documents folder) and put all these files in it.

The SLAAP dictionary:
slaap_dict

one_script version 24 and associated files:
one_script.zip

Sound files and transcripts for the Lab 2 corpus:
lab2_wavs_and_transcripts.zip

Align the Lab 2 corpus in Mac OS

Go to your Terminal window and cd to your lab6 directory.

Run the aligner like this (adjusting paths for your computer):

/Users/rmdodswo/Documents/mfa/bin/mfa_align lab2_wavs_and_transcripts slaap_dict english lab2_corpus

It should say “There are words not found in the dictionary. Would you like to abort to fix them? (Y/N)”

You can enter Y, then dig around in Documents/MFA/lab2_wavs_and_transcripts and its subdirectories until you find a file called oovs_found.txt. Open it in a text editor to see what words are missing. Open slaap_dict in a text editor, and paste the words from oovs_found.txt at the bottom, and add their transcriptions. Refer to other words in the dictionary if you aren’t sure what phones to use. Save this as slaap_dict2 (your own expanded dictionary).

Run the aligner again using slaap_dict2. It should not complain about missing words this time:

/Users/rmdodswo/Documents/mfa/bin/mfa_align lab2_wavs_and_transcripts slaap_dict2 english lab2_corpus

Depending on your computer’s speed, it could finish the alignment in a few minutes, or it could take longer. After it completes, there should be an output directory called lab2_corpus, containing the aligned textgrids for the Lab 2 corpus. Copy or move lab2_files.csv and the seven wav files from lab2_wavs_and_transcripts into lab2_corpus.

Finally, open lab2_files.csv in Excel or another spreadsheet program, and change the file paths in the wav and textgrid columns to match their actual locations on your computer. For example, the first wav file no longer has the path /home/jeff/teaching/eng523tutorial/lab2_corpus/daulton.wav (where it was on my computer). Its path is now something like /Users/rmdodswo/Documents/lab2_corpus/daulton.wav.

Align the Lab 2 corpus in Windows

Go to your Command Prompt window and cd to your lab6 directory

Run the aligner like this (adjusting paths for your computer):
C:\Users\jimielke\Documents\mfa\bin\mfa_align lab2_wavs_and_transcripts slaap_dict english lab2_corpus

It should say “There are words not found in the dictionary. Would you like to abort to fix them? (Y/N)”

You can enter Y, then dig around in Documents\MFA\lab2_wavs_and_transcripts and its subdirectories until you find a file called oovs_found.txt. Open it in a text editor to see what words are missing. Open slaap_dict in a text editor, and paste the words from oovs_found.txt at the bottom, and add their transcriptions. Refer to other words in the dictionary if you aren’t sure what phones to use. Save this as slaap_dict2 (your own expanded dictionary).

Run the aligner again using slaap_dict2. It should not complain about missing words this time:

C:\Users\jimielke\Documents\mfa\bin\mfa_align lab2_wavs_and_transcripts slaap_dict2 english lab2_corpus

Depending on your computer’s speed, it could finish the alignment in a few minutes, or it could take longer. After it completes, there should be an output directory called lab2_corpus, containing the aligned textgrids for the Lab 2 corpus. Copy or move the seven wav files and the one csv file from lab2_wavs_and_transcripts into lab2_corpus.

Finally, open lab2_files.csv in Excel or another spreadsheet program, and change the file paths in the wav and textgrid columns to match their actual locations on your computer. For example, the first wav file no longer has the path /phon/ENG536/files/lab2_corpus/daulton_lab2.wav (where it was on the lab computer). Its path is now something like C:\Users\jimielke\Documents\lab2_corpus\daulton_lab2.wav.

Diarization and transcription on your own computer

To use diarization and transcription tools on another computer you will need to get vosk, pyannote, and the tools they depend on.

To use transcribe_wav.py, you will need to install python and vosk, and download transcribe_wav.py and alignbrary3.py and put them in the same folder. Download the language model vosk-model-en-us-0.22 from the vosk models page.

Install SoX (Sound eXchange), which is used for preprocessing sound files.

When running the script, you will need to tell it where you put the model, something like this:

python transcribe_wav.py --input yoursoundfile.wav --model path_to_model_on_your_computer

You can modify line 28 of your local copy of transcribe_wav.py to make your model path the default.

To diarize and transcribe on your own computer, install vosk like this (you only need to do this once on each lab computer where you want to use vosk for transcription):

pip install vosk

Install SoX as above, and install anaconda. Then follow the instructions above to install pyannote in anaconda. Download dt.py and alignbrary3.py and put them in the same folder.pyannote is made by an organization called HuggingFace. As of late 2022, it is necessary to register on their website in order to use a model required by pyannote. The first time you try using pyannote on another computer without registering, you will probably get a cryptic error message. You can search for that error message on the internet and get information about registration.

When running the diarization and transcription script you will need to specify the path to the language model (or edit your copy of the script to make it the default:

python dt.py --input yoursoundfile.wav --model path_to_model_on_your_computer

one_script on your own computer

If you use linux, running one_script on your computers should be the same as running it on a phonetics lab computer. If you use Mac OS or Windows, it will be a little different.

Measure formants the Lab 2 corpus with Praat and one_script in Mac OS

We should be able to run one_script in the Terminal in Mac OS just like we do in Linux. This did not work on the computer I tested these commands on, but this probably varies depending on things like how Praat was installed. If this doesn’t work, you can always just open Praat and run one_script like a regular Praat script.

Try the following to see if it will run in the Terminal:

/Applications/Praat.app/Contents/MacOS/Praat '/Users/rmdodswo/Documents/one_script/one_script.praat' '/Users/rmdodswo/Documents/lab2_corpus/lab2_files.csv' 'VOWEL VL' 'formants()' 'l'
/Users/rmdodswo/Desktop/Praat '/Users/rmdodswo/Documents/one_script/one_script.praat' '/Users/rmdodswo/Documents/lab2_corpus/lab2_files.csv' 'VOWEL VL' 'formants()' 'l'
Praat '/Users/rmdodswo/Documents/one_script/one_script.praat' '/Users/rmdodswo/Documents/lab2_corpus/lab2_files.csv' 'VOWEL VL' 'formants()' 'l'

For more information you can visit https://www.fon.hum.uva.nl/praat/manual/Scripting_6_9__Calling_from_the_command_line.html

If one of these worked, note which one. The one_script_out file should appear in your current working directory (shown in the command prompt).

If you aren’t able to run one_script in the terminal, run it from within the Praat GUI. Run Praat, then open one_script.praat as a Praat script and run it. Fill out the “Run script: Choices” form that pops up like this (without surrounding single or double quotes):

file list: /Users/rmdodswo/Documents/lab2_corpus/lab2_files.csv
phon string: VOWEL VL
operations: formants()
options: l

It might run more slowly than at the command line, but otherwise the result should be the same.

Measure formants the Lab 2 corpus with Praat and one_script in Windows

For Windows, we apparently need to copy lab2_files.csv into the one_script directory to help the script find it. This may not be absolutely necessary but I haven’t figured out how to get the script to open it somewhere else. Maybe you will figure it out.

Running one_script is like on the lab computers, but the executable is called Praat.exe instead of praat, and the paths look different, and you need to put double quotes around the one_script arguments (instead of the single quotes you used in Linux):

C:\Users\jimielke\Desktop\Praat.exe C:\Users\jimielke\Documents\one_script\one_script.praat “lab2_files.csv” “VOWEL VL” “formants()” “l”

The one_script_out file should appear in your current working directory (shown in the command prompt).

If you aren’t able to run one_script in the terminal, run it from within the Praat GUI. Run Praat, then open one_script.praat as a Praat script and run it. Fill out the “Run script: Choices” form that pops up like this (without surrounding single or double quotes):

file list: lab2_files.csv
phon string: VOWEL VL
operations: formants()
options: l

It might run more slowly than at the command line, but otherwise the result should be the same.

Running one_script interactively

Another way to run one_script (in any operating system) is to run Praat normally and open one_script.praat as a Praat script and run it. You can enter the usual arguments in the form that appears, and then you can see the objects appear and disappear from the object list. It will probably be slower than running at the command line, because it has to draw all the windows, but this can help you find problems that are hard to find at the command line. If you open a wav file and textgrid and select them in the object list, and leave the file list argument unchanged, one_script will process the files you selected pause at each token to let you see what it is measuring.