LIUM_SpkDiarization
LIUM_SpkDiarization

LIUM_SpkDiarization is a software package dedicated to speaker diarization (ie speaker segmentation and clustering). This software is similar to mClust but it is written in Java and integrated the most recent developments in the domain.

LIUM_SpkDiarization comprises a full set of tools to create a complete system for speaker diarization, going from the audio signal to the speaker clustering based on the NCLR metric. These tools integrate computation of MFCC, speech / non-speech detection, and speaker diarization.
This toolkit was developed for the French ESTER2 evaluation campaign, where it obtained the best result for the task of speaker diarization of broadcast news.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Source and Documentation

LIUM_SpkDiarization is written in Java using the java 1.6 extension. It requires a Sun JRE, and won't work with the GNU Compiler for Java (gcj).

In order to compile LIUM_SpkDiarization, the following 3rd party packages are needed:

  • gnu-getopt package to manage command lines with long options.
  • lapack package to manage matrices (blas, f2util and lapack).
  • Sphinx4 and jsapi packages to compute MFCCs.

For convenience, those packages are included in the precompiled version available on this site.

No documentation is provided other than this page, because of lack of time, sorry. Some javadoc comments are available in the source code, though.

LIUM segmentation format

The format for segmentation files is close to the MDTM or STM NIST format. Each line corresponds to a segment.
Example: "19981217_0700_0800_inter_fm_dga 1 1 317 U U U spk0"

  • field 1: "19981217_0700_0800_inter_fm_dga" = the show name
  • field 2: "1" the channel number
  • field 3: "1" the start of the segment (in features)
  • field 4: "317" the length of the segment (in features)
  • field 5: "U" the speaker gender
  • field 6: "U" the type of band (telephone, studio)
  • field 7: "U" the type of environment (music, speech only, ...)
  • field 8: "spk0" the speaker label

Quick Start

Suppose we need to compute the diarization "./showName.seg" of the audio file "./showName.wav". The command line to accomplish this would be:

java -Xmx2024m -jar ./LIUM_SpkDiarization.v2.1.jar  
--fInputMask=./showName.wav --sOutputMask=./showName.seg --doCEClustering showName

corresponding to:

  • java the name of the java virtual machine.
  • option -Xmx2048m sets the memory of the JVM to 2048MB, which is appropriate to treat a one-hour show.
  • option -jar ./LIUM_SpkDiarization.v2.1.jar specifies the jar to use.
  • option --fInputMask=./showName.wav is the name of the audio file. It can be in Sphere format or Wave format, the type is auto detected according the extension.
  • option --sOutputMask=/showName.seg is the output file containing the segmentation.
  • if the option --doCEClustering is set, the program computes the NCLR/CE clustering at the end. The diarization error rate is minimized. If this option is not set, the program stops right after the detection of the gender and the resulting segmentation is sufficient for a transcription system.
  • showName is the name of the show.

The other possible options are:

  • --trace to display information during processing.
  • --help to display a brief usage guide of the tools.
  • --system=current selects the diarization system (currently unused).
  • --saveAllStep save every step of the diarization.
  • --loadInputSegmentation loads the initial segmentation (UEM) from the file specified by the option --sInputMask. By default, the initial segmentation is composed of one segment ranging from the start to the end of the show.

Advanced Use

Suppose we want to compute the diarization "./showName.seg" for the audio file "./showName.sph".

First we need to compute the MFCC. Any tool can be used for this, as long as the resulting file is in Sphinx format. An example is given below using Sphinx 3 (which is easier to use for this task than Sphinx 4). The command line is:

feat_sphinx.sh ./showName.sph ./showName.mfcc ./showName.uem.seg

The script feat_sphinx.sh consists of:

#!/bin/bash

sph=$1                                                                                                                          
mfcc=$2
uem=$3

show=`basename $sph .sph`

echo "sphinx: $sph --> ($mfcc, $uem)"

# sphinxBase
sphinx_fe -nist yes -i $sph -o $mfcc 2> /dev/null

#or with the old version
#wave2feat -nist -i $sph -o $mfcc 2> /dev/null

#get the header in a temporary file
sphinx_cepview -d 0 -e 1 -header 1 -f $mfcc 2> tmp_$$.txt

#get the number of computed MFCC vectors
nbf=`cat tmp_$$.txt | grep frames | awk '{print $4;}'`

#make a uem composed of one segment starting at feature 0 with $nbf features                                      
echo "$show 1 0 $nbf U U U 1" > $uem

#remove the temporary file
rm -f tmp_$$.txt

Now we have the MFCC and initial segmentation files: ./showName.mfcc and ./showName.uem.seg.

Next, we compute the diarization of ./showName.mfcc. The command line is:

./diarization.sh  ./showName.mfcc ./showName.uem.seg

The script ./diarization.sh consists of:

#!/bin/bash

PATH=$PATH:..:.

#the MFCC file
features="$1"

#the MFCC corresponds to sphinx 12 MFCC + Energy
# sphinx=the mfcc was computed by the sphinx tools
# 1: static coefficients are present in the file
# 1: energy coefficient is present in the file
# 0: delta coefficients are not present in the file
# 0: delta energy coefficient is not present in the file
# 0: delta delta coefficients are not present in the file
# 0: delta delta energy coefficient is not present in the file
# 13: total size of a feature vector in the mfcc file
# 0:0:0: no feature normalization  
fDesc="sphinx,1:1:0:0:0:0,13,0:0:0"

#this variable is use in CLR/NCLR clustering and gender detection
#the MFCC corresponds to sphinx 12 MFCC + E
# sphinx=the mfcc is computed by sphinx tools
# 1: static coefficients are present in the file
# 3: energy coefficient is present in the file but will not be used
# 2: delta coefficients are not present in the file and will be computed on the fly
# 0: delta energy coefficient is not present in the file
# 0: delta delta coefficients are not present in the file
# 0: delta delta energy coefficient is not present in the file
# 13: size of a feature vector in the mfcc file
# 1:1:300:4: the MFCC are wrapped (feature warping using a sliding windows of 300 features), 
#                   next the features are centered and reduced: mean and variance are computed by segment  
fDescCLR="sphinx,1:3:2:0:0:0,13,1:1:300:4"

#extract the name of the show  
show=`basename $features .mfcc`

#get the initial segmentation file
uem="$2"

#set the java virtual machine program
java=/usr/bin/java

#define the directory where the results will be saved
datadir=${show}

#define where the UBM GMM is
ubm=./model/ubm.gmm


#define where the speech / non-speech set of GMMs is
pmsgmm=./model/pms.gmms

#define where the silence set of GMMs is
sgmm=./model/s.gmms

#define where the gender and bandwidth set of GMMs (4 models) is
#(female studio, male studio, female telephone, male telephone) 
ggmm=./model/gender.gmm


echo "#####################################################"
echo "#   $show"
echo "#####################################################"

# Create the working directory
mkdir ./$datadir >& /dev/null

# Check the validity of the MFCC
 $java -Xmx1024m -jar ./LIUM_SpkDiarization.jar fr.lium.sphinx_clust.programs.MSegInit --trace --help 
--fInMask=$features --fDesc=$fDesc --sInMask=$uem --sOutMask=./$datadir/%s.i.seg $show # Speech / non-speech segmentation using a set of GMMs iseg=./$datadir/$show.i.seg pmsseg=./$datadir/$show.pms.seg $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MDecode --trace --help
--fDesc=sphinx,1:3:2:0:0:0,13,0:0:0 --fInMask=$features --sInMask=$iseg
--sOutMask=$pmsseg --dPenality=10,10,50 --tInMask=$pmsgmm $show # GLR-based segmentation, make small segments $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MSeg --trace --help
--kind=FULL --sMethod=GLR --fInMask=$features --fDesc=$fDesc --sInMask=./$datadir/%s.i.seg
--sOutMask=./$datadir/%s.s.seg $show # Linear clustering, fuse consecutive segments of the same speaker from the start to the end $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MClust --trace --help
--fInMask=$features --fDesc=$fDesc --sInMask=./$datadir/%s.s.seg
--sOutMask=./$datadir/%s.l.seg --cMethod=l --cThr=2 $show # Hierarchical bottom-up BIC clustering $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MClust --trace --help
--fInMask=$features --fDesc=$fDesc --sInMask=./$datadir/%s.l.seg
--sOutMask=./$datadir/%s.h.seg --cMethod=h --cThr=3 $show # Initialize one speaker GMM with 8 diagonal Gaussian components for each cluster $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MTrainInit --help --trace
--nbComp=8 --kind=DIAG --fInMask=$features --fDesc=$fDesc --sInMask=./$datadir/%s.h.seg
--tOutMask=./$datadir/%s.init.gmms $show # EM computation for each GMM $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MTrainEM --help --trace
--nbComp=8 --kind=DIAG --fInMask=$features --fDesc=$fDesc --sInMask=./$datadir/%s.h.seg
--tOutMask=./$datadir/%s.gmms --tInMask=./$datadir/%s.init.gmms $show # Viterbi decoding using the set of GMMs trained by EM $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MDecode --trace --help
--fInMask=${features} --fDesc=$fDesc --sInMask=./$datadir/%s.h.seg
--sOutMask=./$datadir/%s.d.seg --dPenality=250 --tInMask=$datadir/%s.gmms $show # Adjust segment boundaries near silence sections adjseg=./$datadir/$show.adj.$h.seg $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.tools.SAdjSeg --help --trace
--fInMask=$features --fDesc=sphinx,1:1:0:0:0:0,13,0:0:0 --sInMask=./$datadir/%s.d.seg
--sOutMask=$adjseg $show # Filter speaker segmentation according to speech / non-speech segmentation fltseg=./$datadir/$show.flt.seg $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.tools.SFilter --help --trace
--fDesc=sphinx,1:3:2:0:0:0,13,0:0:0 --fInMask=$features --sSegMinLenSpeech=150 --sSegMinLenSil=25
--sFltClusterName=j --sSegPadding=25 --sInFltMask=$pmsseg --sInMask=$adjseg --sOutMask=$fltseg $show # Split segments longer than 20s (useful for transcription) splseg=./$datadir/$show.spl.seg $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.tools.SSplitSeg --help
--sInFltMask=$pmsseg --sFltClusterName=iS,iT,j --sInMask=$fltseg
--sOutMask=$splseg --fInMask=$features --fDesc=sphinx,1:3:2:0:0:0,13,0:0:0 --tInMask=$sgmm $show #------------------------------------------------------------------------------- # Set gender and bandwidth gseg=./$datadir/$show.g.seg $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MScore --help
--sGender --sByCluster --fDesc=sphinx,1:3:2:0:0:0,13,1:1:0 --fInMask=$features --sInMask=$splseg
--sOutMask=$gseg --tInMask=$ggmm $show # NCLR clustering # Features contain static and delta and are centered and reduced (--fdesc) c=1.7 spkseg=./$datadir/$show.c.seg $java -Xmx1024m -classpath "$LOCALCLASSPATH" fr.lium.sphinx_clust.programs.MClust --help --trace
--fInMask=$features --fDesc=$fDescCLR --sInMask=$gseg
--sOutMask=./$datadir/%s.c.$h.seg --cMethod=ce --cThr=$c --tInMask=$ubm
--emCtrl=1,5,0.01 --sTop=5,$ubm --tOutMask=./$show/$show.c.gmm $show

Other tools

Qingsong Lui, from ustc, develops a Windows program to view a diarization result. The program is available here. A video is also available here.

 
© 2017 Les outils du lium

Joomla! is Free Software released under the GNU/GPL License.