ocvolume.dsp
Class featureExtraction

java.lang.Object
  |
  +--ocvolume.dsp.featureExtraction

public class featureExtraction
extends java.lang.Object

last updated on June 15, 2002
description: feature extraction class used to extract mel-frequency cepstral coefficients from input signal
calls: none
called by: volume, train
input: speech signal
output: mel-frequency cepstral coefficient


Field Summary
protected static fft FFT
          Fast Fourier Transformation
protected static int fftSize
          FFT Size (Must be be a power of 2)
protected static int frameLength
          Number of samples per frame
protected static double[][] frames
          All the frames of the input signal
protected static double[] hammingWindow
          hamming window values
protected static double lowerFilterFreq
          lower limit of filter (or 64 Hz?)
protected static int numCepstra
          Number of MFCCs per frame
protected static int numMelFilters
          number of mel filters (SPHINX-III uses 40)
protected static double preEmphasisAlpha
          Pre-Emphasis Alpha (Set to 0 if no pre-emphasis should be performed)
protected static double samplingRate
          sample rate in Hz
protected static int shiftInterval
          Number of overlapping samples (usually 50% of frame length)
protected static double upperFilterFreq
          upper limit of filter (or half of sampling freq.?)
 
Constructor Summary
featureExtraction()
           
 
Method Summary
protected static void framing(double[] inputSignal)
          performs Frame Blocking to break down a speech signal into frames
calls: none
called by: featureExtraction
protected static double freqToMel(double freq)
          convert frequency to mel-frequency
calls: none
called by: featureExtraction
protected static double log10(double value)
          calculates logarithm with base 10
calls: none
called by: featureExtraction
protected static double[] magnitudeSpectrum(double[] frame)
          computes the magnitude spectrum of the input frame
calls: none
called by: featureExtraction
protected static double[] preEmphasis(short[] inputSignal)
          perform pre-emphasis to equalize amplitude of high and low frequency
calls: none
called by: featureExtraction
static double[][] process(short[] inputSignal)
          takes a speech signal and returns the Mel-Frequency Cepstral Coefficient (MFCC)
calls: fft
called by: volume, train
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

samplingRate

protected static final double samplingRate
sample rate in Hz

See Also:
Constant Field Values

frameLength

protected static final int frameLength
Number of samples per frame

See Also:
Constant Field Values

shiftInterval

protected static final int shiftInterval
Number of overlapping samples (usually 50% of frame length)

See Also:
Constant Field Values

numCepstra

protected static final int numCepstra
Number of MFCCs per frame

See Also:
Constant Field Values

fftSize

protected static final int fftSize
FFT Size (Must be be a power of 2)

See Also:
Constant Field Values

preEmphasisAlpha

protected static final double preEmphasisAlpha
Pre-Emphasis Alpha (Set to 0 if no pre-emphasis should be performed)

See Also:
Constant Field Values

lowerFilterFreq

protected static final double lowerFilterFreq
lower limit of filter (or 64 Hz?)

See Also:
Constant Field Values

upperFilterFreq

protected static final double upperFilterFreq
upper limit of filter (or half of sampling freq.?)

See Also:
Constant Field Values

numMelFilters

protected static final int numMelFilters
number of mel filters (SPHINX-III uses 40)

See Also:
Constant Field Values

frames

protected static double[][] frames
All the frames of the input signal


hammingWindow

protected static double[] hammingWindow
hamming window values


FFT

protected static fft FFT
Fast Fourier Transformation

Constructor Detail

featureExtraction

public featureExtraction()
Method Detail

process

public static double[][] process(short[] inputSignal)
takes a speech signal and returns the Mel-Frequency Cepstral Coefficient (MFCC)
calls: fft
called by: volume, train

Parameters:
inputSignal - Speech Waveform (16 bit integer data)
Returns:
Mel Frequency Cepstral Coefficients (32 bit floating point data)

log10

protected static double log10(double value)
calculates logarithm with base 10
calls: none
called by: featureExtraction

Parameters:
value - Number to take the log of
Returns:
base 10 logarithm of the input values

freqToMel

protected static double freqToMel(double freq)
convert frequency to mel-frequency
calls: none
called by: featureExtraction

Parameters:
freq - Frequency
Returns:
Mel-Frequency

magnitudeSpectrum

protected static double[] magnitudeSpectrum(double[] frame)
computes the magnitude spectrum of the input frame
calls: none
called by: featureExtraction

Parameters:
frame - Input frame signal
Returns:
Magnitude Spectrum array

framing

protected static void framing(double[] inputSignal)
performs Frame Blocking to break down a speech signal into frames
calls: none
called by: featureExtraction

Parameters:
inputSignal - Speech Signal (16 bit integer data)

preEmphasis

protected static double[] preEmphasis(short[] inputSignal)
perform pre-emphasis to equalize amplitude of high and low frequency
calls: none
called by: featureExtraction

Parameters:
inputSignal - Speech Signal (16 bit integer data)
Returns:
Speech signal after pre-emphasis (16 bit integer data)