Monday, June 3, 2019
Speech Enhancement And De Nosing By Wavelet Thresholding And Transform Ii Computer Science Essay
Speech Enhancement And De Nosing By Wavelet Thresholding And Transform Ii Computer Science EssayIn this project the experimenter will try to design and implement proficiencys in order to denoise a noisy audio foretell using the MATLAB softw atomic number 18 and its lams, a literature review will be make and summarized to give details of the contribution to the argona of study. Different techniques that kick in been utilise in the audio and speech exerciseing procedure will be considerd and studied. The implementation will be make using MATLAB stochastic variable 7.0.IntroductionThe Fourier summary of a mark flush toilet be apply as a actu all(prenominal)y billetful tool it can perform the growths of obtaining the absolute oftenness destiny and the amplitude comp starnt of ratifys. The Fourier digest can be used to analyze components of stationary point outs, these ar channelises that repeat, sign of the zodiacs that argon composed of sine and cosine compone nts, but in terms of analyzing non stationary marks, these are ratifys that constitute no repetition in the region that is sampled, the Fourier substitute is not very efficient. Wavelet change on the other hand allows for these signboards to be analyzed. The basic concept behind ripplings is that a maneuver can be analyzed by splicing it into incompatible components and past these components are studied individually. In terms of their frequency and succession, in terms of Fourier analysis the signal is analyzed in terms of its sine and cosine components but when a rippling approach is adapted then the analysis is various, the wavelet algorithm employes a process and analyzed the data on dissimilar scales and resolution as compared to Fourier analysis. In using the wavelet analysis, a typewrite of wavelet, referred to as universe the mother wavelet is used as the main wavelet type for analysis analysis is then performed from the mother wavelet that is of higher fre quency. From the Fourier analysis the frequency analysis of the signal is through with(p) with a simplified form of the mother wavelet, from the wavelet components that are achieved via this process however analysis can be make on these coefficients. Haar wavelet types are very compact and this is one of their defining features, its compact ability, as the interval gets so large it then starts to vanish, but the Haar wavelets have a study limiting itemor they are not unvaryingly differentiable. In the analysis of a given signal the cartridge clip commonwealth component can be used in the analysis of the frequency component of that signal, this concept is the Fourier transform, where a signal component is translated to the frequency domain from a time domain function, the analysis of the signal for its frequency component can now be done, and establish of Fourier analysis this is possible because this analysis incorporates the cosine and sine of the frequency. establish on th e Fourier transform a finite commemorate of sampled points are analyzed this results in the discrete Fourier transforms, these sample points are typical to what the original signal looks like, to hoard the approximate function of a sample, and the gathering of the integral, by the implementation of the discrete Fourier transforms. This is realized by the use of a matrix, the matrix contains an order of the total marrow of money of points of sample,the problem encountered worsens as the number of samples are increased. If there is uniform spacing between the samples then it is possible to factor in the Fourier matrix into the, multiplication of a few matrices, the results of this can be subjected to a vector of an order of the form m log m operations, the result of this know as the Fast Fourier Transform. more or less(prenominal) Fourier transforms mentioned above are linear transforms. The transpose of the FFT and the DWT is what is referred to as the inverse transform matrix and they can be cosine and sine, but in the wavelet domain more complex mother wavelet functions are formed. The domain of analysis in the Fourier transforms are the sine and cosine, but as it regards to wavelets there make it a more complex domain function called wavelets, mother wavelets are formed. The functions are localized functions, and are set in the frequency domain, can be seen in the power spectra. This proves useful in finding the frequency and power distribution. Based on the fact that wavelet transforms are transforms that are localized as compared to Fourier functions that are not, the Fourier function world mentioned are the sine and cosine, this feature of wavelet makes it a useful candidate in the purpose of this research, this feature of wavelets makes operations using wavelets transform sparse and this is useful when used for noise remotion. A major advantage of using wavelets is that the windows vary. A major application of this is to realize the portions and signals that are not continuous having short wavelet functions is a good practice to overcome this, but to obtain more in depth analysis having longer functions are best.A practice that is utilized is having foot functions that are of short high frequency and basis functions that are of long low frequency (A. Graps, 1995-2004), point to note Is that conflicting Fourier analysis that have a limited basis function sine and cosine wavelets have unlimited set of basis functions . This is a very of the essence(p) feature as it allows wavelet to identify cultivation from a signal that can be hidden by other time frequency methods, viz. Fourier analysis.Wavelets consist of different families within to each one family of wavelet there exist different subclasses that are differentiated base on the coefficients that are decomposed and their levels of iteration, wavelets are somely classified based on their number of coefficients, that is also referred to as their vanishing moments, a mathematical relationship relates both. soma above leveling examples of wavelets (N. Rao 2001)One of the most right-hand and defining features of using wavelets is that the experimenter has control over the wavelet coefficients for a wavelet type. Families of wavelets were developed that proved to be very efficient in the representation of polynomial behavior the simplest of these is the Haar wavelet. The coefficients can be thought of as being interpenetrates these are then placed in a transformation matrix and utilize to a raw data vector. The different coefficients are ordered with patterns that work as a smoothing filter and another pattern whose function is to realize the detail information of the data (D. Aerts and I. Daubechies 1979). The coefficient matrix for the wavelet analysis is then applied in a hierarchical algorithm, based on its arrangement odd rows contain the different coefficients, the coefficients will be acting as filters that perform smoothing and the r ows that are even will have the coefficients of the wavelets that contains the details from the analysis, it is to the full length data the matrix is first base applied, it is then smoothed and disseminated by half after this process the step is repeated with the matrix., where more smoothing takes place and the different coefficients are halved, this process is repeated some(prenominal) times until the data that remains is smoothed, what this process actually does is to bring out the highest resolutions from that data source and data smoothing is also performed. In the removal of noise from data wavelet applications have proved very efficient and successful, as can be seen in work done by David Donoho, the process of noise removal is called wavelet shrinkage and sceptering. When data is decomposed using wavelets, actually filters are used as averaging filters while the other produce details, some of the coefficients will relate to some details of the data set and if a given li ttle is small, it can then be distant from the data set without feigning every major feature as it relates to the data. The basic idea of wanding is setting coefficients that are at a particular threshold or less than a particular threshold to energy, these coefficients are then later used in an inverse wavelet transform to reconstruct the data set (S. Cai and K. Li, 2010)Literature ReviewThe work done by Student Nikhil Rao (2001) was reviewed, according to the work that was done a completely new algorithm was developed that focused on the abridgment of speech signals, based on techniques for discrete wavelet transforms. The MATLAB software version 6 was used in order to simulate and implement the codes. The steps that were interpreted to achieve the compression are listed belowChoose wavelet functionSelect depravity levelInput speech signalDivide speech signal into frames break apart each frameCalculate thresholdsTruncate coefficientsEncode zero-valued coefficientsQuantize an d bit encodeTransmit data frameParts of extract above taken from said work by Nikhil Rao (2001). Based on the experiment that was conducted the Haar and Daubechies wavelets were utilized in the speech coding and synthesis the functions that were used that are a function of the MATLAB suite are as follows dwt, wavedec, waverec, and idwt, they were used in computing the wavelet transforms Nikhil Rao (2001). The wavedec function performs the task of signal decomposition, and the waverec function reconstructs the signal from its coefficients. The idwt function functions in the cleverness of the inverse transform on the signal of enkindle and all these functions can be found in the MATLAB software. The speech shoot that was analyzed was split up into frames of 20 ms, which is 160 samples per frame and then each frame was decomposed and compressed, the file format utilized was .OD files, because of the length of the files there were able to be decomposed without being divided up into frames. The global and by-level thresholding was used in the experiment, the main aim of the global thresholding is the maintenance of the coefficients that are the largest, this not being dependent on the size of the decomposition tree for the wavelet transform. Using the level thresholding the approximate coefficients are kept at the decomposition level, during the process both bytes are used to encode the zero values. The function of the very first byte is the specification of the starting points of zeros and the other byte tracks successive zeros.The work done by Qiang Fu and Eric A. Wan (2003) was also reviewed there work was the enhancement of speech based on wavelet de-nosing framework. In their approach to their objective, the noisy speech signal was first processed using a religious deductive reasoning method the aim of this involves the removal of noise from the signal of study forwards the application of the wavelet transform. The traditional approach was then done w here the wavelet transforms are utilized in the decomposition of the speech into different levels, thresholding estimation is then on the different levels , however in this project a modified version on the Ephraim/Malah suppression rule was utilized for the thresholdign estimates. To finally enhance the speech signal the inverse wavelet transform was utilized. It was leavenn the pre processing of the speech signal removed small levels of noise but at the same time the distortion of the original speech signal was minimized, a generalized spectral subtraction algorithm was used to accomplish the task above this algorithm was proposed by Bai and Wan.The wavelets transform for this approach utilized using wavelet packet decomposition, for this process a half dozen stage tree structure decomposition approach was taken this was done using a 16-tap FIR filter, this is derived from the Daubechies wavelet, for a speech signal of 8khz the decomposition that was achieved resulted in 18 leve ls. The estimation method that was used to calculate the threshold levels were of a new type, the experiments took into account the noise deviation for the different levels, and each different time frame . An altered version of the Ephraim/Malah rule for suppression was used to achieve soft thresholdeing. The re-synthesis of the signal was done using the inverse perceptual wavelet transform and this is the very last stage.Work done by S.Manikandan, entitled (2006) focused on the reduction of noise that is present in a wireless signal that is received using special adaptive techniques. The signal of inte remnant in the study was lessened by white noise. The time frequency dependent threshold approach was taken to estimate the threshold level, in this project both the hard and soft thresholding techniques were utilized in the de-noising process. As with the hard thresholding coefficient below a certain values are scaled, in the project a universal threshold was used for the Gaussian noise that was added the error criterion that was used was low 3 mean squared, based on the experiments that were done it was found out that this approximation is not very efficient when it comes to speech, this is mainly because of poor relations amongst the spirit and the existence to the correlated noise. A new thresholding technique was implemented in this technique the standard deviation of the noise was first estimated of the different levels and time frames. For a signal the threshold is calculated and is also calculated for the different sub-band and their related time frame. The soft thresholding was also implemented, with a modified Ephraim/Malah suppression rule, as seen before in the other works that were done in this are. Based on their results obtained, there was an unnatural voice pattern and to overcome this, a new technique based on modification from Ephraim and Mala is implemented.ProcedureThe procedure that undertaken involved doing several voice recording and r eading the file using the wavread function because the file was done in a .wav formatThe length to be analyzed was decided, for the my project the entire length of the signal was analyzedThe un adulterated signal power and signal to noise ratio (SNR) was calculated using different MATLAB functionsAdditive White Gausian Noise (AWGN) was then added to the original recorded, making the uncorrupted signal now corruptedThe average power of the signal corrupted by noise and also the signal to noise ratio (SNR) was then calculatedSignal analysis then followed, the procedure involved in the signal analysis includedThe wavedec function in MATLAB was used in the decomposition of the signal.The detail coefficients and approximated coefficients were then extracted and spots made to show the different levels of decompositionThe different levels of coefficient were then analyzed and compared, making little analysis that the decomposition resulted in after(prenominal) decomposition of the differ ent levels de-nosing took place this was done with the ddencmp function in MATLAB,The actual de-nosing process was then undertaken using wdencmp function in MATLAB, plot comparison was made to compare the noise corrupted signal and the de-noised signalThe average power and SNR of the de-noised signal was done and comparison made between it and the original and the de-noised signal.Implementation/DiscussionThe first part of the project consisted of doing a recording in MATLAB, a recording was done of my own voice and the default sample rate was used were Fs = 11025, codes were used to do recordings in MATLAB and different variables were altered and specified based on the codes used, the m file that is submitted with this project gives all the codes that were utilized for the project, the recordings were done for 9 seconds the wavplay function was then used to action replay the recording that was done until a desired recording was obtained after the recording was done a wavwrite funct ion was then used to store the data that was previously recorded into a wav file. The data that was written into a wav file was originally stored in variable y and then given the name recording1. A plot was then made to show the wave format of the speech file recorded.Fig 1Fig1 plan above showing original recording without any noise corruptionAccording to fig1 the maximum amplitude of the signal is +0.5 and the minimum amplitude being -0.3 from observation with the naked eye it can be seen that most of the information in the speech signal is confined between the amplitude +0.15 -0.15.The power of the speech signal was then calculated in MATLAB using a periodogram spectrum this produces an estimate of the spectral density of the signal and is computed from the finite length digital sequence using the Fast Fourier Transform (The MathWorks 1984-2010) the window parameter that was used was the roleplay window, the window function is some function that is zero outside some chosen int erval. The hamming window is a typical window function and is applied typically by a point by point multiplication to the input of the lush fourier transform, this controls the adjacent levels of spectral artifacts which would appear in the magnitude of the fast fourier transform results, for a case where the input frequencies do not correspond with the bin center. Convolution that occurs within the frequency domain can be considered as windowing this is basically the same as performing multiplication within the time domain, the result of this multiplication is that any samples outside a frequency will affect the overall amplitude of that frequency.Fig2Fig2 plot showing periodogram spectral analysis of original recordingFrom the spectral analysis it was calculated that the power of the signal is 0.0011 wattAfter the signal was analyzed noise was added to the signal, the noise that was added was additive gaussian white noise (AWGN), and this is a random signal that contains a flat p ower spectral density (Wikipedia, 2010). At a given center frequency additional white noise will contain equal power at a fixed bandwidth the term white is used to mean that the frequency spectrum is continuous and is also uniform for the entire frequency band. In the project additive is used to simply mean that this impairment to the original signal is corrupting the speech The MATLAB code that was used to add the noise to the recording can be seen in the m file.For the very first recording the power in the signal was set to 1 watt and the SNR set to 80, the applied code was set to signal z, which is a copy of the original recording y, below is the plot showing the analysis of the noise corrupted recording.Fig3Fig3 plot showing the original recording corrupted by noiseBased on observation of the plot above it can be estimated that information in the original recording is masked by the additive white noise to the signal, this would have a negative effect as the bang-up information would be masked out by the noise, a process known as aliasing. Because the amplitude of the additive noise is greater than the amplitude of the recording it causes distortion observation of the graph shows the amplitude of the corrupted signal is greater than the original recording. The noise power of the corrupted signal was calculated buy the division of the signal power and the signal to noise ratio, the noise power calculated from the first recording is 1.37e-005. The noise power of the corrupted signal is 1.37e-005 the spectrum peridodogram was then used to calculate the average power of the corrupted signal , based on the MATLAB calculations the power was calculated to be 0.0033 wattFig4Fig4 plot showing periodogram spectral analysis of corrupted signalFrom analysis of the plot above it can be seen that the frequency of the corrupted signal spans a wider band, the original recording spectral frequency analysis showed a value of -20Hz as compared to the corrupted signal showed a value of 30Hz this increase in the corrupted signal is attributed to the noise added and this masked out the original recording again as before the process of aliasing.It was seen that the average power of the corrupted was greater than the original signal, the increase in power can be attributed to the additive noise added to the signal this caused the increase in power of the signal.The signal to noise ratio (SNR) of the corrupted signal was calculate from the formula corrupted power/noise power , and the corrupted SNR was found to be 240 as compared to 472.72 of the de-noised, the decrease in signal to noise ratio can be attributed to the additive noise this resulted in the level of noise to the level of clean recording to be greater this is the basis for the decreased SNR in the corrupted signal, the increase in the SNR in the clean signal will be discussed further in the discussion.The reason there was a reduce in the SNR in the corrupted signal is because the level of noise to clean signal is greater and this is basis of signal to noise comparison, it is used to measure how much a signal is corrupted by noise and the lower this ratio is, the more corrupted a signal will be. The calculation method that was used to calculate this ratio isWhere the different signal and noise power were calculated from MATLAB as seen aboveThe analysis of the signal then commenced a .wav file was then created for the corrupted signal using the MATLAB command wavwrite, with Fs being the sample frequency, N being the corrupted file and the name being noise recording, a file x1 that was going to be analysed was created using the MATLAB command wavread.Wavelet multilevel decomposition was then performed on the signal x1 using the MATLAB command wavedec, this function performs the wavelet decomposition of the signal, the decomposition is a multilevel one dimensional decomposition, and discrete wavelet transform (DWT) is using gain algorithms, during the decomposition the signal is passed through a high pass and a low pass filter. The output of the low pass is further passed through a high pass and a low pass filter and this process continues (The MathWorks 1994-2010) based on the specification of the programmer, a linear time invariant filter, this being a filter that passes high frequencies and attenuates frequency that are below a threshold called the cut off frequency, the rate of attenuation is specified by the designer. trance on the other hand the opposite to the high pass filter, is the low pass filter this filter will only pass low frequency signals but attenuates signal that contain a higher frequency than the cut off. Based on the decomposition procedure above the process was done 8 times, and at each level of decomposition the actual signal is down sampled by a factor of 2. The high pass output at each stage represents the actual wavelet transformed data these are called the elaborated coefficients (The MathWorks 1994-2010).Fig 5Fig 5 above l evels decomposition (The MathWorks 1994-2010)Block C above contains the decomposition vectors and Block L contains the bookkeeping vector, based on the representation above a signal X of a specific length is decomposed into coefficients, the first part of the decomposition produces 2 sets of coefficients the approximate coefficient cA1 and the detailed coefficient cD1, to get the approximate coefficient the signal x is convolved with low pass filter and to get the detailed coefficient signal x is convolved with a high pass filer. The second stage is similar only this time the signal that will be sampled is cA1 as compared to x before with the signal further being sampled through high and low pass filter again to produce approximate and detailed coefficients respectively thus the signal is down sampled and the factor of down sampling is twoThe algorithm above (The MathWorks 1994-2010) represents the first level decomposition that was done in MATLAB, the original signal x(t) is deco mposed into approximate and detailed coefficient, the algorithm above represents the signal being passed through a low pass filter where the detail coefficients are extracted to give D2(t)+D1(t) this analysis is passed through a single stage filter bank further analysis through the filter bank will produce greater stages of detailed coefficients as can be seen with the algorithm below (The MathWorks 1994-2010).The coefficients,cAm(k)andcDm(k)formm = 1,2,3can be calculated by iterating or cascading the single stage filter bank to obtain a multiple stage filter bank(The MathWorks 1994-2010).Fig6Fig6 showing graphical representation of multilevel decomposition (The MathWorks 1994-2010)At each level it is observed the signal is down sampled and the sampling factor is 2. At d8 obeservation shows that the signal is down sampled by 28 i.e. 60,000/28. All this is done for better frequency resolution. Lower frequencies arepresentat all time I am mostly concerned with higher frequencies whic h contains the actual data.I have used daubechies wavelet type 4 (db4), the daubechies wavelet are defined by computing the running averages and differences via scalar products with scaling signals and wavelets(M.I. Mahmoud, M. I. M. Dessouky, S. Deyab, and F. H. Elfouly, 2007) For this type of wavelet there exist a balance frequency response but the phase response is non linear. The Daubechies wavelet types uses windows that overlap in order to ensure that the coefficients of higher frequencies will show any changes in their high frequency, based on these properties the Daubechies wavelet types proves to be an efficient tool in the de-nosing and compression of audio signals.For the Daubechies D4 transform, this transform has 4 wavelet types and scaling coefficient functions, these coefficient functions are shown belowThe different steps that are involved in the wavelet transforms, will utilize different scaling functions, to the signal of interest if the data being analyzed contain s a value of N, the scaling function that will be applied will be applied to calculate N/2 smoothed values. The smoothed values are stored in the lower half of the N element input vector for the ordered wavelet transform. The wavelet function coefficient values are g0= h3 g1= -h2 g2= h1 g3= -h0The different scaling function and wavelet function are calculated using the inner product of the coefficients and the four different data values. The equations are shown below (Ian Kaplan, July 2001)The repetition of the of the steps of the wavelet transforms was then used in the calculation of the function value of the wavelet and the scaling function value, for each repetition there is an increase by two in the index and when this occurs a different wavelet and scaling function is produced.Fig 7Diagram above showing the steps involved in forward transform(The MathWorks 1994-2010)The diagram above illustrates steps in the forward transform, based on observation of the diagram it can be seen that the data is divided up into different elements, these separate elements are even and the first elements are stored to the even roll and the second half of the elements are stored in the odd array. In reality this is folded into a single function even though the diagram above goes against this, the diagrams shows two normalized steps.The input signal in the algorithm above (Ian Kaplan, July 2001) is then broken down into what are called wavelets. One of the most significant benefits of use of wavelet transforms is the fact that it contains a window that varies, to identify signal not continuous having base functions that are short is most desirable. But in order to obtain detailed frequency analysis it is better to have long basis function. A good way to achieve this compromise is having a short high frequency functions and also long low frequency ones(Swathi Nibhanupudi, 2003)Wavelet analysis contains an infinite basis functions, this allows wavelet transforms and analyisis wi th the ability realize cases that can not be tardily realized by other time frequency methods, namely Fourier transforms.MATLAB codes are then used to extract the detailed coefficients, the m file shows these codes, the detailed coefficients that are Daubechies orthogonal type wavelets D2-D20are often used. The numbers of coefficients are represented by the index number, for the different wavelets they contain vanishing moments that are identical to the halve of the coefficients. This can be seen using the orthogonal types where D2 contain only one moment and D4 two moments and so on, the vanishing moment of the wavelets refers to its ability to represent the information in a signal or the polynomial behavior. The D2 type that contains only one moment will encode polynomial of one coefficient easily that are of constant signal component. The D4 type will encode polynomial of two coefficients, the D6 will encode coefficient of three polynomial and so on.The scaling and wavelet funct ion have to be normalized and this normalisation factor is a factor. The coefficients for the wavelet are derived by the reverse of the order of the scaling function coefficients and then by reversing the sign of the second one (D4 wavelet = -0.1830125, -0.3169874, 1.1830128, -0.6830128) mathematically, this looks likewherekis the coefficient index,bis a wavelet coefficient andca scaling function coefficient.Nis the wavelet index, ie 4 for D4 (M. Bahoura, J. Bouat. 2009)Fig 7 spell of fig 7 showing approximated coefficient of the level 8 decompositionFig 8Plot of fig 8 showing detailed coefficient of the level 1 decompositionFig 9Plot of fig 9 showing approximated coefficient of the level 3 decompositionFig 10Plot of fig 10 showing approximated coefficient of the level 5 decompositionFig 11Plot of fig 11, showing comparison of the different levels of decompositionFig12Plot fig12 showing the details of all the levels of the coefficientsThe next step in the de-nosing process is the a ctual removal of the noise after the coefficients have been realized and calculated the MATLAB functions that are used in the de-noising functions are the ddencmp and the wdencmp functionThis process actually removes noise by a process called thresholding, De-noising, the task of removing or suppressing uninformative noise from signals is an important part of many signal or image processing applications. Wavelets are common tools in the field of signal processing. The popularity of wavelets in de-nosingis largely due to the computationally efficient algorithms as well as to the sparsityof the wavelet representation of data. By sparsity I mean that majority of the wavelet coefficients have very small magnitudes whereas only a small subset of coefficients have large magnitudes. I may informally state that this small subset contains the interesting informative part of the signal, whereas the rest of the coefficients describe noise and can be discarded to give a noise-free reconstructio n.The best known wavelet de-noising methods are thresholding approaches, see e.g. In hard thresholding all the coefficients with greater magnitudes as compared to the threshold are retained unmodified this is because they comprise the informative part of data, while the rest of the coefficients are considered to represent noise and set to zero. However, it is reasonable to assume that coefficients are not purely either noise or informative but mixtures of those. To cope with this soft thresholding approaches have been proposed, in the process of soft thresholding coefficients that are smaller than the threshold are made zero, however the coefficients that are kept are made smaller towards zero by an amount of the threshold value in order to decrease the effect of noise assumed to corrupt all the wavelet coefficients. In my project I have chosen to do a eight level decomposition before applying the de-nosing algorithm, the decomposition levels of the different eight levels are obtain ed, because the signal of in
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment