人脸表情识别文献
JournalofDonghuaUniversity(Eng.Ed.)Vol.29,No.1(2012)
71
FacialExpressionRecognitionBasedontheQ-shiftDT-CWTandRotationInvariantLBP
CHENLei(陈
蕾)
*
SchoolofElectronics&InformationEngineering,SoochowUniversity,Suzhou215021,China
Abstract:Inthispaper,anovelmethodbasedondual-treecomplex
wavelettransform(DT-CWT)androtationinvariantlocalbinary
pattern(LBP)forfacialexpressionrecognitionisproposed.Thequartersampleshift(Q-shift)DT-CWTcanprovideagroupdelay
of1/4ofasampleperiod,andsatisfytheusual2-bandfilterbank
constraintsofnoaliasingandperfectreconstruction.Toresolveilluminationvariationinexpressionverification,low-frequency
coefficientsproducedbyDT-CWTaresetzeroes,high-frequency
coefficientsareusedforreconstructingtheimage,andbasicLBPhistogramismappedonthereconstructedimagebymeansofhistogramspecification.LBPiscapableofencodingtextureandshapeinformationofthepreprocessedimages.Thehistogramgraphsbuiltfrommulti-scalerotationinvariantLBPsarecombinedtoserve
asfeatureforfurtherrecognition.Templatematchingisadoptedtoclassifyfacialexpressionsforitssimplicity.Theexperimentalresultsshowthattheproposedapproachhasgoodperformanceinefficiencyandaccuracy.
Keywords:facialexpressionrecognition;dual-treecomplexwavelet
transform(DT-CWT);localbinarypattern(LBP);histogram;
similaritymeasureCLCnumber:TP391Documentcode:AArticleID:1672-5220(2012)01-0071-05
,WANGJia-jun(王加俊),SUNBing(孙兵)
Introduction
Facialexpressionrecognitionistoanalyzeanddetectthespecialexpressionstatefromgivenexpressionimagesorvideoframesandthentoascertainthesubject'sspecificinbornemotion,achievingsmarterandmorenaturalinteractionbetweenhumanbeingsandcomputers.Inhumantohumaninteraction,Mehrabiandiscoveredthatverbalcuesprovided7%ofthemeaningofthemessage;vocalcues,38%;andfacialexpressions,55%[1].Thusfacialexpressionprovidesmoreinformationabouttheinteractionthanthespokenwords.Automaticfacialexpressionrecognitionplaysanimportantroleinthedevelopmentofpatternrecognition,computervision,computergraphics,artificialintelligence,physiology,psychologyandsoon.Thestudyoffacialexpressionrecognitionhasfounditsvaluesineconomyandsociety.
Duetoitsapplicationsonsociologyandcomputervision,automaticfacialexpressionrecognitionhasattractedmoreandmoreattention.In1978,onthebasisoftheanatomy,EkmanandFriesenbuiltthefaceactioncodingsystem(FACS)thatassociatedfacialexpressionwithmusclemovement[2].ByFACS,encodingallpossiblefaceexpressionsbecameareality.Thenin1984,EkmanandFriesenputforwardthatthecombinationofspecificFACSactionunitscouldindicatefacialexpressionofemotions[3].Accordingtodifferentemotions,facialexpressionscanbedividedintosixtypes:happy,sad,surprise,fear,anger,anddisgust[4,5].Thesixbasictypesareagreedwidelybyresearchersandtreatedasthefacialexpressioncategories.Facialexpressionrecognitiongenerallyincludesthreestages:facedetectionandlocalization,thefacialfeatureextraction,andexpressionrecognition.Facialexpressionisverycomplex,forexample,iftheopeningmouthdoesnotrepresentsmilenecessarily,itmaybecryorsurprise,andthesamekindofexpressionsmayhavealotofdifferentways,suchashappy,
someareopenmouthcachinnation,someareclosedmouthsmile.Therefore,thefacialexpressionanalysisisadifficulttaskthatmainlyreflectsintheaccuracyexpressionfeatureextractionandvalidityofexpressionfeatureextraction.Researchershavemadesomeachievementsinfacialexpressionanalysis.Therearethefollowingmethods:principalcomponentanalysis(PCA),independentcomponentanalysis(ICA),lineardiscriminantanalysis(LDA),fisherlinearjudgingmethod,clusteringdiscriminantanalysis,elasticchartmatchingmethod,gaborwaveletmethod,andlocalprincipalcomponentanalysis,etc.Everyalgorithmmentionedhassomeeffects,butisnotverysatisfactoryandneedstobeimproved.
Inthispaper,abriefreviewofDT-CWTandlocalbinary
patterns(LBPs)isintroducedandanovelmethodforfacialexpressionfeatureextractionisproposed.Bythismethod,theapproximationcoefficientsderivedfromDT-CWTareresetand
detailscoefficientsarepreserved.InverseofDT-DWTis
employedbyusingmodifiedcoefficients,andthereconstructedimageisreferredasIdt.ThebasicLBPhistogramoforiginalimageIismappedontoIdtbymeansofhistogramspecification,andtheresultedimageisdenotedasILBP.ThefusionimageofIdtandILBPisIpro.Multi-scalespecialdecompositionisapplied
toIpro.ThecombinationofeachscalerotationinvariantLBPhistogramsisusedasfeatureforrecognition.Experimentsshowthatthemethodpresentedinthispaperhashigherrecognitionrateandefficiency.
1
TheQuarterSampleShift(Q-shift)
DT-CWT
Thediscretewavelettransform(DWT)ismostcommonlyusedinitsmaximallydecimatedform.Thisworkswellforcompressionbutitsusesforothersignalanalysisandreconstructiontaskshavebeenhamperedbytwomaindisadvantages:lackofshiftinvarianceandpoordirectionalselectivity.InRefs.[6,7],KingsburyintroducedanewformofDWT,whichgeneratedcomplexcoefficientsbyusingadual-treeofwaveletfilterstoobtaintheirrealandimaginaryparts.Thisintroduceslimitedredundancyandallowsthetransformtoprovideapproximateshiftinvarianceanddirectionallyselectivefilterswhilepreservingtheusualpropertiesofperfectreconstructionandcomputationalefficiencywithgoodwell-balancedfrequencyresponses.TheDT-CWThasreducedover
completenesscomparedwiththeshiftinvariantDWT(SIDWT),anincreaseddirectionalsensitivityovertheDWTthatisabletodistinguishbetweenpositiveandnegativeorientationsgivingsixdistinctsub-bandsateachlevel,theorientationsofwhichare
±15°,±45°,±75°.TheDT-CWTgivesperfect
reconstructionasthefiltersarechosenfromaperfectreconstructionbi-orthogonalset.TheQ-shiftDT-CWTisa
variantoftheearlierform,inordertogivethedual-tree
improvedorthogonalityandsymmetryproperties.TheQ-shift
versionoftheDT-CWTisshowninFig.1,inwhichallthe
filtersbeyondlevel1areevenlength,buttheyarenolongerstrictlylinearphase.Insteadtheyaredesignedtohaveagroup
Receiveddate:2011-09-28
*CorrespondenceshouldbeaddressedtoCHENLei,E-mail:chenlei@suda.edu.cn
delayofapproximately1/4sample(+q).Therequireddelaydifferenceof1/2sample(2q)isthenachievedbyusingthetimereverseofthetreeafiltersintreebsothatthedelaybecomes3q.Furthermore,thefiltercoefficientsarenolongersymmetric,anditisnowpossibletodesigntheperfect-reconstructionfiltersetstobeorthonormal,sothatthereconstructionfiltersarejustthetimereverseoftheequivalentanalysisfiltersinbothtrees.Henceallfiltersbeyondlevel1arederivedfromthesameorthonormalprototype
set.
Fig.2
Q-shiftDT-CWTonafacialexpressionimage:(a)original
image;(b)realpart;(c)imaginarypart;(d)magnitude
2
2.1
RotationInvariantLBPsandFacialExpressionFeatureExtraction
LBPs
TheLBPoperatorwasfirstintroducedbyOjalaetal.[8]
andwasprovedapowerfulmeansoftexturedescription.Theoperatorlabelsthepixelsofanimagebythresholdinga3×3neighborhoodofeachpixelwiththecentervalueandconsideringtheresultsasabinarynumber(seeFig.3foranillustration).Bydefinition,LBPoperatordiscardstheilluminationchanges,sinceitdependsonthegray-scale.This
makesitattractivesincedealingwithvaryingilluminationwhereinitisthemainconcerninour
research.
Fig.1
TheQ-shiftDT-CWT,givingrealandimaginarypartsof
complexcoefficientsfromtreeaandtreebrespectively(q=1/4sampleperiod)
NotethatfortheQ-shiftCWTeachcomplexwaveletbasis
iscenteredontheequivalentcomplexscalingfunctionbasis,andeachoftheseiscenteredbetweenapairofadjacentcomplexbasesfromtheprevious(finer)level.Inthisway,eachcomplexwaveletcoefficientatlevelkhastwocomplexchildrenlocatedsymmetricallyaboveitatlevelk-1.Fortheodd/evenDT-CWT,suchsymmetriesdonotoccur.Insummary,Q-shiftDT-CWThasthefollowing
properties:approximateshiftinvariance,gooddirectional
D)withgabor-likefilters(alsotrueselectivityin2-dimension(2-forhigherdimensionality,m-D),perfectreconstructionusing
Nshortlinear-phasefilters,limitedredundancy,efficientorder-computation,improvedorthogonalityandsymmetryproperties.
TheQ-shiftDT-CWTonafacialexpressionimageisshownin
Fig.
2.
Fig.3ThebasicLBPoperator(LBP=1+8+32+128=169)
LBPisnotrotationinvariant,whichisundesirableincertainapplications.ItispossibletodefinerotationinvariantversionsofLBP,andonesolutionisillustratedinFig.4,whereLBPROTrepresentsthevalueofonerotationinvariantpattern.Thebinaryvaluesofthethresholdedneighborhoodaremappedintoan8-bitwordinclockwiseorcounter-clockwiseorder.An
arbitrarynumberofbinaryshiftsisthenmade,untilthewordmatchesoneofthe36differentpatternsof“0”and“1”an8-bit
wordcanformunderrotation.Theindexofthematchingpatternisusedasthefeaturevalue,describingtherotationinvariantLBPofthisparticularneighborhood[9]
.
Fig.4Rotation-invariantversionofLBP
Thederivedbinarynumberscodifylocalprimitivesincludingdifferenttypesofcurvededges,spots,flatareas,etc.(asshowninFig.5),soeachLBPcodecanberegardedasamicro-pattern.ThelimitationofthebasicLBPoperatorisitssmall3×3neighborhoodwhichcannotcapturedominantfeatureswithlargescalestructures.Hencetheoperatorwaslaterextendedtouseneighborhoodofdifferentsizes[10].Usingcircularneighborhoodsandbilinearlyinterpolatingthepixelvaluesallowanyradiusandnumberofpixelsintheneighborhood.Figure6illustratessomeexamplesoftheextendedLBPoperator,wherethenotation(P,R)denotesaneighborhoodofPequallyspacedsamplingpointsonacircleofradiusofRthatformacircularlysymmetricneighborset.
TheLBPoperatorLBP(P,R)produces2Pdifferentoutputvalues,correspondingtothe2PdifferentbinarypatternsthatcanbeformedbythePpixelsintheneighborset.Ithasbeenshownthatcertainpatternscontainmoreinformationthanothers.Therefore,itispossibletouseonlyasubsetofthe2PLBPstodescribethetextureofimages.Ojalaetal.[8]calledthesefundamentalpatternsasuniformpatterns.AnLBPiscalleduniformifitcontainsatmosttwobitwisetransitionsfrom0to1orviceversawhenthebinarystringisconsideredcircular.Forexample,00000000,001110000,and11100001areuniformpatterns.Itisobservedthatuniformpatternsaccountfornearly90%ofallpatternsinthe(8,1)neighborhoodandforabout70%inthe(16,2)neighborhoodintextureimages.Accumulatingthepatternswhichhavemorethan2transitions
2
intoasinglebinyieldsanLBPoperator,denotedLBPriu(P,R),withlessthan2Pbins.Superscriptriu2standsforrotationinvariantuniformLBPandlabelingallremainingpatternswithasinglelabel[11,12].Forexample,thenumberoflabelsforaneighborhoodof8pixelsis256forthestandardLBPbut59for
riu2
LBP16,2.AfterlabelinganimagewiththeLBPoperator,ahistogramofthelabeledimagefl(x,y)canbedefinedasHi=
andhighfrequencycomponents(cDjs),withQ-shiftfilter.
(2)ZeroallthecoefficientsincAj,andpreservethecoefficientincDjs.EmployinverseofDT-CWTbyusing
modifiedcAjtogetherwithcDjs,referredasIdt.
(3)DerivethebasicLBPhistogramfromtheoriginalimageI.
(4)MapthebasicLBPhistogramofIontoIdt,bymeansofhistogramspecification.TheresultedimageisdenotedasILBP.
(5)ConvertbothIdtandILBPintofrequencydomainbyQ-shiftDT-CWTandthenfuseapproximationanddetails
coefficientsrespectively.AnimageIproisreconstructedbythefusedcoefficients.
(6)ThepreprocessedimageIproisdividedintomulti-level
sub-regions.TherotationinvariantLBPhistogramsarederivedfromthesesub-regions,normalizeddependingontheregion
sizes,andweightedaccordingtotheregionlocation.Thesehistogramsbuiltfromsub-blockscaneffectivelydescribefacial
expressionmicro-patterns.Theyarecombinedandservedas
featurevectorsforrecognition.
Figure7showslowandhighfrequencydirectionalcoefficientsofafacialexpression
image.
Fig.7
∑I(f(x,y)
x,y
l
=i),i=0,1,…,n-1,(1)
wherenisthenumberofdifferentlabelsproducedbytheLBP
operatorand
(2)I(A)=1,Aistrue,
0,Aisfalse.
Approximationanddetailscoefficients,withQ-shiftfilter:
(a)anoriginalimage;(b)approximationcomponent;(c)sixdirectionaldetailscomponents(magnitude)atlevel2;(d)detailscomponentsatlevel3
{
ThisLBPhistogramcontainsinformationaboutthedistributionofthelocalmicro-patterns,suchasedges,spotsand
flatareas,overthewholeimage,soitcanbeusedtostatisticallydescribeimagecharacteristics.
Aszeroingallthelowfrequencycoefficientsandpreservingthehighfrequencycoefficients,wecangetthereconstructedimageFig.8(b)byemployinginverseofDT-CWT.Figure
8(a)isthebasicLBPhistogramofFig.7(a),whichismappedontoFig.8(b),andFig.8(c)istheresultedimagebymeansofhistogramspecification.Figure8(d)isthefusionimageofFig.8(b)andFig.8(c)byapplyingDT-CWTfusion
method.
2.2Facialexpressionfeatrueextraction
Automaticfacialexpressionrecognitioninvolvestwovitalaspects:facialrepresentationandclassifierdesign.Facialrepresentationistoderiveasetoffeaturesfromoriginalfaceimagestoeffectivelyrepresentfaces.Theoptimalfeaturesshouldminimizewithin-classvariationsofexpressionswhile
maximizebetweenclassvariations.Ifinadequatefeaturesareused,eventhebestclassifiercouldfailtoachieveaccuraterecognition.InthispaperafeatureextractionalgorithmbasedontheDT-CWTandLBPhistogramsisproposed.Thewhole
processoffeatureextractionalgorithmiscarriedoutasfollows.
(1)Multi-levelDT-CWTisemployedtoconvertthegray-scalefacialimageIintotwocomponents,lowfrequency(cAj)
recognitionperformanceandfeaturevectorlength.Thusfaceimagesaretotallydividedinto32(1+2+4+9+16=32)regions.TheLBPfeaturesextractedfromeachsub-regionare
normalizedaccordingtothesub-regionsizesandthen
concatenatedintoasinglefeaturehistogramwiththelengthof416(32×26=416).
3
Fig.8
Imagefusion:(a)thebasicLBPhistogramofFig.7(a);(b)theimagereconstructedbyhighfrequencycoefficients;(c)resultedimagebymeansofhistogramspecification;(d)fusedimageforfeatureextraction
ExperimentalResults
Thecharacteristicvectorsareextractedusingthemulti-level
LBPhistogramsofthepreprocessedimageFig.8(d).AnLBPhistogramcomputedoverthewholefaceimageencodesonlytheoccurrencesofthemicro-patternswithoutanyindicationabout
theirlocations.Toalsoconsidershapeinformationoffacialexpression,faceimagesareequallydividedintosmallregionsR0,R1,…,RmtoextractLBPhistograms(asshowninFig.9(a)).TheLBPfeaturesextractedfromeachsub-region
are
Forexperimentsweusedimagesfromoneofthepopulardatabasesforfacialexpressionrecognition,theJAFFEdatabase.TheJAFFEdatabasecontains213imagesofsevenfacialexpressions(6basicfacialexpressionsand1neutral)posedby10Japanesefemalemodels.Foreachwoman,thereare2-4imagesofeveryfacialexpression.EachimageisaTIFFimagewithsize256×256and256graylevels.Alltheimagesaretakenagainstahomogeneousbackgroundwiththesubjectsinfrontalposition.Someimagesof6basicfacialexpressions(anger,disgust,fear,joy,sadness,andsurprise)areshowninFig.10andcorrespondingimagesprocessedbyQ-shiftDT-CWT
inFig.
11.
Fig.9
ThenormalizedLBPhistograms:(a)sub-regionsateachlevel,
riu2
(b)labeledimageofmicro-patterns,(c)LBP24,3normalized
histogramsof9sub-regions,(d)concatenatedhistogramserved
asfeatureforrecognition
concatenatedintoasingle,spatiallyenhancedfeaturehistogram.Theextractedfeaturehistogramrepresentsthelocaltextureandglobalshapeoffaceimages.Someparameterscanbeoptimizedforbetterfeatureextraction.OneistheLBPoperator,andtheotheristhenumberofregionsdivided.We
riu2
selecttheLBP24,3operator,bywhichwedefine26rotationinvariantmicropatterns,anddividethefaceimagesinto1,2,4,9,16regionsrespectively,givingagoodtrade-offbetween
Thecorrectlabelsofthetrainingsamplesarevery
importantforrecognition.Asseveralexpressionimagesaremarkedwronginthedatabase,wecorrectthembeforeourexperiment.Wedodifferentexperimentsusingdifferentcharacteristicsandtwomatchingmethodstoanalyzethefacerecognitionperformances.ThecomposedLBPhistogramsof32multi-scalesub-regionsareservedasfeaturesandthose
histogramsofsub-regionsinvolvingmouthoreyesaresetbigger
weight.Thiscanimprovetherecognitionaccuracyeffectively.Weadoptedtemplatematchingtoclassifyfacialexpressionsandemployedtwomethodsforsimilaritymeasure.
(1)Foreachexpressionofonesubject,wetestthreetimesinturnandtaketheaveragerecognitionrateasthefinalresult.Wetakeoneofthefacialexpressionimagesasatestsampleandtherestastrainingoneseverytime.Thereisnooverlapbetweenthetrainingandtestimages.WeemployEuropeandistancemeasurementforrecognition.
(2)Intraining,thehistogramsofexpressionimagesinagivenclassareaveragedtogenerateatemplateforthisclass.Anearest-neighborclassifierisusedasdissimilaritymetricfor
comparingatargetfacehistogramtothemodelhistogram.
ResultsobtainedfromthedifferentexperimentsarepresentedinTable1.Inthetablewecanseehowdifferentfeaturesandsimilaritymeasuresaffectrecognitionrate.
Table1
Recognitionratesobtained
Recognitionrate(6expressions)65%84.5%89%100%
Features/matchingmethod
LBPof16same-scalesub-regions/distance
betweentestingsampleandtrainingsamplesLBPof32multi-scalesub-regions/distance
betweentestingsampleandtrainingsamplesWeigtedLBPof32multi-scalesub-regions/
distancebetweentestandtrainingsamplesWeigtedLBPof32multi-scalesub-regions/
distancebetweentestsampleandclasscentre
weusedatemplatematchingtoclassifyfacialexpressionsforitssimplicity.ComparedtherecognitionresultsobtainedwithourfacialfeaturestothoseobtainedwithPCAandLDAapproaches(76.3%and69.5%forPCAandLDA,respectively),themethodproposedinthispaperclearlyshowedthebetterperformanceinrecognitionefficiencyandaccuracy.
References
[1]MehrabianA.SilentMessages[M].WadsworthPublishing
1971.Company,Inc.,Belmont,CA,
[2]EkmanP,FriensenW.FacialActionCodingSystem(FACS):a
TechniquefortheMeasurementofFacialMovement[M].PaloAlto:ConsultingPsychologistsPress,1978.
[3]EkmanP,FriensenW.UnmaskingtheFace[M].PaloAlto:
ConsultingPsychologistsPress,1984.[4]PanticM,RothkrantzLJM.FacialActionRecognitionforFacial
.IEEEExpressionAnalysisfromStaticFaceImages[J]
TransactionsonSystems,Man,andCybernetics,2004,34(3):1449-1461.
[5]PanticM,RothkrantzL.AutomaticAnalysisofFacial
Expressions:theStateofArt[J].IEEETransactionsonPattern
2000,22(12):1424-1445.AnalysisandMachineIntelligence,
[6]KingsburyNG.TheDual-TreeComplexWaveletTransform:a
NewEfficientToolforImageRestorationandEnhancement[C].ProceedingsinEUSIPCO98,Rhodes,Greece,1998:319-322.[7]KingsburyNG.ComplexWaveletsforShiftInvariantAnalysis
.JournalofAppliedandandFilteringofSignals[J]
ComputationalHarmonicAnalysis,2001,10(3):234-253.
[8]OjalaT,Pietik inenM,HarwoodD.AComparativeStudyof
TextureMeasureswithClassificationBasedonFeatureDistributions[J].PatternRecognition,1996,29(1):51-59.[9]Pietik inenM,OjalaT,XuZ.Rotation-InvariantTexture
ClassificationUsingFeatureDistributions[J].PatternRecognition,2000,33(1):43-52.[10]OjalaT,Pietik inenM,M enp T.MultiresolutionGray-Scale
andRotationInvariantTextureClassificationwithLocalBinaryPatterns[J].IEEETransactionsonPatternAnalysisandMachine
2002,24(7):971-987.Intelligence,
[11]AhonenT,HadidA,Pietik inenM.FaceRecognitionwithLocal
BinaryPatterns[C].Proceedingsofthe8thEuropeanConferenceonComputerVision,Prague,TheCzechRepublic,2004:469-481.[12]MooreS,BowdenR.LocalBinaryPatternsforMulti-viewFacial
ExpressionRecognition[J].ComputerVisionandImageUnderstanding,2011,115(4):541-558.
4Conclusions
InthispaperweanalyzedthebasicprincipleofDT-CWT
andLBPs.AnovelmethodbasedonQ-shiftDT-CWTand
rotationinvariantLBPwasproposedwhichwasefficientforrecognition.Q-shiftDT-CWTwasusedtoresolveillumination
variationinexpressionverification.RotationinvariantLBPswerecapableofdescribingtextureandshapeinformation.Toenhancethefacialrepresentation,wedividedthepreprocessedfacialimagesintoseveralsub-regionsofdifferentscales.The
LBPhistogramsofsub-regionswerenormalizeddependingon
theregionsizesandgaveadifferentweightdependingontheroleofthegivenregionsinrecognition.Forinstance,sincethemouthregionswereimportantforrecognition,ahighweightcouldbeattributedtothecorrespondingLBPhistograms.Thecombinedhistogramsthatcouldeffectivelydescribefacialexpressionmicro-patternswereservedasfeaturesfor
recognition.Ourmaingoalinthispaperwastoshowthehighdiscriminativepoweroftheproposedfacialfeatures.Therefore,