-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlfw_labeled.csv
We can't make this file beautiful and searchable because it's too large.
401 lines (401 loc) · 554 KB
/
lfw_labeled.csv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
paperId,uses dataset or derivative,dataset(s) / model(s) used,unable to disambiguate,cites D18,cites D19,cites D20(1),cites D20(2),cites D20(3),cites D21,cites D22,cites D23,cites D24,cites D25,cites D26,title,abstract,year,venue,arxivId,doi,pdfUrl
006af49a030aa5b17046cfaf40de8f9246b96adf,0,,,0,1,0,0,0,0,0,0,0,0,0,Super-Resolution on Image and Video,"In this project, we explore image super-resolution using generative adversarial networks. Super-resolution is a problem that has been addressed using signal processing methods, but has only recently been tackled using deep learning, especially generative models. We start with a network inspired by Ledig et al [9], explore changes to the network and test our models on various datasets. We train models for both 2x and 4x upsampling and arrive at results that beat simple interpolations. Our 4x GAN is visually comparable to industry standards in deep learning, although trained on a dataset with comparatively few class labels. We also have preliminary results on super-resolution with video that are promising but computationally expensive.",2017,,,,https://pdfs.semanticscholar.org/006a/f49a030aa5b17046cfaf40de8f9246b96adf.pdf
010575e2ea475cde1b3037ed09a37622261d7abe,0,,,0,1,0,0,0,0,0,0,0,0,0,Identification of Cherry Leaf Disease Infected by Podosphaera Pannosa via Convolutional Neural Network,"The cherry leaves infected by Podosphaera pannosa will suffer powdery mildew, which is a serious disease threatening the cherry production industry. In order to identify the diseased cherry leaves in early stage, the authors formulate the cherry leaf disease infected identification as a classification problem and propose a fully automatic identification method based on convolutional neural network (CNN). The GoogLeNet is used as backbone of the CNN. Then, transferred learning techniques are applied to fine-tune the CNN from pre-trained GoogLeNet on ImageNet dataset. This article compares the proposed method against three traditional machine learning methods i.e., support vector machine (SVM), k-nearest neighbor (KNN) and back propagation (BP) neural network. Quantitative evaluations conducted on a data set of 1,200 images collected by smart phones, demonstrates that the CNN achieves best precise performance in identifying diseased cherry leaves, with the testing accuracy of 99.6%. Thus, a CNN can be used effectively in identifying the diseased cherry leaves.",2019,Int. J. Agric. Environ. Inf. Syst.,,10.4018/IJAEIS.2019040105,
0252256fa23eceb54d9eea50c9fb5c775338d9ea,0,,,1,0,0,1,0,0,0,0,0,0,0,Application-driven Advances in Multi-biometric Fusion,"Biometric recognition is the automated recognition of individuals based on their behavioral or biological characteristics. Beside forensic applications, this technology aims at replacing the outdated and attack prone, physical and knowledge-based, proofs of identity. Choosing one biometric characteristic is a tradeoff between universality, acceptability, and permanence, among other factors. Moreover, the accuracy cap of the chosen characteristic may limit the scalability and usability for some applications. The use of multiple biometric sources within a unified frame, i.e. multi-biometrics, aspires to tackle the limitations of single source biometrics and thus enables a wider implementation of the technology. This work aims at presenting application-driven advances in multi-biometrics by addressing different elements of the multi-biometric system work-flow. At first, practical oriented pre-fusion issues regarding missing data imputation and score normalization are discussed. This includes presenting a novel performance anchored score normalization technique that aligns certain performance-related score values in the fused biometric sources leading to more accurate multi-biometric decisions when compared to conventional normalization approaches. Missing data imputation within score-level multi-biometric fusion is also addressed by analyzing the behavior of different approaches under different operational scenarios. Within the multi-biometric fusion process, different information sources can have different degrees of reliability. This is usually influenced in the fusion process by assigning relative weights to the fused sources. This work presents a number of weighting approaches aiming at optimizing the decision made by the multi-biometric system. First, weights that try to capture the overall performance of the biometric source, as well as an indication of its confidence, are proposed and proved to outperform the state-of-the-art weighting approaches. The work also introduces a set of weights derived from the identification performance representation, the cumulative match characteristics. The effect of these weights is analyzed under the verification and identification scenarios. To further optimize the multi-biometric process, information besides the similarity between two biometric captures can be considered. Previously, the quality measures of biometric captures were successfully integrated, which requires accessing and processing raw captures. In this work, supplementary information that can be reasoned from the comparison scores are in focus. First, the relative relation between different biometric comparisons is discussed and integrated in the fusion process resulting in a large reduction in the error rates. Secondly, the coherence between scores of multi-biometric sources in the same comparison is defined and integrated into the fusion process leading to a reduction in the error rates, especially when processing noisy data. Large-scale biometric deployments are faced by the huge computational costs of running biometric searches and duplicate enrollment checks. Data indexing can limit the search domain leading to faster searches. Multi-biometrics provides richer information that can enhance the retrieval performance. This work provides an optimizable and configurable multi-biometric data retrieval solution that combines and enhances the robustness of rank-level solutions and the performance of feature-level solutions. Furthermore, this work presents biometric solutions that complement and utilize multi-biometric fusion. The first solution captures behavioral and physical biometric characteristics to assure a continuous user authentication. Later, the practical use of presentation attack detection is discussed by investigating the more realistic scenario of cross-database evaluation and presenting a state-of-the-art performance comparison. Finally, the use of multi-biometric fusion to create face references from videos is addressed. Face selection, feature-level fusion, and score-level fusion approaches are evaluated under the scenario of face recognition in videos.",2018,,,,https://pdfs.semanticscholar.org/0252/256fa23eceb54d9eea50c9fb5c775338d9ea.pdf
04a58d24ee4f2e41563c744917d21c09fd1a7ada,0,,,1,0,0,0,0,0,0,0,0,0,0,Local appearance modeling for objects class recognition,"In this work, we propose a new formulation of the objects modeling combining geometry and appearance; it is useful for detection and recognition. The object local appearance location is referenced with respect to an invariant which is a geometric landmark. The appearance (shape and texture) is a combination of Harris–Laplace descriptor and local binary pattern (LBP), all being described by the invariant local appearance model (ILAM). We use an improved variant of LBP traits at regions located by Harris–Laplace detector to encode local appearance. We applied the model to describe and learn object appearances (e.g., faces) and to recognize them. Given the extracted visual traits from a test image, ILAM model is carried out to predict the most similar features to the facial appearance: first, by estimating the highest facial probability and then in terms of LBP histogram-based measure, by computing the texture similarity. Finally, by a geometric calculation the invariant allows to locate an appearance in the image. We evaluate the model by testing it on different face images databases. The experiments show that the model results in high accuracy of detection and provides an acceptable tolerance to the appearance variability.",2017,Pattern Analysis and Applications,,10.1007/s10044-017-0639-2,
04c4754d21f01333a113553ca3fff67177929cff,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Compositional Hierarchical Tensor Factorization: Representing Hierarchical Intrinsic and Extrinsic Causal Factors,"Visual objects are composed of a recursive hierarchy of perceptual wholes and parts, whose properties, such as shape, reflectance, and color, constitute a hierarchy of intrinsic causal factors of object appearance. However, object appearance is the compositional consequence of both an object's intrinsic and extrinsic causal factors, where the extrinsic causal factors are related to illumination, and imaging conditions. Therefore, this paper proposes a unified tensor model of wholes and parts, and introduces a compositional hierarchical tensor factorization that disentangles the hierarchical causal structure of object image formation, and subsumes multilinear block tensor decomposition as a special case. The resulting object representation is an interpretable combinatorial choice of wholes' and parts' representations that renders object recognition robust to occlusion and reduces training data requirements. We demonstrate ourapproach in the context of face recognition by training on an extremely reduced dataset of synthetic images, and report encouragingface verification results on two datasets - the Freiburg dataset, andthe Labeled Face in the Wild (LFW) dataset consisting of real world images, thus, substantiating the suitability of our approach for data starved domains.",2019,ArXiv,1911.0418,,https://arxiv.org/pdf/1911.04180.pdf
05b3d10a85e0380df44abb30ddba9f7da5adf80f,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Boosted multi-task learning for face verification with applications to web image and video search,"Face verification has many potential applications including filtering and ranking image/video search results on celebrities. Since these images/videos are taken under uncontrolled environments, the problem is very challenging due to dramatic lighting and pose variations, low resolutions, compression artifacts, etc. In addition, the available number of training images for each celebrity may be limited, hence learning individual classifiers for each person may cause overfitting. In this paper, we propose two ideas to meet the above challenges. First, we propose to use individual bins, instead of whole histograms, of Local Binary Patterns (LBP) as features for learning, which yields significant performance improvements and computation reduction in our experiments. Second, we present a novel Multi-Task Learning (MTL) framework, called Boosted MTL, for face verification with limited training data. It jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The effectiveness of Boosted MTL and LBP bin features is verified with a large number of celebrity images/videos from the web.",2009,2009 IEEE Conference on Computer Vision and Pattern Recognition,,10.1109/cvprw.2009.5206736,http://www.ee.cuhk.edu.hk/~xgwang/webface.pdf
06a1c935f8bea60d4f4f8d3fb99cae7f90166cd2,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Probabilistic Classifier and Its Application to Face Recognition,"This paper proposes a method to classify different subjects from a large set of subjects. Taking correct decision in the process of classification of various subjects from the large set is an arduous task, since its probability is very low. This task is made simple by the proposed Probabilistic Classifier (PC). Maximum Likelihood Estimation (MLE) and Error Minimizing Algorithms (EMA) are the basis for the proposed classifier. Interpreting the EMA output in a probabilistic manner gives rise to PC. Concept of feedback is used in the classification process to enhance the decision rule. Experimental results obtained by applying the proposed classifier on various benchmark facial datasets, show its promising performance. Eventually, PC is found to be independent of the datasets.",2014,FICTA,,10.1007/978-3-319-11933-5_24,
07a1e6d26028b28185b7a3eee86752c240a24261,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,MODE: automated neural network model debugging via state differential analysis and input selection,"Artificial intelligence models are becoming an integral part of modern computing systems. Just like software inevitably has bugs, models have bugs too, leading to poor classification/prediction accuracy. Unlike software bugs, model bugs cannot be easily fixed by directly modifying models. Existing solutions work by providing additional training inputs. However, they have limited effectiveness due to the lack of understanding of model misbehaviors and hence the incapability of selecting proper inputs. Inspired by software debugging, we propose a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing. Our evaluation results on 29 different models for 6 different applications show that our technique can fix model bugs effectively and efficiently without introducing new bugs. For simple applications (e.g., digit recognition), MODE improves the test accuracy from 75% to 93% on average whereas the state-of-the-art can only improve to 85% with 11 times more training time. For complex applications and models (e.g., object recognition), MODE is able to improve the accuracy from 75% to over 91% in minutes to a few hours, whereas state-of-the-art fails to fix the bug or even degrades the test accuracy.",2018,ESEC/SIGSOFT FSE,,10.1145/3236024.3236082,https://www.cs.purdue.edu/homes/ma229/papers/FSE18.pdf
089c6224cfbcf5c18b63564eb65001c7c42a7acf,0,,,0,1,0,0,0,0,0,0,0,0,0,Knockoff Nets: Stealing Functionality of Black-Box Models,"Machine Learning (ML) models are increasingly deployed in the wild to perform a wide range of tasks. In this work, we ask to what extent can an adversary steal functionality of such ``victim'' models based solely on blackbox interactions: image in, predictions out. In contrast to prior work, we study complex victim blackbox models, and an adversary lacking knowledge of train/test data used by the model, its internals, and semantics over model outputs. We formulate model functionality stealing as a two-step approach: (i) querying a set of input images to the blackbox model to obtain predictions; and (ii) training a ``knockoff'' with queried image-prediction pairs. We make multiple remarkable observations: (a) querying random images from a different distribution than that of the blackbox training data results in a well-performing knockoff; (b) this is possible even when the knockoff is represented using a different architecture; and (c) our reinforcement learning approach additionally improves query sample efficiency in certain settings and provides performance gains. We validate model functionality stealing on a range of datasets and tasks, as well as show that a reasonable knockoff of an image analysis API could be created for as little as $30.",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),1812.02766,10.1109/CVPR.2019.00509,https://arxiv.org/pdf/1812.02766.pdf
0981a71137c64ec2628714017db5b016571b26f4,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Common CNN-based Face Embedding Spaces are (Almost) Equivalent,"CNNs are the dominant method for creating face embeddings for recognition. It might be assumed that, since these networks are distinct, complex, nonlinear functions, that their embeddings are network specific, and thus have some degree of anonymity. However, recent research has shown that distinct networks' features can be directly mapped with little performance penalty (median 1.9% reduction across 90 distinct mappings) in the context of the 1,000 object ImageNet recognition task. This finding has revealed that embeddings coming from different systems can be meaningfully compared, provided the mapping. However, prior work only considered networks trained and tested on a closed set classification task. Here, we present evidence that a linear mapping between feature spaces can be easily discovered in the context of open set face recognition. Specifically, we demonstrate that the feature spaces of four face recognition models, of varying architecture and training datasets, can be mapped between with no more than a 1.0% penalty in recognition accuracy on LFW . This finding, which we also replicate on YouTube Faces, demonstrates that embeddings from different systems can be readily compared once the linear mapping is determined. In further analysis, fewer than 500 pairs of corresponding embeddings from two systems are required to calculate the full mapping between embedding spaces, and reducing the dimensionality of the mapping from 512 to 64 produces negligible performance penalty.",2020,ArXiv,,,
09a179cc3a195f2d414d46d3378eb74d0baa0292,0,,,0,1,0,0,0,0,0,0,0,0,0,Deformable Generator Network: Unsupervised Disentanglement of Appearance and Geometry,"We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner. The appearance generator network models the information related to appearance, including color, illumination, identity or category, while the geometric generator performs geometric warping, such as rotation and stretching, through generating deformation field which is used to warp the generated appearance to obtain the final image or video sequences. Two generators take independent latent vectors as input to disentangle the appearance and geometric information from image or video sequences. For video data, a nonlinear transition model is introduced to both the appearance and geometric generators to capture the dynamics over time. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments shows that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to other image datasets that share similar structure regularity to facilitate knowledge transfer tasks.",2020,IEEE transactions on pattern analysis and machine intelligence,1806.06298,10.1109/tpami.2020.3013905,http://www.stat.ucla.edu/~sczhu/papers/Conf_2019/CVPR_2019_Deep_AAM_1806.06298.pdf
09f2653e2caff8e84f49e346ed763ba9f50a3d2e,0,,,0,0,0,0,0,1,0,0,0,0,0,Profile Face Image Frontalization based on landmark points and 3D Generic Elastic Model,"El primer paso en la mayoria de los sistemas de reconocimiento facial, es la alineacion de los rostros detectados. Cuando los rostros presentan largas variaciones de pose, el proceso de alineacion debe ser capaz de generar rostros frontales. Diferentes metodos han sido propuestos para la frontalizacion de los rostros, pero la mayoria de estos no son capaces de reconstruir una imagen frontal desde un rostro completamente de perfil. En este trabajo, se extiende un metodo de frontalizacion, basado en los Modelos 3D Elasticos Genericos (3DGEM), con el objetivo de recuperar imagenes frontales a partir de rostros de perfil. Primero, se determina si la imagen corresponde a un rostro de perfil derecho o de perfil izquierdo. Se entrena un Modelo de Forma Activa (ASM) para detectar los puntos caracteristicos en los rostros de perfil. Luego, se establece una relacion entre los puntos caracteristicos del perfil y los puntos localizados en el modelo 3D, que es ajustado de manera eficiente a la imagen para ser frontalizada. Se tiene en cuenta la simetria del rostro para proyectar la apariencia del rostro frontalizado. La propuesta es evaluada mediante la frontalizacion de las imagenes faciales en las bases de datos ICB-RW y CFPW. Se muestra la importancia de la frontalizacion para el correcto reconocimiento de los rostros.",2019,,,,http://scielo.sld.cu/pdf/eac/v40n3/1815-5928-eac-40-03-72.pdf
0a869336c65185f078ba473d7ca5b86a371ab929,0,,,0,1,0,0,0,0,0,0,0,0,0,MisGAN: Learning from Incomplete Data with Generative Adversarial Networks,"Generative adversarial networks (GANs) have been shown to provide an effective way to model complex distributions and have obtained impressive results on various challenging tasks. However, typical GANs require fully-observed data during training. In this paper, we present a GAN-based framework for learning from complex, high-dimensional incomplete data. The proposed framework learns a complete data generator along with a mask generator that models the missing data distribution. We further demonstrate how to impute missing data by equipping our framework with an adversarially trained imputer. We evaluate the proposed framework using a series of experiments with several types of missing data processes under the missing completely at random assumption.",2019,ICLR,1902.09599,,https://arxiv.org/pdf/1902.09599.pdf
0ab7cff2ccda7269b73ff6efd9d37e1318f7db25,0,,,1,1,0,0,0,0,0,0,0,0,1,Facial Coding Scheme Reference 1 Craniofacial Distances,"Face recognition is a long-standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that detect, recognize, verify and understand characteristics of human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings, due to confounding factors related to pose, resolution, illumination, occlusion and viewpoint. However, with recent advances in neural networks, face recognition has achieved unprecedented accuracy, built largely on data-driven deep learning methods. While this is encouraging, a critical aspect limiting face recognition performance in practice is intrinsic facial diversity. Every face is different. Every face reflects something unique about us. Aspects of our heritage – including race, ethnicity, culture, geography – and our individual identity – age, gender and visible forms of self-expression – are reflected in our faces. Faces are personal. We expect face recognition to work accurately for each of us. Performance should not vary for different individuals or different populations. As we rely on data-driven methods to create face recognition technology, we need to answer a fundamental question: does the training data for these systems fairly represent the distribution of faces we see in the world? At the heart of this core question are deeper scientific questions about how to measure facial diversity, what features capture intrinsic facial variation and how to evaluate coverage and balance for face image data sets. Towards the goal of answering these questions, Diversity in Faces (DiF ) provides a new data set of annotations of one million publicly available face images for advancing the study of facial diversity. The annotations are generated using ten facial coding schemes that provide human-interpretable quantitative measures of intrinsic facial features. We believe that making these descriptors available will encourage deeper research on this important topic and accelerate efforts towards creating more fair and accurate face recognition systems.",2019,,,,https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/documents/Diversity-in-Faces-Publication.pdf
0b37e25d1efda01ca8950eda36e8a012d3b996ee,0,,,0,1,0,0,0,0,0,0,0,0,0,Adversarial Code Learning for Image Generation,"We introduce the ""adversarial code learning"" (ACL) module that improves overall image generation performance to several types of deep models. Instead of performing a posterior distribution modeling in the pixel spaces of generators, ACLs aim to jointly learn a latent code with another image encoder/inference net, with a prior noise as its input. We conduct the learning in an adversarial learning process, which bears a close resemblance to the original GAN but again shifts the learning from image spaces to prior and latent code spaces. ACL is a portable module that brings up much more flexibility and possibilities in generative model designs. First, it allows flexibility to convert non-generative models like Autoencoders and standard classification models to decent generative models. Second, it enhances existing GANs' performance by generating meaningful codes and images from any part of the prior. We have incorporated our ACL module with the aforementioned frameworks and have performed experiments on synthetic, MNIST, CIFAR-10, and CelebA datasets. Our models have achieved significant improvements which demonstrated the generality for image generation tasks.",2020,ArXiv,2001.11539,,https://arxiv.org/pdf/2001.11539.pdf
0ba6614ff9ed1cd00e07d44c5c61879958e7566b,0,,,0,0,0,0,0,1,0,0,0,0,0,Representation Learning by Rotating Your Faces,"The large pose discrepancy between two face images is one of the fundamental challenges in automatic face recognition. Conventional approaches to pose-invariant face recognition either perform face frontalization on, or learn a pose-invariant representation from, a non-frontal face image. We argue that it is more desirable to perform both tasks jointly to allow them to leverage each other. To this end, this paper proposes a Disentangled Representation learning-Generative Adversarial Network (DR-GAN) with three distinct novelties. First, the encoder-decoder structure of the generator enables DR-GAN to learn a representation that is both generative and discriminative, which can be used for face image synthesis and pose-invariant face recognition. Second, this representation is explicitly disentangled from other face variations such as pose, through the pose code provided to the decoder and pose estimation in the discriminator. Third, DR-GAN can take one or multiple images as the input, and generate one unified identity representation along with an arbitrary number of synthetic face images. Extensive quantitative and qualitative evaluation on a number of controlled and in-the-wild databases demonstrate the superiority of DR-GAN over the state of the art in both learning representations and rotating large-pose face images.",2019,IEEE Transactions on Pattern Analysis and Machine Intelligence,1705.11136,10.1109/TPAMI.2018.2868350,https://arxiv.org/pdf/1705.11136.pdf
0bc1f4e7e59c4268166db5f16353a56d333616e6,0,,,0,1,0,0,0,0,0,0,0,0,0,Bridged Variational Autoencoders for Joint Modeling of Images and Attributes,"Generative models have recently shown the ability to realistically generate data and model the distribution accurately. However, joint modeling of an image with the attribute that it is labeled with requires learning a cross modal correspondence between image and attribute data. Though the information present in a set of images and its attributes possesses completely different statistical properties altogether, there exists an inherent correspondence that is challenging to capture. Various models have aimed at capturing this correspondence either through joint modeling of a variational autoencoder or through separate encoder networks that are then concatenated. We present an alternative by proposing a bridged variational autoencoder that allows for learning cross-modal correspondence by incorporating cross-modal hallucination losses in the latent space. In comparison to the existing methods, we have found that by using a bridge connection in latent space we not only obtain better generation results, but also obtain highly parameter-efficient model which provide 40% reduction in training parameters for bimodal dataset and nearly 70% reduction for trimodal dataset. We validate the proposed method through comparison with state of the art methods and benchmarking on standard datasets.",2020,2020 IEEE Winter Conference on Applications of Computer Vision (WACV),,10.1109/WACV45572.2020.9093565,http://openaccess.thecvf.com/content_WACV_2020/papers/Yadav_Bridged_Variational_Autoencoders_for_Joint_Modeling_of_Images_and_Attributes_WACV_2020_paper.pdf
0c24a2c32d23dce1b7e229fd7ab26d9064ea6b1f,0,,,1,0,0,0,0,0,0,0,0,0,0,SuperPatchMatch : Un algorithme de correspondances robustes de patchs de superpixels,"Les superpixels sont devenus tres populaires dans de nom-breuses applications de vision par ordinateur. Neanmoins, ils restent sous-exploites du fait de l'irregularite des decompositions qui different selon les images. Dans ce travail, nous introduisons d'abord une nouvelle structure, un patch de superpixels, appelee SuperPatch. La structure proposee, basee sur le voisinage du superpixel, definit un descripteur robuste incluant les relations spatiales entre superpixels voisins. La generalisation de la methode de re-cherche de correspondance PatchMatch aux SuperPatchs, nommee SuperPatchMatch, est alors introduite. Enfin, nous proposons une adaptation de la methode a l'´ etiquetage au-tomatique depuis une bibliotheque d'images d'exemples. Nous demontrons alors le potentiel de notre approche en obtenant des resultats superieurssuperieurs`superieursa ceux d'approches basees apprentissages, sur des experiences d'´ etiquetage de visages. Mots Clef Methode basee patchs, Superpixels, Patchs de superpixels Abstract Superpixels have become very popular in many computer vision applications. Nevertheless, they remain underex-ploited because of the irregularity of the decomposition that differ according to the images. In this work, we first introduce a novel structure, a superpixel-based patch, called SuperPatch. The proposed structure, based on super-pixel neighborhood, leads to a robust descriptor including spatial relations between neighboring superpixels. The generalization of the search matching method PatchMatch to these SuperPatches, named SuperPatchMatch, is introduced. Finally, we propose a framework to perform automatic labeling from a library of example images. We demonstrate the potential of our approach by outperforming learning-based approaches on face labeling experiments.",2018,,,,https://pdfs.semanticscholar.org/0c24/a2c32d23dce1b7e229fd7ab26d9064ea6b1f.pdf
0cd28730e3f1643945417faa4d1858bfdf687d60,0,,,0,1,0,0,0,0,0,0,0,0,0,Pixel Transposed Convolutional Networks,"Transposed convolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of transposed convolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel transposed convolutional layer (PixelTCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular transposed convolutional operation. The resulting PixelTCL can be used to replace any transposed convolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelTCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelTCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than transposed convolutional layers. When used in image generation tasks, our PixelTCL can largely overcome the checkerboard problem suffered by regular transposed convolutional operations.",2020,IEEE Transactions on Pattern Analysis and Machine Intelligence,,10.1109/TPAMI.2019.2893965,
0d307221fa52e3939d46180cb5921ebbd92c8adb,0,,,1,0,0,0,0,0,0,0,0,0,0,Word Spotting in the Wild,"We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs - text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines - one open source and one proprietary - with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.",2010,ECCV,,10.1007/978-3-642-15549-9_43,https://vision.cornell.edu/se3/wp-content/uploads/2014/09/wang_eccv2010.pdf
0dcdef6b8d97483f4d4dab461e1cb5b3c4d1fe1a,0,,,0,1,0,0,0,0,0,0,0,0,0,Probabilistic Semantic Inpainting with Pixel Constrained CNNs,"Semantic inpainting is the task of inferring missing pixels in an image given surrounding pixels and high level image semantics. Most semantic inpainting algorithms are deterministic: given an image with missing regions, a single inpainted image is generated. However, there are often several plausible inpaintings for a given missing region. In this paper, we propose a method to perform probabilistic semantic inpainting by building a model, based on PixelCNNs, that learns a distribution of images conditioned on a subset of visible pixels. Experiments on the MNIST and CelebA datasets show that our method produces diverse and realistic inpaintings.",2019,AISTATS,1810.03728,,https://arxiv.org/pdf/1810.03728.pdf
0de1450369cb57e77ef61cd334c3192226e2b4c2,1,[D19],,1,1,0,0,0,0,0,0,0,0,0,"In defense of low-level structural features and SVMs for facial attribute classification: Application to detection of eye state, Mouth State, and eyeglasses in the wild","The current trend in image analysis is to employ automatically detected feature types, such as those obtained using deep-learning techniques. For some applications, however, manually crafted features such as Histogram of Oriented Gradients (HOG) continue to yield better performance in demanding situations. This paper considers both approaches for the problem of facial attribute classification, for images obtained “in the wild.” Attributes of particular interest are eye state (open/closed), mouth state (open/closed), and eyeglasses (present/absent). We present a full face-processing pipeline that employs conventional machine learning techniques, from detection to attribute classification. Experimental results have indicated better performance using RootSIFT with a conventional support-vector machine (SVM) approach, as compared to deep-learning approaches that have been reported in the literature. Our proposed open/closed eye classifier has yielded an accuracy of 99.3% on the CEW dataset, and an accuracy of 98.7% on the ZJU dataset. Similarly, our proposed open/closed mouth classifier has achieved performance similar to deep learning. Also, our proposed presence/absence eyeglasses classifier delivered very good performance, being the best method on LFWA, and second best for the CelebA dataset. The system reported here runs at 30 fps on HD-sized video using a CPU-only implementation.",2017,2017 IEEE International Joint Conference on Biometrics (IJCB),,10.1109/BTAS.2017.8272747,
0e22687fe92f765df06495df1462fc632ea240e4,1,,1,1,0,0,0,0,0,0,0,0,0,0,Likelihood-enhanced Bayesian constrained local models,"This paper addresses to the problem of aligning images in unseen faces. The Constrained Local Models (CLM) are popular methods that combine a set of local landmark detectors whose locations are constrained to lie in a subspace spanned by a linear shape model. The CLM fitting is usually based on a two step approach: locally search, using the detectors, producing response maps (likelihood) followed by a global optimization strategy that jointly maximize all detections at once. In this paper, we mainly focus on the first stage: improving the detectors reliability. Usually the local landmarks detectors are far from perfect. Most often are designed to be fast, having a small support region and are learnt from limited data. As consequence, they will suffer from detection ambiguities. Here we propose to improve the detectors performance by considering multiple detection per landmark. In particular, we propose a joint learning of the detectors by clustering of their training data. Afterwards, the multiple likelihoods are combined using a nonlinear fusion approach. The performance evaluation shows that our (extended) approach further increases the fitting performance of the CLM formulation, when compared with recent state-of-the-art methods.",2014,2014 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2014.7025060,http://home.isr.uc.pt/~pedromartins/Publications/pmartins_icip2014.pdf
123ad2f9d2be35102f1205988613a602232132d0,1,[D18],,1,0,0,0,0,1,0,0,0,0,0,Feature-Improving Generative Adversarial Network for Face Frontalization,"Face frontalization can boost the performance of face recognition methods and has made significant progress with the development of Generative Adversarial Networks (GANs). However, many GAN-based face frontalization methods still perform relatively weak on face recognition tasks under large face poses. In this paper, we propose Feature-Improving GAN (FI-GAN) for face frontalization, which aims to improve the recognition performance under large face poses. We assume that there is an inherent mapping between the frontal face and profile face, and their discrepancy in deep representation space can be estimated. The generation module of FI-GAN has a compact module named Feature-Mapping Block that helps to map the features of profile face images to the frontal space. Moreover, we produce a feature discriminator that can distinguish the features of profile face images from those of ground 1 frontal face images, which guide the generation module to provide high-quality features of profile faces. We conduct experiments on the MultiPIE, Labeled Faces in the Wild (LFW), and Celebrities in Frontal-Profile (CFP) databases. Our method is comparable to state-of-the-art methods under small poses and outperforms them on large pose face recognition.",2020,IEEE Access,,10.1109/ACCESS.2020.2986079,https://ieeexplore.ieee.org/ielx7/6287639/8948470/09057608.pdf
123bbc9e6987c1374442bfebaf195409ec6c2e4e,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images,"In this paper, we propose a novel image-to-image GAN framework for eyeglasses removal, called ByeGlassesGAN, which is used to automatically detect the position of eyeglasses and then remove them from face images. Our ByeGlassesGAN consists of an encoder, a face decoder, and a segmentation decoder. The encoder is responsible for extracting information from the source face image, and the face decoder utilizes this information to generate glasses-removed images. The segmentation decoder is included to predict the segmentation mask of eyeglasses and completed face region. The feature vectors generated by the segmentation decoder are shared with the face decoder, which facilitates better reconstruction results. Our experiments show that ByeGlassesGAN can provide visually appealing results in the eyeglasses-removed face images even for semi-transparent color eyeglasses or glasses with glare. Furthermore, we demonstrate significant improvement in face recognition accuracy for face images with glasses by applying our method as a pre-processing step in our face recognition experiment.",2020,ECCV,2008.11042,10.1007/978-3-030-58526-6_15,https://arxiv.org/pdf/2008.11042.pdf
124f6992202777c09169343d191c254592e4428c,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Visual Psychophysics for Making Face Recognition Algorithms More Explainable,"Scientific fields that are interested in faces have developed their own sets of concepts and procedures for understanding how a target model system (be it a person or algorithm) perceives a face under varying conditions. In computer vision, this has largely been in the form of dataset evaluation for recognition tasks where summary statistics are used to measure progress. While aggregate performance has continued to improve, understanding individual causes of failure has been difficult, as it is not always clear why a particular face fails to be recognized, or why an impostor is recognized by an algorithm. Importantly, other fields studying vision have addressed this via the use of visual psychophysics: the controlled manipulation of stimuli and careful study of the responses they evoke in a model system. In this paper, we suggest that visual psychophysics is a viable methodology for making face recognition algorithms more explainable. A comprehensive set of procedures is developed for assessing face recognition algorithm behavior, which is then deployed over state-of-the-art convolutional neural networks and more basic, yet still widely used, shallow and handcrafted feature-based approaches.",2018,ECCV,1803.0714,10.1007/978-3-030-01267-0_16,https://arxiv.org/pdf/1803.07140.pdf
12c36adc7329ae96bd54ef2ab415566796842b64,0,,,0,1,0,0,0,0,0,0,0,0,0,"Securing Social Identity in Mobile Platforms: Technologies for Security, Privacy and Identity Management",,2020,,,10.1007/978-3-030-39489-9,
12cb66f687e0f0a79c995c0aa08ff855ef1c4ed0,1,,1,0,0,0,0,0,0,1,0,0,0,0,Incorporating higher-order point distribution model priors into MRFs using convex quadratic programming,"Recently, with the advent of powerful optimisation algorithms for Markov random fields (MRFs), priors of high arity (more than two) have been put into practice more widely. The statistical relationship between object parts encoding shape in a covariant space, also known as the point distribution model (PDM), is a widely employed technique in computer vision which has been largely overlooked in the context of higher-order MRF models. This paper focuses on such higher-order statistical shape priors and illustrates that in a spatial transformation invariant space, these models can be formulated as convex quadratic programmes. As such, the associated energy of a PDM may be optimised efficiently using a variety of different dedicated algorithms. Moreover, it is shown that such an approach in the context of graph matching can be utilised to incorporate both a global rigid and a non-rigid deformation prior into the problem in a parametric form, a problem which has been rarely addressed in the literature. The paper then illustrates an application of PDM priors for different tasks using graphical models incorporating factors of different cardinalities.",2016,Machine Vision and Applications,,10.1007/s00138-016-0774-6,
15325f98f245c5aa9261339cff87e9adb4f713e9,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Foresight: Real Time Facial Detection and Recognition Using WebAssembly and Localized Deep Neural Networks,"The emergence of facial recognition technology is an appealing solution to address the many present-day needs for verification of identity claims. The application of such technology has become publicly available and has proved its effectiveness as an added layer of security through native applications. Until now, there has been no previous attempt at bringing this solution to a web-based platform supporting real time classification. With frequent reports of websites being exploited and databases being leaked, there is an urgent need for an intelligent security mechanism to overlay the current traditional authentication methods, including usernames and passwords. This paper investigates the possibility of unobtrusive, continuous authentication for web applications based on facial data collected using WebAssembly driven detection algorithms. This novel detection technique has proved the viability to perform real-time image processing on the web with the possibility of achieving near native speeds. This is accompanied with a competitive server side facial recognition rate of 91.67% achieved on the Labeled Faces in the Wild dataset.",2019,2019 Conference on Information Communications Technology and Society (ICTAS),,10.1109/ICTAS.2019.8703634,
166b3ca3a43102e7d0a35d883e650ec4bbf2c3a5,0,,,0,0,0,0,0,0,0,0,0,1,0,Facial Landmark Localization in the Wild by Backbone-Branches Representation Learning,"Facial landmark localization plays a critical role in face recognition and analysis. In this paper, we propose a novel cascaded Backbone-Branches Fully Convolutional Neural Network (BB-FCN) for rapidly and accurately localizing facial landmarks in unconstrained and cluttered settings. Our proposed BB-FCN generates facial landmark response maps directly from raw images without any pre-processing. It follows a coarse-to-fine cascaded pipeline, which consists of a backbone network for roughly detecting the locations of all facial landmarks and one branch network for each type of detected landmarks for further refining their locations. Extensive experimental evaluations demonstrate that our proposed BB-FCN can significantly outperform the state of the art under both constrained (i.e. within detected facial regions only) and unconstrained settings.",2018,2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM),,10.1109/BigMM.2018.8499059,https://i.cs.hku.hk/~yzyu/publication/BBFCN-BigMM2018.pdf
18d5b0d421332c9321920b07e0e8ac4a240e5f1f,1,[D20],,1,0,0,0,0,0,0,0,0,0,0,Collaborative Representation Classification Ensemble for Face Recognition,"Collaborative Representation Classification (CRC) for face recognition attracts a lot attention recently due to its good recognition performance and fast speed. Compared to Sparse Representation Classification (SRC), CRC achieves a comparable recognition performance with 10-1000 times faster speed. In this paper, we propose to ensemble several CRC models to promote the recognition rate, where each CRC model uses different and divergent randomly generated biologically-inspired features as the face representation. The proposed ensemble algorithm calculates an ensemble weight for each CRC model that guided by the underlying classification rule of CRC. The obtained weights reflect the confidences of those CRC models where the more confident CRC models have larger weights. The proposed weighted ensemble method proves to be very effective and improves the performance of each CRC model significantly. Extensive experiments are conducted to show the superior performance of the proposed method.",2015,ArXiv,1507.08064,,https://arxiv.org/pdf/1507.08064.pdf
191392fe1916f97058fe15df9f847aa1996c27d3,0,,,0,1,0,0,0,0,0,0,0,0,0,Adversarial Face De-Identification,"Recently, much research has been done on how to secure personal data, notably facial images. Face de-identification is one example of privacy protection that protects person identity by fooling intelligent face recognition systems, while typically allowing face recognition by human observers. While many face de-identification methods exist, the generated de-identified facial images do not resemble the original ones. This paper proposes the usage of adversarial examples for face de-identification that introduces minimal facial image distortion, while fooling automatic face recognition systems. Specifically, it introduces P-FGVM, a novel adversarial attack method, which operates on the image spatial domain and generates adversarial de-identified facial images that resemble the original ones. A comparison between P-FGVM and other adversarial attack methods shows that P-FGVM both protects privacy and preserves visual facial image quality more efficiently.",2019,2019 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2019.8803803,
19705579b8e7d955092ef54a22f95f557a455338,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Fiducial facial point extraction with cross ratio,"Automatic extraction of fiducial facial points is one of the key steps to face tracking, recognition and animation as well as video communication. In this paper, we present a method to localize 8 fiducial points in a face image with cross ratio (CR), a fundamental projective invariant. We derive strong shape priors, which characterize the intrinsic geometries shared by human faces, from CR statistics on a moderate size (515) of frontal upright faces. We combine these shape priors with Gabor textural features and edge/corner into a convex optimization. The Gabor features of local patches and geometric constraints from CR are insensitive to global perspective transformations. Thereafter, the proposed approach renders the robustness to great pose or viewpoint changes. Extensive experiments on facial images from several data sets with great variations on expressions, illuminations and poses demonstrate the effectiveness of the proposed approach.",2014,2014 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2014.7025277,
19c53302bda8a82ec40d314a85b1713f43058a1a,0,,,1,0,0,0,0,0,1,0,0,0,0,Deep learning models of biological visual information processing,"Improved computational models of biological vision can shed light on key processes contributing to the high accuracy of the human visual system. Deep learning models, which extract multiple layers of increasingly complex features from data, achieved recent breakthroughs on visual tasks. This thesis proposes such flexible data-driven models of biological vision and also shows how insights regarding biological visual processing can lead to advances within deep learning. To harness the potential of deep learning for modelling the retina and early vision, this work introduces a new dataset and a task simulating an early visual processing function and evaluates deep belief networks (DBNs) and deep neural networks (DNNs) on this input. The models are shown to learn feature detectors similar to retinal ganglion and V1 simple cells and execute early vision tasks. To model high-level visual information processing, this thesis proposes novel deep learning architectures and training methods. Biologically inspired Gaussian receptive field constraints are imposed on restricted Boltzmann machines (RBMs) to improve the fidelity of the data representation to encodings extracted by visual processing neurons. Moreover, concurrently with learning local features, the proposed local receptive field constrained RBMs (LRF-RBMs) automatically discover advantageous non-uniform feature detector placements from data. Following the hierarchical organisation of the visual cortex, novel LRF-DBN and LRF-DNN models are constructed using LRF-RBMs with gradually increasing receptive field sizes to extract consecutive layers of features. On a challenging face dataset, unlike DBNs, LRF-DBNs learn a feature hierarchy exhibiting hierarchical part-based composition. Also, the proposed deep models outperform DBNs and DNNs on face completion and dimensionality reduction, thereby demonstrating the strength of methods inspired by biological visual processing.",2016,,,,http://eprints.nottingham.ac.uk/35561/1/thesis_DianaTurcsany.pdf
1a11c31ccf0238c6d294cdf5f3c0b08f75679877,0,,,0,1,0,0,0,0,0,0,0,0,0,Knowledge Distillation in Deep Learning and its Applications,"Deep learning based models are relatively large, and it is hard to deploy such models on resource-limited devices such as mobile phones and embedded devices. One possible solution is knowledge distillation whereby a smaller model (student model) is trained by utilizing the information from a larger model (teacher model). In this paper, we present a survey of knowledge distillation techniques applied to deep learning models. To compare the performances of different techniques, we propose a new metric called distillation metric. Distillation metric compares different knowledge distillation algorithms based on sizes and accuracy scores. Based on the survey, some interesting conclusions are drawn and presented in this paper.",2020,ArXiv,2007.09029,,https://arxiv.org/pdf/2007.09029.pdf
1a34974b2191541b0f47abbbf50f8e6b2af8cbf2,0,,,0,0,0,0,0,0,0,0,0,1,0,Joint Face Alignment: Rescue Bad Alignments with Good Ones by Regularized Re-fitting,"Nowadays, more and more applications need to jointly align a set of facial images from one specific person, which forms the so-called joint face alignment problem. To address this problem, in this paper, starting from an initial face alignment results, we propose to enhance the alignments by a fundamentally novel idea: rescuing the bad alignments with their well-aligned neighbors. In our method, a discriminative alignment evaluator is well designed to assess the initial face alignments and separate the well-aligned images from the badly-aligned ones. To correct the bad ones, a robust regularized re-fitting algorithm is proposed by exploiting the appearance consistency between the badly-aligned image and its k well-aligned nearest neighbors. Experiments conducted on faces in the wild demonstrate that our method greatly improves the initial face alignment results of an off-the-shelf facial landmark locator. In addition, the effectiveness of our method is validated through comparing with other state-of-the-art methods in joint face alignment under complex conditions.",2012,ECCV,,10.1007/978-3-642-33709-3_44,
1a40c2a2d17c52c8b9d20648647d0886e30a60fa,1,"[D18], [D21]",,1,0,0,0,0,1,0,0,0,0,0,Hybrid hypergraph construction for facial expression recognition,"In this paper, we proposed a novel framework for facial expression recognition, in which face images were taken as vertices in a hypergraph and the task of expression recognition was formulated as the problem of hypergraph based inference. A hybrid strategy was developed to construct hyperedges: we generated probabilities of facial action units by deep convolutional networks and took each action unit as an ‘attribute’ to represent a hyperedge; we also formed hyperedges by using embedded network features before the last full connected layer to perform local clustering. In this way, each face image was assigned to various hyperedges by exploiting the representational power of deep convolutional networks. Our facial expression recognition system generates expression labels by a hypergraph based transductive inference approach, which tends to assign the same label to vertices that share many incidental hyperedges, with the constraints that predicted labels of training images should be similar to their ground truth labels. We compared the proposed approach to state-of-the-art methods and its effectiveness was demonstrated by extensive experimentation.",2016,2016 23rd International Conference on Pattern Recognition (ICPR),,10.1109/ICPR.2016.7900283,
1b797ec702b511613126579028d93ded341b70eb,0,,,0,1,0,0,0,0,0,0,0,0,0,Dairy Goat Image Generation Based on Improved-Self-Attention Generative Adversarial Networks,"The lack of long-range dependence in convolutional neural networks causes weaker performance in generative adversarial networks(GANs) with regard to generating image details. The self-attention generative adversarial network(SAGAN) use the self-attention mechanism to calculate the correlation coefficient between feature vectors, which improves the global coherence of the network. In this paper, we put forward an improved-self-attention GANs(Improved-SAGAN) to improve the method for calculating correlation in the SAGAN. We can better measure the correlation between features by normalizing the feature vectors to eliminate as many errors caused by noise as possible. As the network learns the global information by calculating the correlation coefficient between all features, it can make up for the defects of local receptive field in the convolution network. We replace the conventional one-hot label with multi-label to obtain more supervised information for generative adversarial networks. We generate dairy goat images based on auxiliary condition generative adversarial network(ACGAN) incorporating the normalized self-attention mechanism and prove that images generated under multi-label are of higher quality than images generated under one-hot label. The generative results of different networks on the public dataset are compared by the inception score and FID evaluation algorithms, and we propose a new evaluation algorithm called SSIM-Mean to measure the quality of generated dairy goat images to further verify the effectiveness of the improved-self-attention GANs.",2020,IEEE Access,,10.1109/ACCESS.2020.2981496,https://ieeexplore.ieee.org/ielx7/6287639/8948470/09039669.pdf
1d2b92dc49f3c3d69f629cf89c2d20d47feac532,0,,,0,1,0,0,0,0,0,0,0,0,0,Latent space mapping for generation of object elements with corresponding data annotation,"Abstract Deep neural generative models such as Variational Auto-Encoders (VAE) and Generative Adversarial Networks (GAN) give promising results in estimating the data distribution across a range of machine learning fields of application. Recent results have been especially impressive in image synthesis where learning the spatial appearance information is a key goal. This enables the generation of intermediate spatial data that corresponds to the original dataset. In the training stage, these models learn to decrease the distance of their output distribution to the actual data and, in the test phase, they map a latent space to the data space. Since these models have already learned their latent space mapping, one question is whether there is a function mapping the latent space to any aspect of the database for the given generator. In this work, it has been shown that this mapping is relatively straightforward using small neural network models and by minimizing the mean square error. As a demonstration of this technique, two example use cases have been implemented: firstly, the idea to generate facial images with corresponding landmark data and secondly, generation of low-quality iris images (as would be captured with a smartphone user-facing camera) with a corresponding ground-truth segmentation contour.",2018,Pattern Recognit. Lett.,,10.1016/j.patrec.2018.10.025,
1d3dd9aba79a53390317ec1e0b7cd742cba43132,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,A maximum entropy feature descriptor for age invariant face recognition,"In this paper, we propose a new approach to overcome the representation and matching problems in age invariant face recognition. First, a new maximum entropy feature descriptor (MEFD) is developed that encodes the microstructure of facial images into a set of discrete codes in terms of maximum entropy. By densely sampling the encoded face image, sufficient discriminatory and expressive information can be extracted for further analysis. A new matching method is also developed, called identity factor analysis (IFA), to estimate the probability that two faces have the same underlying identity. The effectiveness of the framework is confirmed by extensive experimentation on two face aging datasets, MORPH (the largest public-domain face aging dataset) and FGNET. We also conduct experiments on the famous LFW dataset to demonstrate the excellent generalizability of our new approach.",2015,2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),,10.1109/CVPR.2015.7299166,
1dcaa4803f58f230a1feac6ae20c00bd9d7c65c4,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Quadruplet-Center Loss for Face Verification,"Deep learning for face verification applications has proven to be productive. Most existing face verification methods focus on enhance the discriminative power of the deeply learned features with softmax loss or learning discriminative features with deep metric learning, which have their own advantages, but the combination of these two directions is more or less ignored. In this paper, a novel loss named quadruplet-center loss is proposed to learn more discriminative features for face verification task. The proposed quadruplet-center loss learns a center for deep features of each class, which forces the distances between the samples and centers from different classes are larger than those from same class and regardless of whether they contain different probe images or not. It is worth mentioning that a dynamic margin is presented based on the average distance between samples and corresponding centers for loss functions in a batch. Our method is evaluated on two widely-used benchmarks for face verification, which outperforms most of the state-of-the-art algorithms. The experimental results clearly demonstrate the effectiveness of our proposed classification loss.",2019,2019 Chinese Automation Congress (CAC),,10.1109/CAC48633.2019.8997490,
1e2339708811897942c18aa2da8c6eced90126bb,0,,,0,0,0,0,0,0,0,0,0,0,1,Age and Gender Prediction from Face Images Using Convolutional Neural Network,"Attribute information such as age and gender improves the performance of face recognition. This paper proposes an age and gender prediction method from face images using convolutional neural network. Through a set of experiments using public face databases, we demonstrate that the proposed method exhibits the efficient performance on age and gender prediction compared with conventional methods.",2018,2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC),,10.23919/APSIPA.2018.8659655,http://www.apsipa.org/proceedings/2018/pdfs/0000007.pdf
2003e228e7c7c5cb662cccb922cb9de559e88380,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,End-to-End Facial Deep Learning Feature Compression with Teacher-Student Enhancement,"In this paper, we propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks, towards intelligent front-end equipped analysis with promising accuracy and efficiency. In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost to achieve feature-in-feature representation. In order to further improve the compression performance, we present a latent code level teacher-student enhancement model, which could efficiently transfer the low bit-rate representation into a high bit rate one. Such a strategy further allows us to adaptively shift the representation cost to decoding computations, leading to more flexible feature compression with enhanced decoding capability. We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy compared with existing models.",2020,ArXiv,2002.03627,,https://arxiv.org/pdf/2002.03627.pdf
201d15361a78fe0d6428d939dff19f5bae0871a7,0,,,0,1,0,0,0,0,0,0,0,0,0,Facial Expression Recognition with Multi-scale Convolution Neural Network,"We present a deep convolutional neural network CNN architecture for facial expression recognition. Inspired by the fact that regions located around certain facial parts e.g. mouth, nose, eyes, and brows contain the most representative information of expressions, an architecture extracts features at different scale from intermediate layers is designed to combine both local and global information. In addition, noticing that in specific to facial expression recognition, traditional face alignment would distort the images and lose expression information. To avoid this side effect, we apply batch normalization to the architecture instead of face alignment and feed the network with original images. Moreover, considering the tiny differences between classes caused by the same facial movements, a triplet-loss learning method is used to train the architecture, which improves the discrimination of deep features. Experiments show that the proposed architecture achieves superior performance to other state-of-the-art methods on the FER2013 dataset.",2016,PCM,,10.1007/978-3-319-48890-5_37,
203d0eedfc7b9a4ef29afc1a4fd01fe4db6b6e99,0,,,1,1,0,0,0,0,0,0,0,0,0,A Review on Face Reenactment Techniques,"Existing Face Re-enactment approaches have two major limitations, first, they require large dataset of images to create photo-realistic face models and second, they do not generalize well if the facial images are not available in training dataset. The generation of a new facial reenactment requires large image dataset and hours are required to train these models. Some progress in Deep Learning has shown quite significant results using Generative Adversarial Networks (GANs). Recent works in GAN have solved the problem of large dataset training dataset by introducing the concept of few-shot learning. This paper reviews existing approaches in Face Re-enactment with few-shot learning techniques and other approaches in Face Re-enactment.",2020,2020 International Conference on Industry 4.0 Technology (I4Tech),,10.1109/I4Tech48345.2020.9102668,
20677be4f548e9268a9fd2f6c64bb621e168b7e1,0,,,0,1,0,0,0,0,0,0,0,0,0,Learning to Inpaint by Progressively Growing the Mask Regions,"Image inpainting is one of the most challenging tasks in computer vision. Recently, generative-based image inpainting methods have been shown to produce visually plausible images. However, they still have difficulties to generate the correct structures and colors as the masked region grows large. This drawback is due to the training stability issue of the generative models. This work introduces a new curriculum-style training approach in the context of image inpainting. The proposed method increases the masked region size progressively in training time, during test time the user gives variable size and multiple holes at arbitrary locations. Incorporating such an approach in GANs may stabilize the training and provides better color consistencies and captures object continuities. We validate our approach on the MSCOCO and CelebA datasets. We report qualitative and quantitative comparisons of our training approach in different models.",2019,2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW),2002.0928,10.1109/ICCVW.2019.00562,https://arxiv.org/pdf/2002.09280.pdf
20a224c5745a65f83e43a6954ea8ee053a2baae5,0,,,0,1,0,0,0,0,0,0,0,0,0,Attributing and Detecting Fake Images Generated by Known GANs,"The quality of GAN-generated fake images has improved significantly, and recent GAN approaches, such as StyleGAN, achieve near indistinguishability from real images for the naked eye. As a result, adversaries are attracted to using GAN-generated fake images for disinformation campaigns and fraud on social networks. However, training an image generation network to produce realistic-looking samples remains a timeconsuming and difficult problem, so adversaries are more likely to use published GAN models to generate fake images. In this paper, we analyze the frequency domain to attribute and detect fake images generated by a known GAN model. We derive a similarity metric on the frequency domain and develop a new approach for GAN image attribution. We conduct experiments on four trained GAN models and two real image datasets. Our results show high attribution accuracy against real images and those from other GAN models. We further analyze our method under evasion attempts and find the frequency-based approach is comparatively robust.",2020,,,,https://personal.utdallas.edu/~shao/papers/joslin_dls20.pdf
21346e7fdffe3a388a62ef5edeb3a0a9736b903b,1,[D20],,1,0,0,1,0,0,0,0,0,0,0,Learning hierarchical representations for face verification,"Most modern face recognition systems rely on a feature representation given by a hand-crafted image descriptor, such as Local Binary Patterns (LBP), and achieve improved performance by combining several such representations. In this paper, we propose deep learning as a natural source for obtaining additional, complementary representations. To learn features in high-resolution images, we make use of convolutional deep belief networks. Moreover, to take advantage of global structure in an object class, we develop local convolutional restricted Boltzmann machines, a novel convolutional learning model that exploits the global structure by not assuming stationarity of features across the image, while maintaining scalability and robustness to small misalignments. We also present a novel application of deep learning to descriptors other than pixel intensity values, such as LBP. In addition, we compare performance of networks trained using unsupervised learning against networks with random filters, and empirically show that learning weights not only is necessary for obtaining good multilayer representations, but also provides robustness to the choice of the network architecture parameters. Finally, we show that a recognition system using only representations obtained from deep learning can achieve comparable accuracy with a system using a combination of hand-crafted image descriptors. Moreover, by combining these representations, we achieve state-of-the-art results on a real-world face verification database.",2012,CVPR 2012,,,
21e158bcda4e10da88ee8da3799a6144b60d791f,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Population Matching Discrepancy and Applications in Deep Learning,"A differentiable estimation of the distance between two distributions based on samples is important for many deep learning tasks. One such estimation is maximum mean discrepancy (MMD). However, MMD suffers from its sensitive kernel bandwidth hyper-parameter, weak gradients, and large mini-batch size when used as a training objective. In this paper, we propose population matching discrepancy (PMD) for estimating the distribution distance based on samples, as well as an algorithm to learn the parameters of the distributions using PMD as an objective. PMD is defined as the minimum weight matching of sample populations from each distribution, and we prove that PMD is a strongly consistent estimator of the first Wasserstein metric. We apply PMD to two deep learning tasks, domain adaptation and generative modeling. Empirical results demonstrate that PMD overcomes the aforementioned drawbacks of MMD, and outperforms MMD on both tasks in terms of the performance as well as the convergence speed.",2017,NIPS,,,https://pdfs.semanticscholar.org/21e1/58bcda4e10da88ee8da3799a6144b60d791f.pdf
222f8c3df6ef628c14e07c24b2d394754a4454a4,0,,,1,0,0,0,0,0,0,0,0,0,0,Efficient resource allocation for automotive active vision systems,"Individual mobility on roads has a noticeable impact upon peoples’ lives, including traffic accidents resulting in severe, or even lethal injuries. Therefore the main goal when operating a vehicle is to safely participate in road-traffic while minimising the adverse effects on our environment. This goal is pursued by road safety measures ranging from safety-oriented road design to driver assistance systems. The latter require exteroceptive sensors to acquire information about the vehicle’s current environment. In this thesis an efficient resource allocation for automotive vision systems is proposed. The notion of allocating resources implies the presence of processes that observe the whole environment and that are able to efficiently direct attentive processes. Directing attention constitutes a decision making process dependent upon the environment it operates in, the goal it pursues, and the sensor resources and computational resources it allocates. The sensor resources considered in this thesis are a subset of the multi-modal sensor system on a test vehicle provided by Audi AG, which is also used to evaluate our proposed resource allocation system. This thesis presents an original contribution in three respects. First, a system architecture designed to efficiently allocate both high-resolution sensor resources and computational expensive processes based upon low-resolution sensor data is proposed. Second, a novel method to estimate 3-D range motion, efficient scan-patterns for spin image based classifiers, and an evaluation of track-to-track fusion algorithms present contributions in the field of data processing methods. Third, a Pareto efficient multi-objective resource allocation method is formalised, implemented, and evaluated using road traffic test sequences.",2009,,,,
2275c30c87b6aa33c691c703d8434abf8bb2b569,1,"[D18], [D20]",,1,0,0,1,0,0,0,0,0,0,0,A Novel Gaussian Mixture Model for Classification,"Gaussian Mixture Model (GMM) is a probabilistic model for representing normally distributed subpopulations within an overall population. It is usually used for unsupervised learning to learn the subpopulations and the subpopulation assignment automatically. It is also used for supervised learning or classification to learn the boundary of subpopulations. However, the performance of GMM as a classifier is not impressive compared with other conventional classifiers such as k-nearest neighbors (KNN), support vector machine (SVM), decision tree and naive Bayes. In this paper, we attempt to address this problem. We propose a GMM classifier, SC-GMM, based on the separability criterion in order to separate the Gaussian models as much as possible. This classifier finds the optimal number of Gaussian components for each class based on the separability criterion and then determines the parameters of these Gaussian components by using the expectation maximization algorithm. Extensive experiments have been carried out on classification tasks from general data mining to face verification. Results show that SC-GMM significantly outperforms the original GMM classifier. Results also show that SC-GMM is comparable in classification accuracy to three variants of GMM classifier: Akaike Information Criterion based GMM (AIC-GMM), Bayesian Information Criterion based GMM (BIC-GMM) and variational Bayesian gaussian mixture (VBGM). However, SC-GMM is significantly more efficient than both AIC-GMM and BIC-GMM. Furthermore, compared with KNN, SVM, decision tree and naive Bayes, SC-GMM achieves competitive classification performance.",2019,"2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)",,10.1109/SMC.2019.8914215,
22d51400424c93eebc136213d51c1aebc9c1f715,0,,,0,1,0,0,0,0,0,0,0,0,0,Learning Implicit Generative Models by Teaching Explicit Ones,"Implicit generative models are difficult to train as no explicit density functions are defined. Generative adversarial nets (GANs) present a minimax framework to train such models, which however can suffer from mode collapse due to the nature of the JS-divergence. This paper presents a learning by teaching (LBT) approach to learning implicit models, which intrinsically avoids the mode collapse problem by optimizing a KL-divergence rather than the JS-divergence in GANs. In LBT, an auxiliary explicit model is introduced to fit the distribution defined by the implicit model while the later one teaches the explicit model to match the data distribution. LBT is formulated as a bilevel optimization problem, whose optimal generator matches the 1 data distribution. LBT can be naturally integrated with GANs to derive a hybrid LBT-GAN that enjoys complimentary benefits. %implies that we obtain the maximum likelihood estimation of the implicit model. Finally, we present a stochastic gradient ascent algorithm with unrolling to solve the challenging learning problems. Experimental results demonstrate the effectiveness of our method.",2018,ArXiv,1807.0387,,https://arxiv.org/pdf/1807.03870.pdf
22e34d1577578ceaa36e573a67802f63bc586f53,0,,,0,1,0,0,0,0,0,0,0,0,0,Deep Learning based Detection of Hair Loss Levels from Facial Images,"Hair loss is a phenomenon known to affect people’s morale and self-confidence. Often, the awareness of the phenomenon and the possibilities of treatment is late. This paper investigates deep learning methods for detecting hairs loss levels by men from face images. In this context, a specific training dataset has been prepared with face images having varied levels of baldness. Moreover, in spite of the low visibility of hairs in such images, a matching method is proposed for automatically classifying facial images with respect to pattern classification tables of male baldness from the medical area. Experimental results show the potential and the efficiency for medical, security and commercial applications.",2019,"2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA)",,10.1109/IPTA.2019.8936122,
234c106036964131c0f2daf76c47ced802652046,1,,1,1,0,0,0,0,0,0,0,0,0,0,Adaptive facial point detection and emotion recognition for a humanoid robot,"We propose a robust landmark detector to deal with pose variation and occlusions.SVRs and NNs are respectively used to estimate intensities of 18 selected AUs.Fuzzy c-means clustering is employed to detect seven basic and compound emotions.Our unsupervised facial point detector outperforms other supervised models.The overall development is integrated with a modern humanoid robot platform. Automatic perception of facial expressions with scaling differences, pose variations and occlusions would greatly enhance natural human robot interaction. This research proposes unsupervised automatic facial point detection integrated with regression-based intensity estimation for facial action units (AUs) and emotion clustering to deal with such challenges. The proposed facial point detector is able to detect 54 facial points in images of faces with occlusions, pose variations and scaling differences using Gabor filtering, BRISK (Binary Robust Invariant Scalable Keypoints), an Iterative Closest Point (ICP) algorithm and fuzzy c-means (FCM) clustering. Especially, in order to effectively deal with images with occlusions, ICP is first applied to generate neutral landmarks for the occluded facial elements. Then FCM is used to further reason the shape of the occluded facial region by taking the prior knowledge of the non-occluded facial elements into account. Post landmark correlation processing is subsequently applied to derive the best fitting geometry for the occluded facial element to further adjust the neutral landmarks generated by ICP and reconstruct the occluded facial region. We then conduct AU intensity estimation respectively using support vector regression and neural networks for 18 selected AUs. FCM is also subsequently employed to recognize seven basic emotions as well as neutral expressions. It also shows great potential to deal with compound and newly arrived novel emotion class detection. The overall system is integrated with a humanoid robot and enables it to deal with challenging real-life facial emotion recognition tasks.",2015,Comput. Vis. Image Underst.,,10.1016/j.cviu.2015.07.007,https://ijcter.com/papers/volume-2/issue-4/facial-point-detection-and-emotion-recognition-for-a-humanoid-robot.pdf
23eb127b0f74aa4cbd5761516dd45cf9089ad15d,1,[D20],,1,0,1,0,0,0,0,0,0,0,0,Uncorrelated regularized local Fisher discriminant analysis for face recognition,"Abstract. A local Fisher discriminant analysis can work well for a multimodal problem. However, it often suffers from the undersampled problem, which makes the local within-class scatter matrix singular. We develop a supervised discriminant analysis technique called uncorrelated regularized local Fisher discriminant analysis for image feature extraction. In this technique, the local within-class scatter matrix is approximated by a full-rank matrix that not only solves the undersampled problem but also eliminates the poor impact of small and zero eigenvalues. Statistically uncorrelated features are obtained to remove redundancy. A trace ratio criterion and the corresponding iterative algorithm are employed to globally solve the objective function. Experimental results on four famous face databases indicate that our proposed method is effective and outperforms the conventional dimensionality reduction methods.",2014,J. Electronic Imaging,,10.1117/1.JEI.23.4.043017,
23f27c91fc7fee369aa99f607386c3b1b4652fff,0,,,0,1,0,0,0,0,0,0,0,0,0,MSG-CapsGAN: Multi-Scale Gradient Capsule GAN for Face Super Resolution,"One of the most useful sub-fields of Super-Resolution (SR) is face SR. Given a Low-Resolution (LR) image of a face, the High-Resolution (HR) counterpart is demanded. However, performing SR task on extremely low resolution images is very challenging due to the image distortion in the HR results. Many deep learning-based SR approaches have intended to solve this issue by using attribute domain information. However, they require more complex data and even additional networks. To simplify this process and yet preserve the precision, a novel Multi-Scale Gradient GAN with Capsule Network as its discriminator is proposed in this paper. MSG-CapsGAN surpassed the state-of-the-art face SR networks in terms of PSNR. This network is a step towards a precise pose invariant SR system.",2020,"2020 International Conference on Electronics, Information, and Communication (ICEIC)",,10.1109/ICEIC49074.2020.9051244,
23f44c187f579d8fa698b639556d37046cb30c86,0,,,0,1,0,0,0,0,0,0,0,0,0,Generative Landmark Guided Face Inpainting,,2020,PRCV,,10.1007/978-3-030-60633-6_2,
2563b2adba98788a217565ba5a648f83cb75eeeb,0,,,1,0,0,0,0,0,0,0,0,0,0,Weight-Optimal Local Binary Patterns,"In this work, we have proposed a learning paradigm for obtaining weight-optimal local binary patterns (WoLBP). We first re-formulate the LBP problem into matrix multiplication with all the bitmaps flattened and then resort to the Fisher ratio criterion for obtaining the optimal weight matrix for LBP encoding. The solution is closed form and can be easily solved using one eigen-decomposition. The experimental results on the FRGC ver2.0 database have shown that the WoLBP gains significant performance improvement over traditional LBP, and such WoLBP learning procedure can be directly ported to many other LBP variants to further improve their performances.",2014,ECCV Workshops,,10.1007/978-3-319-16181-5_11,http://vigir.missouri.edu/~gdesouza/Research/Conference_CDs/ECCV_2014/workshops/w07/W07-15.pdf
25fce91ce1b974865506c14d2e4714d8db2672d1,0,,,1,0,0,0,0,0,0,0,0,0,0,Toward a Practical Face Recognition System: Robust Alignment and Illumination by Sparse Representation,"Many classic and contemporary face recognition algorithms work well on public data sets, but degrade sharply when they are used in a real recognition system. This is mostly due to the difficulty of simultaneously handling variations in illumination, image misalignment, and occlusion in the test image. We consider a scenario where the training images are well controlled and test images are only loosely controlled. We propose a conceptually simple face recognition system that achieves a high degree of robustness and stability to illumination variation, image misalignment, and partial occlusion. The system uses tools from sparse representation to align a test face image to a set of frontal training images. The region of attraction of our alignment algorithm is computed empirically for public face data sets such as Multi-PIE. We demonstrate how to capture a set of training images with enough illumination variation that they span test images taken under uncontrolled illumination. In order to evaluate how our algorithms work under practical testing conditions, we have implemented a complete face recognition system, including a projector-based training acquisition system. Our system can efficiently and effectively recognize faces under a variety of realistic conditions, using only frontal images under the proposed illuminations as training.",2012,IEEE Transactions on Pattern Analysis and Machine Intelligence,,10.1109/TPAMI.2011.112,http://www.columbia.edu/~jw2966/papers/WWGZMM12-PAMI.pdf
269c965748104e54a309980a7e9c29e16845df35,0,,,1,1,0,0,0,0,0,0,0,0,0,Evaluating Face Tracking for Political Analysis in Japanese News Over a Long Period of Time,"TV news is the major source of political information for most of the people, exercising a strong influence on public opinion. Japanese news media try to carefully balance politicians’ representation, but it is important to empirically examine this balance longitudinally. However, it is a tedious, and in many cases impossible task to achieve manually, especially when the news archive covers over a decade. We therefore rely on automatic procedures to do it computationally rather than manually. In this paper, we compare the two face tracking methods as well as a traditional text-based method by using the same dataset of Japanese broadcasting news spanning over a decade. We evaluate the three methods against a manually curated random sample of NHK’s News 7, the flagship news program of the Japanese public broadcasting. The first tracking method is inherited from previous works used on the same dataset and based on traditional Viola-Jones detections and VGGFace for embeddings. Our second method uses modern deep learning techniques with MTCNN for face detection, and ResNet50 trained with VGGFace2 for embeddings. We not only demonstrate that our modern implementation outperforms the two other methods, but also discuss implications and application for social scientific studies.",2019,WI,,10.1145/3358695.3360928,
273785b386eaf01be96e217a2a8aa1c2ee694c2e,0,,,0,1,0,0,0,0,0,0,0,0,0,ReRAM-based accelerator for deep learning,"Big data computing applications such as deep learning and graph analytic usually incur a large amount of data movements. Deploying such applications on conventional von Neumann architecture that separates the processing units and memory components likely leads to performance bottleneck due to the limited memory bandwidth. A common approach is to develop architecture and memory co-design methodologies to overcome the challenge. Our research follows the same strategy by leveraging resistive memory (ReRAM) to further enhance the performance and energy efficiency. Specifically, we employ the general principles behind processing-in-memory to design efficient ReRAM based accelerators that support both testing and training operations. Related circuit and architecture optimization will be discussed too.",2018,"2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)",,10.23919/DATE.2018.8342118,http://alchem.usc.edu/portal/static/download/reram.pdf
27ce54d9d9212e133adee12eb21b5a03ec803a3d,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Towards large scale multimedia indexing: A case study on person discovery in broadcast news,"The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery in the absence of prior identity knowledge requires accurate association of audio-visual cues and detected names. To this end, we present 3 different strategies to approach this problem: clustering-based naming, verification-based naming, and graph-based naming. Each of these strategies utilizes different recent advances in unsupervised face / speech representation, verification, and optimization. To have a better understanding of the approaches, this paper also provides a quantitative and qualitative comparative study of these approaches using the associated corpus of the Person Discovery challenge at MediaEval 2016. From the results of our experiments, we can observe the pros and cons of each approach, thus paving the way for future promising research directions.",2017,CBMI '17,,10.1145/3095713.3095732,https://hal.archives-ouvertes.fr/hal-01551690/file/Le_et_al_CBMI.pdf
28134e3a7e88faf2402b8420a8ebe5fe35990613,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Face Recognition Based on Deep Learning,"As one of the non-contact biometrics, face representation had been widely used in many circumstances. However conventional methods could no longer satisfy the demand at present, due to its low recognition accuracy and restrictions of many occasions. In this paper, we presented the deep learning method to achieve facial landmark detection and unrestricted face recognition. To solve the face landmark detection problem, this paper proposed a layer-by-layer training method of a deep convolutional neural network to help the convolutional neural network to converge and proposed a sample transformation method to avoid over-fitting. This method had reached an accuracy of 91% on ORL face database. To solve the face recognition problem, this paper proposed a SIAMESE convolutional neural network which was trained on different parts and scales of a face and concatenated the face representation. The face recognition algorithm had reached an accuracy of 91% on ORL and 81% on LFW face database.",2014,HCC,,10.1007/978-3-319-15554-8_73,
28f6b4efb25e51eb49424185a602ee2bf2d06164,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Virtual Portraitist: Aesthetic Evaluation of Selfies Based on Angle,"This work addresses the Huawei Grand Challenge that seeks solutions of quality improvement and functionality extension in computational photography. We propose virtual portraitist-a new method that helps users take good selfies in angle. Dissimilar to current solutions that mostly use a post-processing step to fix a photograph, the proposed method enables a novel function of recommending a good look before the photo is captured. This is achieved by using an automatic approach for estimating the aesthetic quality score of a selfie based on angle. In particular, a set of distinctive patterns discovered from a collection of online profile pictures are combined with head pose and camera orientation to rate the quality of a selfie. Experiments validate the effectiveness of the approach.",2014,MM '14,,10.1145/2647868.2656401,
2a35d20b2c0a045ea84723f328321c18be6f555c,0,,,1,0,0,0,0,0,0,0,0,0,0,Boost Picking: A Universal Method on Converting Supervised Classification to Semi-supervised Classification,"This paper proposes a universal method, Boost Picking, to train supervised classification models mainly by un-labeled data. Boost Picking only adopts two weak classifiers to estimate and correct the error. It is theoretically proved that Boost Picking could train a supervised model mainly by un-labeled data as effectively as the same model trained by 100% labeled data, only if recalls of the two weak classifiers are all greater than zero and the sum of precisions is greater than one. Based on Boost Picking, we present ""Test along with Training (TawT)"" to improve the generalization of supervised models. Both Boost Picking and TawT are successfully tested in varied little data sets.",2016,ArXiv,1602.05659,,https://arxiv.org/pdf/1602.05659.pdf
2ac0ddfcf03943d3462f01b95ee7be0a3e6ef724,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Fusing global and local features for face verification,"In the literature of neurophysiology and computer vision, global and local features have both been demonstrated to be complementary for robust face recognition and verification. In this paper, we propose an approach for face verification by fusing global and local discriminative features. In this method, global features are extracted from whole face images by Fourier transform and local features are extracted from ten different component patches by a new image representation method named Histogram of Local Phase Quantization Ordinal Measures (HOLPQOM). Experimental results on the Labeled Face in Wild (LFW) benchmark show the robustness of the proposed local descriptor, compared with other often-used descriptors.",2013,Other Conferences,,10.1117/12.2030875,
2ad96488aa03242a8ef83e97a1c45632dd092867,0,,,0,1,0,0,0,0,0,0,0,0,0,GuidedStyle: Attribute Knowledge Guided Style Manipulation for Semantic Face Editing,"Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there still lacks of control over the generation process in order to achieve semantic face editing. In addition, it remains very challenging to maintain other face information untouched while editing the target attributes. In this paper, we propose a novel learning framework, called GuidedStyle, to achieve semantic face editing on StyleGAN by guiding the image generation process with a knowledge network. Furthermore, we allow an attention mechanism in StyleGAN generator to adaptively select a single layer for style manipulation. As a result, our method is able to perform disentangled and controllable edits along various attributes, including smiling, eyeglasses, gender, mustache and hair color. Both qualitative and quantitative results demonstrate the superiority of our method over other competing methods for semantic face editing. Moreover, we show that our model can be also applied to different types of real and artistic face editing, demonstrating strong generalization ability.",2020,,2012.11856,,https://arxiv.org/pdf/2012.11856.pdf
2ae2bd011af817e75ea3fecef75325df9651986d,0,,,0,0,0,0,0,0,0,0,0,1,0,Facial component-landmark detection,"Landmark detection has proven to be a very challenging task in biometrics. In this paper, we address the task of facial component-landmark detection. By “component” we refer to a rectangular subregion of the face, containing an anatomical component (e.g., “eye”). We present a fully-automated system for facial component-landmark detection based on multi-resolution isotropic analysis and adaptive bag-of-words descriptors incorporated into a cascade of boosted classifiers. Specifically, first each component-landmark detector is applied independently and then the information obtained is used to make inferences for the localization of multiple components. The advantage of our approach is that it has robustness to pose as well as illumination. Our method has a failure rate lower than that of commercial software. Additionally, we demonstrate that using our method for the initialization of a point landmark detector results in performance comparable with that of state-of-the-art methods. All of our experiments are carried out using data from a publicly available database.",2011,Face and Gesture 2011,,10.1109/FG.2011.5771411,http://cbl.uh.edu/pub_files/comp_land_det17_148.pdf
2b695a7ca8d1dc63309f92f914027b69dd9f8d0b,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,"Face Recognition from Multiple Stylistic Sketches: Scenarios, Datasets, and Evaluation","Matching a face sketch against mug shots, which plays an important role in law enforcement and security, is an interesting and challenging topic in face recognition community. Although great progress has been made in recent years, main focus is the face recognition based on SINGLE sketch in existing studies. In this paper, we present a fundamental study of face recognition from multiple stylistic sketches. Three specific scenarios with corresponding datasets are carefully introduced to mimic real-world situations: (1) recognition from multiple hand-drawn sketches; (2) recognition from hand-drawn sketch and composite sketches; (3) recognition from multiple composite sketches. We further provide the evaluation protocols and several benchmarks on these proposed scenarios. Finally, we discuss the plenty of challenges and possible future directions that worth to be further investigated. All the materials will be publicly available online (Available at http://chunleipeng.com/FRMSketches.html.) for comparisons and further study of this problem.",2016,ECCV Workshops,,10.1007/978-3-319-46604-0_1,
2b8fe56756a2c103bc60cc81881d39a0ee163627,0,,,0,1,0,0,0,0,0,0,0,0,0,Multi-scale Generative Adversarial Learning for Facial Attribute Transfer,"Generative Adversarial Network (GAN) has shown its impressive ability on facial attribute transfer. One crucial part in facial attribute transfer is to retain the identity. To achieve this, most of existing approaches employ the L1 norm to maintain the cycle consistency, which tends to cause blurry results due to the weakness of the L1 loss function. To address this problem, we introduce the Structural Similarity Index (SSIM) in our GAN training objective as the measurement between input images and reconstructed images. Furthermore, we also incorporate a multi-scale feature fusion structure into the generator to facilitate feature learning and encourage long-term correlation. Qualitative and quantitative experiments show that our method has achieved better visual quality and fidelity than the baseline on facial attribute transfer.",2019,IFTC,,10.1007/978-981-15-3341-9_8,
2c2de09ca3750f8dbdbad9b16cd70850f5594deb,1,[D34],,1,1,0,0,0,0,0,0,0,0,0,MimicGAN: Robust Projection onto Image Manifolds with Corruption Mimicking,"In the past few years, Generative Adversarial Networks (GANs) have dramatically advanced our ability to represent and parameterize high-dimensional, non-linear image manifolds. As a result, they have been widely adopted across a variety of applications, ranging from challenging inverse problems like image completion, to problems such as anomaly detection and adversarial defense. A recurring theme in many of these applications is the notion of projecting an image observation onto the manifold that is inferred by the generator. In this context, Projected Gradient Descent (PGD) has been the most popular approach, which essentially optimizes for a latent vector that minimizes the discrepancy between a generated image and the given observation. However, PGD is a brittle optimization technique that fails to identify the right projection (or latent vector) when the observation is corrupted, or perturbed even by a small amount. Such corruptions are common in the real world, for example images in the wild come with unknown crops, rotations, missing pixels, or other kinds of non-linear distributional shifts which break current encoding methods, rendering downstream applications unusable. To address this, we propose corruption mimicking—a new robust projection technique, that utilizes a surrogate network to approximate the unknown corruption directly at test time, without the need for additional supervision or data augmentation. The proposed method is significantly more robust than PGD and other competing methods under a wide variety of corruptions, thereby enabling a more effective use of GANs in real-world applications. More importantly, we show that our approach produces state-of-the-art performance in several GAN-based applications—anomaly detection, domain adaptation, and adversarial defense, that benefit from an accurate projection.",2020,International Journal of Computer Vision,1912.07748,10.1007/s11263-020-01310-5,https://arxiv.org/pdf/1912.07748.pdf
2c60d3bd2b53d9eaeae9d91d43761b0974abe704,0,,,0,1,0,0,0,0,0,0,0,0,0,Relaxed Multivariate Bernoulli Distribution and Its Applications to Deep Generative Models,"Recent advances in variational auto-encoder (VAE) have demonstrated the possibility of approximating the intractable posterior distribution with a variational distribution parameterized by a neural network. To optimize the variational objective of VAE, the reparameterization trick is commonly applied to obtain a lowvariance estimator of the gradient. The main idea of the trick is to express the variational distribution as a differentiable function of parameters and a random variable with a fixed distribution. To extend the reparameterization trick to inference involving discrete latent variables, a common approach is to use a continuous relaxation of the categorical distribution as the approximate posterior. However, when applying continuous relaxation to the multivariate cases, multiple variables are typically assumed to be independent, making it suboptimal in applications where modeling dependency is crucial to the overall performance. In this work, we propose a multivariate generalization of the Relaxed Bernoulli distribution, which can be reparameterized and can capture the correlation between variables via a Gaussian copula. We demonstrate its effectiveness in two tasks: density estimation with Bernoulli VAE and semisupervised multi-label classification.",2020,UAI,,,https://pdfs.semanticscholar.org/2c60/d3bd2b53d9eaeae9d91d43761b0974abe704.pdf
2cc541baf1f0b46e87b77a448e972e44ae55f3ca,0,,,1,0,0,0,0,0,0,0,0,0,0,On Low-Resolution Face Recognition in the Wild: Comparisons and New Techniques,"Although face recognition systems have achieved impressive performance in recent years, the low-resolution face recognition task remains challenging, especially when the low-resolution faces are captured under non-ideal conditions, which is widely prevalent in surveillance-based applications. Faces captured in such conditions are often contaminated by blur, non-uniform lighting, and non-frontal face pose. In this paper, we analyze the face recognition techniques using data captured under low-quality conditions in the wild. We provide a comprehensive analysis of the experimental results for two of the most important applications in real surveillance applications, and demonstrate practical approaches to handle both cases that show promising performance. The following three contributions are made: (i) we conduct experiments to evaluate super-resolution methods for low-resolution face recognition; (ii) we study face re-identification on various public face datasets, including real surveillance and low-resolution subsets of large-scale datasets, presenting a baseline result for several deep learning-based approaches, and improve them by introducing a generative adversarial network pre-training approach and fully convolutional architecture; and (iii) we explore the low-resolution face identification by employing a state-of-the-art supervised discriminative learning approach. The evaluations are conducted on challenging portions of the SCface and UCCSface datasets.",2019,IEEE Transactions on Information Forensics and Security,1805.11529,10.1109/TIFS.2018.2890812,https://arxiv.org/pdf/1805.11529.pdf
2dafea864f74a477414c3b71b742f7997e216102,0,,,1,0,0,0,0,0,0,0,0,0,0,Energy-Aware Mobile Edge Computing and Routing for Low-Latency Visual Data Processing,"New paradigms such as Mobile Edge Computing (MEC) are becoming feasible for use in, e.g., real-time decision-making during disaster incident response to handle the data deluge occurring in the network edge. However, MEC deployments today lack flexible IoT device data handling such as handling user preferences for real-time versus energy-efficient processing. Moreover, MEC can also benefit from a policy-based edge routing to handle sustained performance levels with efficient energy consumption. In this paper, we study the potential of MEC to address application issues related to energy management on constrained IoT devices with limited power sources, while also providing low-latency processing of visual data being generated at high resolutions. Using a facial recognition application that is important in disaster incident response scenarios, we propose a novel “offload decision-making” algorithm that analyzes the tradeoffs in computing policies to offload visual data processing (i.e., to an edge cloud or a core cloud) at low-to-high workloads. This algorithm also analyzes the impact on energy consumption in the decision-making under different visual data consumption requirements (i.e., users with thick clients or thin clients). To address the processing-throughput versus energy-efficiency tradeoffs, we propose a “Sustainable Policy-based Intelligence-Driven Edge Routing” algorithm that uses machine learning within Mobile Ad hoc Networks. This algorithm is energy aware and improves the geographic routing baseline performance (i.e., minimizes impact of local minima) for throughput performance sustainability, while also enabling flexible policy specification. We evaluate our proposed algorithms by conducting experiments on a realistic edge and core cloud testbed in the GENI Cloud infrastructure, and recreate disaster scenes of tornado damages within simulations. Our empirical results show how MEC can provide flexibility to users who desire energy conservation over low latency or vice versa in the visual data processing with a facial recognition application. In addition, our simulation results show that our routing approach outperforms existing solutions under diverse user preferences, node mobility, and severe node failure conditions.",2018,IEEE Transactions on Multimedia,,10.1109/TMM.2018.2865661,http://cell.missouri.edu/media/publications/2018_08438902.pdf
2ddabbf8b84239c42efbffbfbe7a3bfc5e2d6404,0,,,1,1,0,0,0,0,0,0,0,0,0,GAN-Leaks: A Taxonomy of Membership Inference Attacks against GANs,"In recent years, the success of deep learning has carried over from discriminative models to generative models. In particular, generative adversarial networks (GANs) have facilitated a new level of performance ranging from media manipulation to dataset re-generation. Despite the success, the potential risks of privacy breach stemming from GANs are less well explored. In this paper, we focus on membership inference attack against GANs that has the potential to reveal information about victim models' training data. Specifically, we present the first taxonomy of membership inference attacks, which encompasses not only existing attacks but also our novel ones. We also propose the first generic attack model that can be instantiated in various settings according to adversary's knowledge about the victim model. We complement our systematic analysis of attack vectors with a comprehensive experimental study, that investigates the effectiveness of these attacks w.r.t. model type, training configurations, and attack type across three diverse application scenarios ranging from images, over medical data to location data. We show consistent effectiveness in all the setups, which bridges the assumption gap and performance gap in previous study with a complete spectrum of performance across settings. We conclusively remind users to think over before publicizing any part of their models.",2019,ArXiv,1909.03935,,https://arxiv.org/pdf/1909.03935.pdf
3002c16de1027be0911ba2642811c68d6059d37a,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Boosting Network Weight Separability via Feed-Backward Reconstruction,"This paper proposes a new evaluation metric and a boosting method for weight separability in neural network design. In contrast to general visual recognition methods designed to encourage both intra-class compactness and inter-class separability of latent features, we focus on estimating linear independence of column vectors in weight matrix and improving the separability of weight vectors. To this end, we propose an evaluation metric for weight separability based on semi-orthogonality of a matrix, Frobenius distance, and the feed-backward reconstruction loss, which explicitly encourages weight separability between the column vectors in the weight matrix. The experimental results on image classification and face recognition demonstrate that the weight separability boosting via minimization of feed-backward reconstruction loss can improve the visual recognition performance, hence universally boosting the performance on various visual recognition tasks.",2020,IEEE Access,1910.09024,10.1109/ACCESS.2020.3041470,https://ieeexplore.ieee.org/ielx7/6287639/6514899/09274406.pdf
30282087b90d9103c8b428a4a6d5ebf4cc84794b,0,,,0,1,0,0,0,0,0,0,0,0,0,GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling,"Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images. Recent studies have shown remarkable success for multiple domains but they suffer from two main limitations: they are either built from several two-domain mappings that are required to be learned independently, or they generate low-diversity results, a problem known as mode collapse. To overcome these limitations, we propose a method named GMM-UNIT, which is based on a content-attribute disentangled representation where the attribute space is fitted with a GMM. Each GMM component represents a domain, and this simple assumption has two prominent advantages. First, it can be easily extended to most multi-domain and multi-modal image-to-image translation tasks. Second, the continuous domain encoding allows for interpolation between domains and for extrapolation to unseen domains and translations. Additionally, we show how GMM-UNIT can be constrained down to different methods in the literature, meaning that GMM-UNIT is a unifying framework for unsupervised image-to-image translation.",2020,ArXiv,2003.06788,,https://arxiv.org/pdf/2003.06788.pdf
304fa538ea00166f7100d99f2a7bb808e02547e4,0,,,0,1,0,0,0,0,0,0,0,0,0,Energy-Inspired Models: Learning with Sampler-Induced Distributions,"Energy-based models (EBMs) are powerful probabilistic models, but suffer from intractable sampling and density evaluation due to the partition function. As a result, inference in EBMs relies on approximate sampling algorithms, leading to a mismatch between the model and inference. Motivated by this, we consider the sampler-induced distribution as the model of interest and maximize the likelihood of this model. This yields a class of energy-inspired models (EIMs) that incorporate learned energy functions while still providing exact samples and tractable log-likelihood lower bounds. We describe and evaluate three instantiations of such models based on truncated rejection sampling, self-normalized importance sampling, and Hamiltonian importance sampling. These models outperform or perform comparably to the recently proposed Learned Accept/Reject Sampling algorithm and provide new insights on ranking Noise Contrastive Estimation and Contrastive Predictive Coding. Moreover, EIMs allow us to generalize a recent connection between multi-sample variational lower bounds and auxiliary variable variational inference. We show how recent variational bounds can be unified with EIMs as the variational family.",2019,NeurIPS,1910.14265,,https://arxiv.org/pdf/1910.14265.pdf
3060ac37dec4633ef69e7bc63488548ab3511f61,0,,,0,0,0,0,0,1,0,0,0,0,0,A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots,"We have recently seen significant advancements in the development of robotic machines that are designed to assist people with their daily lives. Socially assistive robots are now able to perform a number of tasks autonomously and without human supervision. However, if these robots are to be accepted by human users, there is a need to focus on the form of human–robot interaction that is seen as acceptable by such users. In this paper, we extend our previous work, originally presented in Ruiz-Garcia et al. (in: Engineering applications of neural networks: 17th international conference, EANN 2016, Aberdeen, UK, September 2–5, 2016, proceedings, pp 79–93, 2016. https://doi.org/10.1007/978-3-319-44188-7_6), to provide emotion recognition from human facial expressions for application on a real-time robot. We expand on previous work by presenting a new hybrid deep learning emotion recognition model and preliminary results using this model on real-time emotion recognition performed by our humanoid robot. The hybrid emotion recognition model combines a Deep Convolutional Neural Network (CNN) for self-learnt feature extraction and a Support Vector Machine (SVM) for emotion classification. Compared to more complex approaches that use more layers in the convolutional model, this hybrid deep learning model produces state-of-the-art classification rate of $$96.26\%$$96.26%, when tested on the Karolinska Directed Emotional Faces dataset (Lundqvist et al. in The Karolinska Directed Emotional Faces—KDEF, 1998), and offers similar performance on unseen data when tested on the Extended Cohn–Kanade dataset (Lucey et al. in: Proceedings of the third international workshop on CVPR for human communicative behaviour analysis (CVPR4HB 2010), San Francisco, USA, pp 94–101, 2010). This architecture also takes advantage of batch normalisation (Ioffe and Szegedy in Batch normalization: accelerating deep network training by reducing internal covariate shift. http://arxiv.org/abs/1502.03167, 2015) for fast learning from a smaller number of training samples. A comparison between Gabor filters and CNN for feature extraction, and between SVM and multilayer perceptron for classification is also provided.",2018,Neural Computing and Applications,,10.1007/s00521-018-3358-8,http://eprints.leedsbeckett.ac.uk/4939/6/AHybridDeepLearningNeural%20pproachforEmotionAM-ALTAHHAN.pdf
309352addd5225ef997ec1520f0489c00e14c14a,0,,,1,0,0,0,0,0,0,0,0,0,0,Machine Learning and Understanding for Intelligent Extreme Scale Scientific Computing and Discovery DOE Workshop Report,"Cover: Machine learning techniques can be applied to a wide range of DOE research areas, such as automatically identifying weather phenomena in massive simulation datasets..",2015,,,,http://web.eecs.utk.edu/~mberry/ascr/ML_DOE_Report_5.pdf
30eeb0331e281fb64e093507ee9f76e9f3818767,0,,,0,0,0,0,0,0,0,1,0,0,0,"Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN","Contemporary automatic speaker recognition (ASR) systems do not provide 100% accuracy making it imperative to explore different techniques to improve it. Easy access to mobile devices and advances in sensor technology, has made voice a preferred parameter for biometrics. Here, a comparative analysis of accuracies obtained in ASR with employment of classical Gaussian mixture model (GMM), support vector machine (SVM) which is the machine learning algorithm and the state of art 1-D CNN as classifiers is presented. Authors propose considering dynamic voice features along with static features as relevant speaker information in them lead to substantial improvement in the accuracy for ASR. As concatenation of features leads to the redundancy and increased computation complexity, Fisher score algorithm was employed to select the best contributing features resulting in improvement in accuracy. The results indicate that SVM and 1-D Neural network outperform GMM. Support Vector Machine (SVM), and 1-D CNN gave comparable results with 1-D CNN giving an improved accuracy of 94.77% in ASR.",2020,,,10.1007/s10772-020-09771-2,
3136cab00cfb223ceb9aff78af2c165b6e71a878,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Open source biometric recognition,"The biometrics community enjoys an active research field that has produced algorithms for several modalities suitable for real-world applications. Despite these developments, there exist few open source implementations of complete algorithms that are maintained by the community or deployed outside a laboratory environment. In this paper we motivate the need for more community-driven open source software in the field of biometrics and present OpenBR as a candidate to address this deficiency. We overview the OpenBR software architecture and consider still-image frontal face recognition as a case study to illustrate its strengths and capabilities. All of our work is available at www.openbiometrics.org.",2013,"2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS)",,10.1109/BTAS.2013.6712754,http://biometrics.cse.msu.edu/Publications/GeneralBiometrics/Klontzetal_OpenSourceBiometricRecognition_BTAS13.pdf
314c4c95694ff12b3419733db387476346969932,1,"[D18], [D28], [D27]",,1,0,0,0,0,0,0,0,0,0,0,Adaptive Metric Learning with the Low Rank Constraint,"Good quality distance metrics can significantly promote the performance of many computer vision applications. In order to learn an appropriate distance metric, most of existing metric learning approaches restrict the learned distances between similar pairs to be smaller than a given lower bound, while the learned distances between dissimilar pairs are required to be larger than a given upper bound. However, the learned metrics may not perform well by leveraging the fixed bounds, especially when the data distributions are complex in practical applications. Besides, most methods attempt to learn a distance metric with a full rank matrix transformation from the given training data, which is not only inefficient to compute but also prone to overfitting. In this paper, we propose an Adaptive Metric Learning with the Low Rank Constraint (AML-LR) method, which restricts the learned distances between examples of pairs using adaptive bounds and meanwhile the rank of the learned matrix is minimized. Therefore, the learned metric is adaptive to different data distributions and robust to avoid overfitting. To solve the proposed optimization problem efficiently, we present an effective optimization algorithm based on the accelerated gradient method. Experimental results on UCI datasets and face verification databases demonstrate that AML-LR achieves competitive results compared with other state-of-the-art metric learning methods.",2016,ICIMCS'16,,10.1145/3007669.3007672,
31ee9185ae5d97d91bc31322a3fdb431eac66cb9,0,,,1,0,0,0,0,0,0,0,0,0,0,Feature Fusion with Covariance Matrix Regularization in Face Recognition,"The fusion of multiple features is important for achieving state-of-the-art face recognition results. This has been proven in both traditional and deep learning approaches. Existing feature fusion methods either reduce the dimensionality of each feature first and then concatenate all low-dimensional feature vectors, named as DR-Cat, or the vice versa, named as Cat-DR. However, DR-Cat ignores the correlation information between different features which is useful for classification. In Cat-DR, on the other hand, the correlation information estimated from the training data may not be reliable especially when the number of training samples is limited. We propose a covariance matrix regularization (CMR) technique to solve problems of DR-Cat and Cat-DR. It works by assigning weights to cross-feature covariances in the covariance matrix of training data. Thus the feature correlation estimated from training data is regularized before being used to train the feature fusion model. The proposed CMR is applied to 4 feature fusion schemes: fusion of pixel values from 3 color channels, fusion of LBP features from 3 color channels, fusion of pixel values and LBP features from a single color channel, and fusion of CNN features extracted by 2 deep models. Extensive experiments of face recognition and verification are conducted on databases including MultiPIE, Georgia Tech, AR and LFW. Results show that the proposed CMR technique significantly and consistently outperforms the best single feature, DR-Cat and Cat-DR.",2019,,,,
32fbd7e8e9fa2387cbd6d23b9925fe7c2072cb92,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,A New Robust Color Descriptor for Face Detection,"Most state-of-the-art approaches to object and face detection rely on intensity information and ignore color information, as it usually exhibits variations due to illumination changes and shadows, and due to the lower spatial resolution in color channels than in the intensity image. We propose a new color descriptor, derived from a variant of Local Binary Patterns, designed to achieve invariance to monotonic changes in chroma. The descriptor is produced by histograms of encoded color texture similarity measures of small radially-distributed patches. As it is based on similarities of local patches, we expect the descriptor to exhibit a high degree of invariance to local appearance and pose changes. We demonstrate empirically by simulation the invariance of the descriptor to photometric variations, i.e. illumination changes and image noise, geometric variations, i.e. face pose and camera viewpoint, and discriminative power in a face detection setting. Lastly, we show that the contribution of the presented descriptor to face detection performance is significant and superior to several other color descriptors, which are in use for object detection. This color descriptor can be applied in color-based object detection and recognition tasks.",2015,ICPRAM,,10.5220/0005177400130021,https://pdfs.semanticscholar.org/32fb/d7e8e9fa2387cbd6d23b9925fe7c2072cb92.pdf
33658ee91ae67f3c92542dd0f0838b48c994ae4d,0,,,0,1,0,0,0,0,0,0,0,0,0,Robust Head Detection in Collaborative Learning Environments Using AM-FM Representations,"The paper introduces the problem of robust head detection in collaborative learning environments. In such environments, the camera remains fixed while the students are allowed to sit at different parts of a table. Example challenges include the fact that students may be facing away from the camera or exposing different parts of their face to the camera. To address these issues, the paper proposes the development of two new methods based on Amplitude Modulation-Frequency Modulation (AM-FM) models. First, a combined approach based on color and FM texture is developed for robust face detection. Secondly, a combined approach based on processing the AM and FM components is developed for robust, back of the head detection. The results of the two approaches are also combined to detect all of the students sitting at each table. The robust face detector achieved 79% accuracy on a set of 1000 face image examples. The back of the head detector achieved 91% accuracy on a set of 363 test image examples.",2018,2018 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI),,10.1109/SSIAI.2018.8470355,http://ivpcl.unm.edu/bibtex_php/Conferences_Pdfs/RobustHeadDetectioninCollaborativeLearning.pdf
33ac7fd3a622da23308f21b0c4986ae8a86ecd2b,0,,,0,0,1,0,0,0,0,0,0,0,0,Building an On-Demand Avatar-Based Health Intervention for Behavior Change,"We discuss the design and implementation of the pro- totype of an avatar-based health system aimed at pro- viding people access to an effective behavior change intervention which can help them to find and cultivate motivation to change unhealthy lifestyles. An empathic Embodied Conversational Agent (ECA) delivers the in- tervention. The health dialog is directed by a compu- tational model of Motivational Interviewing, a novel effective face-to-face patient-centered counseling style which respects an individual’s pace toward behavior change. Although conducted on a small sample size, re- sults of a preliminary user study to asses users’ accep- tance of the avatar counselor indicate that the current early version of the system prototype is well accepted by 75% of users.",2012,FLAIRS Conference,,,http://cake.fiu.edu/Publications/Lisetti+al-12-BO.Building_an_On-demand_Avatar-based_Health_Intervention.FLAIRS2012.AAAI-published.paper.pdf
34fe617eb4881289eb1d33c4aedc905e9c3f22b9,0,,,0,1,0,0,0,0,0,0,0,0,0,Joint Multi-View Face Alignment in the Wild,"The de facto algorithm for facial landmark estimation involves running a face detector with a subsequent deformable model fitting on the bounding box. This encompasses two basic problems: 1) the detection and deformable fitting steps are performed independently, while the detector might not provide the best-suited initialization for the fitting step, and 2) the face appearance varies hugely across different poses, which makes the deformable face fitting very challenging and thus distinct models have to be used (e.g., one for profile and one for frontal faces). In this paper, we propose the first, to the best of our knowledge, joint multi-view convolutional network to handle large pose variations across faces in-the-wild, and elegantly bridge face detection and facial landmark localization tasks. The existing joint face detection and landmark localization methods focus only on a very small set of landmarks. By contrast, our method can detect and align a large number of landmarks for semi-frontal (68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a plethora of datasets including the standard static image datasets such as IBUG, 300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile faces. A significant improvement over the state-of-the-art methods on deformable face tracking is witnessed on the 300VW benchmark. We also demonstrate state-of-the-art results for face detection on FDDB and MALF datasets.",2019,IEEE Transactions on Image Processing,1708.06023,10.1109/TIP.2019.2899267,https://arxiv.org/pdf/1708.06023.pdf
36e25994cfeab3dc487f9a82139c08f26cebf92f,0,,,0,1,0,0,0,0,0,0,0,0,0,Annealed Generative Adversarial Networks,"We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed {\ss}-GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, {\ss}-GAN with a fixed annealing schedule was stable and did not suffer from mode collapse.",2017,ArXiv,1705.07505,,https://arxiv.org/pdf/1705.07505.pdf
38d8ff137ff753f04689e6b76119a44588e143f3,0,,,1,0,0,0,0,0,0,0,0,0,0,When 3D-Aided 2D Face Recognition Meets Deep Learning: An extended UR2D for Pose-Invariant Face Recognition,"Most of the face recognition works focus on specific modules or demonstrate a research idea. This paper presents a pose-invariant 3D-aided 2D face recognition system (UR2D) that is robust to pose variations as large as 90? by leveraging deep learning technology. The architecture and the interface of UR2D are described, and each module is introduced in detail. Extensive experiments are conducted on the UHDB31 and IJB-A, demonstrating that UR2D outperforms existing 2D face recognition systems such as VGG-Face, FaceNet, and a commercial off-the-shelf software (COTS) by at least 9% on the UHDB31 dataset and 3% on the IJB-A dataset on average in face identification tasks. UR2D also achieves state-of-the-art performance of 85% on the IJB-A dataset by comparing the Rank-1 accuracy score from template matching. It fills a gap by providing a 3D-aided 2D face recognition system that has compatible results with 2D face recognition systems using deep learning techniques.",2017,ArXiv,1709.06532,,https://arxiv.org/pdf/1709.06532.pdf
392c8e575f8520bb880959d494be0911d091b525,1,"[D18], [D28]",,1,0,0,0,0,0,0,0,0,0,0,Cross-Modal Metric Learning for AUC Optimization,"Cross-modal metric learning (CML) deals with learning distance functions for cross-modal data matching. The existing methods mostly focus on minimizing a loss defined on sample pairs. However, the numbers of intraclass and interclass sample pairs can be highly imbalanced in many applications, and this can lead to deteriorating or unsatisfactory performances. The area under the receiver operating characteristic curve (AUC) is a more meaningful performance measure for the imbalanced distribution problem. To tackle the problem as well as to make samples from different modalities directly comparable, a CML method is presented by directly maximizing AUC. The method can be further extended to focus on optimizing partial AUC (pAUC), which is the AUC between two specific 0 positive rates (FPRs). This is particularly useful in certain applications where only the performances assessed within predefined 0 positive ranges are critical. The proposed method is formulated as a log-determinant regularized semidefinite optimization problem. For efficient optimization, a minibatch proximal point algorithm is developed. The algorithm is experimentally verified stable with the size of sampled pairs that form a minibatch at each iteration. Several data sets have been used in evaluation, including three cross-modal data sets on face recognition under various scenarios and a single modal data set, the Labeled Faces in the Wild. Results demonstrate the effectiveness of the proposed methods and marked improvements over the existing methods. Specifically, pAUC-optimized CML proves to be more competitive for performance measures such as Rank-1 and verification rate at FPR = 0.1%.",2018,IEEE Transactions on Neural Networks and Learning Systems,,10.1109/TNNLS.2017.2769128,https://www.research.manchester.ac.uk/portal/files/60894764/AUC_Metric_Learning_TNNLS_Revise_SecondRound_revb.pdf
3a43935221938c868b33cbb68ccaaa0b5118694e,0,,,1,0,0,0,0,0,0,0,0,0,0,Facial image registration,"Face alignment is an important step in a typical automatic face recognition system.This thesis addresses the alignment of faces for face recognition applicationin video surveillance context. The main challenging factors of this research includethe low quality of images (e.g., low resolution, motion blur, and noise), uncontrolledillumination conditions, pose variations, expression changes, and occlusions. In orderto deal with these problems, we propose several face alignment methods using differentstrategies. The _rst part of our work is a three-stage method for facial pointlocalization which can be used for correcting mis-alignment errors. While existingalgorithms mostly rely on a priori knowledge of facial structure and on a trainingphase, our approach works in an online mode without requirements of pre-de_nedconstraints on feature distributions. The proposed method works well on images underexpression and lighting variations. The key contributions of this thesis are aboutjoint image alignment algorithms where a set of images is simultaneously alignedwithout a biased template selection. We respectively propose two unsupervised jointalignment algorithms : \Lucas-Kanade entropy congealing"" (LKC) and \gradient correlationcongealing"" (GCC). In LKC, an image ensemble is aligned by minimizing asum-of-entropy function de_ned over all images. GCC uses gradient correlation coef-_cient as similarity measure. The proposed algorithms perform well on images underdi_erent conditions. To further improve the robustness to mis-alignments and thecomputational speed, we apply a multi-resolution framework to joint face alignmentalgorithms. Moreover, our work is not limited in the face alignment stage. Since facealignment and face acquisition are interrelated, we develop an adaptive appearanceface tracking method with alignment feedbacks. This closed-loop framework showsits robustness to large variations in target's state, and it signi_cantly decreases themis-alignment errors in tracked faces.",2012,,,,https://pdfs.semanticscholar.org/7011/7b802bb446d6d2594d246e34e5e391fa51e0.pdf
3a4f8a62abef8c10a6e561572fa9bfdbf6099af3,1,[D20],,1,0,0,0,1,0,0,0,0,0,0,Robust Linear Subspace for Image Set Retrieval,"This paper attempts to take advantage of both dual linear regression and sparse coding for set-to-set based object recognition. In order to determine the right category of test image set, our algorithm finds a virtual object in the intersection of two subspace, one of which is represented by a sparse linear combination of the images in test image set, the other represented by a sparse linear combinations of all images in the gallery. The quality of the representation of the virtual object using images of each category in the gallery is evaluated and used to make the decision of the classification. Experiments on the benchmarks Caltech101, YouTube and LFW are carried out to verify the effectiveness of the algorithm. The results demonstrate that our algorithm achieved best classification accuracy with state-of-the-art methods.",2020,ICMLC,,10.1145/3383972.3383976,
3b3941524d97e7f778367a1250ba1efb9205d5fc,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Open Source Face Recognition Performance Evaluation Package,"Biometrics-related research has been accelerated significantly by deep learning technology. However, there are limited open-source resources to help researchers evaluate their deep learning-based biometrics algorithms efficiently, especially for the face recognition tasks. In this work, we design and implement a light-weight, maintainable, scalable, generalizable, and extendable face recognition evaluation toolbox named FaRE that supports both online and offline evaluation to provide feedback to algorithm development and accelerate biometrics-related research. FaRE consists of a set of evaluation metric functions and provides various APIs for commonly-used face recognition datasets including LFW, CFP, UHDB31, and IJB-series datasets, which can be easily extended to include other customized datasets. The package and the pre-trained baseline models will be released for public academic research use after obtaining university approval.",2019,ArXiv,1901.09447,,https://arxiv.org/pdf/1901.09447.pdf
3b3efb099c993514dcf87b19b78d7dc341d0c034,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Face Extraction and Recognition from Public Images Using HIPI,"Social networking services with public data are widely used nowadays. Billions of images uploaded to the internet each day over the world. This paper proposes the idea of a system which is currently being developed. The system collects images from public sources by some specific criteria, applies face detection and recognition algorithms on collected images, and provides the results in readable form. The Apache Hadoop library is used to increase performance of the system, and images are downloaded by the help of HIPI library. For testing purposes Labeled Faces in the Wild benchmark is used as a database for images containing over 13233 images of 5749 identities. Each image is jpg format in 250*250 resolutions. Results were more than good after testing the system with this database for face detection and recognition.",2018,2018 14th International Conference on Electronics Computer and Computation (ICECCO),,10.1109/ICECCO.2018.8634718,
3c93a0d4722a151559b16b5ad3ddb693cef636e1,0,,,0,1,0,0,0,0,0,0,0,0,0,Efficient Process-in-Memory Architecture Design for Unsupervised GAN-based Deep Learning using ReRAM,"The ending of Moore's Law makes domain-specific architecture as the future of computing. The most representative is the emergence of various deep learning accelerators. Among the proposed solutions, resistive random access memory (ReRAM) based process-in-memory (PIM) architecture is anticipated as a promising candidate because ReRAM has the capability of both data storage and in-situ computation. However, we found that existing solutions are unable to efficiently support the computational needs required by the training of unsupervised generative adversarial networks (GANs), due to the lack of the following two features: 1) Computation efficiency: GAN utilizes a new operator, called transposed convolution. It inserts massive zeros in its input before a convolution operation, resulting in significant resource under-utilization; 2) Data traffic: The data intensive training process of GANs often incurs structural heavy data traffic as well as frequent massive data swaps. Our research follows the PIM strategy by leveraging the energy-efficiency of ReRAM arrays for vector-matrix multiplication to enhance the performance and energy efficiency. Specifically, we propose a novel computation deformation technique that can skip zero-insertions in transposed convolution for computation efficiency improvement. Moreover, we explore an efficient pipelined training procedure to reduce on-chip memory access. The implementation of related circuits and architecture is also discussed. At the end, we present our perspective on the future trend and opportunities of deep learning accelerators.",2019,ACM Great Lakes Symposium on VLSI,,10.1145/3299874.3319482,
3cadb554a0022a5b12b87930177ba95d4fe0d868,0,,,1,0,1,0,0,0,0,0,0,0,0,Template adaptation for face verification and identification,"Abstract Face recognition performance evaluation has traditionally focused on one-to-one verification, popularized by the Labeled Faces in the Wild data set [1] for imagery and the YouTubeFaces data set [2] for videos. In contrast, the newly released IJB-A face recognition data set [3] unifies evaluation of one-to-many face identification with one-to-one face verification over templates, or sets of imagery and videos for a subject. In this paper, we study the problem of template adaptation, a form of transfer learning to the set of media in a template. Extensive performance evaluations on IJB-A show a surprising result, that perhaps the simplest method of template adaptation, combining deep convolutional network features with template specific linear SVMs, outperforms the state-of-the-art by a wide margin. We study the effects of template size, negative set construction and classifier fusion on performance, then compare template adaptation to convolutional networks with metric learning, 2D and 3D alignment. Our unexpected conclusion is that these other methods, when combined with template adaptation, all achieve nearly the same top performance on IJB-A for template-based face verification and identification.",2018,Image Vis. Comput.,,10.1016/j.imavis.2018.09.002,
3d17bd832ca3e3f1fc84624a3093ae84d2bce041,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Hierarchical Feature-Pair Relation Networks for Face Recognition,"We propose a novel face recognition method using a Hierarchical Feature Relational Network (HFRN) which extracts facial part representations around facial landmark points, and predicts hierarchical latent relations between facial part representations. These hierarchical latent relations should be unique relations within the same identity and discriminative relations among different identities for face recognition task. To do this, the HFRN extracts appearance features as facial parts representations around facial landmark points on the feature maps, globally pool these extracted appearance features onto single feature vectors, and captures the relations for the pairs of appearance features. The HFRN captures the locally detailed relations in the low-level layers and the locally abstracted global relations in the high-level layers for the pairs of appearance features extracted around facial landmark points projected on each layer, respectively. These relations from low-level layers to high-level layers are concatenated into a single hierarchical relation feature. To further improve the accuracy of face recognition, we combine the global appearance feature with the hierarchical relation feature. In experiments, the proposed method achieves the comparable performance in the 1:1 face verification and 1:N face identification tasks compared to existing state-of-the-art methods on the challenging IARPA Janus Benchmark A (IJB-A) and IARPA Janus Benchmark B (IJB-B) datasets.",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),,10.1109/CVPRW.2019.00286,http://openaccess.thecvf.com/content_CVPRW_2019/papers/Biometrics/Kang_Hierarchical_Feature-Pair_Relation_Networks_for_Face_Recognition_CVPRW_2019_paper.pdf
3dced9b381570a52d77e8c3470d31d3ee9507f9c,1,"[D21], [D31]",,0,0,0,0,0,1,0,0,0,0,0,Age and gender classification in the wild with unsupervised feature learning,"Abstract. Inspired by unsupervised feature learning (UFL) within the self-taught learning framework, we propose a method based on UFL, convolution representation, and part-based dimensionality reduction to handle facial age and gender classification, which are two challenging problems under unconstrained circumstances. First, UFL is introduced to learn selective receptive fields (filters) automatically by applying whitening transformation and spherical k-means on random patches collected from unlabeled data. The learning process is fast and has no hyperparameters to tune. Then, the input image is convolved with these filters to obtain filtering responses on which local contrast normalization is applied. Average pooling and feature concatenation are then used to form global face representation. Finally, linear discriminant analysis with part-based strategy is presented to reduce the dimensions of the global representation and to improve classification performances further. Experiments on three challenging databases, namely, Labeled faces in the wild, Gallagher group photos, and Adience, demonstrate the effectiveness of the proposed method relative to that of state-of-the-art approaches.",2017,J. Electronic Imaging,,10.1117/1.JEI.26.2.023007,
3e8bc8ed766a5fb0224bf61409599de21c221c56,0,,,1,0,0,0,0,0,0,0,0,0,0,A Face in any Form: New Challenges and Opportunities for Face Recognition Technology,"Despite new technologies that make face detection and recognition more sophisticated, long-recognized problems in security, privacy, and accuracy persist. Refining this technology and introducing it into new domains will require solving these problems through focused interdisciplinary efforts among developers, researchers, and policymakers.",2017,Computer,,10.1109/MC.2017.119,
3e93d3b6b0cdc24ea5ee2b1b03e6a6a5a1d97f0c,0,,,0,1,0,0,0,0,0,0,0,0,0,Cross-Modal Deep Face Normals With Deactivable Skip Connections,"We present an approach for estimating surface normals from in-the-wild color images of faces. While data-driven strategies have been proposed for single face images, limited available ground truth data makes this problem difficult. To alleviate this issue, we propose a method that can leverage all available image and normal data, whether paired or not, thanks to a novel cross-modal learning architecture. In particular, we enable additional training with single modality data, either color or normal, by using two encoder-decoder networks with a shared latent space. The proposed architecture also enables face details to be transferred between the image and normal domains, given paired data, through skip connections between the image encoder and normal decoder. Core to our approach is a novel module that we call deactivable skip connections, which allows integrating both the auto-encoded and image-to-normal branches within the same architecture that can be trained end-to-end. This allows learning of a rich latent space that can accurately capture the normal information. We compare against state-of-the-art methods and show that our approach can achieve significant improvements, both quantitative and qualitative, with natural face images.",2020,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2003.09691,10.1109/CVPR42600.2020.00503,https://arxiv.org/pdf/2003.09691.pdf
3e9ab40e6e23f09d16c852b74d40264067ac6abc,1,[D18],,1,0,0,1,0,0,0,0,0,0,0,Learning Locally-Adaptive Decision Functions for Person Verification,"This paper considers the person verification problem in modern surveillance and video retrieval systems. The problem is to identify whether a pair of face or human body images is about the same person, even if the person is not seen before. Traditional methods usually look for a distance (or similarity) measure between images (e.g., by metric learning algorithms), and make decisions based on a fixed threshold. We show that this is nevertheless insufficient and sub-optimal for the verification problem. This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule. We further formulate the inference on our decision function as a second-order large-margin regularization problem, and provide an efficient algorithm in its dual from. We evaluate our algorithm on both human body verification and face verification problems. Our method outperforms not only the classical metric learning algorithm including LMNN and ITML, but also the state-of-the-art in the computer vision community.",2013,2013 IEEE Conference on Computer Vision and Pattern Recognition,,10.1109/CVPR.2013.463,http://www.ifp.illinois.edu/~chang87/papers/cvpr_2013.pdf
3f3d10db649a63d7142906511e8cff0a2e1e1cd3,0,,,0,0,0,0,0,0,0,0,0,1,0,Mural Sketch Generation via Style-aware Convolutional Neural Network,"Sketch is one of the most important art expression forms for traditional Chinese painting. This paper presents a complete sketch generation framework for ancient mural paintings. First, we propose a deep learning network to perform mural-to-sketch prediction by combining meaningful convolutional features in a holistic manner. A dedicated mural database with fine-grained ground truth is built for network training and testing. Then we design a style-aware image fusion approach by detecting the specific feature region in a mural, from which the artistic style can be maximally preserved. Experimental results have demonstrated its validity in extracting style mural sketch. This work has the potential to provide a computer aided tool for artists and restorers to imitate and restore time-honored paintings.",2018,CGI 2018,,10.1145/3208159.3208160,
40096a032691b5cd6372be50aefed57f1dd9949c,0,,,0,1,0,0,0,0,0,0,0,0,0,Learning Spatial Attention for Face Super-Resolution,"General image super-resolution techniques have difficulties in recovering detailed face structures when applying to low resolution face images. Recent deep learning based methods tailored for face images have achieved improved performance by jointly trained with additional task such as face parsing and landmark prediction. However, multi-task learning requires extra manually labeled data. Besides, most of the existing works can only generate relatively low resolution face images (e.g., 128×128), and their applications are therefore limited. In this paper, we introduce a novel SPatial Attention Residual Network (SPARNet) built on our newly proposed Face Attention Units (FAUs) for face super-resolution. Specifically, we introduce a spatial attention mechanism to the vanilla residual blocks. This enables the convolutional layers to adaptively bootstrap features related to the key face structures and pay less attention to those less feature-rich regions. This makes the training more effective and efficient as the key face structures only account for a very small portion of the face image. Visualization of the attention maps shows that our spatial attention network can capture the key face structures well even for very low resolution faces (e.g., 16×16). Quantitative comparisons on various kinds of metrics (including PSNR, SSIM, identity similarity, and landmark detection) demonstrate the superiority of our method over current state-of-the-arts. We further extend SPARNet with multi-scale discriminators, named as SPARNetHD, to produce high resolution results (i.e., 512×512). We show that SPARNetHD trained with synthetic data cannot only produce high quality and high resolution outputs for synthetically degraded face images, but also show good generalization ability to real world low quality face images.",2020,IEEE transactions on image processing : a publication of the IEEE Signal Processing Society,2012.01211,10.1109/TIP.2020.3043093,https://arxiv.org/pdf/2012.01211.pdf
4015d798a8c6feef1ff21b44948d9f6c537ce64b,0,,,0,1,0,0,0,0,0,0,0,0,0,Parallel-Pathway Generator for Generative Adversarial Networks to Generate High-Resolution Natural Images,"Generative Adversarial Networks (GANs) can learn various generative models such as probability distribution and images, while it is difficult to converge training. There are few successful methods for generating high-resolution images. In this paper, we propose the parallel-pathway generator network to generate high-resolution natural images. Our parallel network are constructed by parallelly stacked generators with different structure. To investigate the effect of our structure, we apply it to two image generation tasks: human-face image and road image which does not have square resolution. Results indicate that our method can generate high-resolution natural images with few parameter tuning.",2017,ICANN,,10.1007/978-3-319-68612-7_74,
40638a7a9e0a0499af46053c6efc05ce0b088a28,0,,,0,1,0,0,0,0,0,0,0,0,0,On the convergence properties of GAN training,"Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this note we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is generally not convergent. Furthermore, we discuss recent regularization strategies that were proposed to stabilize GAN training. Our analysis shows that while GAN training with instance noise or gradient penalties converges, Wasserstein-GANs and Wasserstein-GANs-GP with a finite number of discriminator updates per generator update do in general not converge to the equilibrium point. We explain these results and show that both instance noise and gradient penalties constitute solutions to the problem of purely imaginary eigenvalues of the Jacobian of the gradient vector field. Based on our analysis, we also propose a simplified gradient penalty with the same effects on local convergence as more complicated penalties.",2018,ArXiv,,,
413992b048847aa6e82631420799403e61516d23,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Analyzing and Reducing the Damage of Dataset Bias to Face Recognition With Synthetic Data,"It is well known that deep learning approaches to face recognition suffer from various biases in the available training data. In this work, we demonstrate the large potential of synthetic data for analyzing and reducing the negative effects of dataset bias on deep face recognition systems. In particular we explore two complementary application areas for synthetic face images: 1) Using fully annotated synthetic face images we can study the face recognition rate as a function of interpretable parameters such as face pose. This enables us to systematically analyze the effect of different types of dataset biases on the generalization ability of neural network architectures. Our analysis reveals that deeper neural network architectures can generalize better to unseen face poses. Furthermore, our study shows that current neural network architectures cannot disentangle face pose and facial identity, which limits their generalization ability. 2) We pre-train neural networks with large-scale synthetic data that is highly variable in face pose and the number of facial identities. After a subsequent fine-tuning with real-world data, we observe that the damage of dataset bias in the real-world data is largely reduced. Furthermore, we demonstrate that the size of real-world datasets can be reduced by 75% while maintaining competitive face recognition performance. The data and software used in this work are publicly available.",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),,10.1109/CVPRW.2019.00279,https://edoc.unibas.ch/75257/1/20200128164027_5e3055eb775f1.pdf
4180978dbcd09162d166f7449136cb0b320adf1f,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Real-time head pose classification in uncontrolled environments with Spatio-Temporal Active Appearance Models,"In this paper, we present a full-automatic real-time system for recognizing the head pose in uncontrolled environments over a cotinouos spatio-temporal behavior by the subject . The method is based on tracking facial features through Active Appearance Models. To differentiate and identify the different head pose we use a multiclassifier composed of different binary Support Vector Machines. Finally, we propose a continuous solution to the problem using the Tait-Bryan angles, addressing the problem of the head pose as an object that performs a rotary motion in dimensional space.",2010,,,,https://pdfs.semanticscholar.org/4180/978dbcd09162d166f7449136cb0b320adf1f.pdf
430cd3385d26ea1f58d61fb700635fdbe037b823,0,,,0,0,1,0,0,0,0,0,0,0,0,Multi-directional local adjacency descriptors (MDLAD) for heterogeneous face recognition,"This paper presents new image descriptors for heterogeneous face recognition (HFR). The proposed descriptors combine directional and neighborhood information using a rotating spoke and concentric rings concept. We name the descriptors as multi-directional local adjacency descriptors (MDLAD). This family of descriptor captures the directional information through successive rotations of a pair of orthogonal spokes. Likewise, they capture the adjacency information through a comparison against the central pixel of a window with concentric rings around the central pixel. The MDLAD is found to describe the face images well for recognition purposes, which when matched using the chi-squared distance. The face recognition performance with MDLAD improves with its use as a layer in a deep neural network, which yields a robust classification for heterogeneous face recognition with respect to the state-of-the-art methods. The MDLADNET deep network is easily trainable with few hyperparameters and limited data samples as compared to existing similar deep networks. We have experimented on different heterogeneous modalities viz. Extended Yale B, CASIA, CUFSF, IIITD, LFW, Multi-PIE, and CARL, and have found proficient results.",2020,IET Image Process.,,10.1049/iet-ipr.2019.0199,
43bb20ccfda7b111850743a80a5929792cb031f0,0,,,0,0,0,0,0,0,0,0,0,1,0,Discrimination of Computer Generated versus Natural Human Faces,"The development of computer graphics technologies has been bringing realism to computer generated multimedia data, e.g., scenes, human characters and other objects, making them achieve a very high quality level. However, these synthetic objects may be used to create situations which may not be present in real world, hence raising the demand of having advance tools for differentiating between real and artificial data. Indeed, since 2005 the research community on multimedia forensics has started to develop methods to identify computer generated multimedia data, focusing mainly on images. However, most of them do not achieved very good performances on the problem of identifying CG characters. The objective of this doctoral study is to develop efficient techniques to distinguish between computer generated and natural human faces. We focused our study on geometric-based forensic techniques, which exploit the structure of the face and its shape, proposing methods both for image and video forensics. Firstly, we proposed a method to differentiate between computer generated and photographic human faces in photos. Based on the estimation of the face asymmetry, a given photo is classified as computer generated or not. Secondly, we introduced a method to distinguish between computer generated and natural faces based on facial expressions analysis. In particular, small variations of the facial shape models corresponding to the same expression are used as evidence of synthetic characters. Finally, by exploiting the differences between face models over time, we can identify synthetic animations since their models are usually recreated or performed in patterns, comparing to the models of natural animations.",2014,,,,http://eprints-phd.biblio.unitn.it/1168/1/dnductien_PhDThesis.pdf
43e9e3b8f61b9e950633fe7415e8be9ed79c1f22,0,,,0,1,0,0,0,0,0,0,0,0,0,Modifying social dimensions of human faces with ModifAE,"Author(s): Atalla, Chad E | Advisor(s): Cottrell, Gary | Abstract: At first glance, humans extract social judgments from faces, including how trustworthy, attractive, and aggressive they look. These impressions have profound social, economic, and political consequences, as they subconsciously influence decisions like voting and criminal sentencing. Therefore, understanding human perception of these judgments is important for the social sciences. In this work, we present a modifying autoencoder (ModifAE, pronounced ``modify'') that can model and alter these facial impressions. We assemble a face impression dataset large enough for training a generative model by applying a state-of-the-art (SOTA) impression predictor to faces from CelebA. Then, we apply ModifAE to learn generalizable modifications of these continuous-valued traits in faces (e.g., make a face look slightly more intelligent or much less aggressive). ModifAE can modify face images to create controlled social science experimental datasets, and it can reveal dataset biases by creating direct visualizations of what makes a face salient in social dimensions. The ModifAE architecture is also smaller and faster than SOTA image-to-image translation models, while outperforming SOTA in quantitative evaluations.",2019,CogSci,,,https://pdfs.semanticscholar.org/43e9/e3b8f61b9e950633fe7415e8be9ed79c1f22.pdf
44f48a4b1ef94a9104d063e53bf88a69ff0f55f3,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Automatically Building Face Datasets of New Domains from Weakly Labeled Data with Pretrained Models,"Training data are critical in face recognition systems. However, labeling a large scale face data for a particular domain is very tedious. In this paper, we propose a method to automatically and incrementally construct datasets from massive weakly labeled data of the target domain which are readily available on the Internet under the help of a pretrained face model. More specifically, given a large scale weakly labeled dataset in which each face image is associated with a label, i.e. the name of an identity, we create a graph for each identity with edges linking matched faces verified by the existing model under a tight threshold. Then we use the maximal subgraph as the cleaned data for that identity. With the cleaned dataset, we update the existing face model and use the new model to filter the original dataset to get a larger cleaned dataset. We collect a large weakly labeled dataset containing 530,560 Asian face images of 7,962 identities from the Internet, which will be published for the study of face recognition. By running the filtering process, we obtain a cleaned datasets (99.7+% purity) of size 223,767 (recall 70.9%). On our testing dataset of Asian faces, the model trained by the cleaned dataset achieves recognition rate 93.1%, which obviously outperforms the model trained by the public dataset CASIA whose recognition rate is 85.9%.",2016,ArXiv,1611.08107,,https://arxiv.org/pdf/1611.08107.pdf
457de9ee7729629dbcb2f0ff9bef434977e6d8a3,0,,,1,0,0,0,0,0,0,0,0,0,0,A Selection Module for Large-Scale Face Recognition Systems,"Face recognition systems aimed at working on large scale datasets are required to solve specific hurdles. In particular, due to the huge amount of data, it becomes mandatory to furnish a very fast and effective approach. Moreover the solution should be scalable, that is it should deal efficiently the growing of the gallery with new subjects. In literature, most of the works tackling this problem are composed of two stages, namely the selection and the classification. The former is aimed at significantly pruning the face image gallery, while the latter, often expensive but precise, determines the probe identity on this reduced domain. In this article a new selection method is presented, combining a multi-feature representation and the least squares method. Data are split into sub-galleries so as to make the system more efficient and scalable. Experiments on the union of four challenging datasets and comparisons with the state-of-the-art prove the effectiveness of our method.",2015,ICIAP,,10.1007/978-3-319-23234-8_49,https://air.unimi.it/retrieve/handle/2434/426481/663299/ICIAP_2015_FR.pdf
458322fe323106409a73b44c2f0efeb2339e9af8,0,,,1,0,0,0,0,0,0,0,0,0,0,Unconstrained Facial Recognition Systems: A Review,"Face recognition presents a challenging problem in the field of image analysis and computer vision, and as such has received a great deal of attention over the last few years because of its applications in various domains. Face recognition under controlled environment that is where pose, illumination and other factors are controlled, has well been developed in the literature and near perfection accuracy results have been achieved. However, the unconstrained counterpart, where these factors are not controlled, still under heavy research. Recently, newly developed algorithms in the field that are based on deep learning technology have made significant progress. In this paper, an overview of the newly developed unconstrained facial recognition systems is presented.",2015,,,,
458677de7910a5455283a2be99f776a834449f61,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Face Image Retrieval Using Facial Attributes By K-Means,"With the unpredictable growth of devices using camera, people can freely take photos to capture moments of life which are very precious, especially the ones accompanied with friends and family. The growing photo, large-scale content-based face image retrieval is an enabling technology for many emerging applications. We propose a novel method in this paper for searching consumer photos by make use of computer vision technologies in considering facial attributes and focusing on similarities between the faces of the target persons. In this proposed approach, we achieve immediate retrieval in a large-scale dataset by improving the face retrieval in the offline and online stages. The proposed fully automatic method not only allows the recognition of 22 AUs but also openly model their of time characteristics (i.e., sequences of of time segments: neutral, start, height, and balance).",2014,,,,https://pdfs.semanticscholar.org/4586/77de7910a5455283a2be99f776a834449f61.pdf
45fd600e5adca237c3c645b6691499234f19fd94,0,,,0,1,0,0,0,0,0,0,0,0,0,Example-Guided Image Synthesis across Arbitrary Scenes using Masked Spatial-Channel Attention and Self-Supervision,"Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image. In the task, the additional exemplar image provides the style guidance that controls the appearance of the synthesized output. Despite the controllability advantage, the existing models are designed on datasets with specific and roughly aligned objects. In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map. To this end, we first propose a Masked Spatial-Channel Attention (MSCA) module which models the correspondence between two arbitrary scenes via efficient decoupled attention. Next, we propose an end-to-end network for joint global and local feature alignment and synthesis. Finally, we propose a novel self-supervision task to enable training. Experiments on the large-scale and more diverse COCO-stuff dataset show significant improvements over the existing methods. Moreover, our approach provides interpretability and can be readily extended to other content manipulation tasks including style and spatial interpolation or extrapolation.",2020,ArXiv,2004.10024,,https://arxiv.org/pdf/2004.10024.pdf
4793f11fbca4a7dba898b9fff68f70d868e2497c,0,,,1,0,0,0,0,0,0,0,0,0,0,Kinship Verification through Transfer Learning,"Because of the inevitable impact factors such as pose, expression, lighting and aging on faces, identity verification through faces is still an unsolved problem. Research on biometrics raises an even challenging problem--is it possible to determine the kinship merely based on face images? A critical observation that faces of parents captured while they were young are more alike their children's compared with images captured when they are old has been revealed by genetics studies. This enlightens us the following research. First, a new kinship database named UB KinFace composed of child, young parent and old parent face images is collected from Internet. Second, an extended transfer subspace learning method is proposed aiming at mitigating the enormous divergence of distributions between children and old parents. The key idea is to utilize an intermediate distribution close to both the source and target distributions to bridge them and reduce the divergence. Naturally the young parent set is suitable for this task. Through this learning process, the large gap between distributions can be significantly reduced and kinship verification problem becomesmore discriminative. Experimental results show that our hypothesis on the role of young parents is valid and transfer learning is effective to enhance the verification accuracy.",2011,IJCAI,,10.5591/978-1-57735-516-8/IJCAI11-422,https://pdfs.semanticscholar.org/4793/f11fbca4a7dba898b9fff68f70d868e2497c.pdf
480e000ffe8c3f97f38ab7871e7996401d64d660,0,,,1,0,0,1,0,0,0,0,0,0,0,Analyse faciale avec dérivées Gaussiennes,"Dans cette these, nous explorons l'utilisation des derivees Gaussiennes multi-echelles comme representation initiale pour la detection, la reconnaissance et la classification des visages humains dans des images. Nous montrons qu'un algorithme rapide, $O(N)$, de construction d'une pyramide binomiale peut etre utilise pour extraire des derivees Gaussiennes avec une reponse impulsionnelle identique a un facteur d'echelle $sqrt{2}$>. Nous montrons ensuite qu'un vecteur compose de ces derivees a differentes echelles et a differents ordres en chaque pixel peut etre utilise comme base pour les algorithmes de detection, de classification et de reconnaissance lesquels atteignent ou depassent les performances de l'etat de l'art avec un cout de calcul reduit. De plus l'utilisation de coefficients entiers, avec une complexite de calcul et des exigences memoires en $O(N)$ font qu'une telle approche est appropriee pour des applications temps reel embarquees sur des systemes mobiles. Nous testons cette representation en utilisant trois problemes classiques d'analyse d'images faciales : detection de visages, reconnaissance de visages et estimation de l'âge. Pour la detection de visages, nous examinons les derivees Gaussiennes multi-echelles comme une alternative aux ondelettes de Haar pour une utilisation dans la construction d'une cascade de classifieurs lineaires appris avec l'algorithme Adaboost, popularise par Viola and Jones. Nous montrons que la representation pyramidale peut etre utilisee pour optimiser le processus de detection en adaptant la position des derivees dans la cascade. Dans ces experiences nous sommes capables de montrer que nous pouvons obtenir des niveaux de performances de detection similaires (mesures par des courbes ROC) avec une reduction importante du cout de calcul. Pour la reconnaissance de visages et l'estimation de l'âge, nous montrons que les derivees Gaussiennes multi-echelles peuvent etre utilisees pour calculer une representation tensorielle qui conserve l'information faciale la plus importante. Nous montrons que combinee a l'Analyse Multilineaire en Composantes Principales et a la methode Kernel Discriminative Common Vectors (KDCV), cette representation tensorielle peut mener a un algorithme qui est similaire aux techniques concurrentes pour la reconnaissance de visages avec un cout de calcul reduit. Pour l'estimation de l'âge a partir d'images faciales, nous montrons que notre representation tensorielle utilisant les derivees de Gaussiennes multi-echelles peut etre utilisee avec une machine a vecteur de pertinence pour fournir une estimation de l'âge avec des niveaux de performances similaires aux methodes de l'etat de l'art.",2011,,,,https://pdfs.semanticscholar.org/2447/efd253931924617bcf26fd95a483374dd657.pdf
4b6662a1b123f2253b8f9bf0511ca57bfa9db951,1,"[D18], [D32]",,1,0,0,0,0,0,0,0,0,0,0,Knot Magnify Loss for Face Recognition,"Deep Convolutional Neural Netowrks (DCNN) have significantly improved the performance of face recognition in recent years. Softmax loss is the most widely used loss function for training the DCNN-based face recognition system. It gives the same weights to easy and hard samples in one batch, which would leads to performance gap on the quality imbalanced data. In this paper, we discover that the rare hard samples in the training dataset has become a main obstacle for training a robust face recognition model. We propose to address this problem by a new supervisor signal that pays more attention to the rare hard samples and reduces the effects of the easy samples relatively. Our proposed novel Knot Magnify (KM) loss modulates the classical softmax loss to suppress the influence of easy samples and up-weight the loss of hard samples during training. Our results show that after training with KM loss, face recognition model is able to get competing accuracy on the well-known face recognition benchmark LFW dataset and the challenging CFP dataset.",2018,2018 25th IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2018.8451838,
4b9b30066a05bdeb0e05025402668499ebf99a6b,0,,,1,0,0,0,0,0,0,0,0,0,0,Real-time face detection using Gentle AdaBoost algorithm and nesting cascade structure,"In this paper, a face detector based on Gentle AdaBoost algorithm and nesting cascade structure is proposed. Nesting cascade structure is introduced to avoid that too many weak classifiers in a cascade classifier will slow down the face detection speed of this cascade classifier. Gentle AdaBoost algorithm is used to train node classifiers on a Haar-like feature set to improve the generalization ability of the node classifier. Consequently, the face detection performance of the face detector is improved. Experimental results have proved that the proposed algorithm can significantly reduce the number of weak classifiers, increase the detection speed, and slightly raise the detection accuracy as well. On the CIF (352×288) video sequences, the average detection speed of the proposed face detector can achieve 125fps, which is superior to the state-of-the-art face detectors and completely satisfies the demand of real-time face detection.",2012,2012 International Symposium on Intelligent Signal Processing and Communications Systems,,10.1109/ISPACS.2012.6473448,
4bd2148f7034ea33525a8842e7095884d6e7ff8c,0,,,0,1,0,0,0,0,0,0,0,0,0,Artificial Intelligence Applications and Innovations,"s of Invited Talks Learning from Electronic Health Records: From Temporal Abstractions to Time Series Interpretability Panagiotis Papapetrou Department of Computer and Systems Sciences, Stockholm University [email protected] Abstract. The first part of the talk will focus on data mining methods for learning from Electronic Health Records (EHRs), which are typically perceived as big and complex patient data sources. On them, scientists strive to perform predictions on patients’ progress, to understand and predict response to therapy, to detect adverse drug effects, and many other learning tasks. Medical researchers are also interested in learning from cohorts of population-based studies and of experiments. Learning tasks include the identification of disease predictors that can lead to new diagnostic tests and the acquisition of insights on interventions. The talk will elaborate on data sources, methods, and case studies in medical mining. The second part of the talk will tackle the issue of interpretability and explainability of opaque machine learning models, with focus on time series classification. Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. This talk will formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, the objective is to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. Moreover, it will be shown that the problem is NP-hard. Two instantiations of the problem will be presented. The classifier under investigation will be the random shapelet forest classifier. Moreover, two algorithmic solutions for the two problem instantiations will be presented along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. The first part of the talk will focus on data mining methods for learning from Electronic Health Records (EHRs), which are typically perceived as big and complex patient data sources. On them, scientists strive to perform predictions on patients’ progress, to understand and predict response to therapy, to detect adverse drug effects, and many other learning tasks. Medical researchers are also interested in learning from cohorts of population-based studies and of experiments. Learning tasks include the identification of disease predictors that can lead to new diagnostic tests and the acquisition of insights on interventions. The talk will elaborate on data sources, methods, and case studies in medical mining. The second part of the talk will tackle the issue of interpretability and explainability of opaque machine learning models, with focus on time series classification. Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. This talk will formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, the objective is to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. Moreover, it will be shown that the problem is NP-hard. Two instantiations of the problem will be presented. The classifier under investigation will be the random shapelet forest classifier. Moreover, two algorithmic solutions for the two problem instantiations will be presented along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. Empirical Approach: How to Get Fast, Interpretable Deep Learning",2019,IFIP Advances in Information and Communication Technology,,10.1007/978-3-030-19823-7,https://hal.inria.fr/hal-02331312/file/IFIPAICT0559DL_2019_BookFrontmatter.pdf
4cab77870a69d1ddc83e200e79c07f7f9768cf5f,0,,,0,1,0,0,0,0,0,0,0,0,0,Boosted GAN with Semantically Interpretable Information for Image Inpainting,"Image inpainting aims at restoring missing regions of corrupted images, which has many applications such as image restoration and object removal. However, current GAN-based inpainting models fail to explicitly consider the semantic consistency between restored images and original images. For example, given a male image with image region of one eye missing, current models may restore it with a female eye. This is due to the ambiguity of GAN-based inpainting models: these models can generate many possible restorations given a missing region. To address this limitation, our key insight is that semantically interpretable information (such as attribute and segmentation information) of input images (with missing regions) can provide essential guidance for the inpainting process. Based on this insight, we propose a boosted GAN with semantically interpretable information for image inpainting that consists of an inpainting network and a discriminative network. The inpainting network utilizes two auxiliary pretrained networks to discover the attribute and segmentation information of input images and incorporates them into the inpainting process to provide explicit semantic-level guidance. The discriminative network adopts a multi-level design that can enforce regularizations not only on overall realness but also on attribute and segmentation consistency with the original images. Experimental results show that our proposed model can preserve consistency on both attribute and segmentation level, and significantly outperforms the state-of-the-art models.",2019,2019 International Joint Conference on Neural Networks (IJCNN),1908.04503,10.1109/IJCNN.2019.8851926,https://arxiv.org/pdf/1908.04503.pdf
4d015bb5211eb901251a019e34342b487c9b1047,0,,,1,0,0,0,0,0,0,0,0,0,0,Algorithmes de correspondance et superpixels pour l'analyse et le traitement d'images. (Matching algorithms and superpixels for image analysis and processing),"Cette these s’interesse a diverses composantes du traitement et de l’analyse d’images par methodes non locales. Ces methodes sont basees sur la redondance d’information presente dans d’autres images, et utilisent des algorithmes de recherche de correspondance, generalement bases sur l’utilisation patchs, pour extraire et transferer de l’information depuis ces images d’exemples. Ces approches, largement utilisees par la communaute de vision par ordinateur, sont souvent limitees par le temps de calcul de l’algorithme de recherche, applique a chaque pixel, et par la necessite d’effectuer un pretraitement ou un apprentissage pour utiliser de grandes bases de donnees.Pour pallier ces limites, nous proposons plusieurs methodes generales, sans apprentissage,rapides, et qui peuvent etre facilement adaptees a diverses applications de traitement et d’analyse d’images naturelles ou medicales. Nous introduisons un algorithme de recherche de correspondances permettant d’extraire rapidement des patchs d’une grande bibliotheque d’images 3D, que nous appliquons a la segmentation d’images medicales. Pour utiliser de facon similaire aux patchs,des presegmentations en superpixels reduisant le nombre d’elements de l’image,nous presentons une nouvelle structure de voisinage de superpixels. Ce nouveau descripteur permet d’utiliser efficacement les superpixels dans des approches non locales. Nous proposons egalement une methode de decomposition reguliere et precise en superpixels. Nous montrons comment evaluer cette regularite de facon robuste, et que celle-ci est necessaire pour obtenir de bonnes performances de recherche de correspondances basees sur les superpixels.",2017,,,,https://pdfs.semanticscholar.org/d43f/9335dcfbc201df6dd6e476e7f0ca479016a1.pdf
4d9ac28a6b27d3383862fc6e15f4749a964eea2a,0,,,1,0,0,0,0,0,0,0,0,0,0,Alignment-Free Gender Recognition in the Wild,"Gender is possibly the most common facial attribute automatically estimated from images. Achieving robust gender classification “in the wild,” i.e. in images acquired in real settings, is still an open problem. Face pose variations are a major source of classification errors. They are solved using sophisticated face alignment algorithms that are costly computationally. They are also prone to getting stuck in local minima thus providing a poor pose invariance. In this paper we move the alignment problem to the learning stage. The result is an efficient pose-aware classifier with no on-line alignment. Our efficient procedure gets state of the art performance even with facial poses “in the wild.” In our experiments using “The Images of Groups” database we prove that by simultaneously predicting gender and pose we get an increase of about 5% in the performance of a linear state-of-the-art gender classifier.",2013,IbPRIA,,10.1007/978-3-642-38628-2_45,https://jmbuena.github.io/publications/ibpria2013.pdf
4df3b534beeba05ba930814162ea6e19948c5fcd,0,,,0,1,0,0,0,0,0,0,0,0,0,Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions,"Generative convolutional deep neural networks, e.g. popular GAN architectures, are relying on convolution based up-sampling methods to produce non-scalar outputs like images or video sequences. In this paper, we show that common up-sampling methods, i.e. known as up-convolution or transposed convolution, are causing the inability of such models to reproduce spectral distributions of natural training data correctly. This effect is independent of the underlying architecture and we show that it can be used to easily detect generated data like deepfakes with up to 100% accuracy on public benchmarks. To overcome this drawback of current generative models, we propose to add a novel spectral regularization term to the training optimization objective. We show that this approach not only allows to train spectral consistent GANs that are avoiding high frequency errors. Also, we show that a correct approximation of the frequency spectrum has positive effects on the training stability and output quality of generative networks.",2020,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2003.01826,10.1109/CVPR42600.2020.00791,https://arxiv.org/pdf/2003.01826.pdf
4e91defcc0b5ddf18fa70c34d91ce94a0be0f4d7,0,,,0,1,0,0,0,0,0,0,0,0,0,ODELS WITH A DVERSARIAL T RAINING,"We introduce causal implicit generative models (CiGMs): models that allow sampling from not only the 1 observational but also the 1 interventional distributions. We show that adversarial training can be used to learn a CiGM, if the generator architecture is structured based on a given causal graph. We consider the application of conditional and interventional sampling of face images with binary feature labels, such as mustache, young. We preserve the dependency structure between the labels with a given causal graph. We devise a two-stage procedure for learning a CiGM over the labels and the image. First we train a CiGM over the binary labels using a Wasserstein GAN where the generator neural network is consistent with the causal graph between the labels. Later, we combine this with a conditional GAN to generate images conditioned on the binary labels. We propose two new conditional GAN architectures: CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained CiGM for the labels is then a CiGM over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset.",2018,,,,https://pdfs.semanticscholar.org/4e91/defcc0b5ddf18fa70c34d91ce94a0be0f4d7.pdf
4f00c357b4757e324dc5b0d45cf0d345e995d10d,0,,,0,1,0,0,0,0,0,0,0,0,0,Multimodal Generative Models for Scalable Weakly-Supervised Learning,"Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous work have proposed generative models to handle multi-modal input. However, these models either do not learn a joint distribution or require complex additional computations to handle missing data. Here, we introduce a multimodal variational autoencoder that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities, thereby enabling weakly-supervised learning. We apply our method on four datasets and show that we match state-of-the-art performance using many fewer parameters. In each case our approach yields strong weakly-supervised results. We then consider a case study of learning image transformations---edge detection, colorization, facial landmark segmentation, etc.---as a set of modalities. We find appealing results across this range of tasks.",2018,NeurIPS,1802.05335,,https://arxiv.org/pdf/1802.05335.pdf
4f591e243a8f38ee3152300bbf42899ac5aae0a5,0,,,1,0,0,0,0,0,0,0,0,0,0,Understanding Higher-Order Shape via 3D Shape Attributes,"In this paper we investigate 3D shape attributes as a means to understand the shape of an object in a single image. To this end, we make a number of contributions: (i) we introduce and define a set of 3D shape attributes, including planarity, symmetry and occupied space; (ii) we show that such properties can be successfully inferred from a single image using a Convolutional Neural Network (CNN); (iii) we introduce a 143K image dataset of sculptures with 2197 works over 242 artists for training and evaluating the CNN; (iv) we show that the 3D attributes trained on this dataset generalize to images of other (non-sculpture) object classes; (v) we show that the CNN also provides a shape embedding that can be used to match previously unseen sculptures largely independent of viewpoint; and furthermore (vi) we analyze how the CNN predicts these attributes.",2016,ArXiv,1612.06836,,https://arxiv.org/pdf/1612.06836.pdf
5121f42de7cb9e41f93646e087df82b573b23311,0,,,1,0,0,0,0,0,0,0,0,0,0,Classifying Online Dating Profiles on Tinder using FaceNet Facial Embeddings,"A method to produce personalized classification models to automatically review online dating profiles on Tinder is proposed, based on the user's historical preference. The method takes advantage of a FaceNet facial classification model to extract features which may be related to facial attractiveness. The embeddings from a FaceNet model were used as the features to describe an individual's face. A user reviewed 8,545 online dating profiles. For each reviewed online dating profile, a feature set was constructed from the profile images which contained just one face. Two approaches are presented to go from the set of features for each face, to a set of profile features. A simple logistic regression trained on the embeddings from just 20 profiles could obtain a 65% validation accuracy. A point of diminishing marginal returns was identified to occur around 80 profiles, at which the model accuracy of 73% would only improve marginally after reviewing a significant number of additional profiles.",2018,ArXiv,1803.04347,,https://arxiv.org/pdf/1803.04347.pdf
5132f9fbf633a1030f6c0a2c484d9f4ca4072f83,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Preventing Personal Data Theft in Images with Adversarial ML,"Facial recognition tools are becoming exceptionally accurate in identifying people from images. However, this comes at the cost of privacy for users of online services with photo management (e.g. social media platforms). Particularly troubling is the ability to leverage unsupervised learning to recognize faces even when the user has not labeled their images. This is made simpler by modern facial recognition tools, such as FaceNet, that use encoders to generate low dimensional embeddings that can be clustered to learn previously unknown faces. In this paper, we propose a strategy to generate non-invasive noise masks to apply to facial images for a newly introduced user, yielding adversarial examples and preventing the formation of identifiable clusters in the embedding space. We demonstrate the effectiveness of our method by showing that various classification and clustering methods cannot reliably cluster the adversarial examples we generate.",2020,ArXiv,2010.10242,,https://arxiv.org/pdf/2010.10242.pdf
52763a39817316e7a41f98dfb20c26d59d818a00,0,,,0,1,0,0,0,0,0,0,0,0,0,Generative image deblurring based on multi-scaled residual adversary network driven by composed prior-posterior loss,"Abstract Conditional Generative Adversarial Networks (CGANs) have been introduced to generate realistic images from extremely degraded inputs. However, these generative models without prior knowledge of spatial distributions has limited performance to deal with various complex scenes. In this paper, we proposed a image deblurring network based on CGANs to generate ideal images without any blurring assumption. To overcome adversarial insufficiency, an extended classifier with different attribute domains is formulated to replace the original discriminator of CGANs. Inspired by residual learning, a set of skip-connections are cohered to transfer multi-scaled spatial features to the following high-level operations. Furthermore, this adversary architecture is driven by a composite loss that integrates histogram of gradients (HoG) and geodesic distance. In experiments, an uniformed adversarial iteration is circularly applied to improve image degenerations. Extensive results show that the proposed deblurring approach significantly outperforms state-of-the-art methods on both qualitative and quantitative evaluations.",2019,J. Vis. Commun. Image Represent.,,10.1016/j.jvcir.2019.102648,
53a87ec52acbc645189e379af5559169e3614ade,0,,,0,1,0,0,0,0,0,0,0,0,0,Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder,,2020,ArXiv,2003.02977,,https://arxiv.org/pdf/2003.02977.pdf
557b7222dac6d17397fc3402fec36e499d1a8270,0,,,0,1,0,0,0,0,0,0,0,0,0,Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination,"Most of the current face hallucination methods, whether they are shallow learning-based or deep learning-based, all try to learn a relationship model between Low-Resolution (LR) and High-Resolution (HR) spaces with the help of a training set. They mainly focus on modeling image prior through either model-based optimization or discriminative inference learning. However, when the input LR face is tiny, the learned prior knowledge is no longer effective and their performance will drop sharply. To solve this problem, in this paper we propose a general face hallucination method that can integrate model-based optimization and discriminative inference. In particular, to exploit the model based prior, the Deep Convolutional Neural Networks (CNN) denoiser prior is plugged into the super-resolution optimization model with the aid of image-adaptive Laplacian regularization. Additionally, we further develop a high-frequency details compensation method by dividing the face image to facial components and performing face hallucination in a multi-layer neighbor embedding manner. Experiments demonstrate that the proposed method can achieve promising super-resolution results for tiny input LR faces.",2018,IJCAI,1806.10726,10.24963/ijcai.2018/107,https://arxiv.org/pdf/1806.10726.pdf
55e1dc88736337f1cc8be91a3ed53aaa617b7711,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Recognition of Faces using Efficient Multiscale Local Binary Pattern and Kernel Discriminant Analysis in Varying Environment,"Face recognition involves matching face images with different environmental conditions. Matching face images with different environmental conditions is not a easy task. Also matching face images considering variations such as changing illumination, pose, facial expression and that with uncontrolled conditions becomes more difficult. This paper focuses on accurately recognizing face images considering all the above variations. The proposed system is based on collecting features from face images using Multiscale Local Binary pattern (MLBP) with eight orientations out of 59 crucial ones and then finding similarity using a kernel linear discriminant analysis. Literature suggested that MLBP can give up to 256 orientations for a single radius considered around a pixel and its neighborhood. The paper uses only 8 orientations for a single radius and four such radii (1, 3, 5 and 7) are considered around a single pixel with (8x4) 32 histogram features thus reducing the computational complexity. Various face image databases are considered in this paper namely, Labeled Faces in Wild (LFW), Japanese Female Facial Expression (JAFFE), AR and Asian. Results showed that the proposed system correctly identified 9 out of 10 subjects. The proposed system involves preprocessing including alignment and noise reduction using a Gaussian filter, feature extraction using MLBP based histograms and matching based on kernel linear discriminant analysis.",2017,,,10.3844/AJEASSP.2017.726.732,
562b21e58805061f898c421848ec837a70d3017d,0,,,1,1,0,0,0,0,0,0,0,0,1,A Survey of Deep Facial Attribute Analysis,"Facial attribute analysis has received considerable attention when deep learning techniques made remarkable breakthroughs in this field over the past few years. Deep learning based facial attribute analysis consists of two basic sub-issues: facial attribute estimation (FAE), which recognizes whether facial attributes are present in given images, and facial attribute manipulation (FAM), which synthesizes or removes desired facial attributes. In this paper, we provide a comprehensive survey of deep facial attribute analysis from the perspectives of both estimation and manipulation. First, we summarize a general pipeline that deep facial attribute analysis follows, which comprises two stages: data preprocessing and model construction. Additionally, we introduce the underlying theories of this two-stage pipeline for both FAE and FAM. Second, the datasets and performance metrics commonly used in facial attribute analysis are presented. Third, we create a taxonomy of state-of-the-art methods and review deep FAE and FAM algorithms in detail. Furthermore, several additional facial attribute related issues are introduced, as well as relevant real-world applications. Finally, we discuss possible challenges and promising future research directions.",2020,International Journal of Computer Vision,1812.10265,10.1007/s11263-020-01308-z,https://arxiv.org/pdf/1812.10265.pdf
564f084106ac6d662bf7ca5c5343a3c5997ad456,0,,,0,1,0,0,0,0,0,0,0,0,0,Non-parametric estimation of Jensen-Shannon Divergence in Generative Adversarial Network training,"Generative Adversarial Networks (GANs) have become a widely popular framework for generative modelling of high-dimensional datasets. However their training is well-known to be difficult. This work presents a rigorous statistical analysis of GANs providing straight-forward explanations for common training pathologies such as vanishing gradients. Furthermore, it proposes a new training objective, Kernel GANs, and demonstrates its practical effectiveness on large-scale real-world data sets. A key element in the analysis is the distinction between training with respect to the (unknown) data distribution, and its empirical counterpart. To overcome issues in GAN training, we pursue the idea of smoothing the Jensen-Shannon Divergence (JSD) by incorporating noise in the input distributions of the discriminator. As we show, this effectively leads to an empirical version of the JSD in which the 1 and the generator densities are replaced by kernel density estimates, which leads to Kernel GANs.",2018,AISTATS,1705.09199,,https://arxiv.org/pdf/1705.09199.pdf
56e25358ebfaf8a8b3c7c33ed007e24f026065d0,1,,1,1,0,0,0,0,0,0,0,0,0,0,V-shaped interval insensitive loss for ordinal classification,We address a problem of learning ordinal classifiers from partially annotated examples. We introduce a V-shaped interval-insensitive loss function to measure discrepancy between predictions of an ordinal classifier and a partial annotation provided in the form of intervals of candidate labels. We show that under reasonable assumptions on the annotation process the Bayes risk of the ordinal classifier can be bounded by the expectation of an associated interval-insensitive loss. We propose several convex surrogates of the interval-insensitive loss which are used to formulate convex learning problems. We described a variant of the cutting plane method which can solve large instances of the learning problems. Experiments on a real-life application of human age estimation show that the ordinal classifier learned from cheap partially annotated examples can achieve accuracy matching the results of the so-far used supervised methods which require expensive precisely annotated examples.,2015,Machine Learning,,10.1007/s10994-015-5541-9,https://link.springer.com/content/pdf/10.1007/s10994-015-5541-9.pdf
56e79f0699f558dc0bb6ac8c3f60a8b587f81acd,0,,,1,0,0,0,0,0,0,0,0,0,0,Local binary patterns preprocessing for face identification/verification using the VanderLugt correlator,"The face recognition tasks can be divided into two categories: verification (i.e. compare two images in order to know if they represent the same person) and identification (i.e. find the identity of a person into the database). Several powerful face recognition methods exist, in literature, for controlled environments: constrained illumination, frontal pose, neutral expression... However, there are few reliable methods for the uncontrolled case. Optical correlation has shown its interest through relevant architectures for controlled and uncontrolled environments. Based on this architecture, we propose a novel method for verification and identification tasks under illumination variation conditions. More specifically, we optimize the performances of a correlation method against illumination changes by using and adapting the Local Binary Patterns (LBP) description. This later is widely used in the literature to describe the texture of an image using 8 bits words. For both, target image and reference image, we begin by using a specific-Gaussian function as first step of LBP-VLC correlator. This function filters the considered image with a band-pass filter in order to extract the edges. Then we applied the adapted LBP-VLC method. To validate our new approach, we used a simple POF filter (others correlation filters can be used). The simulations are done using the YaleB and YaleB Extended databases that contain respectively 10 and 38 identities with 64 illuminations. The results obtained reach more than 94% and 92% for the verification and 93% and 90% for the identification case. These results show the good performances of our approach of LBP-correlation methods against illumination changes.",2014,Defense + Security Symposium,,10.1117/12.2051267,
57165bc624217f07bf6ecfb0481b13b88ca9ec74,0,,,0,0,0,0,0,1,0,0,0,0,0,Soft-Biometric Attributes from Selfie Images,"The aim of this chapter is to discuss the soft-biometric attributes that can be extracted from selfie images acquired from mobile devices. Existing literature suggests that various features in demographics, such as gender and age, in physical, such as periocular and eyebrow, and in material, such as eyeglasses and clothing, have been extracted from selfie images for continuous user authentication and performance enhancement of primary biometric traits. Due to the limited hardware resources, low resolution of front-facing cameras, and the usage of the device in different environmental conditions, factors such as robustness to low-quality data, consent-free acquisition, lower computational complexity, and privacy, favor soft-biometric prediction in mobile devices.",2019,Selfie Biometrics,,10.1007/978-3-030-26972-2_10,
5789f8420d8f15e7772580ec373112f864627c4b,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Efficient Global Illumination for Morphable Models,"We propose an efficient self-shadowing illumination model for Morphable Models. Simulating self-shadowing with ray casting is computationally expensive which makes them impractical in Analysis-by-Synthesis methods for object reconstruction from single images. Therefore, we propose to learn self-shadowing for Morphable Model parameters directly with a linear model. Radiance transfer functions are a powerful way to represent self-shadowing used within the precomputed radiance transfer framework (PRT). We build on PRT to render deforming objects with self-shadowing at interactive frame rates. It can be illuminated efficiently by environment maps represented with spherical harmonics. The result is an efficient global illumination method for Morphable Models, exploiting an approximated radiance transfer. We apply the method to fitting Morphable Model parameters to a single image of a face and demonstrate that considering self-shadowing improves shape reconstruction.",2017,2017 IEEE International Conference on Computer Vision (ICCV),,10.1109/ICCV.2017.417,https://edoc.unibas.ch/59209/1/20180118163113_5a60bdc1ba795.pdf
58fbd1d9e3e804b9faca470bddbced2df9612765,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Discriminative low-rank projection for robust subspace learning,"The robustness to outliers, noises, and corruptions has been paid more attention recently to increase the performance in linear feature extraction and image classification. As one of the most effective subspace learning methods, low-rank representation (LRR) can improve the robustness of an algorithm by exploring the global representative structure information among the samples. However, the traditional LRR cannot project the training samples into low-dimensional subspace with supervised information. Thus, in this paper, we integrate the properties of LRR with supervised dimensionality reduction techniques to obtain optimal low-rank subspace and discriminative projection at the same time. To achieve this goal, we proposed a novel model named Discriminative Low-Rank Projection (DLRP). Furthermore, DLRP can break the limitation of the small class problem which means the number of projections is bound by the number of classes. Our model can be solved by alternatively linearized alternating direction method with adaptive penalty and the singular value decomposition. Besides, the analyses of differences between DLRP and previous related models are shown. Extensive experiments conducted on various contaminated databases have confirmed the superiority of the proposed method.",2020,Int. J. Mach. Learn. Cybern.,,10.1007/s13042-020-01113-7,
5922e26c9eaaee92d1d70eae36275bb226ecdb2e,1,"[D18], [D28], [D27]",,1,0,0,0,0,0,0,0,0,0,0,Boosting Classification Based Similarity Learning by using Standard Distances,"Metric learning has been shown to outperform standard classification based similarity learning in a number of different contexts. In this paper, we show that the performance of classification similarity learning strongly depends on the sample format used to learn the model. We also propose an enriched classification based set-up that uses a set of standard distances to supplement the information provided by the feature vectors of the training samples. The method is compared to state-of-the-art metric learning methods, using a linear SVM for classification. Results obtained show comparable performances, slightly in favour of the method",2015,CCIA,,10.3233/978-1-61499-578-4-153,https://pdfs.semanticscholar.org/5922/e26c9eaaee92d1d70eae36275bb226ecdb2e.pdf
598cdefe07655c88a5fa7b9d53b7b8c65735116a,0,,,0,1,0,0,0,0,0,0,0,0,0,Fine-grained Synthesis of Unrestricted Adversarial Examples,"We propose a novel approach for generating unrestricted adversarial examples by manipulating fine-grained aspects of image generation. Unlike existing unrestricted attacks that typically hand-craft geometric transformations, we learn stylistic and stochastic modifications leveraging state-of-the-art generative models. This allows us to manipulate an image in a controlled, fine-grained manner without being bounded by a norm threshold. Our approach can be used for targeted and non-targeted unrestricted attacks on classification, semantic segmentation and object detection models. Our attacks can bypass certified defenses, yet our adversarial images look indistinguishable from natural images as verified by human evaluation. Moreover, we demonstrate that adversarial training with our examples improves performance of the model on clean images without requiring any modifications to the architecture. We perform experiments on LSUN, CelebA-HQ and COCO-Stuff as high resolution datasets to validate efficacy of our proposed approach.",2019,ArXiv,1911.09058,,https://arxiv.org/pdf/1911.09058.pdf
59b6ff409ae6f57525faff4b369af85c37a8dd80,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,Deep Attribute Driven Image Similarity Learning Using Limited Data,"In this work, we propose to derive the attribute specific similarity score for a pair of images using an existing parent deep model. As an example, given two facial images, we derive a similarity score for attributes like gender and complexion using an existing face recognition model. It is not always feasible to train a new model for each attribute, as training of deep neural network based model requires a large number of labelled samples to reliably learn the parameters. Hence, in the proposed framework a similarity score for each attribute is obtained as a weighted combination of all the hidden layer features of the parent model. The weights are attribute specific, and are estimated by minimizing the proposed triplet based hinge loss criteria over small number of labelled samples. Although generic, the proposed approach is developed in the context of a specific application to search for social media profiles of suspects of law enforcement agencies. To measure the effectiveness of our proposed approach, we have also created a social media dataset ""LFW Social (LFW-S)"", corresponding to the Labeled Faces in the Wild (LFW) dataset. The key motivation behind our approach is not to improve upon the existing baseline methods but to reduce the overhead of generating a labeled dataset for learning new attribute. However, it is worth noting that the learnt attribute driven models performs at par with the existing baseline models on attribute driven ranking task.",2017,2017 IEEE International Symposium on Multimedia (ISM),,10.1109/ISM.2017.28,
59c47e49d8211953b1acd68984650b807ce69a71,0,,,1,0,0,0,0,0,0,0,0,0,0,Racial Faces in the Wild: Reducing Racial Bias by Information Maximization Adaptation Network,"Racial bias is an important issue in biometric, but has not been thoroughly studied in deep face recognition. In this paper, we first contribute a dedicated dataset called Racial Faces in-the-Wild (RFW) database, on which we firmly validated the racial bias of four commercial APIs and four state-of-the-art (SOTA) algorithms. Then, we further present the solution using deep unsupervised domain adaptation and propose a deep information maximization adaptation network (IMAN) to alleviate this bias by using Caucasian as source domain and other races as target domains. This unsupervised method simultaneously aligns global distribution to decrease race gap at domain-level, and learns the discriminative target representations at cluster level. A novel mutual information loss is proposed to further enhance the discriminative ability of network output without label information. Extensive experiments on RFW, GBU, and IJB-A databases show that IMAN successfully learns features that generalize well across different races and across different databases.",2019,2019 IEEE/CVF International Conference on Computer Vision (ICCV),,10.1109/ICCV.2019.00078,
59fb707bb8ac6e40fc40694162060788af8b4651,0,,,1,0,0,0,0,0,0,0,0,0,0,ANALYZING THE GEO-DEPENDENCE OF HUMAN FACE APPEARANCE AND ITS APPLICATIONS,"OF DISSERTATION ANALYZING THE GEO-DEPENDENCE OF HUMAN FACE APPEARANCE AND ITS APPLICATIONS Human faces have been a subject of study in computer science for decades. The rich of set features from human faces have been used in solving various problems in computer vision, including person identification, facial expression analysis, and attribute classification. In this work, I explore the human facial features that depend on the geo-location using a datadriven approach. I analyze millions of public domain images to extract the geo-dependent human facial features and explore their applications. Using various machine learning and statistical techniques, I show that the geo-dependent features of human faces can be used to solve the image geo-localization task of given an image, predict where it was taken. Deep Convolutional Neural Networks (CNN) have been recently shown to excel at the image classification task; I have used CNNs to geo-localize images using the human face as a cue. I also show that the facial features used in image localization can be used to solve other problems, such as ethnicity, gender, and age estimation.",2016,,,10.13023/ETD.2016.323,
5a7e62fdea39a4372e25cbbadc01d9b2204af95a,0,,,0,1,0,0,0,0,0,0,0,0,0,Direct Shape Regression Networks for End-to-End Face Alignment,"Face alignment has been extensively studied in computer vision community due to its fundamental role in facial analysis, but it remains an unsolved problem. The major challenges lie in the highly nonlinear relationship between face images and associated facial shapes, which is coupled by underlying correlation of landmarks. Existing methods mainly rely on cascaded regression, suffering from intrinsic shortcomings, e.g., strong dependency on initialization and failure to exploit landmark correlations. In this paper, we propose the direct shape regression network (DSRN) for end-to-end face alignment by jointly handling the aforementioned challenges in a unified framework. Specifically, by deploying doubly convolutional layer and by using the Fourier feature pooling layer proposed in this paper, DSRN efficiently constructs strong representations to disentangle highly nonlinear relationships between images and shapes; by incorporating a linear layer of low-rank learning, DSRN effectively encodes correlations of landmarks to improve performance. DSRN leverages the strengths of kernels for nonlinear feature extraction and neural networks for structured prediction, and provides the first end-to-end learning architecture for direct face alignment. Its effectiveness and generality are validated by extensive experiments on five benchmark datasets, including AFLW, 300W, CelebA, MAFL, and 300VW. All empirical results demonstrate that DSRN consistently produces high performance and in most cases surpasses state-of-the-art.",2018,2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,,10.1109/CVPR.2018.00529,http://openaccess.thecvf.com/content_cvpr_2018/papers/Miao_Direct_Shape_Regression_CVPR_2018_paper.pdf
5b0ac3e1d1a493a0664fadfaa054085d55701891,0,,,1,0,0,0,0,0,0,0,0,0,0,Automatic Eye Localization for Hospitalized Infants and Children Using Convolutional Neural Networks,"Abstract Background Reliable localization and tracking of the eye region in the pediatric hospital environment is a significant challenge for clinical decision support and patient monitoring applications. Existing work in eye localization achieves high performance on adult datasets but performs poorly in the busy pediatric hospital environment, where face appearance varies because of age, position and the presence of medical equipment. Methods We developed two new datasets: a training dataset using public image data from internet searches, and a test dataset using 59 recordings of patients in a pediatric intensive care unit. We trained two eye localization models, using the Faster R-CNN algorithm to fine-tune a pre-trained ResNet base network, and evaluated them using the images from the pediatric ICU. Results The convolutional neural network trained with a combination of adult and child data achieved an 79.7% eye localization rate, significantly higher than the model trained on adult data alone. With additional pre-processing to equalize image contrast, the localization rate rises to 84%. Conclusion The results demonstrate the potential of convolutional neural networks for eye localization and tracking in a pediatric ICU setting, even when training data is limited. We obtained significant performance gains by adding task-specific images to the training dataset, highlighting the need for custom models and datasets for specialized applications like pediatric patient monitoring. The moderate size of our added training dataset shows that it is feasible to develop an internal training dataset for clinical computer vision applications, and apply it with transfer learning to fine-tune existing pre-trained models.",2020,,,10.1016/j.ijmedinf.2020.104344,
5bb74bbb2fffe30f0dd28d32ba70ea24b4978a2a,0,,,1,0,0,0,0,0,0,0,0,0,0,LGLG-WPCA: An Effective Texture-based Method for Face Recognition,"In this paper, we proposed an effective face feature extraction method by Learning Gabor Log-Euclidean Gaussian with Whitening Principal Component Analysis (WPCA), called LGLG-WPCA. The proposed method learns face features from the embedded multivariate Gaussian in Gabor wavelet domain; it has the robust performance to adverse conditions such as varying poses, skin aging and uneven illumination. Because the space of Gaussian is a Riemannian manifold and it is difficult to incorporate learning mechanism in the model. To address this issue, we use L2EMG to map the multidimensional Gaussian model to the linear space, and then use WPCA to learn face features. We also implemented the key-point-based version of LGLG-WPCA, called LGLG(KP)-WPCA. Experiments show the proposed methods are effective and promising for face texture feature extraction and the combination of the feature of the proposed methods and the features of Deep Convolutional Network (DCNN) achieved the best recognition accuracies on FERET database compared to the state-of-the-art methods. In the next version of this paper, we will test the performance of the proposed methods on the large-varying pose databases.",2018,ArXiv,1811.08345,,https://arxiv.org/pdf/1811.08345.pdf
5cb9ddd676e25516f9b273c7714f02b1542cee7e,0,,,0,1,0,0,0,0,0,0,0,0,0,Dense 3D Face Decoding Over 2500FPS: Joint Texture & Shape Convolutional Mesh Decoders,"3D Morphable Models (3DMMs) are statistical models that represent facial texture and shape variations using a set of linear bases and more particular Principal Component Analysis (PCA). 3DMMs were used as statistical priors for reconstructing 3D faces from images by solving non-linear least square optimization problems. Recently, 3DMMs were used as generative models for training non-linear mappings (\ie, regressors) from image to the parameters of the models via Deep Convolutional Neural Networks (DCNNs). Nevertheless, all of the above methods use either fully connected layers or 2D convolutions on parametric unwrapped UV spaces leading to large networks with many parameters. In this paper, we present the first, to the best of our knowledge, non-linear 3DMMs by learning joint texture and shape auto-encoders using direct mesh convolutions. We demonstrate how these auto-encoders can be used to train very light-weight models that perform Coloured Mesh Decoding (CMD) in-the-wild at a speed of over 2500 FPS.",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),1904.03525,10.1109/CVPR.2019.00119,https://eprints.mdx.ac.uk/26524/1/Kotsia_dense3d.pdf
5d4821e83b33998266df11b125f8adf3b2b6fed1,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,Subject Property Inference Attack in Collaborative Learning,"Binding decentralized data and computing resources together, collaborative learning is currently a booming research area for the application requirements of efficiency and privacy. However, a lot of work has shown that this may still expose record-level or statistical information of private local data. This paper aims to implement subject-level privacy inference during the training phase. The subject is the data source the training data comes from, such as a person or certain environment. Based on auxiliary local data, gradient and intermediate output, we present passive and active property inference attack on the lack of training data of target subject. In the active attack, we choose a very innovative approach - CycleGAN that reconstructs impacts of data with certain properties on the global model. We test our algorithms on 2 public image datasets, and give a comprehensive analysis of the privacy leakage in subject property inference attack.",2020,2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC),,10.1109/IHMSC49165.2020.00057,
5dd0e7d11f990ba1e808e92455b67751106c9cb9,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency,,2020,ICMR,2003.10421,10.1145/3372278.3390670,https://arxiv.org/pdf/2003.10421.pdf
5e21c4012e42d3c782f4815f15de8d1cea080baf,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,A Deep Deformable Convolutional Method for Age-Invariant Face Recognition,"With the rapid development of deep learning, face recognition also finds its improving dramatically. However, facial change is still a main effect to the accuracy of recognition, as some complex factors like age-invariant, health state and emotion, are hard to model. Unlike some previous methods decomposing facial features into age-related and identity-related parts, we propose an innovative end-to-end method that introduces a deformable convolution into a deep learning discriminant model and automatically learns how the facial characteristics changes over time, and test its effectiveness on multiple data sets.",2019,CSPS,,10.1007/978-981-13-9409-6_245,
5e2b918f2dee17cb79d692e10aa2103ca9129e2c,0,,,1,0,0,0,0,0,0,0,0,0,0,Rotating your face using multi-task deep neural network,"Face recognition under viewpoint and illumination changes is a difficult problem, so many researchers have tried to solve this problem by producing the pose- and illumination- invariant feature. Zhu et al. [26] changed all arbitrary pose and illumination images to the frontal view image to use for the invariant feature. In this scheme, preserving identity while rotating pose image is a crucial issue. This paper proposes a new deep architecture based on a novel type of multitask learning, which can achieve superior performance in rotating to a target-pose face image from an arbitrary pose and illumination image while preserving identity. The target pose can be controlled by the user's intention. This novel type of multi-task model significantly improves identity preservation over the single task model. By using all the synthesized controlled pose images, called Controlled Pose Image (CPI), for the pose-illumination-invariant feature and voting among the multiple face recognition results, we clearly outperform the state-of-the-art algorithms by more than 4~6% on the MultiPIE dataset.",2015,2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),,10.1109/CVPR.2015.7298667,http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_074.pdf
5f7b5b42f6e953e3ac8421dd51aa270b7ba6be92,0,,,1,0,0,0,0,0,0,0,0,0,0,Partial least squares for face hashing,"Abstract Face identification is an important research topic due to areas such as its application in surveillance, forensics and human–computer interaction. In the past few years, a myriad of methods for face identification has been proposed in the literature, with just a few among them focusing on scalability. In this work, we propose a simple but efficient approach for scalable face identification based on partial least squares (PLS) and random independent hash functions inspired by locality-sensitive hashing (LSH), resulting in the PLS for hashing (PLSH) approach. The original PLSH approach is further extended using feature selection to reduce the computational cost to evaluate the PLS-based hash functions, resulting in the state-of-the-art extended PLSH approach (ePLSH). The proposed approach is evaluated in the dataset FERET and in the dataset FRGCv1. The results show significant reduction in the number of subjects evaluated in the face identification (reduced to 0.3% of the gallery), providing averaged speedups up to 233 times compared to evaluating all subjects in the face gallery and 58 times compared to previous works in the literature.",2016,Neurocomputing,,10.1016/j.neucom.2016.02.083,https://repositorio.ufmg.br/bitstream/1843/ESBF-A3FFZQ/1/cassioelias.pdf
6075c07ecb29d551ffa474c3eca45f2da5fd5007,1,,1,1,1,0,0,0,0,0,0,0,0,0,Shallow convolutional neural network for eyeglasses detection in facial images,"Automatic eyeglasses detection plays a major role in many facial analysis systems. To improve the robustness of these systems and cope with real-world applications, a high-speed eyeglasses detector that can achieve high accuracy is needed. Recent studies indicate that the features extracted from convolutional neural networks are compelling. Therefore, this paper presents an effective and efficient method for eyeglasses detection in facial images based on extracting deep features from a well-designed shallow convolutional neural network (CNN). The main contribution of this paper is to address the two essential aspects of CNN: (1) the size of the training dataset required and (2) the depth of the network architecture. To this end, we initialize the learning parameters of the shallow CNN by the parameters of a deep CNN which is fine-tuned on a small dataset. The depth of the neural network is then decreased by removing some convolutional layers after testing its performance on the validation dataset. As a result, a significantly more accurate shallow CNN architecture, Shallow-GlassNet, is obtained, which achieves not only high accuracy but also high speed in eyeglasses detection. Evaluation experiments have been conducted on two large unconstrained facial image databases, LFW and Celeb Faces. The results have demonstrated the superior performance of the proposed framework which achieves a mean accuracy of 99.73%.",2017,2017 9th Computer Science and Electronic Engineering (CEEC),,10.1109/CEEC.2017.8101617,
60a05d1fa215146adc82ee1054534a925a3fb9c9,1,,1,0,1,1,0,0,0,0,0,0,0,0,Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data,"A fancy learning algorithm $A$ outperforms a baseline method $B$ when they are both trained on the same data. Should $A$ get all of the credit for the improved performance or does the training data also deserve some credit? When deployed in a new setting from a different domain, however, $A$ makes more mistakes than $B$. How much of the blame should go to the learning algorithm or the training data? Such questions are becoming increasingly important and prevalent as we aim to make ML more accountable. Their answers would also help us allocate resources between algorithm design and data collection. In this paper, we formalize these questions and provide a principled Extended Shapley framework to jointly quantify the contribution of the learning algorithm and training data. Extended Shapley uniquely satisfies several natural properties that ensure equitable treatment of data and algorithm. Through experiments and theoretical analysis, we demonstrate that Extended Shapley has several important applications: 1) it provides a new metric of ML performance improvement that disentangles the influence of the data regime and the algorithm; 2) it facilitates ML accountability by properly assigning responsibility for mistakes; 3) it provides more robustness to manipulation by the ML designer.",2019,ArXiv,1910.04214,,https://arxiv.org/pdf/1910.04214.pdf
612e2788d58fba4f6c0566a894934584f91d3812,0,,,0,1,0,0,0,0,0,0,0,0,0,Domain Adaptation in Multi-Channel Autoencoder based Features for Robust Face Anti-Spoofing,"While the performance of face recognition systems has improved significantly in the last decade, they are proved to be highly vulnerable to presentation attacks (spoofing). Most of the research in the field of face presentation attack detection (PAD), was focused on boosting the performance of the systems within a single database. Face PAD datasets are usually captured with RGB cameras, and have very limited number of both bona-fide samples and presentation attack instruments. Training face PAD systems on such data leads to poor performance, even in the closed-set scenario, especially when sophisticated attacks are involved. We explore two paths to boost the performance of the face PAD system against challenging attacks. First, by using multichannel (RGB, Depth and NIR) data, which is still easily accessible in a number of mass production devices. Second, we develop a novel Autoencoders + MLP based face PAD algorithm. Moreover, instead of collecting more data for training of the proposed deep architecture, the domain adaptation technique is proposed, transferring the knowledge of facial appearance from RGB to multi-channel domain. We also demonstrate, that learning the features of individual facial regions, is more discriminative than the features learned from an entire face. The proposed system is tested on a very recent publicly available multi-channel PAD database with a wide variety of presentation attacks.",2019,2019 International Conference on Biometrics (ICB),1907.04048,10.1109/ICB45273.2019.8987247,https://arxiv.org/pdf/1907.04048.pdf
631776f6ef644abf55ff1b3e1eed5e22f0e4df3d,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,Shaping Deep Feature Space towards Gaussian Mixture for Visual Classification,"The softmax cross-entropy loss function has been widely used to train deep models for various tasks. In this work, we propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification. Unlike the softmax cross-entropy loss, our method explicitly shapes the deep feature space towards a Gaussian Mixture distribution. With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution. The GM loss can be readily used to distinguish abnormal inputs, such as the adversarial examples, based on the discrepancy between feature distributions of the inputs and the training set. Furthermore, theoretical analysis shows that a symmetric feature space can be achieved by using the GM loss, which enables the models to perform robustly against adversarial attacks. The proposed model can be implemented easily and efficiently without using extra trainable parameters. Extensive evaluations demonstrate that the proposed method performs favorably not only on image classification but also on robust detection of adversarial examples generated by strong attacks under different threat models.",2020,ArXiv,2011.09066,,https://arxiv.org/pdf/2011.09066.pdf
63cd4e4a39694fb6ee8189311cac89874262c463,1,,1,0,1,0,0,0,0,0,0,0,0,1,LGCN: Learnable Gabor Convolution Network for Human Gender Recognition in the Wild,"Human gender recognition in the wild is a challenging task due to complex face variations, such as poses, lighting, occlusions, etc. In this letter, learnable Gabor convolutional network (LGCN), a new neural network computing framework for gender recognition was proposed. In LGCN, a learnable Gabor filter (LGF) is introduced and combined with the convolutional neural network (CNN). Specifically, the proposed framework is constructed by replacing some first layer convolutional kernels of a standard CNN with LGFs. Here, LGFs learn intrinsic parameters by using standard back propagation method, so that the values of those parameters are no longer fixed by experience as traditional methods, but can be modified by self-learning automatically. In addition, the performance of LGCN in gender recognition is further improved by applying a proposed feature combination strategy. The experimental results demonstrate that, compared to the standard CNNs with identical network architecture, our approach achieves better performance on three challenging public datasets without introducing any sacrifice in parameter size. key words: gender recognition, learnable Gabor convolutional neural network, learnable Gabor filter, back propagation",2019,IEICE Trans. Inf. Syst.,,10.1587/transinf.2018edl8239,https://pdfs.semanticscholar.org/9af3/0b29aaaafdc2475f7383b638ae5b42b3fc02.pdf
63cd6407a47a3ddb80855569c5a699c75189215f,0,,,1,0,0,0,0,0,0,0,0,0,0,Transfiguring portraits,"People may look dramatically different by changing their hair color, hair style, when they grow older, in a different era style, or a different country or occupation. Some of those may transfigure appearance and inspire creative changes, some not, but how would we know without physically trying? We present a system that enables automatic synthesis of limitless numbers of appearances. A user inputs one or more photos (as many as they like) of his or her face, text queries an appearance of interest (just like they'd search an image search engine) and gets as output the input person in the queried appearance. Rather than fixing the number of queries or a dataset our system utilizes all the relevant and searchable images on the Internet, estimates a doppelgänger set for the inputs, and utilizes it to generate composites. We present a large number of examples on photos taken with completely unconstrained imaging conditions.",2016,TOGS,,10.1145/2897824.2925871,http://grail.cs.washington.edu/wp-content/uploads/2016/09/kemelmacher2016tp.pdf
641ff4551af4661ff96b8452c0e416c5e6f67113,0,,,1,0,0,0,0,0,0,0,0,0,0,Integrity-Preserving Image Aesthetic Assessment,"Image aesthetic assessment is a challenging problem in the field of computer vision. Recently, the input size of images is often limited by the network of aesthetic problems. The methods of cropping, wrapping and padding unify images to the same size, which will destroy the aesthetic quality of the images and affect their aesthetic rating labels. In this paper, we present an end-to-end deep Multi-Task Spatial Pyramid Pooling Fully Convolutional Neural NasNet (MTP-NasNet) method for image aesthetic assessment that can directly manipulate the original size of the image without destroying its beauty. Our method is developed based on Fully Convolutional Network (FCN) and Spatial Pyramid Pooling (SPP). In addition, existing studies regards aesthetic assessment as a two-category task, a distribution predicting task or a style predicting task, but ignore the correlation between these tasks. To address this issue, we adopt the multi-task learning method that fuses two-category task, style task and score distribution task. Moreover, this paper also explores the reference of information such as variance in the score distribution for image reliability. Our experiment results show that our approach has significant performance on the large-scale aesthetic assessment datasets (AVA [1]), and demonstrate the importance of multi-task learning and size preserving. Our study provides a powerful tool for image aesthetic assessment, which can be applied to photography and image optimization field.",2019,ICC 2019,,10.1007/978-3-030-41117-6_1,
6424b69f3ff4d35249c0bb7ef912fbc2c86f4ff4,1,[D19],,1,0,0,0,0,0,0,0,0,0,0,Deep Learning Face Attributes in the Wild,"Predicting face attributes in the wild is challenging due to complex face variations. We propose a novel deep learning framework for attribute prediction in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently. LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction. This framework not only outperforms the state-of-the-art with a large margin, but also reveals valuable facts on learning face representation. (1) It shows how the performances of face localization (LNet) and attribute prediction (ANet) can be improved by different pre-training strategies. (2) It reveals that although the filters of LNet are fine-tuned only with image-level attribute tags, their response maps over entire images have strong indication of face locations. This fact enables training LNet for face localization with only image-level annotations, but without face bounding boxes or landmarks, which are required by all attribute recognition works. (3) It also demonstrates that the high-level hidden neurons of ANet automatically discover semantic concepts after pre-training with massive face identities, and such concepts are significantly enriched after fine-tuning with attribute tags. Each attribute can be well explained with a sparse linear combination of these concepts.",2015,2015 IEEE International Conference on Computer Vision (ICCV),1411.7766,10.1109/ICCV.2015.425,https://arxiv.org/pdf/1411.7766.pdf
64b6fe9fc52887974a88756e831ff3353a39e9e6,0,,,0,1,0,0,0,0,0,0,0,0,0,Attentive Semantic Exploring for Manipulated Face Detection.,"Face manipulation methods develop rapidly in recent years, whose potential risk to society accounts for the emerging of researches on detection methods. However, due to the diversity of manipulation methods and the high quality of fake images, detection methods suffer from a lack of generalization ability. To solve the problem, we find that segmenting images into semantic fragments could be effective, as discriminative defects and distortions are closely related to such fragments. Besides, to highlight discriminative regions in fragments and to measure contribution to the final prediction of each fragment is efficient for the improvement of generalization ability. Therefore, we propose a novel manipulated face detection method based on Multilevel Facial Semantic Segmentation and Cascade Attention Mechanism. To evaluate our method, we reconstruct two datasets: GGFI and FFMI, and also collect two open-source datasets. Experiments on four datasets verify the advantages of our approach against other state-of-the-arts, especially its generalization ability.",2020,,2005.02958,,https://arxiv.org/pdf/2005.02958.pdf
6580807eeb2cba8e0cd577f3cc57904685edcc7f,0,,,0,1,0,0,0,0,0,0,0,0,0,Bidirectional One-Shot Unsupervised Domain Mapping,"We study the problem of mapping between a domain A, in which there is a single training sample and a domain B, for which we have a richer training set. The method we present is able to perform this mapping in both directions. For example, we can transfer all MNIST images to the visual domain captured by a single SVHN image and transform the SVHN image to the domain of the MNIST images. Our method is based on employing one encoder and one decoder for each domain, without utilizing weight sharing. The autoencoder of the single sample domain is trained to match both this sample and the latent space of domain B. Our results demonstrate convincing mapping between domains, where either the source or the target domain are defined by a single sample, far surpassing existing solutions. Our code is made publicly available at https://github.com/tomercohen11/BiOST.",2019,2019 IEEE/CVF International Conference on Computer Vision (ICCV),1909.01595,10.1109/ICCV.2019.00187,https://arxiv.org/pdf/1909.01595.pdf
659c15db147539006b08f238bf3ef5be0a9634f3,0,,,0,1,0,0,0,1,0,0,0,0,1,"LEARNING QUALITY, AESTHETICS, AND FACIAL ATTRIBUTES FOR IMAGE ANNOTATION","Every day, a large number of digital images are produced by users of social networks, smartphone users, photography professionals, etc. This caused a problem in the management, organization, indexing, and recovery of digital images. In order to ease this problem, several methods have been introduced in the literature to catalog images automatically. These methods are designed to associate images with one or more keywords belonging to a predefined dictionary or to associate images with visual attributes such as, for example, quality, aesthetics, sentiment, memorability, interestingness, and complexity, etc. This thesis investigates the use of deep convolutional neural network for automatic estimation of image quality and image aesthetics. In the last few years, several methods for automatic image quality assessment have been proposed. Most of them have been designed to deal with synthetically distorted images, which by definition do not truly model distortions afflicting real-world images. In this thesis a method for the automatic quality assessment of authentically distorted images is investigated. It shows better performances than state-of-the-art methods both on synthetically and authentically distorted images datasets. Differently from the image quality, which characterizes the perceived quality of the image signal, aesthetics depicts perceived beauty. As first step, the problem of aesthetic quality assessment of real-life general content images has been investigated. The proposed solution outperformed state-of-the-art methods on the largest publicly available dataset. Given that one of the most popular visual contents is the face (e.g. on social networks for photo sharing), aesthetics assessment is, therefore, further investigated on the specific case of portrait images. To this end, in this thesis an algorithm involving the combination of the previously investigated visual attributes (i.e. quality and aesthetics of general content images) and the facial attributes (i.e. smiling, hair style, makeup) description is proposed. Facial attributes description is achieved thanks to two proposed methods. The first algorithm is a robust smile detector (it represents an important visual feature for portrait aesthetics), the second is a multiple-task model designed in order to simultaneously estimate soft biometrics and attributes such as hair colors and styles, types of beards. While the first algorithm outperforms state-of-the-art methods (also respect to highly distorted images), the multi-task model demonstrates comparable performance. Experimental results for the portrait image aesthetic assessment thanks to the use of the proposed algorithm show promising performance on three standard datasets.",2018,,,,https://pdfs.semanticscholar.org/659c/15db147539006b08f238bf3ef5be0a9634f3.pdf
666c6b06f7f7b997255f3a0b4950b203cc820bb5,0,,,1,0,0,0,0,0,0,0,0,0,0,Several models and applications for deep learning,"As a popular technology in recent years, deep learning has attracted widespread attention from academic research to industrial application. In this paper, we briefly summarize the important concepts and decisive factors in the development of deep learning. Then the representative contemporary algorithms are mentioned. It summarizes the 6 main deep learning models of the current mainstream academic research and expounds their principles, illustrating the concepts and characteristics of different kinds of neural network structure models. Certain industrial applications, including speech recognition, image recognition and artificial intelligence, are presented to analyze the future trends and the main challenges.",2017,2017 3rd IEEE International Conference on Computer and Communications (ICCC),,10.1109/COMPCOMM.2017.8322601,
6707b49dcd4b35ddf749c2ba77296f817bbf096b,0,,,1,0,0,0,0,0,0,0,0,0,0,Privacy Preservation for Cloud-Based Data Sharing and Data Analytics,"Data privacy is a globally recognized human right for individuals to control the access to their personal information, and bar the negative consequences from the use of this information. As communication technologies progress, the means to protect data privacy must also evolve to address new challenges come into view. Our research goal in this dissertation is to develop privacy protection frameworks and techniques suitable for the emerging cloud-based data services, in particular privacy-preserving algorithms and protocols for the cloud-based data sharing and data analytics services. Cloud computing has enabled users to store, process, and communicate their personal information through third-party services. It has also raised privacy issues regarding losing control over data, mass harvesting of information, and un-consented disclosure of personal content. Above all, the main concern is the lack of understanding about data privacy in cloud environments. Currently, the cloud service providers either advocate the principle of third-party doctrine and deny users’ rights to protect their data stored in the cloud; or rely the notice-and-choice framework and present users with ambiguous, incomprehensible privacy statements without any meaningful privacy guarantee. In this regard, our research has three main contributions. First, to capture users’ privacy expectations in cloud environments, we conceptually divide personal data into two categories, i.e., visible data and invisible data. The visible data refer to information users intentionally create, upload to, and share through the cloud; the invisible data refer to users’ information retained in the cloud that is aggregated, analyzed, and repurposed without their knowledge or understanding. Second, to address users’ privacy concerns raised by cloud computing, we propose two privacy protection frameworks, namely individual control and use limitation. The individual control framework emphasizes users’ capability to govern the access to the visible data stored in the cloud. The use limitation framework emphasizes users’ expectation to remain anonymous when the invisible data are aggregated and analyzed by cloud-based data services. Finally, we investigate various techniques to accommodate the new privacy protection frameworks, in the context of four cloud-based data services: personal health record sharing, location-based proximity test, link recommendation for social networks, and face tagging in photo management applications. For the first case, we develop a key-based protection technique to enforce fine-grained access control to users’ digital health records. For the second case, we develop a key-less protection technique to achieve location-specific user selection. For latter two cases, we develop distributed learning algorithms to prevent large scale data harvesting. We further combine these algorithms with query regulation techniques to achieve user anonymity. The picture that is emerging from the above works is a bleak one. Regarding to personal data, the reality is we can no longer control them all. As communication technologies evolve, the scope of personal data has expanded beyond local, discrete silos, and integrated into the Internet. The traditional understanding of privacy must be updated to reflect these changes. In addition, because privacy is a particularly nuanced problem that is governed by context, there is no one-size-fit-all solution. While some cases can be salvaged either by cryptography or by other means, in others a rethinking of the trade-offs between utility and privacy appears to be necessary. Privacy Preservation for Cloud-Based Data Sharing and Data Analytics",2016,,,,
671bfefb22d2044ab3e4402703bb88a10a7da78a,0,,,0,1,0,0,0,0,0,0,0,0,0,Triple consistency loss for pairing distributions in GAN-based face synthesis,"Generative Adversarial Networks have shown impressive results for the task of object translation, including face-to-face translation. A key component behind the success of recent approaches is the self-consistency loss, which encourages a network to recover the original input image when the output generated for a desired attribute is itself passed through the same network, but with the target attribute inverted. While the self-consistency loss yields photo-realistic results, it can be shown that the input and target domains, supposed to be close, differ substantially. This is empirically found by observing that a network recovers the input image even if attributes other than the inversion of the original goal are set as target. This stops one combining networks for different tasks, or using a network to do progressive forward passes. In this paper, we show empirical evidence of this effect, and propose a new loss to bridge the gap between the distributions of the input and target domains. This ""triple consistency loss"", aims to minimise the distance between the outputs generated by the network for different routes to the target, independent of any intermediate steps. To show this is effective, we incorporate the triple consistency loss into the training of a new landmark-guided face to face synthesis, where, contrary to previous works, the generated images can simultaneously undergo a large transformation in both expression and pose. To the best of our knowledge, we are the first to tackle the problem of mismatching distributions in self-domain synthesis, and to propose ""in-the-wild"" landmark-guided synthesis. Code will be available at this https URL",2018,ArXiv,1811.03492,,https://arxiv.org/pdf/1811.03492.pdf
6798ab287fd0d9aa29a78e31eecfac79a274c167,0,,,0,1,0,0,0,0,0,0,0,0,0,Algorithms above the noise floor,"Many success stories in the data sciences share an intriguing computational phenomenon. While the core algorithmic problems might seem intractable at first, simple heuristics or approximation algorithms often perform surprisingly well in practice. Common examples include optimizing non-convex functions or optimizing over non-convex sets. In theory, such problems are usually NP-hard. But in practice, they are often solved sufficiently well for applications in machine learning and statistics. Even when a problem is convex, we often settle for sub-optimal solutions returned by inexact methods like stochastic gradient descent. And in nearest neighbor search, a variety of approximation algorithms works remarkably well despite the ""curse of dimensionality"". In this thesis, we study this phenomenon in the context of three fundamental algorithmic problems arising in the data sciences. * In constrained optimization, we show that it is possible to optimize over a wide range of non-convex sets up to the statistical noise floor. "" In unconstrained optimization, we prove that important convex problems already require approximation if we want to find a solution quickly. * In nearest neighbor search, we show that approximation guarantees can explain much of the good performance observed in practice. The overarching theme is that the computational hardness of many problems emerges only below the inherent ""noise floor"" of real data. Hence computational hardness of these problems does not prevent us from finding answers that perform well from a statistical perspective. This offers an explanation for why algorithmic problems in the data sciences often turn out to be easier than expected. Thesis Supervisor: Piotr Indyk Title: Professor of Electrical Engineering and Computer Science",2018,,,,https://pdfs.semanticscholar.org/a79e/472f997463c0959d46d87d19d8cee62f1a9f.pdf
680405eab6adfdeab5aba0e182b6210e6dbe9406,0,,,0,1,0,0,0,0,0,0,0,0,0,Face Recognition Systems Under Morphing Attacks: A Survey,"Recently, researchers found that the intended generalizability of (deep) face recognition systems increases their vulnerability against attacks. In particular, the attacks based on morphed face images pose a severe security risk to face recognition systems. In the last few years, the topic of (face) image morphing and automated morphing attack detection has sparked the interest of several research laboratories working in the field of biometrics and many different approaches have been published. In this paper, a conceptual categorization and metrics for an evaluation of such methods are presented, followed by a comprehensive survey of relevant publications. In addition, technical considerations and tradeoffs of the surveyed methods are discussed along with open issues and challenges in the field.",2019,IEEE Access,,10.1109/ACCESS.2019.2899367,
680f3d35bf290b575c9d20dd846d928bad4e305c,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Learning a perceptual manifold for image set classification,"We present a biologically motivated manifold learning framework for image set classification inspired by Independent Component Analysis for Grassmann manifolds. A Grassmann manifold is a collection of linear subspaces, such that each subspace is mapped on a single point on the manifold. We propose constructing Grassmann subspaces using Independent Component Analysis for robustness and improved class separation. The independent components capture spatially local information similar to Gabor-like filters within each subspace resulting in better classification accuracy. We further utilize linear discriminant analysis or sparse representation classification on the Grassmann manifold to achieve robust classification performance. We demonstrate the efficacy of our approach for image set classification on face and object recognition datasets.",2016,2016 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2016.7533198,https://sriramkumarwild.github.io/papers/ICIP_2016_GRAIL_final_v2_submit.pdf
68f19f06f49aa98b676fc6e315b25e23a1efb1f0,1,,1,0,0,1,0,0,1,0,0,0,0,0,Robust pose normalization for face recognition under varying views,"Unconstrained face recognition under varying views is one of the most challenging tasks, since the difference in appearances caused by poses may be even larger than that due to identity. In this paper, we exploit and analyze a novel pose normalization scheme for facial images under varying views via robust 3D shape reconstruction from single, unconstrained photos in the wild. Specifically, to address the problem of ambiguous 2D-to-3D landmark correspondence and imperfect landmark detector, for each input 2D face, the 3D shape is suggested to be learned by iteratively refining the 3D landmarks and the weighting coefficients of each landmark. Experimental results on both LFW and a large-scale self-collected face databases demonstrate that the proposed approach performs better than the existing representative technologies.",2015,2015 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2015.7351080,
69b2a7533e38c2c8c9a0891a728abb423ad2c7e7,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Manifold based sparse representation for facial understanding in natural images,"Sparse representations, motivated by strong evidence of sparsity in the primate visual cortex, are gaining popularity in the computer vision and pattern recognition fields, yet sparse methods have not gained widespread acceptance in the facial understanding communities. A main criticism brought forward by recent publications is that sparse reconstruction models work well with controlled datasets, but exhibit coefficient contamination in natural datasets. To better handle facial understanding problems, specifically the broad category of facial classification problems, an improved sparse paradigm is introduced in this paper. Our paradigm combines manifold learning for dimensionality reduction, based on a newly introduced variant of semi-supervised Locality Preserving Projections, with a @?^1 reconstruction error, and a regional based statistical inference model. We demonstrate state-of-the-art classification accuracy for the facial understanding problems of expression, gender, race, glasses, and facial hair classification. Our method minimizes coefficient contamination and offers a unique advantage over other facial classification methods when dealing with occlusions. Experimental results are presented on multi-class as well as binary facial classification problems using the Labeled Faces in the Wild, Cohn-Kanade, Extended Cohn-Kanade, and GEMEP-FERA datasets demonstrating how and under what conditions sparse representations can further the field of facial understanding.",2013,Image Vis. Comput.,,10.1016/j.imavis.2013.03.003,
6b0bbf3e7df725cc3b781d2648e41782cb3d8539,0,,,0,1,0,0,0,0,0,0,0,0,0,Generative Image Inpainting with Contextual Attention,"Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feedforward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting.",2018,2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,1801.07892,10.1109/CVPR.2018.00577,https://arxiv.org/pdf/1801.07892.pdf
6b4b26ae9d9c3a6b98163a1d0dd2268092a66718,0,,,0,1,0,0,0,0,0,0,0,0,0,Advances in deep learning with limited supervision and computational resources,"Deep neural networks are the cornerstone of state-of-the-art systems for a wide range of tasks, including object recognition, language modelling and machine translation. In the last decade, research in the field of deep learning has led to numerous key advances in designing novel architectures and training algorithms for neural networks. However, most success stories in deep learning heavily relied on two main factors: the availability of large amounts of labelled data and massive computational resources. This thesis by articles makes several contributions to advancing deep learning, specifically in problems with limited or no labelled data, or with constrained computational resources. The first article addresses sparsity of labelled data that emerges in the application field of recommender systems. We propose a multi-task learning framework that leverages natural language reviews in improving recommendation. Specifically, we apply neural-network-based methods for learning representations of products from review text, while learning from rating data. We demonstrate that the proposed method can achieve state-of-the-art performance on the Amazon Reviews dataset. The second article tackles computational challenges in training large-scale deep neural networks. We propose a conditional computation network architecture which can adaptively assign its capacity, and hence computations, across different regions of the input. We demonstrate the effectiveness of our model on visual recognition tasks where objects are spatially localized within the input, while maintaining much lower computational overhead than standard network architectures. The third article contributes to the domain of unsupervised learning with the generative adversarial networks paradigm. We introduce a flexible adversarial training framework, in which not only the generator converges to the 1 data distribution, but also the discriminator recovers the relative density of the data at the optimum. We validate our framework empirically by showing that the discriminator is able to accurately estimate the 1 energy of data while obtaining state-of-the-art quality of samples. Finally, in the fourth article, we address the problem of unsupervised domain translation. We propose a model which can learn flexible, many-to-many mappings across domains from unpaired data. We validate our approach on several image datasets, and we show that it",2020,,,,
6be2522fa708de1334fea647c9151149d16ec9a1,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition,"This paper proposes an encoder-decoder network to disentangle shape features during 3D face reconstruction from single 2D images, such that the tasks of reconstructing accurate 3D face shapes and learning discriminative shape features for face recognition can be accomplished simultaneously. Unlike existing 3D face reconstruction methods, our proposed method directly regresses dense 3D face shapes from single 2D images, and tackles identity and residual (i.e., non-identity) components in 3D face shapes explicitly and separately based on a composite 3D face shape model with latent representations. We devise a training process for the proposed network with a joint loss measuring both face identification error and 3D face shape reconstruction error. To construct training data we develop a method for fitting 3D morphable model (3DMM) to multiple 2D images of a subject. Comprehensive experiments have been done on MICC, BU3DFE, LFW and YTF databases. The results show that our method expands the capacity of 3DMM for capturing discriminative shape features and facial detail, and thus outperforms existing methods both in 3D face reconstruction accuracy and in face recognition accuracy.",2018,2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,1803.11366,10.1109/CVPR.2018.00547,https://arxiv.org/pdf/1803.11366.pdf
6c1b27242822485bea954cb66f6aa314a10f55aa,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,End-to-End Face Detection and Cast Grouping in Movies Using Erdös-Rényi Clustering,"We present an end-to-end system for detecting and clustering faces by identity in full-length movies. Unlike works that start with a predefined set of detected faces, we consider the end-to-end problem of detection and clustering together. We make three separate contributions. First, we combine a state-of-the-art face detector with a generic tracker to extract high quality face tracklets. We then introduce a novel clustering method, motivated by the classic graph theory results of Erdös and Rényi. It is based on the observations that large clusters can be fully connected by joining just a small fraction of their point pairs, while just a single connection between two different people can lead to poor clustering results. This suggests clustering using a verification system with very few 0 positives but perhaps moderate recall. We introduce a novel verification method, rank-1 counts verification, that has this property, and use it in a link-based clustering scheme. Finally, we define a novel end-to-end detection and clustering evaluation metric allowing us to assess the accuracy of the entire end-to-end system. We present state-of-the-art results on multiple video data sets and also on standard face databases.",2017,2017 IEEE International Conference on Computer Vision (ICCV),1709.02458,10.1109/ICCV.2017.564,http://people.cs.umass.edu/~elm/papers/Erdos.pdf
6cfa4ab327d42103195cb8e5c6181028cec8ee62,0,,,0,0,0,0,0,0,0,0,0,0,1,PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing,"Depth estimation and scene parsing are two particularly important tasks in visual scene understanding. In this paper we tackle the problem of simultaneous depth estimation and scene parsing in a joint CNN. The task can be typically treated as a deep multi-task learning problem [42]. Different from previous methods directly optimizing multiple tasks given the input training data, this paper proposes a novel multi-task guided prediction-and-distillation network (PAD-Net), which first predicts a set of intermediate auxiliary tasks ranging from low level to high level, and then the predictions from these intermediate auxiliary tasks are utilized as multi-modal input via our proposed multi-modal distillation modules for the final tasks. During the joint learning, the intermediate tasks not only act as supervision for learning more robust deep representations but also provide rich multi-modal information for improving the final tasks. Extensive experiments are conducted on two challenging datasets (i.e. NYUD-v2 and Cityscapes) for both the depth estimation and scene parsing tasks, demonstrating the effectiveness of the proposed approach.",2018,2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,1805.04409,10.1109/CVPR.2018.00077,https://arxiv.org/pdf/1805.04409.pdf
6d6709597acde263c032ec288725813f0157dde8,0,,,0,1,0,0,0,0,0,0,0,0,0,Leveraging Model Flexibility and Deep Structure: Non-Parametric and Deep Models for Computer Vision Processes with Applications to Deep Model Compression,"My dissertation presents several new algorithms incorporating non-parametric and deep learning approaches for computer vision and related tasks, including object localization, object tracking and model compression. With respect to object localization, I introduce a method to perform active localization by modeling spatial and other relationships between objects in a coherent “visual situation” using a set of probability distributions. I further refine this approach with the Multipole Density Estimation with Importance Clustering (MICSituate) algorithm. Next, I formulate active, “situation” object search as a Bayesian optimization problem using Gaussian Processes. Using my Gaussian Process ContextSituation Learning (GP-CL) algorithm, I demonstrate improved efficiency for object localization over baseline procedures. In subsequent work, I expand this research to frame object tracking in video as a temporally-evolving, dynamic Bayesian optimization problem. Here I present the Siamese-Dynamic Bayesian Tracking Algorithm (SDBTA), the first integrated dynamic Bayesian optimization framework in combination with deep learning for video tracking. Through experiments, I show improved results for video tracking in comparison with baseline approaches. Finally, I propose a novel data compression algorithm, Regularized L21 Semi-NonNegative Matrix Factorization (L21 SNF) which serves as a general purpose, parts-based compression algorithm, applicable to deep model compression.",2020,,,10.15760/etd.7320,https://pdfs.semanticscholar.org/6d67/09597acde263c032ec288725813f0157dde8.pdf
6ea6c39593853a5382ee975a0b7bb8222227a6cc,0,,,0,1,0,0,0,0,0,0,0,0,0,An Adaptive Control Algorithm for Stable Training of Generative Adversarial Networks,"Generative adversarial networks (GANs) have shown significant progress in generating high-quality visual samples, however they are still well known both for being unstable to train and for the problem of mode collapse, particularly when trained on data collections containing a diverse set of visual objects. In this paper, we propose an Adaptive <inline-formula> <tex-math notation=""LaTeX"">$k$ </tex-math></inline-formula>-step Generative Adversarial Network (<inline-formula> <tex-math notation=""LaTeX"">$\text{A}k$ </tex-math></inline-formula>-GAN), which is designed to mitigate the impact of instability and saturation in the original by dynamically adjusting the ratio of the training steps of both the generator and discriminator. To accomplish this, we track and analyze stable training curves of relatively narrow datasets and use them as the target fitting lines when training more diverse data collections. Furthermore, we conduct experiments on the proposed procedure using several optimization techniques (e.g., supervised guiding from previous stable learning curves with and without momentum) and compare their performance with that of state-of-the-art models on the task of image synthesis from datasets consisting of diverse images. Empirical results demonstrate that <inline-formula> <tex-math notation=""LaTeX"">$\text{A}k$ </tex-math></inline-formula>-GAN works well in practice and exhibits more stable behavior than regular GANs during training. A quantitative evaluation has been conducted on the <inline-formula> <tex-math notation=""LaTeX"">$Inception~Score$ </tex-math></inline-formula> (<inline-formula> <tex-math notation=""LaTeX"">$IS$ </tex-math></inline-formula>) and the <inline-formula> <tex-math notation=""LaTeX"">$relative~inverse~Inception~Score$ </tex-math></inline-formula> (<inline-formula> <tex-math notation=""LaTeX"">$RIS$ </tex-math></inline-formula>); compared with regular GANs, the former has been improved by 61% and 83%, and the latter by 21% and 60%, on the CelebA and the Anime datasets, respectively.",2019,IEEE Access,,10.1109/ACCESS.2019.2960461,
6eba25166fe461dc388805cc2452d49f5d1cdadd,0,,,1,0,0,0,0,0,0,0,0,0,0,"ALBANIE, VEDALDI: LEARNING GRIMACES BY WATCHING TV 1 Learning Grimaces by Watching TV","Differently from computer vision systems which require explicit supervision, humans can learn facial expressions by observing people in their environment. In this paper, we look at how similar capabilities could be developed in machine vision. As a starting point, we consider the problem of relating facial expressions to objectively-measurable events occurring in videos. In particular, we consider a gameshow in which contestants play to win significant sums of money. We extract events affecting the game and corresponding facial expressions objectively and automatically from the videos, obtaining large quantities of labelled data for our study. We also develop, using benchmarks such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial expression recognition, showing that pre-training on face verification data can be highly beneficial for this task. Then, we extend these models to use facial expressions to predict events in videos and learn nameable expressions from them. The dataset and emotion recognition models are available at http://www.robots.ox.ac.uk/~vgg/data/facevalue.",2017,,,,http://www.bmva.org/bmvc/2016/papers/paper122/paper122.pdf
6fb232c1335c7460200db2ca211d7d8e68df7f85,0,,,0,1,0,0,0,0,0,0,0,0,0,Rate-Distortion Optimization Guided Autoencoder for Isometric Embedding in Euclidean Latent Space,,2019,,1910.04329,,https://arxiv.org/pdf/1910.04329.pdf
704c29fa4ffa6dd5c0774467e38413085a26e361,0,,,1,0,0,0,0,0,0,0,0,0,0,Face database generation based on text-video correlation,"The size of databases is the key to success to face recognition systems. However, building such a database is both time-consuming and labor intensive. In this paper, we address the problem by proposing a database generation framework based on text-video correlation. Specifically, visual content of a video can be presented as a character sequence by face detection, tracking and recognition, while text information extracted from subtitles and scripts provides complementary identity sequence. By correlating these two sequences, faces recognized can be refined without manual intervention. Experiments demonstrate that 90% of the human effort in face database construction can be reduced. HighlightsA face database generation framework based on text-video correlation is proposed.The system is able to reduce 90% of the human effort in face database construction.We utilize scripts and subtitles of videos to remove face recognition errors.We introduce timing projection method for text and video correlation.",2016,Neurocomputing,,10.1016/j.neucom.2016.05.009,
70580ed8bc482cad66e059e838e4a779081d1648,1,[D20],,1,0,0,1,0,0,0,0,0,0,0,Gender Classification using Multi-Level Wavelets on Real World Face Images,"Gender classification is a major area of classification that has generated a lot of academic and research interest over the past decade or so. Being a recent area of interest in classification, there is still a lot of opportunity for further improvements in the existing techniques and their capabilities. In this paper, an attempt has been made to cover some of the limitations that the associated research community has faced by proposing a novel gender classification technique. In this technique, discrete wavelet transform has been used up to five levels for the purpose of feature extraction. To accommodate pose and expression variations, the energies of sub-bands are calculated and combined at the end. Only those features are used which are considered significant, and this significance is measured using Particle Swarm Optimization (PSO). The experimentation performed on real world images has shown a significant classification improvement and accuracy to the tune of 97%. The results also reveal the superiority of the proposed technique over others in its robustness, efficiency, illumination and pose change variation detection.",2013,,,10.12700/aph.10.04.2013.4.12,https://pdfs.semanticscholar.org/7058/0ed8bc482cad66e059e838e4a779081d1648.pdf
70d2ab1af0edd5c0a30d576a5d4aa397c4f92d3e,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Elastic preserving projections based on L1-norm maximization,"Elastic preserving projections (EPP) is a classical manifold learning technique for dimensionality reduction, which has demonstrated good performance in pattern recognition. However, EPP is sensitive to the outliers because it makes use of the L2-norm for optimization. In this paper, we propose an effective and robust EPP version based on L1-norm maxmization (EPP-L1), which can learn the optimal projection vectors by maximizing the ratio of the global dispersion to the local dispersion using the L1-norm rather than L2-norm. The proposed method is proved to be feasible and also robust to outliers while overcoming the singular problem of the local scatter matrix for EPP. Experiments on five popular face image databases demonstrate the effectiveness of the proposed method.",2018,Multimedia Tools and Applications,,10.1007/s11042-018-5608-2,
71b7fc715e2f1bb24c0030af8d7e7b6e7cd128a6,1,[D18],,1,1,0,0,0,1,0,0,0,0,0,The Do’s and Don’ts for CNN-Based Face Verification,"While the research community appears to have developed a consensus on the methods of acquiring annotated data, design and training of CNNs, many questions still remain to be answered. In this paper, we explore the following questions that are critical to face recognition research: (i) Can we train on still images and expect the systems to work on videos? (ii) Are deeper datasets better than wider datasets? (iii) Does adding label noise lead to improvement in performance of deep networks? (iv) Is alignment needed for face recognition? We address these questions by training CNNs using CASIA-WebFace, UMD-Faces, and a new video dataset and testing on YouTube-Faces, IJB-A and a disjoint portion of UMDFaces datasets. Our new data set, which will be made publicly available, has 22,075 videos and 3,735,476 human annotated frames extracted from them.",2017,2017 IEEE International Conference on Computer Vision Workshops (ICCVW),1705.07426,10.1109/ICCVW.2017.299,https://arxiv.org/pdf/1705.07426.pdf
71bfa3fd13bc72c4143d36d01620de50eb2673ee,0,,,1,0,0,0,0,0,0,0,0,0,0,Exploiting random perturbations to defend against adversarial attacks,"Adversarial examples are deliberately crafted data points which aim to induce errors in machine learning models. This phenomenon has gained much attention recently, especially in the field of image classification, where many methods have been proposed to generate such malicious examples. In this paper we focus on defending a trained model against such attacks by introducing randomness to its inputs.",2018,"Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA)",,10.1117/12.2501606,
7343f0b7bcdaf909c5e37937e295bf0ac7b69499,1,[D37],,0,0,0,0,0,0,0,0,0,1,0,Adaptive Cascade Deep Convolutional Neural Networks for face alignment,"Abstract Deep convolutional network cascade has been successfully applied for face alignment. The configuration of each network, including the selecting strategy of local patches for training and the input range of local patches, is crucial for achieving desired performance. In this paper, we propose an adaptive cascade framework, termed Adaptive Cascade Deep Convolutional Neural Networks (ACDCNN) which adjusts the cascade structure adaptively. Gaussian distribution is utilized to bridge the successive networks. Extensive experiments demonstrate that our proposed ACDCNN achieves the state-of-the-art in accuracy, but with reduced model complexity and increased robustness.",2015,Comput. Stand. Interfaces,,10.1016/j.csi.2015.06.004,http://www1.ece.neu.edu/~yuewu/files/2015/1-s2.0-S0920548915000665-main.pdf
7365f887c938ca21a6adbef08b5a520ebbd4638f,0,,,0,1,0,0,0,0,0,0,0,0,0,Model Cards for Model Reporting,"Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type [15]) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for two supervised models: One trained to detect smiling faces in images, and one trained to detect toxic comments in text. We propose model cards as a step towards the responsible democratization of machine learning and related artificial intelligence technology, increasing transparency into how well artificial intelligence technology works. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation.",2019,FAT* '19,1810.03993,10.1145/3287560.3287596,https://arxiv.org/pdf/1810.03993.pdf
7391db7c99c5b43428b093188a66a9e274d7ac08,0,,,0,1,0,0,0,0,0,0,0,0,0,GADE: A Generative Adversarial Approach to Density Estimation and its Applications,"Density estimation is a challenging unsupervised learning problem. Current maximum likelihood approaches for density estimation are either restrictive or incapable of producing high-quality samples. On the other hand, likelihood-free models such as generative adversarial networks, produce sharp samples without a density model. The lack of a density estimate limits the applications to which the sampled data can be put, however. We propose a generative adversarial density estimator (GADE), a density estimation approach that bridges the gap between the two. Allowing for a prior on the parameters of the model, we extend our density estimator to a Bayesian model where we can leverage the predictive variance to measure our confidence in the likelihood. Our experiments on challenging applications such as visual dialog or autonomous driving where the density and the confidence in predictions are crucial shows the effectiveness of our approach.",2020,International Journal of Computer Vision,,10.1007/s11263-020-01360-9,
741bf6ebb5d8abaa54ea4e4738d36ea54db313fe,0,,,0,1,0,0,0,0,0,0,0,0,0,Interpretable Set Functions,"We propose learning flexible but interpretable functions that aggregate a variable-length set of permutation-invariant feature vectors to predict a label. We use a deep lattice network model so we can architect the model structure to enhance interpretability, and add monotonicity constraints between inputs-and-outputs. We then use the proposed set function to automate the engineering of dense, interpretable features from sparse categorical features, which we call semantic feature engine. Experiments on real-world data show the achieved accuracy is similar to deep sets or deep neural networks, and is easier to debug and understand.",2018,ArXiv,1806.0005,,https://arxiv.org/pdf/1806.00050.pdf
75b9df987114dfcf9ca83a86ec324a5b3c0d6375,0,,,1,0,0,0,0,0,0,0,0,0,0,New face recognition method based on local binary pattern histogram,"Face recognition is one of the most important tasks in computer vision and biometrics where many algorithms have been developed. The Local Binary Pattern (LBP) has been proved to be effective for facial image representation and analysis, but it is too local to be robust. In this paper, we present an improved method for face recognition named Elongated Multi-Block Local Ternary Pattern (EMBLTP), which is based on Local Binary Pattern (LBP).The proposed method is tested on Yale face database and compared with different variants of LBP. Experimental results show that, the classification rate of the proposed method is appreciable.",2014,2014 15th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA),,10.1109/STA.2014.7086724,http://www.univ-oeb.dz/bibliotheque/wp-content/uploads/2017/01/new-face-recognition-method.pdf
75f41ad75a634b1371261edee03fef8fe292df0d,0,,,0,0,0,0,0,0,0,1,0,0,0,"Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures","A convenient way of analysing Riemannian manifolds is to embed them in Euclidean spaces, with the embedding typically obtained by flattening the manifold via tangent spaces. This general approach is not free of drawbacks. For example, only distances between points to the tangent pole are equal to 1 geodesic distances. This is restrictive and may lead to inaccurate modelling. Instead of using tangent spaces, we propose embedding into the Reproducing Kernel Hilbert Space by introducing a Riemannian pseudo kernel. We furthermore propose to recast a locality preserving projection technique from Euclidean spaces to Riemannian manifolds, in order to demonstrate the benefits of the embedding. Experiments on several visual classification tasks (gesture recognition, person re-identification and texture classification) show that in comparison to tangent-based processing and state-of-the-art methods (such as tensor canonical correlation analysis), the proposed approach obtains considerable improvements in discrimination accuracy.",2012,2012 IEEE Workshop on the Applications of Computer Vision (WACV),,10.1109/WACV.2012.6163005,https://espace.library.uq.edu.au/view/UQ:269137/UQ269137_OA.pdf
7651d6498f437e30d31e354933b93f52791b6542,0,,,0,1,0,0,0,0,0,0,0,0,0,Learning from Irregularly-Sampled Time Series: A Missing Data Perspective,"Irregularly-sampled time series occur in many domains including healthcare. They can be challenging to model because they do not naturally yield a fixed-dimensional representation as required by many standard machine learning models. In this paper, we consider irregular sampling from the perspective of missing data. We model observed irregularly-sampled time series data as a sequence of index-value pairs sampled from a continuous but unobserved function. We introduce an encoder-decoder framework for learning from such generic indexed sequences. We propose learning methods for this framework based on variational autoencoders and generative adversarial networks. For continuous irregularly-sampled time series, we introduce continuous convolutional layers that can efficiently interface with existing neural network architectures. Experiments show that our models are able to achieve competitive or better classification results on irregularly-sampled multivariate time series compared to recent RNN models while offering significantly faster training times.",2020,ICML,2008.07599,,https://arxiv.org/pdf/2008.07599.pdf
767a6054796e2e6c1de453afab0e05e55aadf825,0,,,0,1,0,0,0,0,0,0,0,0,0,Learning Continuous Image Representation with Local Implicit Image Function,"How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit function, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in an arbitrary resolution. To generate the continuous representation for pixel-based images, we train an encoder and LIIF representation via a self-supervised task with superresolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to ×30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. Our project page with code is at https://yinboc.github.io/liif/.",2020,,2012.09161,,https://arxiv.org/pdf/2012.09161.pdf
7714882e4adbeab92fc217f07d5c150229b0aadc,1,[D20],,0,0,0,1,0,0,0,0,0,0,0,Joint features classifier with genetic set for undersampled face recognition,"Face recognition with limited training samples is a very difficult task. Especially in face recognition featuring one training image per individual, it even seems to be impossible to enable a superb accuracy. In this paper, we present a novel joint features classification approach with an external generic set for face recognition. The presented scheme leverages two representations based on Gabor feature and local Gabor binary patterns (LGBP) feature. Firstly, Gabor feature-based representation with an external generic set and LGBP feature-based representation with an external generic set are obtained independently. Then a weighted score level fusion scheme is adopted to automatically combine Gabor feature and LGBP feature, and to output the final decision. Three metrics, i.e., recognition rate, stability and execution time, are investigated in our evaluation of the performance of the presented method. The comprehensive experimental results on three large face databases (i.e., AR, FERET and WLF) demonstrated that the presented approach can always achieve very satisfactory accuracy and stability and that it is computationally tractable.",2017,Neural Computing and Applications,,10.1007/s00521-017-2897-8,
7717924aecb9ec5ebb8582aaa346c69eee1f86ca,0,,,0,1,0,0,0,0,0,0,0,0,0,No Representation without Transformation,"We extend the framework of variational autoencoders to represent transformations explicitly in the latent space. In the family of hierarchical graphical models that emerges, the latent space is populated by higher order objects that are inferred jointly with the latent representations they act on. To explicitly demonstrate the effect of these higher order objects, we show that the inferred latent transformations reflect interpretable properties in the observation space. Furthermore, the model is structured in such a way that in the absence of transformations, we can run inference and obtain generative capabilities comparable with standard variational autoencoders. Finally, utilizing the trained encoder, we outperform the baselines by a wide margin on a challenging out-of-distribution classification task.",2019,ArXiv,1912.03845,,https://arxiv.org/pdf/1912.03845.pdf
77c7d8012fe4179a814c1241a37a2256361bc1a4,0,,,1,0,0,0,0,0,0,0,0,0,0,BGP Face Retrieval Based on Coding Pyramid,"The traditional face image retrieval method is to compare the target picture with all the pictures in the database one by one, resulting in great time consumption. In this paper, the cascaded binary gradient pattern (BGP) is used to hierar-chically encode faces and form an encoding pyramid. The method of using fuzzy matching to exact matching is used when searching, according to the coding level, and the idea is based on the coarse-to-fine search. It is verified in the Yale Library that the method significantly shortens retrieval time and improves search efficiency.",2018,2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC),,10.1109/IHMSC.2018.10167,
77f1a5e3166b7fca66870fb61ddc8d6070d12cd0,0,,,0,1,0,0,0,0,0,0,0,0,0,OC-FakeDect: Classifying Deepfakes Using One-class Variational Autoencoder,"An image forgery method called Deepfakes can cause security and privacy issues by changing the identity of a person in a photo through the replacement of his/her face with a computer-generated image or another person's face. Therefore, a new challenge of detecting Deepfakes arises to protect individuals from potential misuses. Many researchers have proposed various binary-classification based detection approaches to detect deepfakes. However, binary-classification based methods generally require a large amount of both real and fake face images for training, and it is challenging to collect sufficient fake images data in advance. Besides, when new deepfakes generation methods are introduced, little deepfakes data will be available, and the detection performance may be mediocre. To overcome these data scarcity limitations, we formulate deepfakes detection as a one-class anomaly detection problem. We propose OC-FakeDect, which uses a one-class Variational Autoencoder (VAE) to train only on real face images and detects non-real images such as deepfakes by treating them as anomalies. Our preliminary result shows that our one class-based approach can be promising when detecting Deepfakes, achieving a 97.5% accuracy on the NeuralTextures data of the well-known FaceForensics++ benchmark dataset without using any fake images for the training process.",2020,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),,10.1109/CVPRW50498.2020.00336,
787e4415baa7bfa198a7e4a245c9b11034b01510,0,,,0,0,0,0,0,0,1,0,0,0,0,Robust Recognition using L 1-Principal Component Analysis by,,2016,,,,
78bcad8018467af3a6350714b70c91bd44f1181f,0,,,0,1,0,0,0,0,0,0,0,0,0,Considering Race a Problem of Transfer Learning,,2019,2019 IEEE Winter Applications of Computer Vision Workshops (WACVW),1812.04751,10.1109/wacvw.2019.00022,https://arxiv.org/pdf/1812.04751.pdf
79033ec1b2c86034908febd444d6ed3c753e17b3,1,[D20],,1,0,0,1,0,0,0,0,0,0,0,Face Recognition via Globality-Locality Preserving Projections,"We present an improved Locality Preserving Projections (LPP) method, namedGloablity-Locality Preserving Projections (GLPP), to preserve both the global and local geometric structures of data. In our approach, an additional constraint of the geometry of classes is imposed to the objective function of conventional LPP for respecting some more global manifold structures. Moreover, we formulate a two-dimensional extension of GLPP (2D-GLPP) as an example to show how to extend GLPP with some other statistical techniques. We apply our works to face recognition on four popular face databases, namely ORL, Yale, FERET and LFW-A databases, and extensive experimental results demonstrate that the considered global manifold information can significantly improve the performance of LPP and the proposed face recognition methods outperform the state-of-the-arts.",2013,ArXiv,1311.1279,,https://arxiv.org/pdf/1311.1279.pdf
7923742e2af655dee4f9a99e39916d164bc30178,1,,1,1,1,0,0,0,0,0,0,0,0,0,Soft biometric privacy: Retaining biometric utility of face images while perturbing gender,"While the primary purpose for collecting biometric data (such as face images, iris, fingerprints, etc.) is for person recognition, yet recent advances in machine learning has shown the possibility of extracting auxiliary information from biometric data such as age, gender, health attributes, etc. These auxiliary attributes are sometimes referred to as soft biometrics. This automatic extraction of soft biometric attributes can happen without the user's agreement, thereby raising several privacy concerns. In this work, we design a technique that modifies a face image such that its gender as assessed by a gender classifier is perturbed, while its biometric utility as assessed by a face matcher is retained. Given an arbitrary biometric matcher and an attribute classifier, the proposed method systematically perturbs the input image such that the output of the attribute classifier is confounded, while the output of the biometric matcher is not significantly impacted. Experimental analysis convey the efficacy of the scheme in imparting gender privacy to face images.",2017,2017 IEEE International Joint Conference on Biometrics (IJCB),,10.1109/BTAS.2017.8272743,http://www.cse.msu.edu/~rossarun/pubs/MirjaliliRossSoftBiometricPrivacy_IJCB2017.pdf
79815f31f42708fd59da345f8fa79f635a070730,0,,,0,1,0,0,0,0,0,0,0,0,0,Autoregressive Quantile Networks for Generative Modeling,"We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression. AIQN is able to achieve superior perceptual quality and improvements in evaluation metrics, without incurring a loss of sample diversity. The method can be applied to many existing models and architectures. In this work we extend the PixelCNN model with AIQN and demonstrate results on CIFAR-10 and ImageNet using Inception score, FID, non-cherry-picked samples, and inpainting results. We consistently observe that AIQN yields a highly stable algorithm that improves perceptual quality while maintaining a highly diverse distribution.",2018,ICML,1806.05575,,https://arxiv.org/pdf/1806.05575.pdf
799322fc525964889afbfde27db1fcef9feb3d61,0,,,0,1,0,0,0,0,0,0,0,0,0,Training Deep Neural Network in Limited Precision,"Energy and resource efficient training of DNNs will greatly extend the applications of deep learning. However, there are three major obstacles which mandate accurate calculation in high precision. In this paper, we tackle two of them related to the loss of gradients during parameter update and backpropagation through a softmax nonlinearity layer in low precision training. We implemented SGD with Kahan summation by employing an additional parameter to virtually extend the bit-width of the parameters for a reliable parameter update. We also proposed a simple guideline to help select the appropriate bit-width for the last FC layer followed by a softmax nonlinearity layer. It determines the lower bound of the required bit-width based on the class size of the dataset. Extensive experiments on various network architectures and benchmarks verifies the effectiveness of the proposed technique for low precision training.",2018,ArXiv,1810.05486,,https://arxiv.org/pdf/1810.05486.pdf
7a5adfc8b3d1e5ae246b7d37c232ca4f7ad734f5,0,,,0,0,0,0,0,0,1,0,0,0,0,An Efficient Convolutional Neural Network Approach for Facial Recognition,"Data security being the main concern now a days, has faced a lot of threat in terms of breaching of information which requires immediate attention. Biometrics have served a long-run for this purpose which is a part of Deep Learning. In the recent past, face recognition has become a very important tool for safety and security purposes. This paper presents the application of face recognition technique, making use of Convolutional Neural Network (CNN) with Python and a comparison is drawn between the other techniques such as Principal Component Analysis (PCA), Local Binary Pattern (LBP) and K Nearest Neighbour (KNN). Unlike conventional methods, the proposed scheme uses four Convolutional layers with ReLu layers, four pooling layers, a fully connected layer and a Softmax Loss Layer to normalize the probability distribution. The dataset consists of 1500 images with different facial expressions and the model is trained and tested in order to acquire an accuracy using CNN method. Experimental results show that the proposed Neural Network scored an accuracy of 96.96%.",2020,"2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)",,10.1109/Confluence47617.2020.9058109,
7b2e0ada9b4299380421944f809a98783cf065a1,1,[D23],,1,0,0,0,0,0,1,1,0,0,0,An Out-of-Sample Extension to Manifold Learning via Meta-Modeling,"Unsupervised manifold learning has become accepted as an important tool for reducing dimensionality of a dataset by finding its meaningful low-dimensional representation lying on an unknown nonlinear subspace. Most manifold learning methods only embed an existing dataset but do not provide an explicit mapping function for novel out-of-sample data, thereby potentially resulting in an ineffective tool for classification purposes, particularly for iterative methods, such as active learning. To address this issue, out-of-sample extension methods have been introduced to generalize an existing embedding of new samples. In this paper, a novel out-of-sample method is introduced by utilizing high dimensional model representation (HDMR) as a nonlinear multivariate regression with the Tikhonov regularizer for unsupervised manifold learning algorithms. The proposed method was extensively analyzed using illustrative datasets sampled from known manifolds. Several experiments with 3D synthetic datasets and face recognition datasets were also conducted, and the performance of the proposed method was compared to several well-known out-of-sample methods. The results obtained with locally linear embedding (LLE), Laplacian Eigenmaps (LE), and t-distributed stochastic neighbor embedding (t-SNE) showed that the proposed method achieves competitive even better performance than the other out-of-sample methods.",2019,IEEE Transactions on Image Processing,,10.1109/TIP.2019.2915162,
7b8dce13824bd1ba29b5b2ea772ee676e639c817,0,,,0,1,0,0,0,0,0,0,0,0,0,Coupled Learning for Image Generation and Latent Representation Inference Using MMD,"For modeling the data distribution or the latent representation distribution in the image domain, deep learning methods such as the variational autoencoder (VAE) and the generative adversarial network (GAN) have been proposed. However, despite its capability of modeling these two distributions, VAE tends to learn less meaningful latent representations; GAN can only model the data distribution using the challenging and unstable adversarial training. To address these issues, we propose an unsupervised learning framework to perform coupled learning of these two distributions based on kernel maximum mean discrepancy (MMD). Specifically, the proposed framework consists of (1) an inference network and a generation network for mapping between the data space and the latent space, and (2) a latent tester and a data tester for performing two-sample tests in these two spaces, respectively. On one hand, we perform a two-sample test between stochastic representations from the prior distribution and inferred representations from the inference network. On the other hand, we perform a two-sample test between the real data and generated data. In addition, we impose structural regularization that the two networks are inverses of each other, so that the learning of these two distributions can be coupled. Experimental results on benchmark image datasets demonstrate that the proposed framework is competitive on image generation and latent representation inference of images compared with representative approaches.",2018,PCM,,10.1007/978-3-030-00767-6_40,
7bf089c461a31b1737b2207448b21a2fbca998f9,0,,,0,1,0,0,0,0,0,0,0,0,0,Deep Model Transferability from Attribution Maps,"Exploring the transferability between heterogeneous tasks sheds light on their intrinsic interconnections, and consequently enables knowledge transfer from one task to another so as to reduce the training effort of the latter. In this paper, we propose an embarrassingly simple yet very efficacious approach to estimating the transferability of deep networks, especially those handling vision tasks. Unlike the seminal work of \emph{taskonomy} that relies on a large number of annotations as supervision and is thus computationally cumbersome, the proposed approach requires no human annotations and imposes no constraints on the architectures of the networks. This is achieved, specifically, via projecting deep networks into a \emph{model space}, wherein each network is treated as a point and the distances between two points are measured by deviations of their produced attribution maps. The proposed approach is several-magnitude times faster than taskonomy, and meanwhile preserves a task-wise topological structure highly similar to the one obtained by taskonomy. Code is available at \url{https://github.com/zju-vipa/TransferbilityFromAttributionMaps}.",2019,NeurIPS,1909.11902,,https://arxiv.org/pdf/1909.11902.pdf
7cfc822cab9e1c893cf6f6d6cb7ff2741d30a467,1,"[D18], [D22]",,1,0,0,0,0,1,1,0,0,0,0,Pose-robust face signature for multi-view face recognition,"Despite the great progress achieved in unconstrained face recognition, pose variations still remain a challenging and unsolved practical issue. We propose a novel framework for multi-view face recognition based on extracting and matching pose-robust face signatures from 2D images. Specifically, we propose an efficient method for monocular 3D face reconstruction, which is used to lift the 2D facial appearance to a canonical texture space and estimate the self-occlusion. On the lifted facial texture we then extract various local features, which are further enhanced by the occlusion encodings computed on the self-occlusion mask, resulting in a pose-robust face signature, a novel feature representation of the original 2D facial image. Extensive experiments on two public datasets demonstrate that our method not only simplifies the matching of multi-view 2D facial images by circumventing the requirement for pose-adaptive classifiers, but also achieves superior performance.",2015,"2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS)",,10.1109/BTAS.2015.7358788,http://www.cbl.uh.edu/pub_files/PF_PRFS_submission_btas2015_v11.pdf
7d4fab16ace47e1ea041abaac4b7845987bc37e1,0,,,0,0,0,0,0,0,0,0,0,0,1,HeadNet: Pedestrian Head Detection Utilizing Body in Context,"Pedestrian head with arbitrary poses and size is prohibitively difficult to detect in many real world applications. An appealing alternative is to utilize object detection technologies, which tend to be more and more mature and faster. However, general object detection technologies can hardly work in complicated scenarios where many heads are often too small to detect. In this paper, we present a novel approach that learns a semantic connection between pedestrian head and other body parts for head detection. Specifically, the proposed model, named as HeadNet, is based on PVANet backbone and also introduces beneficial strategies including online hard example mining (OHEM), fine-grained feature maps, RoI Align and Body in Context (BiC). Experiments demonstrate that our approach is able to utilize spatial semantics of the entire body effectively, and gains inspiring performance for pedestrian head detection.",2018,2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018),,10.1109/FG.2018.00089,http://vipl.ict.ac.cn/uploadfile/upload/2018070916430980.pdf
7dbaf162c668c03a492c80abf3344fa616b3325b,0,,,0,1,0,0,0,0,0,0,0,0,1,Slim-CNN: A Light-Weight CNN for Face Attribute Prediction,"We introduce a computationally-efficient CNN micro-architecture Slim Module to design a lightweight deep neural network Slim-Net for face attribute prediction. Slim Modules are constructed by assembling depthwise separable convolutions with pointwise convolution to produce a computationally efficient module. The problem of facial attribute prediction is challenging because of the large variations in pose, background, illumination, and dataset imbalance. We stack these Slim Modules to devise a compact CNN which still maintains very high accuracy. Additionally, the neural network has a very low memory footprint which makes it suitable for mobile and embedded applications. Experiments on the CelebA dataset show that Slim-Net achieves an accuracy of 91.24% with at least 25 times fewer parameters than comparably performing methods, which reduces the memory storage requirement of Slim-net by at least 87%.",2019,ArXiv,1907.02157,,https://arxiv.org/pdf/1907.02157.pdf
8035e8796ed5bdd44477c523cd6b03f9adfa2d8e,0,,,1,0,0,0,0,0,0,0,0,0,0,Multimodal Feature Level Fusion based on Particle Swarm Optimization with Deep Transfer Learning,"There are several biometric-based systems which rely on a single biometric modality, most of them focus on face, iris or fingerprint. Despite the good accuracies obtained with single modalities, these systems are more susceptible to attacks, i.e, spoofing attacks, and noises of all kinds, especially in non-cooperative (in-the-wild) environments. Since non-cooperative environments are becoming more and more common, new approaches involving multi-modal biometrics have received more attention. One challenge in multimodal biometric systems is how to integrate the data from different modalities. Initially, we propose a deep transfer learning optimized from a model trained for face recognition achieving outstanding representation for only iris modality. Our feature level fusion by means of features selection targets the use of the Particle Swarm Optimization (PSO) for such aims. In our pool, we have the proposed iris fine-tuned representation and a periocular one from previous work of us. We compare this approach for fusion in feature level against three basic function rules for matching at score level: sum, multi, and min. Results are reported for iris and periocular region (NICE.II competition database) and also in an open-world scenario. The experiments in the NICE.II competition databases showed that our transfer learning representation for iris modality achieved a new state-of-the-art, i.e., decidability of 2.22 and 14.56% of EER. We also yielded a new state-of-the-art result when the fusion at feature level by PSO is done on periocular and iris modalities, i.e., decidability of 3.45 and 5.55% of EER.",2018,2018 IEEE Congress on Evolutionary Computation (CEC),,10.1109/CEC.2018.8477817,
804f5149ed3b3c2de7291d72740a1c8f601ca1ea,0,,,0,1,0,0,0,0,0,0,0,0,0,Disentangling Latent Factors of Variational Auto-encoder with Whitening,"After deep generative models were successfully applied to image generation tasks, learning disentangled latent variables of data has become a crucial part of deep generative model research. Many models have been proposed to learn an interpretable and factorized representation of latent variable by modifying their objective function or model architecture. To disentangle the latent variable, some models show lower quality of reconstructed images and others increase the model complexity which is hard to train. In this paper, we propose a simple disentangling method based on a traditional whitening process. The proposed method is applied to the latent variables of variational auto-encoder (VAE), although it can be applied to any generative models with latent variables. In experiment, we apply the proposed method to simple VAE models and experiment results confirm that our method finds more interpretable factors from the latent space while keeping the reconstruction error the same as the conventional VAE's error.",2019,ICANN,,10.1007/978-3-030-30508-6_47,
80fd2ab057b406eb0617ef07aae6a07545cc3b8a,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Library Automation System: Book cover recognition using deep learning,"With thousands of books across hundreds of disciplines present in a library, manual allocation of books is both time consuming, human intensive, as well as inefficient in terms of monetary costs. The presence of manual labour while providing employment to a section of society tends to slow down the process of lending in addition to increasing the probability of errors. The proposed research work attempts at automating the library book management system there by reducing latency and long queues as well as the potential of making mistakes in the distribution of books. Recent advancements of deep neural networks have improved textual recognition in natural scenes. The system proposed employs a neural network model for text detection and recognition, having the potential to reduce and even eliminate manual labour performed by the staff. This paper attempts to investigate neural networks for the application of library management by proposing our own Optical Character Recognition (OCR) and text matching algorithms. This task further introduces challenges such as getting the exact name of the book as well as dealing with distorted images and varied backgrounds. The system thus developed performs very well with book covers producing results over 60% in accuracy against current models that provide about 40% thereby making the system achieve state of the art performance on book cover information retrieval.",2019,2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS),,10.1109/CSITSS47250.2019.9031052,
8224b69ba539b09487b85a5af273b396d5628f97,0,,,0,1,0,0,0,0,0,0,0,0,0,DEEP LEARNING ARCHITECTURES FOR COMPUTER VISION A Degree Thesis Submitted to the Faculty of the Escola Tècnica d'Enginyeria de Telecomunicació de Barcelona,........................................................................................................................... 1 Resum .............................................................................................................................. 2 Resumen .......................................................................................................................... 3 Acknowledgements .......................................................................................................... 5 Revision history and approval record ................................................................................ 6 Table of contents .............................................................................................................. 7 List of Figures ................................................................................................................... 8 List of Tables: ................................................................................................................... 9,2016,,,,https://pdfs.semanticscholar.org/8224/b69ba539b09487b85a5af273b396d5628f97.pdf
822502df38f9b242a939e0ca19dd4e197690ef2f,0,,,0,1,0,0,0,0,0,0,0,0,0,Deception Detection by 2D-to-3D Face Reconstruction from Videos,"Lies and deception are common phenomena in society, both in our private and professional lives. However, humans are notoriously bad at accurate deception detection. Based on the literature, human accuracy of distinguishing between lies and truthful statements is 54% on average, in other words it is slightly better than a random guess. While people do not much care about this issue, in high-stakes situations such as interrogations for series crimes and for evaluating the testimonies in court cases, accurate deception detection methods are highly desirable. To achieve a reliable, covert, and non-invasive deception detection, we propose a novel method that jointly extracts reliable low- and high-level facial features namely, 3D facial geometry, skin reflectance, expression, head pose, and scene illumination in a video sequence. Then these features are modeled using a Recurrent Neural Network to learn temporal characteristics of deceptive and honest behavior. We evaluate the proposed method on the Real-Life Trial (RLT) dataset that contains high-stake deceptive and honest videos recorded in courtrooms. Our results show that the proposed method (with an accuracy of 72.8%) improves the state of the art as well as outperforming the use of manually coded facial attributes 67.6%) in deception detection.",2018,ArXiv,1812.10558,,https://arxiv.org/pdf/1812.10558.pdf
829f73b0ba27776c44a5914fad441615174ebd99,0,,,0,1,0,0,0,0,0,0,0,0,0,Sparse Generative Adversarial Network,"We propose a new approach to Generative Adversarial Networks (GANs) to achieve an improved performance with additional robustness to its so-called and well-recognized mode collapse. We first proceed by mapping the desired data onto a frame-based space for a sparse representation to lift any limitation of small support features prior to learning the structure. To that end, we start by dividing an image into multiple patches and modifying the role of the generative network from producing an entire image, at once, to creating a sparse representation vector for each image patch. We synthesize an entire image by multiplying generated sparse representations to a pre-trained dictionary and assembling the resulting patches. This approach restricts the output of the generator to a particular structure, obtained by imposing a Union of Subspaces (UoS) model to the original training data, leading to more realistic images, while maintaining a desired diversity. To further regularize GANs in generating high-quality images and to avoid the notorious mode-collapse problem, we introduce a third player in GANs, called reconstructor. This player utilizes an auto-encoding scheme to ensure that first, the input-output relation in the generator is injective and second each real image corresponds to some input noise. We present a number of experiments, where the proposed algorithm shows a remarkably higher inception score compared to the equivalent conventional GANs.",2019,2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW),1908.0893,10.1109/ICCVW.2019.00369,https://arxiv.org/pdf/1908.08930.pdf
833ea9178db44444672d6d92ccaece50589250ac,0,,,1,0,0,0,0,0,0,0,0,0,0,Methodologies in Face Recognition for Surveillance,"Face recognition being the most significant part of the biometrics and video surveillance systems has to deal with many challenges like pose, illumination, distance, expression variations along with occlusion, low resolution and noise. This paper analyses the latest approaches and methodologies adapted by Face Recognition systems in overcoming the challenges with the focus on their performance on publicly available bench-mark face databases. Few of the approaches discussed in this paper are Local Binary Pattern technique, Reference Face Graph framework, Principal Component Analysis algorithm, Linear Discriminant Analysis algorithm, MLP Neural Networks, 3D Modeling, Back Propagation Neural Network, Local Pavithra S G Bhat Gradient Hexa Pattern descriptor, FPGA based architecture, Conditional Generative Adversarial Networks and Trunk-Branch Ensemble Convolutional Neural Networks. This paper provides an insight for selection of an approach for future researchers and practitioners for real world implementation.",2018,2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS),,10.1109/CSITSS.2018.8768747,
83aae4a4f2d843720682f965fbbbe4919debe015,0,,,0,1,0,0,0,0,0,0,0,0,1,EGroupNet: A Feature-enhanced Network for Age Estimation with Novel Age Group Schemes,"Although age estimation is easily affected by smiling, race, gender, and other age-related attributes, most of the researchers did not pay attention to the correlations among these attributes. Moreover, many researchers perform age estimation from a wide range of age; however, conducting an age prediction over a narrow age range may achieve better results. This article proposes a hierarchic approach referred to as EGroupNet for age prediction. The method includes two main stages, i.e., feature enhancement via excavating the correlations among age-related attributes and age estimation based on different age group schemes. First, we apply the multi-task learning model to learn multiple face attributes simultaneously to obtain discriminative features of different attributes. Second, we project the outputs of fully connected layers of several subnetworks into a highly correlated matrix space via the correlation learning process. Third, we classify these enhanced features into narrow age groups using two Extreme Learning Machine models. Finally, we make predictions based on the results of the age groups mergence. We conduct a large number of experiments on MORPH-II, LAP-2016 dataset, and Adience benchmark. The mean absolute errors of the two different settings on MORPH-II are 2.48 and 2.13 years, respectively; the normal score (ε) on the LAP-2016 dataset is 0.3578; and the accuracy of age prediction on Adience benchmark is 0.6978.",2020,ACM Trans. Multim. Comput. Commun. Appl.,,10.1145/3379449,
84b94515f07f46b3a48448974a4e2d55ec557e37,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Euclidean-Distance Based Fuzzy Commitment Scheme for Biometric Template Security,"With the introduction of triplet loss and other Euclidean-distance based loss functions, significant performance enhancement has been obtained for deep-learning based face recognition systems. However, existing template security solutions are based on binary feature vectors or binarization of real valued feature vectors using shielding function. This paper proposes a key-binding cryptographic template security scheme that uses lattice structure and sphere packing in the Euclidean space. In contrary to existing schemes, the proposed scheme can be applied to real-valued feature vectors. Therefore, it is more compatible with recent face recognition methods based on Euclidean distance. In this paper, two different versions of our proposed schemes are discussed in terms of security and complexity. Experimental investigations on Labeled Faces in the Wild dataset suggest no degradation in the performance of the face recognition system after being secured by our proposed scheme.",2019,2019 7th International Workshop on Biometrics and Forensics (IWBF),,10.1109/IWBF.2019.8739177,
84eec311063320352b771b71156292148f25b0f3,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Collaborative multi-view metric learning for visual classification,"Most of distance metric learning algorithms usually learn a single distance metric over the single-view data and cannot directly exploit multi-view data. In many visual classification applications, we have access to multi-view feature representations. To exploit more discriminative information for classification, it is desired to learn several distance metrics from multi-view data. To this aim, we propose a collaborative multi-view metric learning (CMML) method for visual classification. The proposed method jointly learns multiple distance metrics under which multiple feature representations are consistent across different views, i.e., the difference of the distance metrics learned in different views is enforced to be as small as possible. Experimental results on two visual classification tasks including face recognition and scene classification show the efficacy of the CMML method.",2016,2016 IEEE International Conference on Multimedia and Expo (ICME),,10.1109/ICME.2016.7552996,
854aaeb2ad96d369a8d955b91dd7320056a71efb,0,,,1,1,0,0,0,1,0,0,0,0,0,M2FPA: A Multi-Yaw Multi-Pitch High-Quality Dataset and Benchmark for Facial Pose Analysis,"Facial images in surveillance or mobile scenarios often have large view-point variations in terms of pitch and yaw angles. These jointly occurred angle variations make face recognition challenging. Current public face databases mainly consider the case of yaw variations. In this paper, a new large-scale Multi-yaw Multi-pitch high-quality database is proposed for Facial Pose Analysis (M2FPA), including face frontalization, face rotation, facial pose estimation and pose-invariant face recognition. It contains 397,544 images of 229 subjects with yaw, pitch, attribute, illumination and accessory. M2FPA is the most comprehensive multi-view face database for facial pose analysis. Further, we provide an effective benchmark for face frontalization and pose-invariant face recognition on M2FPA with several state-of-the-art methods, including DR-GAN, TP-GAN and CAPG-GAN. We believe that the new database and benchmark can significantly push forward the advance of facial pose analysis in real-world applications. Moreover, a simple yet effective parsing guided discriminator is introduced to capture the local consistency during GAN optimization. Extensive quantitative and qualitative results on M2FPA and Multi-PIE demonstrate the superiority of our face frontalization method. Baseline results for both face synthesis and face recognition from state-of-the-art methods demonstrate the challenge offered by this new database.",2019,2019 IEEE/CVF International Conference on Computer Vision (ICCV),1904.00168,10.1109/ICCV.2019.01014,https://arxiv.org/pdf/1904.00168.pdf
85b047b3534f55b81ccf37d11abc59f7ba597845,0,,,0,1,0,0,0,0,0,0,0,0,0,Multi-view Generative Adversarial Networks,"Learning over multi-view data is a challenging problem with strong practical applications. Most related studies focus on the classification point of view and assume that all the views are available at any time. We consider an extension of this framework in two directions. First, based on the BiGAN model, the Multi-view BiGAN (MV-BiGAN) is able to perform density estimation from multi-view inputs. Second, it can deal with missing views and is able to update its prediction when additional views are provided. We illustrate these properties on a set of experiments over different datasets.",2017,ECML/PKDD,1611.02019,10.1007/978-3-319-71246-8_11,https://arxiv.org/pdf/1611.02019.pdf
86341567040276487163e3b65f5dabac17009230,1,[D20],,1,0,1,0,0,0,0,0,0,0,0,Graphical Representation for Heterogeneous Face Recognition,"Heterogeneous face recognition (HFR) refers to matching face images acquired from different sources (i.e., different sensors or different wavelengths) for identification. HFR plays an important role in both biometrics research and industry. In spite of promising progresses achieved in recent years, HFR is still a challenging problem due to the difficulty to represent two heterogeneous images in a homogeneous manner. Existing HFR methods either represent an image ignoring the spatial information, or rely on a transformation procedure which complicates the recognition task. Considering these problems, we propose a novel graphical representation based HFR method (G-HFR) in this paper. Markov networks are employed to represent heterogeneous image patches separately, which takes the spatial compatibility between neighboring image patches into consideration. A coupled representation similarity metric (CRSM) is designed to measure the similarity between obtained graphical representations. Extensive experiments conducted on multiple HFR scenarios (viewed sketch, forensic sketch, near infrared image, and thermal infrared image) show that the proposed method outperforms state-of-the-art methods.",2017,IEEE Transactions on Pattern Analysis and Machine Intelligence,1503.00488,10.1109/TPAMI.2016.2542816,https://arxiv.org/pdf/1503.00488.pdf
878b53c6fb66b343e901200e8a88860a2a93d5f2,1,"[D18], [D28], [D27]",,1,0,0,0,0,0,0,0,0,0,0,OPML: A one-pass closed-form solution for online metric learning,"Abstract To achieve a low computational cost when performing online metric learning for large-scale data, we present a one-pass closed-form solution namely OPML in this paper. Typically, the proposed OPML first adopts a one-pass triplet construction strategy, which aims to use only a very small number of triplets to approximate the representation ability of whole original triplets obtained by batch-manner methods. Then, OPML employs a closed-form solution to update the metric for new coming samples, which leads to a low space (i.e., O ( d )) and time (i.e., O ( d 2 )) complexity, where d is the feature dimensionality. In addition, an extension of OPML (namely COPML) is further proposed to enhance the robustness when in real case the first several samples come from the same class (i.e., cold start problem). In the experiments, we have systematically evaluated our methods (OPML and COPML) on three typical tasks, including UCI data classification, face verification, and abnormal event detection in videos, which aims to fully evaluate the proposed methods on different sample number, different feature dimensionalities and different feature extraction ways (i.e., hand-crafted and deeply-learned). The results show that OPML and COPML can obtain the promising performance with a very low computational cost. Also, the effectiveness of COPML under the cold start setting is experimentally verified.",2018,Pattern Recognit.,1609.09178,10.1016/j.patcog.2017.03.016,https://arxiv.org/pdf/1609.09178.pdf
87e20228a43e395ff9c5f070c313c6dac475c183,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,An Illumination Augmentation Approach for Robust Face Recognition,"Deep learning has achieved great success in face recognition and significantly improved the performance of the existing face recognition systems. However, the performance of deep network-based methods degrades dramatically when the training data is insufficient to cover the intra-class variations, e.g., illumination. To solve this problem, we propose an illumination augmentation approach to augment the training set by constructing new training images with additional illumination components. The proposed approach first utilizes an external benchmark to generate several illumination templates. Then we combine the generated templates with the training images to simulate different illumination conditions. Finally, we conduct color correction by using the singular value decomposition (SVD) algorithm to confirm that the color of the augmented image is consistent with the input image. Experimental results demonstrate that the proposed illumination augmentation approach is effective for improving the performance of the existing deep networks.",2018,CCBR,,10.1007/978-3-319-97909-0_44,
8878428c0edb28fadf45cd2d97d1718c3a0ebbce,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,Towards Universal Representation Learning for Deep Face Recognition,"Recognizing wild faces is extremely hard as they appear with all kinds of variations. Traditional methods either train with specifically annotated variation data from target domains, or by introducing unlabeled target variation data to adapt from the training data. Instead, we propose a universal representation learning framework that can deal with larger variation unseen in the given training data without leveraging target domain knowledge. We firstly synthesize training data alongside some semantically meaningful variations, such as low resolution, occlusion and head pose. However, directly feeding the augmented data for training will not converge well as the newly introduced samples are mostly hard examples. We propose to split the feature embedding into multiple sub-embeddings, and associate different confidence values for each sub-embedding to smooth the training procedure. The sub-embeddings are further decorrelated by regularizing variation classification loss and variation adversarial loss on different partitions of them. Experiments show that our method achieves top performance on general face recognition datasets such as LFW and MegaFace, while significantly better on extreme benchmarks such as TinyFace and IJB-S.",2020,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2002.11841,10.1109/cvpr42600.2020.00685,https://arxiv.org/pdf/2002.11841.pdf
89b3b7a5e6a3926c22d23879d1e68bd144453720,1,[D18],,1,0,0,0,1,0,0,0,0,0,0,Multi-Task Pose-Invariant Face Recognition,"Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification framework capable of handling the full range of pose variations within ±90° of yaw. The proposed framework first transforms the original pose-invariant face recognition problem into a partial frontal face recognition problem. A robust patch-based face representation scheme is then developed to represent the synthesized partial frontal faces. For each patch, a transformation dictionary is learnt under the proposed multi-task learning scheme. The transformation dictionary transforms the features of different poses into a discriminative subspace. Finally, face matching is performed at patch level rather than at the holistic level. Extensive and systematic experimentation on FERET, CMU-PIE, and Multi-PIE databases shows that the proposed method consistently outperforms single-task-based baselines as well as state-of-the-art methods for the pose problem. We further extend the proposed algorithm for the unconstrained face verification problem and achieve top-level performance on the challenging LFW data set.",2015,IEEE Transactions on Image Processing,,10.1109/TIP.2015.2390959,
8a0214298c31145b932227d572daf828fb75d6b9,0,,,0,1,0,0,0,0,0,0,0,0,0,Rate-distortion optimization guided autoencoder for isometric embedding in Euclidean latent space,"To analyze high-dimensional and complex data in the real world, generative model approach of machine learning aims to reduce the dimension and acquire a probabilistic model of the data. For this purpose, deep-autoencoder based generative models such as variational autoencoder (VAE) have been proposed. However, in previous works, the scale of metrics between the real and the reduced-dimensional space(latent space) is not well-controlled. Therefore, the quantitative impact of the latent variable on real data is unclear. In the end, the probability distribution function (PDF) in the real space cannot be estimated from that of the latent space accurately. To overcome this problem, we propose Rate-Distortion Optimization guided autoencoder. We show our method has the following properties theoretically and experimentally: (i) the columns of Jacobian matrix between two spaces is constantly-scaled orthonormal system and data can be embedded in a Euclidean space isometrically; (ii) the PDF of the latent space is proportional to that of the real space. Furthermore, to verify the usefulness in the practical application, we evaluate its performance in unsupervised anomaly detection and it outperforms current state-of-the-art methods.",2020,ICML,1910.04329,,https://arxiv.org/pdf/1910.04329.pdf
8bfd9725d13152cfb17302b2a251c2a0a0bbe3e1,1,[D34],,1,0,0,0,0,0,0,0,0,0,0,Scaling Up Class-Specific Kernel Discriminant Analysis for Large-Scale Face Verification,"In this paper, a novel approximate solution of the criterion used in non-linear class-specific discriminant subspace learning is proposed. We build on the class-specific kernel spectral regression method, which is a two-step process formed by an eigenanalysis step and a kernel regression step. Based on the structure of the intra-class and out-of-class scatter matrices, we provide a fast solution for the first step. For the second step, we propose the use of approximate kernel space definitions. We analytically show that the adoption of randomized and class-specific kernels has the effect of regularization and Nyström-based approximation, respectively. We evaluate the proposed approach in face verification problems and compare it with the existing approaches. Experimental results show the effectiveness and efficiency of the proposed approximate class-specific kernel spectral regression method, since it can provide satisfactory performance and scale well with the size of the data.",2016,IEEE Transactions on Information Forensics and Security,,10.1109/TIFS.2016.2582562,
8d00cbf957147940c3082e0f53a5f67a2f9e4485,0,,,0,1,0,0,0,0,0,0,0,0,0,Nonlinear Monte Carlo Method for Imbalanced Data Learning,"For basic machine learning problems, expected error is used to evaluate model performance. Since the distribution of data is usually unknown, we can make simple hypothesis that the data are sampled independently and identically distributed (i.i.d.) and the mean value of loss function is used as the empirical risk by Law of Large Numbers (LLN). This is known as the Monte Carlo method. However, when LLN is not applicable, such as imbalanced data problems, empirical risk will cause overfitting and might decrease robustness and generalization ability. Inspired by the framework of nonlinear expectation theory, we substitute the mean value of loss function with the maximum value of subgroup mean loss. We call it nonlinear Monte Carlo method. In order to use numerical method of optimization, we linearize and smooth the functional of maximum empirical risk and get the descent direction via quadratic programming. With the proposed method, we achieve better performance than SOTA backbone models with less training steps, and more robustness for basic regression and imbalanced classification tasks.",2020,ArXiv,2010.1406,,https://arxiv.org/pdf/2010.14060.pdf
8f99f7ccb85af6d4b9e015a9b215c529126e7844,1,,1,1,0,0,0,0,0,0,0,0,0,0,Face image-based age and gender estimation with consideration of ethnic difference,"This study presents an age and gender estimation system that considers ethnic difference in face images using a Convolutional Neural Network(CNN) and Support Vector Machine(SVM). Most age and gender estimation systems using face images are trained on ethnicity-biased databases. Therefore, these systems show limited performance on face images of ethnic groups occupying a small proportion of the training data. To resolve this problem, we propose an age and gender estimation system that considers the ethnic difference in face images. At the first stage of the system, the ethnicity of the facial image is determined by a CNN trained with manually collected face images of Asian and non-Asian celebrities. Then, one of the SVM classifiers is selected according to the ethnicity for the final age and gender estimation. We compared the proposed system with an estimation system that does not consider ethnic difference. The result shows improved performance for age estimation but no improvement for gender recognition.",2017,2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN),,10.1109/ROMAN.2017.8172359,
8ff988530e3329bd6ab00dc5eef635a1bc5812ca,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness,"Face image quality is an important factor to enable high-performance face recognition systems. Face quality assessment aims at estimating the suitability of a face image for the purpose of recognition. Previous work proposed supervised solutions that require artificially or human labelled quality values. However, both labelling mechanisms are error prone as they do not rely on a clear definition of quality and may not know the best characteristics for the utilized face recognition system. Avoiding the use of inaccurate quality labels, we proposed a novel concept to measure face quality based on an arbitrary face recognition model. By determining the embedding variations generated from random subnetworks of a face model, the robustness of a sample representation and thus, its quality is estimated. The experiments are conducted in a cross-database evaluation setting on three publicly available databases. We compare our proposed solution on two face embeddings against six state-of-the-art approaches from academia and industry. The results show that our unsupervised solution outperforms all other approaches in the majority of the investigated scenarios. In contrast to previous works, the proposed solution shows a stable performance over all scenarios. Utilizing the deployed face recognition model for our face quality assessment methodology avoids the training phase completely and further outperforms all baseline approaches by a large margin. Our solution can be easily integrated into current face recognition systems, and can be modified to other tasks beyond face recognition.",2020,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2003.09373,10.1109/cvpr42600.2020.00569,https://arxiv.org/pdf/2003.09373.pdf
90110f8016439c22e8a1ec939522ab056968a1fb,0,,,0,0,0,0,0,0,0,1,0,0,0,Robust Foreground Object Segmentation via Adaptive Region-Based Background Modelling,"We propose a region-based foreground object segmentation method capable of dealing with image sequences containing noise, illumination variations and dynamic backgrounds (as often present in outdoor environments). The method utilises contextual spatial information through analysing each frame on an overlapping block by-block basis and obtaining a low-dimensional texture descriptor for each block. Each descriptor is passed through an adaptive multi-stage classifier, comprised of a likelihood evaluation, an illumination invariant measure, and a temporal correlation check. The overlapping of blocks not only ensures smooth contours of the foreground objects but also effectively minimises the number of 0 positives in the generated foreground masks. The parameter settings are robust against wide variety of sequences and post-processing of foreground masks is not required. Experiments on the challenging I2R dataset show that the proposed method obtains considerably better results (both qualitatively and quantitatively) than methods based on Gaussian mixture models (GMMs), feature histograms, and normalised vector distances. On average, the proposed method achieves 36% more accurate foreground masks than the GMM based method.",2010,2010 20th International Conference on Pattern Recognition,,10.1109/ICPR.2010.958,https://espace.library.uq.edu.au/view/UQ:222674/MIC12UQ222674.pdf
91e6e4e13750fb8a30a8a8a3c03afd43eefd8751,1,[D31],,1,0,0,0,0,0,0,0,0,0,0,Single- and cross- database benchmarks for gender classification under unconstrained settings,"Gender classification is one of the most important tasks in automated face analysis, and has attracted the interest of researchers for years. Up to now, most gender classification approaches have been tested using single-database experiments, and on quite controlled datasets such as the FERET database, which are not representative of real world settings. However, a recent trend towards more realistic benchmarks has emerged within the face analysis community, leading to the appearance of databases and protocols such as the Labeled Faces in the Wild (LFW) database, and the so-called Gallagher's database, which comprises images collected from Flickr.",2011,2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops),,10.1109/ICCVW.2011.6130514,https://www.gradiant.org/images/stories/publicaciones_tecnicas/2011_09_02_befit_genderrecognition.pdf
9264b390aa00521f9bd01095ba0ba4b42bf84d7e,1,[D20],,1,0,0,0,1,0,0,0,0,0,0,Displacement Template with Divide-&-Conquer Algorithm for Significantly Improving Descriptor Based Face Recognition Approaches,"This paper proposes a displacement template structure for improving descriptor based face recognition approaches. With this template structure, a face is represented by a template consisting of a set of piled blocks; each block pile consists of a few heavily overlapped blocks from the face image. An ensemble of blocks, one from each pile, is taken as a candidate image of the face. When a descriptor based approach is used, we are able to generate a displacement description template for the face by replacing each block in the template with its local description, where a concatenation of the local descriptions of the blocks, one from each pile, is taken to be a candidate description of the face. Using the description template together with a divide-and-conquer algorithm for computing the similarities between description templates, we have demonstrated the significantly improved performance of LBP, TPLBP and FPLBP templates over original LBP, TPLBP and FPLBP approaches by the experiments on benchmark face databases.",2012,ECCV,,10.1007/978-3-642-33715-4_16,http://web.unbc.ca/~chenl/papers-new/ECCVPaper1208.pdf
92a9c1b72ff0a7bd1745171395ccb0e54db6fad2,0,,,1,0,0,0,0,0,0,0,0,0,0,Optimal face templates: the next step in surveillance face recognition,"The paper deals with surveillance face recognition in security applications such as surveillance camera systems or access control systems. Presented research is focused on enhancing recognition performance, reducing classification time and memory requirements. We aim to make it feasible to implement face recognition in end devices such as cameras, identification terminals or popular IoT devices. Therefore, we utilize algorithms that require low computational power and optimize them in order to reach higher recognition rates. We present a novel higher quantile method that enhances recognition performance via creation of robust and representative face templates for nearest neighbor classifier. Templates computed by the higher quantile method are determined by tolerance intervals which handle feature variability caused by face pose, expression, illumination and possible low image quality. The recognition performance evaluation has been conducted on images captured by surveillance camera system that are contained in unique IFaViD dataset. The IFaViD is the only one dataset captured by real surveillance camera system containing complex scenarios. The results show that the higher quantile method outperforms the contemporary approaches by 4%, respectively, 10% depending on the IFaViD’s test subset.",2019,Pattern Analysis and Applications,,10.1007/s10044-019-00842-y,
92abd0dd407d07de5f0433ecf307a4ce4b6bfbf3,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Improved Performance and Execution Time of Face Recognition Using MRSRC,"Face recognition accuracy is vulnerable to environmental noise, low-resolution images, and other variations such as illumination, pose, and expression. The accuracy of the face recognition mostly relying on the features of training samples and testing samples. Recently, sparse representation based classification (SRC) has shown state-of-the-art results in face recognition and developed several extended versions of SRC methods to improve the performance. The time complexity of the SRC is depended on the size of the dictionary. In this paper, a new fusion approach MRSRC (Multi-resolution sparse representation based classification) is developed by incorporating the wavelet compressed features into the dictionary. MRSRC has shown better performance than an existing algorithm and also reduces the time complexity. The experimentation is carried out on benchmarking databases such as LFW and ORL.",2018,SocProS,,10.1007/978-981-15-0035-0_49,
93670f48e53619eacf7cbcb9d483ad6eaf0422d4,0,,,0,1,0,0,0,0,1,0,0,0,0,"Computer Vision and Image Processing: 4th International Conference, CVIP 2019, Jaipur, India, September 27–29, 2019, Revised Selected Papers, Part II","Image processing techniques are readily used in the field of sciences and computer vision for the enhancement of images and extraction of useful information from them. A key step used in image processing involves the removal of different kinds of noises from the images. Noises can arise in an image during the process of storing, transmitting or acquiring the images. A model qualifies as a satisfactory de-noising model if it satisfies image preservation along with noise removal. There can be various kind of noises in an image such as Gaussian, salt and pepper, Speckle etc. A model which can denoise a different kind of noises is considered to be superior to others. In this paper, we have designed a model using autoencoder which can remove several kinds of noises from images. We have performed a comparative study between the accuracy of each kind using PSNR, SSIM and RMSE values. An increase in the PSNR and SSIM values was seen from the original and noisy image to the original and reconstructed image while a decrease was seen in the value of RMSE.",2020,CVIP,,10.1007/978-981-15-4018-9,
940e5c45511b63f609568dce2ad61437c5e39683,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Fiducial Facial Point Extraction Using a Novel Projective Invariant,"Automatic extraction of fiducial facial points is one of the key steps to face tracking, recognition, and animation. Great facial variations, especially pose or viewpoint changes, typically degrade the performance of classical methods. Recent learning or regression-based approaches highly rely on the availability of a training set that covers facial variations as wide as possible. In this paper, we introduce and extend a novel projective invariant, named the characteristic number (CN), which unifies the collinearity, cross ratio, and geometrical characteristics given by more (6) points. We derive strong shape priors from CN statistics on a moderate size (515) of frontal upright faces in order to characterize the intrinsic geometries shared by human faces. We combine these shape priors with simple appearance based constraints, e.g., texture, edge, and corner, into a quadratic optimization. Thereafter, the solution to facial point extraction can be found by the standard gradient descent. The inclusion of these shape priors renders the robustness to pose changes owing to their invariance to projective transformations. Extensive experiments on the Labeled Faces in the Wild, Labeled Face Parts in the Wild and Helen database, and cross-set faces with various changes demonstrate the effectiveness of the CN-based shape priors compared with the state of the art.",2015,IEEE Transactions on Image Processing,,10.1109/TIP.2015.2390976,
94798f0f0d676fa8053e5f700a31e801fc9ba53e,0,,,0,0,0,0,0,0,0,0,0,0,1,Phased Groupwise Face Alignment,"A face does not only have rigid variations but also non-rigid distortions, which has influenced the performance of groupwise face alignment. A novel method for groupwise face alignment which considers both rigid variations of a face and non-rigid distortions was presented in the paper. The process for groupwise face alignment was divided into two stages, i.e. affine transformations and non-rigid distortions. At the stage of the affine transformations, the key points of a face were categorized into five groups and the affine transformations were used for each group of the key points. At the stage of the non-rigid distortions, a novel method was used for all of the key points in a face. Two stages were independent of each other, and moreover, iterations were made in each stage. Besides, the results from the stage of the affine transformations were used as the input of the non-rigid distortion stage. Experiments show that the method for groupwise face alignment in the paper is better than that only considering global affine variations, and is also better than that considering global affine variations and local non-rigid distortions. The method in the paper can be used as a novel method for groupwise face alignment.",2020,IEEE Access,,10.1109/ACCESS.2020.2983722,https://ieeexplore.ieee.org/ielx7/6287639/8948470/09049392.pdf
953e180ab8faae331afa93dbef6b8cdab63f9a13,0,,,0,1,0,0,0,0,0,0,0,0,0,Occlusion-Aware GAN for Face De-Occlusion in the Wild,"Occluded faces–as a common scene in real life–have a significant negative impact on most face recognition systems. Existing methods try to remove the occlusions by a single-stage generative adversarial network (GAN), which is unaware of the occlusion and thus has difficulties in generalizing to a large variety of occlusion types, e.g., different objects at various positions. To this end, we propose the two-stage Occlusion-Aware GAN (OA-GAN), where the first GAN is for disentangling the occlusions, which will be served as the additional input of the second GAN for synthesizing the final de-occluded faces. In this way, our two-stage model can handle diverse occlusions in the wild and is naturally more explainable because of its awareness of the occluded objects. Extensive experiments on both synthetic and real-world datasets validate the superiority of the two-stage OAGAN design. Furthermore, by applying the generated de-occluded faces to facial expression recognition (FER) systems, we find that our two-stage de-occlusion process significantly increases the accuracy of FER under occlusion.",2020,2020 IEEE International Conference on Multimedia and Expo (ICME),,10.1109/ICME46284.2020.9102788,
95bf7e3960d88f2492fcf10298b0719b1c7da248,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Head Pose Recommendation for Taking Good Selfies,"We present a head-pose recommendation system that guides a user in how to best pose while taking a selfie. Given an input face image, the system finds the most attractive angle of the face and suggests how the pose should be adjusted. The recommendation results are determined adaptively to the appearance and initial pose of the input face. The user study shows the recommendation performance of the system is moderately related to the degree of conformity among the photographers' recommendations.",2017,MUSA2 '17,,10.1145/3132515.3132518,
95f7e7a90cb9bdaba593454a4f6cb70ea244d9f4,0,,,0,0,0,0,0,1,0,0,0,0,0,Robust Facial Landmark Detection by Multi-order Multi-constraint Deep Networks,"Recently, heatmap regression has been widely explored in facial landmark detection and obtained remarkable performance. However, most of the existing heatmap regressionbased facial landmark detection methods neglect to explore the high-order feature correlations, which is very important to learn more representative features and enhance shape constraints. Moreover, no explicit global shape constraints have been added to the final predicted landmarks, which leads to a reduction in accuracy. To address these issues, in this paper, we propose a Multi-order Multi-constraint Deep Network (MMDN) for more powerful feature correlations and shape constraints learning. Specifically, an Implicit Multi-order Correlating Geometryaware (IMCG) model is proposed to introduce the multi-order spatial correlations and multi-order channel correlations for more discriminative representations. Furthermore, an Explicit Probability-based Boundary-adaptive Regression (EPBR) method is developed to enhance the global shape constraints and further search the semantically consistent landmarks in the predicted boundary for robust facial landmark detection. It’s interesting to show that the proposed MMDN can generate more accurate boundary-adaptive landmark heatmaps and effectively enhance shape constraints to the predicted landmarks for faces with large pose variations and heavy occlusions. Experimental results on challenging benchmark datasets demonstrate the superiority of our MMDN over state-of-the-art facial landmark detection methods. The code has been publicly available at https://github. com/junwan2014/MMDN-master.",2020,,2012.04927,,https://arxiv.org/pdf/2012.04927.pdf
96390f95a73a6bd495728b6cd2a97554ef187f76,0,,,0,1,0,0,0,0,0,0,0,0,0,Pan Olympus : Sensor Privacy through Utility Aware,"Personal data garnered from various sensors are often offloaded by applications to the cloud for analytics. This leads to a potential risk of disclosing private user information. We observe that the analytics run on the cloud are often limited to a machine learning model such as predicting a user’s activity using an activity classifier. We present Olympus, a privacy framework that limits the risk of disclosing private user information by obfuscating sensor data while minimally affecting the functionality the data are intended for. Olympus achieves privacy by designing a utility aware obfuscation mechanism, where privacy and utility requirements are modeled as adversarial networks. By rigorous and comprehensive evaluation on a real world app and on benchmark datasets, we show that Olympus successfully limits the disclosure of private information without significantly affecting functionality of the application.",2018,,,,https://pdfs.semanticscholar.org/9639/0f95a73a6bd495728b6cd2a97554ef187f76.pdf
969fd48e1a668ab5d3c6a80a3d2aeab77067c6ce,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,End-to-End Spatial Transform Face Detection and Recognition,"Abstract Plenty of face detection and recognition methods have been proposed and got excellent results in decades. Common face recognition pipeline consists of: 1) face detection, 2) face alignment, 3) feature extraction, 4) similarity calculation, which are separated and independent from each other. The separated face analyzing stages lead the model redundant calculation and are hard for end-to-end training. In this paper, we proposed a novel end-to-end trainable convolutional network framework for face detection and recognition, in which a geometric transformation matrix was directly learned to align the faces, instead of predicting the facial landmarks. In training stage, our single CNN model is supervised only by face bounding boxes and personal identities, which are publicly available from WIDER FACE [52] dataset and CASIA-WebFace [53] dataset. Tested on Face Detection Dataset and Benchmark (FDDB) [21] dataset and Labeled Face in the Wild (LFW) [19] dataset, we have achieved 89.24% recall for face detection task and 98.63% verification accuracy for face recognition task simultaneously, which are comparable to state-of-the-art results.",2020,Virtual Real. Intell. Hardw.,,10.1016/j.vrih.2020.04.002,https://arxiv.org/pdf/1703.10818.pdf
9740464651e09598df8b6805899e0f4df87b1ee7,1,"[D18], [D27]",,0,1,0,0,0,0,0,0,0,0,0,TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework for Deep Learning with Anonymized Intermediate Representations,"The success of deep learning partially benefits from the availability of various large-scale datasets. These datasets are often crowdsourced from individual users and contain private information like gender, age, etc. The emerging privacy concerns from users on data sharing hinder the generation or use of crowdsourcing datasets and lead to hunger of training data for new deep learning applications. One naive solution is to pre-process the raw data to extract features at the user-side, and then only the extracted features will be sent to the data collector. Unfortunately, attackers can still exploit these extracted features to train an adversary classifier to infer private attributes. Some prior arts leveraged game theory to protect private attributes. However, these defenses are designed for known primary learning tasks, the extracted features work poorly for unknown learning tasks. To tackle the case where the learning task may be unknown or changing, we present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks. We design a hybrid training method to learn the anonymized intermediate representation: (1) an adversarial training process for hiding private information from features; (2) maximally retain original information using a neural-network-based mutual information estimator. We extensively evaluate TIPRDC and compare it with existing methods using two image datasets and one text dataset. Our results show that TIPRDC substantially outperforms other existing methods. Our work is the first task-independent privacy-respecting data crowdsourcing framework.",2020,KDD,2005.1148,10.1145/3394486.3403125,https://arxiv.org/pdf/2005.11480.pdf
979bcd527f2a1c0ffb5f08c57959f48a0bb65f84,0,,,1,0,0,0,0,0,0,0,0,0,0,Progressive deep feature learning for manga character recognition via unlabeled training data,"The recognition of manga (Japanese comics) characters is an essential step in industrial applications, such as manga character retrieval, content analysis and copyright protection. However, conventional methods for manga character recognition are mainly based on handcrafted features which are not robust enough for manga of various style. The emergence of deep learning based methods provides representational features, which has a huge demand for labeled data. In this paper, we propose a framework to exploit unlabeled manga data to facilitate the discriminative capability of deep feature representations for manga character recognition (i.e., unsupervised learning on manga images), which does not rely on any manual annotation. Specifically, we first train an initial feature model using an anime character dataset. Then, we adopt a Progressive Main Characters Mining (PMCM) strategy which iterates between two steps: 1) produce selected data with estimated labels from unlabeled data, 2) update the feature model by the selected data. These two steps are mutually promoted in essence. Experimental results on Manga109 dataset, to which we introduce new head annotations, demonstrate the effectiveness of the proposed framework and the usefulness in manga character verification and retrieval.",2019,ACM TUR-C,,10.1145/3321408.3322624,
97d811ae99bcbcf9f63c2f447041ab6d74a20b1e,0,,,1,0,0,0,0,0,0,0,0,0,0,Face recognition using truncated transform domain feature extraction,"Face Recognition (FR) under varying pose is challenging and exacting pose invariant features is an effective approach to solve this problem. In this paper, we propose a novel Truncated Transform Domain Feature Extractor (TTDFE) to improve the performance of the FR system. TTDFE involves a unique combination of Symlet-4 DWT, 2D-DCT, followed by a novel truncation process. The truncation process extracts higher amplitude coefficients from the Discrete Cosine Transform (DCT ) matrix. An optimal Truncation Point (TP) is estimated, which is inspired by a relationship developed between the image dimensions and the positions of DCT amplitude peaks. TTDFE is used for efficient feature extraction and a Binary Particle Swarm Optimization (BPSO) based feature selection algorithm is used to search the feature space for the optimal feature subset. Experimental results, obtained by applying the proposed algorithm on 5 benchmark face databases with large pose variations, namely Facial Recognition Technology (FERET), University of Manchester Institute of Science and Technology (UMIST), Foundation for Education of Ignatius (FEI), Pointing' 04 Head Pose image Database (PHPD) and Indian Face Database (IFD), show that the proposed system outperforms other FR systems. A significant increase in the Recognition Rate (RR) and a substantial reduction in the number of features selected are observed.",2015,Int. Arab J. Inf. Technol.,,,https://pdfs.semanticscholar.org/97d8/11ae99bcbcf9f63c2f447041ab6d74a20b1e.pdf
981e84780f5362b5c60cd424626c0cc8fd0b793a,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Visual Chirality,"How can we tell whether an image has been mirrored? While we understand the geometry of mirror reflections very well, less has been said about how it affects distributions of imagery at scale, despite widespread use for data augmentation in computer vision. In this paper, we investigate how the statistics of visual data are changed by reflection. We refer to these changes as ``visual chirality,'' after the concept of geometric chirality---the notion of objects that are distinct from their mirror image. Our analysis of visual chirality reveals surprising results, including low-level chiral signals pervading imagery stemming from image processing in cameras, to the ability to discover visual chirality in images of people and faces. Our work has implications for data augmentation, self-supervised learning, and image forensics.",2020,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2006.09512,10.1109/CVPR42600.2020.01231,https://arxiv.org/pdf/2006.09512.pdf
982e47b702554801cdade0b77bd728a3ced57c17,1,[D18],,1,1,0,0,0,1,0,0,0,0,0,GDFace: Gated Deformation for Multi-View Face Image Synthesis,"Photorealistic multi-view face synthesis from a single image is an important but challenging problem. Existing methods mainly learn a texture mapping model from the source face to the target face. However, they fail to consider the internal deformation caused by the change of poses, leading to the unsatisfactory synthesized results for large pose variations. In this paper, we propose a Gated Deformable Face Synthesis Network to model the deformation of faces that aids the synthesis of the target face image. Specifically, we propose a dual network that consists of two modules. The first module estimates the deformation of two views in the form of convolution offsets according to the input and target poses. The second one, on the other hand, leverages the predicted deformation offsets to create the target face image. In this way, pose changes are explicitly modeled in the face generator to cope with geometric transformation, by adaptively focusing on pertinent regions of the source image. To compensate offset estimation errors, we introduce a soft-gating mechanism that enables adaptive fusion between deformable features and primitive features. Extensive experimental results on five widely-used benchmarks show that our approach performs favorably against the state-of-the-arts on multi-view face synthesis, especially for large pose changes.",2020,AAAI,,10.1609/AAAI.V34I07.6942,https://pdfs.semanticscholar.org/b9d0/9d7f40a77d54041a35bbf8b520a7342c1c8e.pdf
9853a348f61aec83b410f307ab905a4ae001fcd4,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,A Framework for Evaluating Gradient Leakage Attacks in Federated Learning,"Federated learning (FL) is an emerging distributed machine learning framework for collaborative model training with a network of clients (edge devices). FL offers default client privacy by allowing clients to keep their sensitive data on local devices and to only share local training parameter updates with the federated server. However, recent studies have shown that even sharing local parameter updates from a client to the federated server may be susceptible to gradient leakage attacks and intrude the client privacy regarding its training data. In this paper, we present a principled framework for evaluating and comparing different forms of client privacy leakage attacks. We first provide formal and experimental analysis to show how adversaries can reconstruct the private local training data by simply analyzing the shared parameter update from local training (e.g., local gradient or weight update vector). We then analyze how different hyperparameter configurations in federated learning and different settings of the attack algorithm may impact on both attack effectiveness and attack cost. Our framework also measures, evaluates, and analyzes the effectiveness of client privacy leakage attacks under different gradient compression ratios when using communication efficient FL protocols. Our experiments also include some preliminary mitigation strategies to highlight the importance of providing a systematic attack evaluation framework towards an in-depth understanding of the various forms of client privacy leakage threats in federated learning and developing theoretical foundations for attack mitigation.",2020,ArXiv,2004.10397,,https://arxiv.org/pdf/2004.10397.pdf
98f7740348036c2fb03815279b8ca94befb9847f,1,[D35],,1,0,0,0,0,0,0,0,0,0,0,FaceHop: A Light-Weight Low-Resolution Face Gender Classification Method,"A light-weight low-resolution face gender classification method, called FaceHop, is proposed in this research. We have witnessed a rapid progress in face gender classification accuracy due to the adoption of deep learning (DL) technology. Yet, DL-based systems are not suitable for resource-constrained environments with limited networking and computing. FaceHop offers an interpretable non-parametric machine learning solution. It has desired characteristics such as a small model size, a small training data amount, low training complexity, and low resolution input images. FaceHop is developed with the successive subspace learning (SSL) principle and built upon the foundation of PixelHop++. The effectiveness of the FaceHop method is demonstrated by experiments. For gray-scale face images of resolution $32 \times 32$ in the LFW and the CMU Multi-PIE datasets, FaceHop achieves correct gender classification rates of 94.63\% and 95.12\% with model sizes of 16.9K and 17.6K parameters, respectively. It outperforms LeNet-5 in classification accuracy while LeNet-5 has a model size of 75.8K parameters.",2020,ArXiv,2007.0951,,https://arxiv.org/pdf/2007.09510.pdf
9924616d6a236b628319da07b19aacf9147314d1,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression,"It plays a fundamental role to compactly represent the visual information towards the optimization of the ultimate utility in myriad visual data centered applications. With numerous approaches proposed to efficiently compress the texture and visual features serving human visual perception and machine intelligence respectively, much less work has been dedicated to studying the interactions between them. Here we investigate the integration of feature and texture compression, and show that a universal and collaborative visual information representation can be achieved in a hierarchical way. In particular, we study the feature and texture compression in a scalable coding framework, where the base layer serves as the deep learning feature and enhancement layer targets to perfectly reconstruct the texture. Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction, aiming to further construct texture representation from feature. As such, the residuals between the original and reconstructed texture could be further conveyed in the enhancement layer. To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner such that the learned features enjoy both high quality reconstruction and high accuracy analysis. We further demonstrate the framework and optimization strategies in face image compression, and promising coding performance has been achieved in terms of both rate-fidelity and rate-accuracy.",2020,ArXiv,2004.10043,,https://arxiv.org/pdf/2004.10043.pdf
9930e10a6216bc741c5f043abf168bba3611d96f,0,,,1,0,0,0,0,0,0,0,0,0,0,Self-Organizing Neural Visual Models to Learn Feature Detectors and Motion Tracking Behaviour by Exposure to Real-World Data,,2018,,,10.20381/ruor-21368,https://ruor.uottawa.ca/bitstream/10393/37096/1/Yogeswaran_Arjun_2018_thesis.pdf
993a2c02a5a3263b3047202e3d86aa9a0dd6ebfe,0,,,1,0,1,0,0,0,0,0,0,0,0,Motion Interchange Patterns for Action Recognition in Unconstrained Videos,"Action Recognition in videos is an active research field that is fueled by an acute need, spanning several application domains. Still, existing systems fall short of the applications' needs in real-world scenarios, where the quality of the video is less than optimal and the viewpoint is uncontrolled and often not static. In this paper, we consider the key elements of motion encoding and focus on capturing local changes in motion directions. In addition, we decouple image edges from motion edges using a suppression mechanism, and compensate for global camera motion by using an especially fitted registration scheme. Combined with a standard bag-of-words technique, our methods achieves state-of-the-art performance in the most recent and challenging benchmarks.",2012,ECCV,,10.1007/978-3-642-33783-3_19,http://www.cs.tau.ac.il/~wolf/papers/MIP_eccv12.pdf
99c57ec53f2598d63c010f791adbca386b276919,1,[D21],,1,0,0,0,0,0,0,0,0,0,0,Landmark-Guided Local Deep Neural Networks for Age and Gender Classification,"Many types of deep neural networks have been proposed to address the problem of human biometric identification, especially in the areas of face detection and recognition. Local deep neural networks have been recently used in face-based age and gender classification, despite their improvement in performance, their costs on model training is rather expensive. In this paper, we propose to construct a local deep neural network for age and gender classification. In our proposed model, local image patches are selected based on the detected facial landmarks; the selected patches are then used for the network training. A holistical edge map for an entire image is also used for training a “global” network. The age and gender classification results are obtained by combining both the outputs from both the “global” and the local networks. Our proposed model is tested on two face image benchmark datasets; competitive performance is obtained compared to the state-of-the-art methods.",2018,J. Sensors,,10.1155/2018/5034684,https://pdfs.semanticscholar.org/99c5/7ec53f2598d63c010f791adbca386b276919.pdf
9b38a536982409358030a97b58be3c9b05922db3,0,,,0,1,0,0,0,0,0,0,0,0,0,Convolutional neural networks for attribute-based active authentication on mobile devices,"We present a Deep Convolutional Neural Network (DCNN) architecture for the task of continuous authentication on mobile devices. To deal with the limited resources of these devices, we reduce the complexity of the networks by learning intermediate features such as gender and hair color instead of identities. We present a multi-task, part-based DCNN architecture for attribute detection that performs better than state-of-the-art methods in terms of accuracy. As a byproduct of the proposed architecture, we are able to explore the embedding space of the attributes extracted from different facial parts, such as mouth and eyes, to discover new attributes. Furthermore, through extensive experimentation, we show that the attribute features extracted by our method outperform a previously presented attribute-based method and a baseline LBP method for the task of active authentication. Lastly, we demonstrate the effectiveness of the proposed architecture in terms of speed and power consumption by deploying it on an actual mobile device.",2016,"2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS)",1604.08865,10.1109/BTAS.2016.7791163,https://arxiv.org/pdf/1604.08865.pdf
9ca6d0555e8e1d62270b5b95aeca0165b741d026,0,,,1,0,0,0,0,1,0,0,0,0,0,Pooling Faces: Template Based Face Recognition with Pooled Face Images,"We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an approach which was proven effective in many other domains, but, to our knowledge, never fully explored for face images: average pooling of face photos. We show how (and why!) the space of a template's images can be partitioned and then pooled based on image quality and head pose and the effect this has on accuracy and template size. We perform extensive tests on the IJB-A and Janus CS2 template based face identification and verification benchmarks. These show that not only does our approach outperform published state of the art despite requiring far fewer cross template comparisons, but also, surprisingly, that image pooling performs on par with deep feature pooling.",2016,2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),1607.0145,10.1109/CVPRW.2016.23,https://arxiv.org/pdf/1607.01450.pdf
9cc4abd2ec10e5fa94ff846c5ee27377caf17cf0,0,,,0,1,0,0,0,0,0,1,0,0,0,Improved Techniques for GAN based Facial Inpainting,"In this paper we present several architectural and optimization recipes for generative adversarial network(GAN) based facial semantic inpainting. Current benchmark models are susceptible to initial solutions of non-convex optimization criterion of GAN based inpainting. We present an end-to-end trainable parametric network to deterministically start from good initial solutions leading to more photo realistic reconstructions with significant optimization speed up. For the first time, we show how to efficiently extend GAN based single image inpainter models to sequences by a)learning to initialize a temporal window of solutions with a recurrent neural network and b)imposing a temporal smoothness loss(during iterative optimization) to respect the redundancy in temporal dimension of a sequence. We conduct comprehensive empirical evaluations on CelebA images and pseudo sequences followed by real life videos of VidTIMIT dataset. The proposed method significantly outperforms current GAN based state-of-the-art in terms of reconstruction quality with a simultaneous speedup of over 15$\times$. We also show that our proposed model is better in preserving facial identity in a sequence even without explicitly using any face recognition module during training.",2018,ArXiv,1810.08774,,https://arxiv.org/pdf/1810.08774.pdf
9cf718136c8a33659dc35bf0a7fd4a8c6c68f75d,0,,,1,0,0,0,0,0,0,0,0,0,0,From Images to 3D Shape Attributes,"Our goal in this paper is to investigate properties of 3D shape that can be determined from a single image. We define 3D shape attributes—generic properties of the shape that capture curvature, contact and occupied space. Our first objective is to infer these 3D shape attributes from a single image. A second objective is to infer a 3D shape embedding—a low dimensional vector representing the 3D shape. We study how the 3D shape attributes and embedding can be obtained from a single image by training a Convolutional Neural Network (CNN) for this task. We start with synthetic images so that the contribution of various cues and nuisance parameters can be controlled. We then turn to real images and introduce a large scale image dataset of sculptures containing 143K images covering 2197 works from 242 artists. For the CNN trained on the sculpture dataset we show the following: (i) which regions of the imaged sculpture are used by the CNN to infer the 3D shape attributes; (ii) that the shape embedding can be used to match previously unseen sculptures largely independent of viewpoint; and (iii) that the 3D attributes generalize to images of other (non-sculpture) object classes.",2019,IEEE Transactions on Pattern Analysis and Machine Intelligence,,10.1109/TPAMI.2017.2782810,
9d757c0fede931b1c6ac344f67767533043cba14,0,,,1,0,0,0,0,0,0,0,0,0,0,Search Based Face Annotation Using PCA and Unsupervised Label Refinement Algorithms,"Face recognition/detection presents a challenging problem in the field of image analysis and computer applications, and it is becoming more popular day by day because of its applications in various fields. Face annotation provide ways for recognizing facial images. Face annotation process is part of face detection, face recognition. Now a day’s research interest is in mining weakly-labeled facial images for resolving research challenges in computer vision. Search based face annotation framework is proposed to tackle the problems related to face image and label quality. This framework proposed Unsupervised Label Refinement algorithm to refine weakly labeled facial images. Clustering based Approximation (CBA) algorithm to improve efficiency and scalability. Experiment results show that the ULR algorithm boost the performance of the proposed search based face annotation framework. Keywords— Face annotation, Search based face annotation, unsupervised label refinement, web facial images, and weak label.",2015,,,,https://pdfs.semanticscholar.org/9d75/7c0fede931b1c6ac344f67767533043cba14.pdf
9de28803825a1c7b34f27569e718e872d1a86698,0,,,0,1,0,0,0,0,0,0,0,0,1,Face Attribute Detection with MobileNetV2 and NasNet-Mobile,"In this paper, we propose two simple yet effective methods to estimate facial attributes in unconstrained images. We use a straight forward and fast face alignment technique for preprocessing and estimate the face attributes using MobileNetV2 and Nasnet-Mobile, two lightweight CNN (Convolutional Neural Network) architectures. Both architectures perform similarly well in terms of accuracy and speed. A comparison with state-of-the-art methods with respect to processing time and accuracy shows that our proposed approach perform faster than the best state-of-the-art model and better than the fastest state-of-the-art model. Moreover, our approach is easy to use and capable of being deployed on mobile devices.",2019,2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA),,10.1109/ISPA.2019.8868585,
9ded233c0c51f28aa00700deea916981578b5cac,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,Secure Face Matching Using Fully Homomorphic Encryption,"Face recognition technology has demonstrated tremendous progress over the past few years, primarily due to advances in representation learning. As we witness the widespread adoption of these systems, it is imperative to consider the security of face representations. In this pa per, we explore the practicality of using a fully homomorphic encryption based framework to secure a database of face templates. This framework is designed to preserve the privacy of users and prevent information leakage from the templates, while maintaining their utility through tem plate matching directly in the encrypted domain. Additionally, we also explore a batching and dimensionality reduction scheme to trade-off face matching accuracy and computational complexity. Experiments on benchmark face datasets (LFW, IJB-A, IJB-B, CASIA) indicate that secure face matching can be practically feasible (16KB template size and 0.01 sec per match pair for 512-dimensional features from SphereFace I231) while exhibiting minimal loss in matching performance.",2018,"2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS)",,,
9e3a9bddd773cd34b186cbd3489a112598583294,1,[D29],,1,0,0,0,0,0,0,0,0,0,0,Masked Face Recognition Dataset and Application,"In order to effectively prevent the spread of COVID-19 virus, almost everyone wears a mask during coronavirus epidemic. This almost makes conventional facial recognition technology ineffective in many cases, such as community access control, face access control, facial attendance, facial security checks at train stations, etc. Therefore, it is very urgent to improve the recognition performance of the existing face recognition technology on the masked faces. Most current advanced face recognition approaches are designed based on deep learning, which depend on a large number of face samples. However, at present, there are no publicly available masked face recognition datasets. To this end, this work proposes three types of masked face datasets, including Masked Face Detection Dataset (MFDD), Real-world Masked Face Recognition Dataset (RMFRD) and Simulated Masked Face Recognition Dataset (SMFRD). Among them, to the best of our knowledge, RMFRD is currently theworld's largest real-world masked face dataset. These datasets are freely available to industry and academia, based on which various applications on masked faces can be developed. The multi-granularity masked face recognition model we developed achieves 95% accuracy, exceeding the results reported by the industry. Our datasets are available at: this https URL.",2020,ArXiv,2003.09093,,https://arxiv.org/pdf/2003.09093.pdf
9ec9a80b1c9ee6450f4419f01e457bb87d91bd5e,1,[D22],,1,0,0,0,0,0,1,0,0,0,0,Optimized projection for Collaborative Representation based Classification and its applications to face recognition,"A new dimensionality reduction method called OP-CRC is proposed.OP-CRC is designed based on Collaborative Representation based Classification (CRC).The projection matrix of OP-CRC is solved by iteration algorithm.OP-CRC is effective for face recognition. Collaborative Representation based Classification (CRC) is powerful for face recognition and has lower computational complexity than Sparse Representation based Classification (SRC). To improve the performance of CRC, this paper proposes a new dimensionality reduction method called Optimized Projection for Collaborative Representation based Classification (OP-CRC), which has the direct connection to CRC. CRC uses the minimum reconstruction residual based on collaborative representation as the decision rule. OP-CRC is designed according to this rule. The criterion of OP-CRC is maximizing the collaborative representation based between-class scatter and minimizing the collaborative representation based within-class scatter in the transformed space simultaneously. This criterion is solved by iterative algorithm and the algorithm converges fast. CRC performs very well in the transformed space of OP-CRC. Experimental results on Yale, AR, FERET, CMU_PIE and LFW databases show the effectiveness of OP-CRC in face recognition.",2016,Pattern Recognit. Lett.,,10.1016/j.patrec.2016.01.012,
a0eecf56c9b59406dae9138a02542a0154b65b80,0,,,0,1,0,0,0,0,0,0,0,0,0,Generalizing Energy-based Generative ConvNets from Particle Evolution Perspective,"Compared with Generative Adversarial Networks (GAN), the Energy-Based generative Model (EBM) possesses two appealing properties: i) it can be directly optimized without requiring an auxiliary network during the learning and synthesizing; ii) it can better approximate underlying distribution of the observed data by learning explicitly potential functions. This paper studies a branch of EBMs, i.e., the energy-based Generative ConvNet (GCN), which minimizes its energy function defined by a bottom-up ConvNet. From the perspective of particle physics, we solve the problem of unstable energy dissipation that might damage the quality of the synthesized samples during the maximum likelihood learning. Specifically, we establish a connection between FRAME model [1] and dynamic physics process and provide a generalized formulation of FRAME in discrete flow with a certain metric measure from particle perspective. To address KL-vanishing issue, we generalize the reformulated GCN from the KL discrete flow with KL divergence measure to a Jordan-Kinderleher-Otto (JKO) discrete flow with Wasserastein distance metric and derive a Wasserastein GCN (w-GCN). To further minimize the learning bias and improve the model generalization, we present a Generalized GCN (GGCN). GGCN introduces a hidden space mapping strategy and employs a normal distribution as hidden space for the reference distribution. Besides, it applies a matching trainable non-linear upsampling function for further generalization. Considering the limitation of the efficiency problem in MCMC based learning of EBMs, an amortized learning are also proposed to improve the learning efficiency. Quantitative and qualitative experiments are conducted on several widely-used face and natural image datasets. Our experimental results surpass those of the existing models in both model stability and the quality of generated samples.",2019,ArXiv,1910.14216,,https://arxiv.org/pdf/1910.14216.pdf
a113b9ac18560277a47fe2442a1bdfdcb58aa01d,0,,,0,1,0,0,0,0,0,0,0,0,0,Learning 3D Face Reconstruction with a Pose Guidance Network,"We present a self-supervised learning approach to learning monocular 3D face reconstruction with a pose guidance network (PGN). First, we unveil the bottleneck of pose estimation in prior parametric 3D face learning methods, and propose to utilize 3D face landmarks for estimating pose parameters. With our specially designed PGN, our model can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images. Our network is further augmented with a self-supervised learning scheme, which exploits face geometry information embedded in multiple frames of the same person, to alleviate the ill-posed nature of regressing 3D face geometry from a single image. These three insights yield a single approach that combines the complementary strengths of parametric model learning and data-driven learning techniques. We conduct a rigorous evaluation on the challenging AFLW2000-3D, Florence and FaceWarehouse datasets, and show that our method outperforms the state-of-the-art for all metrics.",2020,ArXiv,2010.04384,,https://arxiv.org/pdf/2010.04384.pdf
a324d61c79fe2e240e080f0dab358aa72dd002b3,1,,1,0,0,1,0,0,0,0,0,0,0,0,Adaptive noise dictionary construction via IRRPCA for face recognition,"Recently, regression analysis has become a popular method for face recognition. Various robust regression methods have been proposed to handle with different recognition tasks. In this paper, we attempt to achieve this goal by the strategy of adding an adaptive noise dictionary (AND) to the training samples. In contrast to the previous methods, the noise dictionary (ND) is adaptive to different kinds of noise and extracted automatically. To get an effective noise dictionary, the Iteratively Reweighted Robust Principal Component Analysis (IRRPCA) is proposed. A corresponding classifier based on linear regression is presented for recognition. As this adaptive noise dictionary can describe the noise distribution of testing samples, it is robust to various kinds of noise and applicable for recognition tasks with occluded or corrupted images. This method is also extended to deal with misaligned images. Experiments are conducted on AR, Yale B, CMU PIE, CMU Multi-Pie, LFW and Pubfig databases to verify the robustness of our method to variations in occlusion, corruption, illumination, misalignment, etc. A novel noise dictionary for regression is developed.The noise dictionary is adaptive to different kinds of noise.Iteratively Reweighted Robust Principal Component Analysis is developed.Augmented Lagrangian Multiplier method is used to solve our model.An extended version is provided to deal with misaligned images.",2016,Pattern Recognit.,,10.1016/j.patcog.2016.02.005,
a45a3b89d0522562e90da93fa508d0d01df29240,0,,,1,0,0,0,0,0,0,0,0,0,0,Image Recognition Using Manifold Constrained Collaborative Representation,"Image recognition is still a challenging task due to the existed illumination and view variations. Manifold learning and representation based classifiers (RCs) are two widely utilized methods to treat the image recognition. The common RCs only emphasize the representation by the training samples globally, while the geometric manifold structure of samples is not fully considered. In this letter, a novel manifold constrained collaborative representation is proposed, which aims to make the representation of query sample be similar with the codes of their nearby-points. Thus, the obtained representations can be more discriminative for recognition. Extensive experiments on several popular databases show that the our proposed method is promising in recognizing various images.",2018,"2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)",,10.1109/SPAC46244.2018.8965466,
a47ac8569ab1970740cff9f1643f77e9143a62d4,0,,,0,1,0,0,0,0,0,0,0,0,0,Associative Compression Networks for Representation Learning,"This paper introduces Associative Compression Networks (ACNs), a new framework for variational autoencoding with neural networks. The system differs from existing variational autoencoders (VAEs) in that the prior distribution used to model each code is conditioned on a similar code from the dataset. In compression terms this equates to sequentially transmitting the dataset using an ordering determined by proximity in latent space. Since the prior need only account for local, rather than global variations in the latent space, the coding cost is greatly reduced, leading to rich, informative codes. Crucially, the codes remain informative when powerful, autoregressive decoders are used, which we argue is fundamentally difficult with normal VAEs. Experimental results on MNIST, CIFAR-10, ImageNet and CelebA show that ACNs discover high-level latent features such as object class, writing style, pose and facial expression, which can be used to cluster and classify the data, as well as to generate diverse and convincing samples. We conclude that ACNs are a promising new direction for representation learning: one that steps away from IID modelling, and towards learning a structured description of the dataset as a whole.",2018,ArXiv,1804.02476,,https://arxiv.org/pdf/1804.02476.pdf
a50f604454670403e3a7993b7d8b3b246e4e5c2b,1,,1,0,0,0,0,0,1,1,0,0,0,0,Video Face Recognition Using Siamese Networks With Block-Sparsity Matching,"Deep learning models for still-to-video FR typically provide a low level of accuracy because faces captured in unconstrained videos are matched against a reference gallery comprised of a single facial still per individual. For improved robustness to intra-class variations, deep Siamese networks have recently been used for pair-wise face matching. Although these networks can improve state-of-the-art accuracy, the absence of prior knowledge from the target domain means that many images must be collected to account for all possible capture conditions, which is not practical for many real-world surveillance applications. In this paper, we propose the deep SiamSRC network that employs block-sparsity for face matching, while the reference gallery is augmented with a compact set of domain-specific facial images. Prior to deployment, clustering based on row sparsity is performed on unlabelled faces captured in videos from the target domain. Cluster centers discovered in the capture condition space (defined by, e.g., pose, scale and illumination) are used as rendering parameters with an off-the-shelf 3D face model, and a compact set of synthetic faces are thereby generated for each reference still based on representative intra-class information from the target domain. For pair-wise similarity matching with query facial images, the SiamSRC exploits sparse representation-based classification with a block structure. Experimental results obtained with the videos from the Chokepoint and COX-S2V datasets indicate that the proposed SiamSRC network can outperform state-of-the-art methods for still-to-video FR with a single sample per person, with only a moderate increase in computational complexity.",2020,"IEEE Transactions on Biometrics, Behavior, and Identity Science",,10.1109/TBIOM.2019.2949364,
a53e73139d9d6474ef8a002ee9c1dea49755ebc6,0,,,0,1,0,0,0,0,0,0,0,0,0,Dual Contradistinctive Generative Autoencoder,"We present a new generative autoencoder model with dual contradistinctive losses to improve generative autoencoder that performs simultaneous inference (reconstruction) and synthesis (sampling). Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instancelevel fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for the reconstruction/synthesis), both being contradistinctive. Extensive experimental results by DC-VAE across different resolutions including 32×32, 64×64, 128×128, and 512×512 are reported. The two contradistinctive losses in VAE work harmoniously in DC-VAE leading to a significant qualitative and quantitative performance enhancement over the baseline VAEs without architectural changes. State-of-the-art or competitive results among generative autoencoders for image reconstruction, image synthesis, image interpolation, and representation learning are observed. DC-VAE is a general-purpose VAE model, applicable to a wide variety of downstream tasks in computer vision and machine learning.",2020,ArXiv,2011.10063,,https://arxiv.org/pdf/2011.10063.pdf
a5eb36f1e77245dfc9e5c0c03998529331e4c89b,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,An optimal set of code words and correntropy for rotated least squares regression,"This paper presents a robust feature extraction method for face recognition based on least squares regression (LSR). Our focus is to enhance the robustness and discriminability of the LSR. First, an optimal set of code words is introduced in LSR. Compared to the traditional set of code words, this new set uses less number of code words. Furthermore, it can make the distance of the regression targets of different classes as large as possible. Then, correntropy is integrated into the LSR model for better robustness. Furthermore, considering the commonly used distance metrics such as Euclidean distance and Cosine distance in the subspace are invariant to rotation transformation, rotation is introduced as additional freedom to promote flexibility without sacrificing accuracy. Our objective function is optimized using half-quadratic (HQ) optimization, which facilitates algorithm development and convergence study. Experimental results show that our method outperforms several subspace methods for face recognition, which indicates the validity of the proposed method.",2014,IEEE International Joint Conference on Biometrics,,10.1109/BTAS.2014.6996222,
a73fd2a2a359bced998119f709c520d6d4ca6a75,0,,,0,1,0,0,0,0,0,0,0,0,0,Towards All-around Knowledge Transferring: Learning From Task-irrelevant Labels,"Deep neural models have hitherto achieved significant performances on numerous classification tasks, but meanwhile require sufficient manually annotated data. Since it is extremely time-consuming and expensive to annotate adequate data for each classification task, learning an empirically effective model with generalization on small dataset has received increased attention. Existing efforts mainly focus on transferring task-relevant knowledge from other similar data to tackle the issue. These approaches have yielded remarkable improvements, yet neglecting the fact that the task-irrelevant features could bring out massive negative transfer effects. To date, no large-scale studies have been performed to investigate the impact of task-irrelevant features, let alone the utilization of this kind of features. In this paper, we firstly propose Task-Irrelevant Transfer Learning (TIRTL) to exploit task-irrelevant features, which mainly are extracted from task-irrelevant labels. Particularly, we suppress the expression of task-irrelevant information and facilitate the learning process of classification. We also provide a theoretical explanation of our method. In addition, TIRTL does not conflict with those that have previously exploited task-relevant knowledge and can be well combined to enable the simultaneous utilization of task-relevant and task-irrelevant features for the first time. In order to verify the effectiveness of our theory and method, we conduct extensive experiments on facial expression recognition and digit recognition tasks. Our source code will be also available in the future for reproducibility.",2020,ArXiv,2011.0847,,https://arxiv.org/pdf/2011.08470.pdf
a799bec46cf4bdace26e8b136d5131a27ec6aa1a,1,[D22],,1,0,0,0,0,0,1,0,0,0,0,Image gradient orientations embedded structural error coding for face recognition with occlusion,"Partially occluded faces are very common in automatic face recognition (FR) in the real world. We explore the problem of FR with occlusion by embedding Image Gradient Orientations (IGO) into robust error coding. The existing works usually put stress on the error distribution in the non-occluded region but neglect the one in the occluded region due to its unpredictability incurred by irregular occlusion. However, in the IGO domain, the error distribution in the occluded region can be built simply and elegantly by a uniform distribution in the interval $$\left[ -\pi ,\pi \right)$$ - π , π , and the one in the occluded region can be well built by a weight-conditional Gaussian distribution. By incorporating the two error distributions and a Markov random field for the priori distribution of the occlusion support, we propose a joint probabilistic generative model for a novel IGO-embedded Structural Error Coding (IGO-SEC) model. Two methods, a new reconstruction method and a new robust structural error metric, are further presented to boost the performance of IGO-SEC. Extensive experiments on 8 popular robust FR methods and 4 benchmark face databases demonstrate the effectiveness and robustness of IGO-SEC in dealing with facial occlusion and occlusion-like variations.",2020,J. Ambient Intell. Humaniz. Comput.,,10.1007/S12652-019-01257-7,
a7c531c4a38721516c6c4c155ee2234e0a3656a1,0,,,0,1,0,0,0,0,0,0,0,0,0,RAG: Facial Attribute Editing by Learning Residual Attributes,"Facial attribute editing aims to modify face images in the desired manner, such as changing hair color, gender, and age, adding or removing eyeglasses, and so on. Recent researches on this topic largely leverage the adversarial loss so that the generated faces are not only realistic but also well correspond to the target attributes. In this paper, we propose Residual Attribute Generative Adversarial Network (RAG), a novel model to achieve unpaired editing for multiple facial attributes. Instead of directly learning the target attributes, we propose to learn the residual attributes, a more intuitive and understandable representation to convert the original task as a problem of arithmetic addition or subtraction for different attributes. Furthermore, we propose the identity preservation loss, which proves to facilitate convergence and provide better results. At last, we leverage effective visual attention to localize the related regions and preserve the unrelated content during transformation. The extensive experiments on two facial attribute datasets demonstrate the superiority of our approach to generate realistic and high-quality faces for multiple attributes. Visualization of the residual image, which is defined as the difference between the original image and the generated result, better explains which regions RAG focuses on when editing different attributes.",2019,IEEE Access,,10.1109/ACCESS.2019.2924959,
a7e5a46e47dd21cc9347b913dd3dde2f0ad832ed,0,,,0,1,0,0,0,0,0,0,0,0,0,On denoising autoencoders trained to minimise binary cross-entropy,"Denoising autoencoders (DAEs) are powerful deep learning models used for feature extraction, data generation and network pre-training. DAEs consist of an encoder and decoder which may be trained simultaneously to minimise a loss (function) between an input and the reconstruction of a corrupted version of the input. There are two common loss functions used for training autoencoders, these include the mean-squared error (MSE) and the binary cross-entropy (BCE). When training autoencoders on image data a natural choice of loss function is BCE, since pixel values may be normalised to take values in [0,1] and the decoder model may be designed to generate samples that take values in (0,1). We show theoretically that DAEs trained to minimise BCE may be used to take gradient steps in the data space towards regions of high probability under the data-generating distribution. Previously this had only been shown for DAEs trained using MSE. As a consequence of the theory, iterative application of a trained DAE moves a data sample from regions of low probability to regions of higher probability under the data-generating distribution. Firstly, we validate the theory by showing that novel data samples, consistent with the training data, may be synthesised when the initial data samples are random noise. Secondly, we motivate the theory by showing that initial data samples synthesised via other methods may be improved via iterative application of a trained DAE to those initial samples.",2017,ArXiv,1708.08487,,https://arxiv.org/pdf/1708.08487.pdf
a91593b6e6d587022351940358aaf5eb815471c2,0,,,0,1,0,0,0,0,0,0,0,0,0,Stacked Wasserstein Autoencoder,"Abstract Approximating distributions over complicated manifolds, such as natural images, are conceptually attractive. The deep latent variable model, trained using variational autoencoders and generative adversarial networks, is now a key technique for representation learning. However, it is difficult to unify these two models for exact latent-variable inference and parallelize both reconstruction and sampling, partly due to the regularization under the latent variables, to match a simple explicit prior distribution. These approaches are prone to be oversimplified, and can only characterize a few modes of the 1 distribution. Based on the recently proposed Wasserstein autoencoder (WAE) with a new regularization as an optimal transport. The paper proposes a stacked Wasserstein autoencoder (SWAE) to learn a deep latent variable model. SWAE is a hierarchical model, which relaxes the optimal transport constraints at two stages. At the first stage, the SWAE flexibly learns a representation distribution, i.e., the encoded prior; and at the second stage, the encoded representation distribution is approximated with a latent variable model under the regularization encouraging the latent distribution to match the explicit prior. This model allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce changes in the output space. Both quantitative and qualitative results demonstrate the superior performance of SWAE compared with the state-of-the-art approaches in terms of faithful reconstruction and generation quality.",2019,Neurocomputing,1910.0256,10.1016/J.NEUCOM.2019.06.096,https://arxiv.org/pdf/1910.02560.pdf
aa0c30bd923774add6e2f27ac74acd197b9110f2,0,,,0,0,0,0,0,1,0,0,0,0,0,Dynamic Probabilistic Linear Discriminant Analysis for video classification,"Component Analysis (CA) comprises of statistical techniques that decompose signals into appropriate latent components, relevant to a task-at-hand (e.g., clustering, segmentation, classification). Recently, an explosion of research in CA has been witnessed, with several novel probabilistic models proposed (e.g., Probabilistic Principal CA, Probabilistic Linear Discriminant Analysis (PLDA), Probabilistic Canonical Correlation Analysis). PLDA is a popular generative probabilistic CA method, that incorporates knowledge regarding class-labels and furthermore introduces class-specific and sample-specific latent spaces. While PLDA has been shown to outperform several state-of-the-art methods, it is nevertheless a static model; any feature-level temporal dependencies that arise in the data are ignored. As has been repeatedly shown, appropriate modelling of temporal dynamics is crucial for the analysis of temporal data (e.g., videos). In this light, we propose the first, to the best of our knowledge, probabilistic LDA formulation that models dynamics, the so-called Dynamic-PLDA (DPLDA). DPLDA is a generative model suitable for video classification and is able to jointly model the label information (e.g., face identity, consistent over videos of the same subject), as well as dynamic variations of each individual video. Experiments on video classification tasks such as face and facial expression recognition show the efficacy of the proposed method.",2017,"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",,10.1109/ICASSP.2017.7952663,http://eprints.mdx.ac.uk/22042/6/dplda_kotsia.pdf
abaa114931d71f80e82fddf076e2a62666126f9d,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Fast object detection based on several samples by training voting space,"In this paper, we propose a fast and novel detection method based on several samples to localize objects in target images or video. Firstly, we use several samples to train a voting space which is constructed by cells at corresponding positions. Each cell is described by a Gaussian distribution whose parameters are estimated by maximum likelihood estimation method. Then, we randomly choose one sample as a query image. Patches of target image are recognized by densely voting in the trained voting space. Next, we use a mean-shift method to refine multiple instances of object class. The high performance of our approach is demonstrated on several challenging data sets in both efficiency and effectiveness.",2015,Pattern Recognition and Image Analysis,,10.1134/S1054661815040227,
ac1a5ec957707d4139924d0d5035b9edc3c0b053,1,[D20],,1,0,0,1,0,0,0,0,0,0,0,Gender Classification on Real-Life Faces,"Gender recognition is one of fundamental tasks of face image analysis. Most of the existing studies have focused on face images acquired under controlled conditions. However, real-world applications require gender classification on real-life faces, which is much more challenging due to significant appearance variations in unconstrained scenarios. In this paper, we investigate gender recognition on real-life faces using the recently built database, the Labeled Faces in the Wild (LFW). Local Binary Patterns (LBP) is employed to describe faces, and Adaboost is used to select the discriminative LBP features. We obtain the performance of 94.44% by applying Support Vector Machine (SVM) with the boosted LBP features. The public database used in this study makes future benchmark and evaluation possible.",2010,ACIVS,,10.1007/978-3-642-17691-3_30,
acbe0df1363a3a40f1302420dd87dcbac02d994d,0,,,0,1,0,0,0,0,0,0,0,0,0,Uncertainty in Neural Processes,We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data is small. We highlight specific architecture and objective choices that we find lead to qualitative and quantitative improvement to posterior inference in this low data regime. Specifically we explore the effects of choices of pooling operator and variational family on posterior quality in neural processes. Superior posterior predictive samples drawn from our novel neural process architectures are demonstrated via image completion/in-painting experiments.,2020,ArXiv,2010.03753,,https://arxiv.org/pdf/2010.03753.pdf
ad5a1621190d18dd429930ab5125c849ce7e4506,0,,,1,0,1,0,0,0,0,0,0,0,0,One shot emotion scores for facial emotion recognition,"Facial emotion recognition in unconstrained settings is a difficult task. They key problems are that people express their emotions in ways that are different from other people, and, for large datasets, there are not enough examples of a specific person to model his/her emotion. A model for predicting emotions will not generalize well to predicting the emotions of a person who has not been encountered during the training. We propose a system that addresses these issues by matching a face video to references of emotion. It does not require examples from the person in the video being queried. We compute the matching scores without requiring fine registration. The method is called one-shot emotion score. We improve classification rate of interdataset experiments over a baseline system by 23% when training on MMI and testing on CK+.",2014,2014 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2014.7025275,http://www.cs.csub.edu/~acruz/papers/10.1109-ICIP.2014.7025275.pdf
af363da96ef9690560d8d030f3225b050eda8753,0,,,0,0,1,0,0,0,0,0,0,0,0,OCR-Free Transcript Alignment,"Recent large-scale digitization and preservation efforts have made images of original manuscripts, accompanied by transcripts, commonly available. An important challenge, for which no practical system exists, is that of aligning transcript letters to their coordinates in manuscript images. Here we propose a system that directly matches the image of a historical text with a synthetic image created from the transcript for the purpose. This, rather than attempting to recognize individual letters in the manuscript image using optical character recognition (OCR). Our method matches the pixels of the two images by employing a dedicated dense flow mechanism coupled with novel local image descriptors designed to spatially integrate local patch similarities. Matching these pixel representations is performed using a message passing algorithm. The various stages of our method make it robust with respect to document degradation, to variations between script styles and to non-linear image transformations. Robustness, as well as practicality of the system, are verified by comprehensive empirical experiments.",2013,2013 12th International Conference on Document Analysis and Recognition,,10.1109/ICDAR.2013.265,http://www.cs.tau.ac.il/~wolf/papers/ofta-online-version.pdf
b04f78429efe1ae240778142a3a55bcec969de2a,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Metamorphic filtering of black-box adversarial attacks on multi-network face recognition models,"Adversarial examples pose a serious threat to the robustness of machine learning models in general and of deep learning models in particular. These carefully designed perturbations of input images can cause targeted misclassifications to a label of the attacker's choice, without being detectable to the naked eye. A particular class of adversarial attacks called black box attacks can be used to fool a target model despite not having access to the model parameters or to the input data used to train the model. In this paper, we first build a black box attack against robust multi-model face recognition pipelines and then test it against Google's FaceNet. We then present a novel metamorphic defense pipeline relying on nonlinear image transformations to detect adversarial attacks with a high degree of accuracy. We further use the results to create probabilistic metamorphic relations that define efficient decision boundaries between the safe and adversarial examples; achieving adversarial classification accuracy of up to 96%.",2020,ICSE,,10.1145/3387940.3391483,
b07582d1a59a9c6f029d0d8328414c7bef64dca0,1,[D36],,1,0,0,0,0,0,0,0,0,0,0,Employing Fusion of Learned and Handcrafted Features for Unconstrained Ear Recognition,"We present an unconstrained ear recognition framework that outperforms state-of-the-art systems in different publicly available image databases. To this end, we developed CNN-based solutions for ear normalization and description, we used well-known handcrafted descriptors, and we fused learned and handcrafted features to improve recognition. We designed a two-stage landmark detector that successfully worked under untrained scenarios. We used the results generated to perform a geometric image normalization that boosted the performance of all evaluated descriptors. Our CNN descriptor outperformed other CNN-based works in the literature, specially in more difficult scenarios. The fusion of learned and handcrafted matchers appears to be complementary as it achieved the best performance in all experiments. The obtained results outperformed all other reported results for the UERC challenge, which contains the most difficult database nowadays.",2018,IET Biom.,1710.07662,10.1049/iet-bmt.2017.0210,https://arxiv.org/pdf/1710.07662.pdf
b08f6c0e6020a551a7f96397fc64e7a85bea42e8,0,,,1,0,1,0,0,0,0,0,0,0,0,Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos,"A hierarchical temporal model is used to estimate head pose in real-world videos.Head pose classification in (un)constrained databases shows superior performance.Proposed model is used to classify facial traits in real-world videos.Trait classification with and without using the estimated pose angle is performed.Facial trait classification using the proposed model show superior performance. Recently, head pose estimation in real-world environments has been receiving attention in the computer vision community due to its applicability to a wide range of contexts. However, this task still remains as an open problem because of the challenges presented by real-world environments. The focus of most of the approaches to this problem has been on estimation from single images or video frames, without leveraging the temporal information available in the entire video sequence. Other approaches frame the problem in terms of classification into a set of very coarse pose bins. In this paper, we propose a hierarchical graphical model that probabilistically estimates continuous head pose angles from real-world videos, by leveraging the temporal pose information over frames. The proposed graphical model is a general framework, which is able to use any type of feature and can be adapted to any facial classification task. Furthermore, the framework outputs the entire pose distribution for a given video frame. This permits robust temporal probabilistic fusion of pose information over the video sequence, and also probabilistically embedding the head pose information into other inference tasks. Experiments on large, real-world video sequences reveal that our approach significantly outperforms alternative state-of-the-art pose estimation methods. The proposed framework is also evaluated on gender and facial hair estimation. By incorporating pose information into the proposed hierarchical temporal graphical mode, superior results are achieved for attribute classification tasks.",2015,Comput. Vis. Image Underst.,,10.1016/j.cviu.2015.03.005,
b0ab28da1f327e65c7e0c76a49faf28840ef6ffd,0,,,0,1,0,0,0,0,0,0,0,0,0,Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions,"By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees. Our experimental results support our theory and show that our algorithm is able to successfully capture the structure of different types of data distributions.",2019,ICML,1806.08141,,https://arxiv.org/pdf/1806.08141.pdf
b0f49ada8e9454048faf17f66d5e7520d5e46e98,0,,,0,1,0,0,0,0,0,0,0,0,0,Coherent Semantic Attention for Image Inpainting,"The latest deep learning-based approaches have shown promising results for the challenging task of inpainting missing regions of an image. However, the existing methods often generate contents with blurry textures and distorted structures due to the discontinuity of the local pixels. From a semantic-level perspective, the local pixel discontinuity is mainly because these methods ignore the semantic relevance and feature continuity of hole regions. To handle this problem, we investigate the human behavior in repairing pictures and propose a fined deep generative model-based approach with a novel coherent semantic attention (CSA) layer, which can not only preserve contextual structure but also make more effective predictions of missing parts by modeling the semantic relevance between the holes features. The task is divided into rough, refinement as two steps and we model each step with a neural network under the U-Net architecture, where the CSA layer is embedded into the encoder of refinement step. Meanwhile, we further propose consistency loss and feature patch discriminator to stabilize the network training process and improve the details. The experiments on CelebA, Places2, and Paris StreetView datasets have validated the effectiveness of our proposed methods in image inpainting tasks and can obtain images with a higher quality as compared with the existing state-of-the-art approaches. The codes and pre-trained models will be available at https://github.com/KumapowerLIU/CSA-inpainting.",2019,2019 IEEE/CVF International Conference on Computer Vision (ICCV),1905.12384,10.1109/ICCV.2019.00427,https://arxiv.org/pdf/1905.12384.pdf
b1855683aee9d635252c216acaaec0f661aeb365,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Activation-Based Weight Significance Criterion for Pruning Deep Neural Networks,"Due to the massive amount of network parameters and great demand for computational resources, large-scale neural networks, especially deep convolutional neural networks (CNNs), can be inconvenient to implement for many real world applications. Therefore, sparsifying deep and densely connected neural networks is becoming a more and more important topic in the computer vision field for addressing these limitations. This paper starts from a very deep CNN trained for face recognition, then explores sparsifying neuron connections for network compression. We propose an activation-based weight significance criterion which estimates the contribution that each weight makes in the activations of the neurons in the next layer, then removes those weights that make least contribution first. A concise but effective procedure is devised for pruning parameters of densely connected neural networks. In this procedure, one neuron is sparsified at a time, and a requested amount of parameters related to this neuron is removed. Applying the proposed method, we greatly compressed the size of a large-scale neural network without causing any loss in recognition accuracy. Furthermore, our experiments show that this procedure can work with different weight significance criterions for different expectations.",2017,ICIG,,10.1007/978-3-319-71589-6_6,
b28d061c0e26b4580f603c1fd919ce0c7a84c731,0,,,1,0,0,0,0,0,0,0,0,0,0,Face value of companies : deep learning for nonverbal communication,"As a side effect of digitalization, a massive amount of unstructured data is generated every day. Unstructured data comprises video, speech, text, and image data, which are easy to interpret for humans but can be challenging for computers. Financial research has been much engaged in recent history with decision-making based on textual or sentiment analysis. Textual analysis is based on the verbal part of communication, but in human interaction on a face-to-face level, nonverbal communication can play an equally important role in supporting a message. The interpretation of emotions in facial expressions is a major component of nonverbal communication. Deep learning is a versatile technique, that is used in numerous applications, providing somewhat cognitive capabilities for machines. This thesis describes how to build a deep convolutional neural network with the ability to detect emotions in faces. Different approaches in deep convolutional model designs are tested and evaluated. The results are then used to evaluate videos of the regular press conference of the European Central Bank between January 2011 and September 2017. This processing step results in emotional-scores of facial expressions from 70 press conferences and more than 200,000 single pictures. It is investigated whether information of nonverbal communication, measured in levels of emotional excitement, can be linked to the movements of the Euro Stoxx 50 index. This ‘face value’ is compared to the value of speech and accompanying research. Using image data from press conferences as source of unstructured data and transferring of nonverbal communication to stock markets are both topics that, to the best of found knowledge, have not yet been focused upon in research.",2017,,,,https://pdfs.semanticscholar.org/b28d/061c0e26b4580f603c1fd919ce0c7a84c731.pdf
b40290a694075868e0daef77303f2c4ca1c43269,0,,,1,0,0,0,0,0,0,0,0,0,0,Combining Local and Global Information for Hair Shape Modeling,"Hair plays an important role in human appearance. However, hair segmentation is still a challenging problem partially due to the lack of an effective model to handle its arbitrary shape variations. In this paper, we present a partbased model, which is robust to hair shape and environment variations. The model combines local and global information to describe the hair shape. The local model is learned by a series of algorithms, including global shape word vocabulary construction, shape word classifier learning and parameter optimization, while the global model which depicts different hair styles is learned using support vector machine (SVM) to configure parts and define potentials for all underlying hair shapes. Experiments performed on a set of consumer images show our algorithm′s capability and robustness to handle hair shape variations and complex environments.",2014,,,,https://pdfs.semanticscholar.org/b402/90a694075868e0daef77303f2c4ca1c43269.pdf
b48b68f52b2ebaa8c7b428e98eafe1953045067f,0,,,0,1,0,0,0,0,0,0,0,0,0,Coevolution of Generative Adversarial Networks,"Generative adversarial networks (GAN) became a hot topic, presenting impressive results in the field of computer vision. However, there are still open problems with the GAN model, such as the training stability and the hand-design of architectures. Neuroevolution is a technique that can be used to provide the automatic design of network architectures even in large search spaces as in deep neural networks. Therefore, this project proposes COEGAN, a model that combines neuroevolution and coevolution in the coordination of the GAN training algorithm. The proposal uses the adversarial characteristic between the generator and discriminator components to design an algorithm using coevolution techniques. Our proposal was evaluated in the MNIST dataset. The results suggest the improvement of the training stability and the automatic discovery of efficient network architectures for GANs. Our model also partially solves the mode collapse problem.",2019,EvoApplications,1912.06172,10.1007/978-3-030-16692-2_32,https://arxiv.org/pdf/1912.06172.pdf
b4a25eafdd9e6c737176b4371e616cd91c8b9c7e,0,,,0,1,0,0,0,0,0,0,0,0,0,Structural Autoencoders Improve Representations for Generation and Transfer,"We study the problem of structuring a learned representation to significantly improve performance without supervision. Unlike most methods which focus on using side information like weak supervision or defining new regularization objectives, we focus on improving the learned representation by structuring the architecture of the model. We propose a self-attention based architecture to make the encoder explicitly associate parts of the representation with parts of the input observation. Meanwhile, our structural decoder architecture encourages a hierarchical structure in the latent space, akin to structural causal models, and learns a natural ordering of the latent mechanisms. We demonstrate how these models learn a representation which improves results in a variety of downstream tasks including generation, disentanglement, and transfer using several challenging and natural image datasets.",2020,ArXiv,2006.07796,,https://arxiv.org/pdf/2006.07796.pdf
b5353d2859e41db83c7e4e37c8f25edb3f833dd3,1,[D25],,0,0,0,0,0,0,0,0,0,1,0,An Iterative Regression Approach for Face Pose Estimation from RGB Images,"This paper presents a iterative optimization method, explicit shape regression, for face pose detection and localization. The regression function is learnt to find out the entire facial shape and minimize the alignment errors. A cascaded learning framework is employed to enhance shape constraint during detection. A combination of a two-level boosted regression, shape indexed features and a correlation-based feature selection method is used to improve the performance. In this paper, we have explain the advantage of ESR for deformable object like face pose estimation and reveal its generic applications of the method. In the experiment, we compare the results with different work and demonstrate the accuracy and robustness in different scenarios.",2017,ArXiv,1709.0317,,https://arxiv.org/pdf/1709.03170.pdf
b64ec3f7a89afdcc021d7a08ca7b775de48cc649,1,[D22],,0,0,0,0,1,0,1,0,0,0,0,Real-Time Face Identification via CNN and Boosted Hashing Forest,"The family of real-time face representations is obtained via Convolutional Network with Hashing Forest (CNHF). We learn the CNN, then transform CNN to the multiple convolution architecture and finally learn the output hashing transform via new Boosted Hashing Forest (BHF) technique. This BHF generalizes the Boosted SSC approach for hashing learning with joint optimization of face verification and identification. CNHF is trained on CASIA-WebFace dataset and evaluated on LFW dataset. We code the output of single CNN with 97% on LFW. For Hamming embedding we get CBHF-200 bit (25 byte) code with 96.3% and 2000-bit code with 98.14% on LFW. CNHF with 2000×7-bit hashing trees achieves 93% rank-1 on LFW relative to basic CNN 89.9% rank-1. CNHF generates templates at the rate of 40+ fps with CPU Core i7 and 120+ fps with GPU GeForce GTX 650.",2016,2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),,10.1109/CVPRW.2016.25,http://vislab.ucr.edu/Biometrics16/CVPRW_Vizilter.pdf
b64f8364c21394822bbf337abade00c9b8d5038d,1,"[D18], [D20]",,0,1,0,0,0,0,0,0,0,0,0,Improved Single Sample Per Person Face Recognition via Enriching Intra-Variation and Invariant Features,,2020,,,10.3390/app10020601,
b7bf1fead0b966de955b33b024fe84866be42695,0,,,0,1,0,0,0,0,0,0,0,0,0,C V ] 1 0 N ov 2 01 8 Multi-label Object Attribute Classification using a Convolutional Neural Network,"Objects of different classes can be described using a limited number of attributes such as color, shape, pattern, and texture. Learning to detect object attributes instead of only detecting objects can be helpful in dealing with a priori unknown objects. With this inspiration, a deep convolutional neural network for low-level object attribute classification, called the Deep Attribute Network (DAN), is proposed. Since object features are implicitly learned by object recognition networks, one such existing network is modified and fine-tuned for developing DAN. The performance of DAN is evaluated on the ImageNet Attribute and aPascal datasets. Experiments show that in comparison with state-of-theart methods, the proposed model achieves better results.",2018,,,,https://pdfs.semanticscholar.org/b7bf/1fead0b966de955b33b024fe84866be42695.pdf
b7f290e42dc8369a68367bb0ab171c26d99cbd5e,0,,,1,0,0,0,0,0,0,0,0,0,0,SkData: Data Sets and Algorithm Evaluation Protocols in Python,"Machine learning benchmark data sets come in all shapes and sizes, whereas classification algorithms assume sanitized input, such as (x, y) pairs with vector-valued input x and integer class label y. Researchers and practitioners know all too well how tedious it can be to get from the URL of a new data set to a NumPy ndarray suitable for e.g. pandas or sklearn. The SkData library handles that work for a growing number of benchmark data sets (small and large) so that one-off in-house scripts for downloading and parsing data sets can be replaced with library code that is reliable, community-tested, and documented. The SkData library also introduces an open-ended formalization of training and testing protocols that facilitates direct comparison with published research. This paper describes the usage and architecture of the SkData library. Index Terms—machine learning, cross validation, reproducibility While the neatness of these mathematical abstractions is reflected in the organization of machine learning libraries such as (sklearn), we believe there is a gap in Python's machine learning stack between raw data sets and such neat, abstract interfaces. Data, even when it is provided specifically to test classification algorithms, is seldom provided as (feature, label) pairs. Guidelines regarding standard experiment protocols (e.g. which data to use for training) are expressed informally in web page text if at all. The SkData library consolidates myriad little details of idiosyncratic data processing required to run experiments on standard data sets, and packages them as a library of reusable code. It serves as both a gateway to access a growing list of standard public data sets, and as a framework for expressing precise evaluation protocols that correspond to standard ways of using those data sets. This paper introduces the SkData library ((SkData)) for accessing data sets in Python. SkData provides two levels of interface: 1. It provides low-level idiosyncratic logic for acquir- ing, unpacking, and parsing standard data sets so that they can be loaded into sensible Python data structures.",2013,,,10.25080/MAJORA-8B375195-004,http://conference.scipy.org/proceedings/scipy2013/pdfs/bergstra_skdata.pdf
b824cdb86ed0b5c43d46a9811c170a661b4646e9,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,A test sample oriented two-phase discriminative dictionary learning algorithm for face recognition,"In the field of face recognition, conventional dictionary learning algorithms mainly focus on reconstructing the training samples and cannot directly associate the learning procedure with the test samples. Thus, they may not well represent the test samples and obtain unsatisfactory classification performance. In addition, though different training samples have various contributions to learn a dictionary, conventional dictionary learning algorithms cannot well exploit these contributions. In order to address these problems, we present a test sample oriented two-phase dictionary learning (TSOTP-DL) algorithm for face recognition. In the first phase of the TSOTP-DL algorithm, we use all training samples to provide a linear representation of the test sample, and select K “important” training samples by using the variety of contributions. In the second phase of the TSOTPDL algorithm, a dictionary is learned for the test sample by using the selected K “important” training samples. The TSOTP-DL algorithm utilizes the testing sample to select a subset of the training samples for learning a dictionary, which can reduce the influence of noise. Thus, the training samples are refined according to their contributions to the test sample in our algorithm, and it can improve the discriminative ability of the learned dictionary. In order to further improve the discriminative ability of the learned dictionary, a label embedding of atoms is constructed to encourage the same class training samples to have more similar coding coefficients than different classes. Experiment results demonstrate that our proposed algorithm achieves better classification results than some state-of-the-art dictionary learning and sparse coding algorithms on four public face databases.",2016,Intell. Data Anal.,,10.3233/IDA-150296,
b838de830d9e5e27deeffca0596ad8383eff7b4a,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Data-Driven Sampling Matrix Boolean Optimization for Energy-Efficient Biomedical Signal Acquisition by Compressive Sensing,"Compressive sensing is widely used in biomedical applications, and the sampling matrix plays a critical role on both quality and power consumption of signal acquisition. It projects a high-dimensional vector of data into a low-dimensional subspace by matrix-vector multiplication. An optimal sampling matrix can ensure accurate data reconstruction and/or high compression ratio. Most existing optimization methods can only produce real-valued embedding matrices that result in large energy consumption during data acquisition. In this paper, we propose an efficient method that finds an optimal Boolean sampling matrix in order to reduce the energy consumption. Compared to random Boolean embedding, our data-driven Boolean sampling matrix can improve the image recovery quality by 9 dB. Moreover, in terms of sampling hardware complexity, it reduces the energy consumption by 4.6× and the silicon area by 1.9× over the data-driven real-valued embedding.",2017,IEEE Transactions on Biomedical Circuits and Systems,,10.1109/TBCAS.2016.2597310,https://ren-fengbo.lab.asu.edu/sites/default/files/07742902.pdf
b923d634c155850db1ab243634335c95207a31db,1,[D18],,1,0,0,0,0,1,0,0,0,0,0,Localized Deep Norm-CNN Structure for Face Verification,"Face verification is still a big challenging problem due to the different image conditions such as expression, pose, and illumination. To address these challenges, we propose a new Deep Leaning structure called Localized Deep-Norm CNN. Our model focuses on finding the correlations of features inside the sub region of each learning face by adding a localized feature normalization layer. The model can recover all the important correlated features of face images. Intuitively, the Localized Deep-Norm CNN model mimics the primary visual context of the learned face image by combining the localized extracted features representations. The local relational face features are extracted and normalized by assigning each sub-block to a local CNN model. Then, the global features are constructed by combining these localized high-level features to one fully connected layer to produce the final feature space of 4608d dimensions. In our model, two different optimization techniques are proposed to optimize the loss functions. The first optimization modifies the SoftMax loss function by using a cosine similarity metric instead of Euclidean inner-product layer. The second optimization is done by combining different loss functions with different metric learning. Our model achieves robustness accuracy of 99.19% in characterizing the similarity of multiple faces which is 0.16% improving on the LFW performance results.",2018,2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA),,10.1109/ICMLA.2018.00010,
ba2dcfbe724e3dab5ccab1229182a05efe59e5d0,0,,,1,0,0,0,0,0,1,0,0,0,0,Subspace Representations and Learning for Visual Recognition,"Subspace Representations and Learning for Visual Recognition Farzad Siyahjani Pervasive and affordable sensor and storage technology enables the acquisition of an ever-rising amount of visual data. The ability to extract semantic information by interpreting, indexing and searching visual data is impacting domains such as surveillance, robotics, intelligence, human-computer interaction, navigation, healthcare, and several others. This further stimulates the investigation of automated extraction techniques that are more efficient, and robust against the many sources of noise affecting the already complex visual data, which is carrying the semantic information of interest. We address the problem by designing novel visual data representations, based on learning data subspace decompositions that are invariant against noise, while being informative for the task at hand. We use this guiding principle to tackle several visual recognition problems, including detection and recognition of human interactions from surveillance video, face recognition in unconstrained environments, and domain generalization for object recognition. By interpreting visual data with a simple additive noise model, we consider the subspaces spanned by the model portion (model subspace) and the noise portion (variation subspace). We observe that decomposing the variation subspace against the model subspace gives rise to the so-called parity subspace. Decomposing the model subspace against the variation subspace instead gives rise to what we name invariant subspace. We extend the use of kernel techniques for the parity subspace. This enables modeling the highly non-linear temporal trajectories describing human behavior, and performing detection and recognition of human interactions. In addition, we introduce supervised low-rank matrix decomposition techniques for learning the invariant subspace for two other tasks. We learn invariant representations for face recognition from grossly corrupted images, and we learn object recognition classifiers that are invariant to the socalled domain bias. Extensive experiments using the benchmark datasets publicly available for each of the three tasks, show that learning representations based on subspace decompositions invariant to the sources of noise lead to results comparable or better than the state-ofthe-art.",2017,,,10.33915/etd.6652,https://pdfs.semanticscholar.org/4622/3080936543ffb84e154bad50b711ae9e8322.pdf
badaab2798fbe4f5621280ea5f0705ea8ad56683,0,,,0,1,0,0,0,0,0,0,0,0,0,Single-Frame Regularization for Temporally Stable CNNs,"Convolutional neural networks (CNNs) can model complicated non-linear relations between images. However, they are notoriously sensitive to small changes in the input. Most CNNs trained to describe image-to-image mappings generate temporally unstable results when applied to video sequences, leading to flickering artifacts and other inconsistencies over time. In order to use CNNs for video material, previous methods have relied on estimating dense frame-to-frame motion information (optical flow) in the training and/or the inference phase, or by exploring recurrent learning structures. We take a different approach to the problem, posing temporal stability as a regularization of the cost function. The regularization is formulated to account for different types of motion that can occur between frames, so that temporally stable CNNs can be trained without the need for video material or expensive motion estimation. The training can be performed as a fine-tuning operation, without architectural modifications of the CNN. Our evaluation shows that the training strategy leads to large improvements in temporal smoothness. Moreover, for small datasets the regularization can help in boosting the generalization performance to a much larger extent than what is possible with naive augmentation strategies.",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),1902.10424,10.1109/CVPR.2019.01143,https://arxiv.org/pdf/1902.10424.pdf
bb97664df153ac563e46ec2233346129cafe601b,0,,,0,1,0,0,0,1,0,0,0,0,0,A study on the use of Boundary Equilibrium GAN for Approximate Frontalization of Unconstrained Faces to aid in Surveillance,"Face frontalization is the process of synthesizing frontal facing views of faces given its angled poses. We implement a generative adversarial network (GAN) with spherical linear interpolation (Slerp) for frontalization of unconstrained facial images. Our special focus is intended towards the generation of approximate frontal faces of the side posed images captured from surveillance cameras. Specifically, the present work is a comprehensive study on the implementation of an auto-encoder based Boundary Equilibrium GAN (BEGAN) to generate frontal faces using an interpolation of a side view face and its mirrored view. To increase the quality of the interpolated output we implement a BEGAN with Slerp. This approach could produce a promising output along with a faster and more stable training for the model. The BEGAN model additionally has a balanced generator-discriminator combination, which prevents mode collapse along with a global convergence measure. It is expected that such an approximate face generation model would be able to replace face composites used in surveillance and crime detection.",2018,ArXiv,1809.05611,,https://arxiv.org/pdf/1809.05611.pdf
bc7799ef388bbdd2a122432affee9b132abed028,0,,,1,0,0,0,0,0,0,0,0,0,0,Towards a practical face recognition system: Robust registration and illumination by sparse representation,"Most contemporary face recognition algorithms work well under laboratory conditions but degrade when tested in less-controlled environments. This is mostly due to the difficulty of simultaneously handling variations in illumination, alignment, pose, and occlusion. In this paper, we propose a simple and practical face recognition system that achieves a high degree of robustness and stability to all these variations. We demonstrate how to use tools from sparse representation to align a test face image with a set of frontal training images in the presence of significant registration error and occlusion. We thoroughly characterize the region of attraction for our alignment algorithm on public face datasets such as Multi-PIE. We further study how to obtain a sufficient set of training illuminations for linearly interpolating practical lighting conditions. We have implemented a complete face recognition system, including a projector-based training acquisition system, in order to evaluate how our algorithms work under practical testing conditions. We show that our system can efficiently and effectively recognize faces under a variety of realistic conditions, using only frontal images under the proposed illuminations as training.",2009,CVPR,,10.1109/CVPR.2009.5206654,
be201c86efdf6f979257b6956518bf9311bbdfaa,0,,,0,1,0,0,0,0,0,0,0,0,0,Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses,"Recently, the use of synthetic data generated by GANs has become a popular method to do data augmentation for many applications. While practitioners celebrate this as an economical way to obtain synthetic data for training data-hungry machine learning models, it is not clear that they recognize the perils of such an augmentation technique when applied to an already-biased dataset. Although one expects GANs to replicate the distribution of the original data, in real-world settings with limited data and finite network capacity, GANs suffer from mode collapse. Especially when this data is coming from online social media platforms or the web which are never balanced. In this paper, we show that in settings where data exhibits bias along some axes (eg. gender, race), failure modes of Generative Adversarial Networks (GANs) exacerbate the biases in the generated data. More often than not, this bias is unavoidable; we empirically demonstrate that given input of a dataset of headshots of engineering faculty collected from 47 online university directory webpages in the United States is biased toward white males, a state-of-the-art (unconditional variant of) GAN ""imagines"" faces of synthetic engineering professors that have masculine facial features and white skin color (inferred using human studies and a state-of-the-art gender recognition system). We also conduct a preliminary case study to highlight how Snapchat's explosively popular ""female"" filter (widely accepted to use a conditional variant of GAN), ends up consistently lightening the skin tones in women of color when trying to make face images appear more feminine. Our study is meant to serve as a cautionary tale for the lay practitioners who may unknowingly increase the bias in their training data by using GAN-based augmentation techniques with web data and to showcase the dangers of using biased datasets for facial applications.",2020,ArXiv,2001.09528,,https://arxiv.org/pdf/2001.09528.pdf
bef416fd2e16bf7215eca7394ad1581f7caa8250,0,,,0,1,0,0,0,0,0,0,0,0,0,Deep Learning for Image Super-resolution: A Survey,"Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. In this survey, we aim to give a survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.",2020,IEEE transactions on pattern analysis and machine intelligence,1902.06068,10.1109/TPAMI.2020.2982166,https://arxiv.org/pdf/1902.06068.pdf
c0e78d1bdc59fb076fbf57eb89bb5a83313e9f66,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Noise-robust dictionary learning with slack block-Diagonal structure for face recognition,"Abstract Strict ‘0-1’ block-diagonal structure has been widely used for learning structured representation in face recognition problems. However, it is questionable and unreasonable to assume the within-class representations are the same. To circumvent this problem, in this paper, we propose a slack block-diagonal (SBD) structure for representation where the target structure matrix is dynamically updated, yet its blockdiagonal nature is preserved. Furthermore, in order to depict the noise in face images more precisely, we propose a robust dictionary learning algorithm based on mixed-noise model by utilizing the above SBD structure (SBD2L). SBD2L considers that there exists two forms of noise in data which are drawn from Laplacian and Gaussion distribution, respectively. Moreover, SBD2L introduces a low-rank constraint on the representation matrix to enhance the dictionary’s robustness to noise. Extensive experiments on four benchmark databases show that the proposed SBD2L can achieve better classification results than several state-of-the-art dictionary learning methods.",2020,Pattern Recognit.,,10.1016/j.patcog.2019.107118,
c1482491f553726a8349337351692627a04d5dbe,0,,,1,0,0,0,0,0,0,0,0,0,0,When Follow is Just One Click Away: Understanding Twitter Follow Behavior in the 2016 U.S. Presidential Election,"Motivated by the two paradoxical facts that the marginal cost of following one extra candidate is close to zero and that the majority of Twitter users choose to follow only one or two candidates, we study the Twitter follow behaviors observed in the 2016 U.S. presidential election. Specifically, we complete the following tasks: (1) analyze Twitter follow patterns of the presidential election on Twitter, (2) use negative binomial regression to study the effects of gender and occupation on the number of candidates that one follows, and (3) use multinomial logistic regression to investigate the effects of gender, occupation and celebrities on the choice of candidates to follow.",2017,SocInfo,1702.00048,10.1007/978-3-319-67217-5_25,https://arxiv.org/pdf/1702.00048.pdf
c1f8d69cdd27fc7b0fee22675634592eb98ccf91,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Convolutional neural network with nonlinear competitive units,"Abstract Convolutional Neural Network (CNN) has been an important breakthrough in pattern recognition in recent years. Nevertheless, with the increase in complexity, CNN becomes more difficult to train. To alleviate the problem of training difficulties, we propose a novel nonlinear unit, called Nonlinear Competitive Unit (NCU). By comparing the elements from different network layers and selecting the larger signals element-wisely, it can not only strengthen feature propagation but also accelerate the convergence of CNN. This unit can be regarded as a feature fusion method as well as a kind of activation function. We evaluate our NCU-based models for face verification task and visual classification task on four benchmark datasets. The experimental results demonstrate the superior performance of our models over many state-of-the-art methods, which shows the advantage and potential of the NCU in networks.",2018,Signal Process. Image Commun.,,10.1016/j.image.2017.09.011,http://huamingwu.com/PDF/SPIC.pdf
c383566b0774c004650dcb635d7a2cf2fc11d0e9,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Face Image Deblurring Based on Iterative Spiral Optimazation,"The motion blurred image is caused by the relative motion between the target and the capturing device during the exposure time. It’s difficult to analyze the face information of the motion blurred face image, therefore motion deblurring is needed. However, the existing algorithms cannot deal with the diversity of motion blur kernels well. Based on that, this paper proposes an iterative spiral optimization algorithm for blind motion blurring. The algorithm makes the blurred image spirally approximate the sharp image by calling the deblurring generator multiple times. It is proved that the algorithm can effectively restore the motion blurred image with diverse blurred kernels in the approximate natural state, and improve the visual effect of the image.",2019,CCBR,,10.1007/978-3-030-31456-9_18,
c5651aea43997f71891c2cc7694ddf51af95c2c0,1,[D18],,1,1,0,0,0,1,0,0,0,0,0,Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set,"Recently, deep learning based 3D face reconstruction methods have shown promising results in both quality and efficiency. However, training deep neural networks typically requires a large volume of data, whereas face images with ground-truth 3D face shapes are scarce. In this paper, we propose a novel deep 3D face reconstruction approach that 1) leverages a robust, hybrid loss function for weakly-supervised learning which takes into account both low-level and perception-level information for supervision, and 2) performs multi-image face reconstruction by exploiting complementary information from different images for shape aggregation. Our method is fast, accurate, and robust to occlusion and large pose. We provide comprehensive experiments on MICC Florence and Facewarehouse datasets, systematically comparing our method with fifteen recent methods and demonstrating its state-of-the-art performance. Code available at https://github.com/Microsoft/Deep3DFaceReconstruction",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),1903.08527,10.1109/CVPRW.2019.00038,https://arxiv.org/pdf/1903.08527.pdf
c58ac6a18515dda8aa444a09815196abbfb82429,0,,,1,1,0,0,0,0,0,0,1,0,0,The Elements of End-to-end Deep Face Recognition: A Survey of Recent Advances,"Face recognition is one of the most fundamental and long-standing topics in computer vision community. With the recent developments of deep convolutional neural networks and large-scale datasets, deep face recognition has made remarkable progress and been widely used in the real-world applications. Given a natural image or video frame as input, an end-to-end deep face recognition system outputs the face feature for recognition. To achieve this, the whole system is generally built with three key elements: face detection, face preprocessing, and face representation. The face detection locates faces in the image or frame. Then, the face preprocessing is proceeded to calibrate the faces to a canonical view and crop them to a normalized pixel size. Finally, in the stage of face representation, the discriminative features are extracted from the preprocessed faces for recognition. All of the three elements are fulfilled by deep convolutional neural networks. In this paper, we present a comprehensive survey about the recent advances of every element of the end-to-end deep face recognition, since the thriving deep learning techniques have greatly improved the capability of them. To start with, we introduce an overview of the end-to-end deep face recognition, which, as mentioned above, includes face detection, face preprocessing, and face representation. Then, we review the deep learning based advances of each element, respectively, covering many aspects such as the up-to-date algorithm designs, evaluation metrics, datasets, performance comparison, existing challenges, and promising directions for future research. We hope this survey could bring helpful thoughts to one for better understanding of the big picture of end-to-end face recognition and deeper exploration in a systematic way.",2020,ArXiv,2009.1329,,https://arxiv.org/pdf/2009.13290.pdf
c67420ca490d2219de46ce0cf77bb2bd90ecccdc,0,,,0,1,0,0,0,0,0,0,0,0,0,PA-GAN: Progressive Attention Generative Adversarial Network for Facial Attribute Editing,"Facial attribute editing aims to manipulate attributes on the human face, e.g., adding a mustache or changing the hair color. Existing approaches suffer from a serious compromise between correct attribute generation and preservation of the other information such as identity and background, because they edit the attributes in the imprecise area. To resolve this dilemma, we propose a progressive attention GAN (PA-GAN) for facial attribute editing. In our approach, the editing is progressively conducted from high to low feature level while being constrained inside a proper attribute area by an attention mask at each level. This manner prevents undesired modifications to the irrelevant regions from the beginning, and then the network can focus more on correctly generating the attributes within a proper boundary at each level. As a result, our approach achieves correct attribute editing with irrelevant details much better preserved compared with the state-of-the-arts. Codes are released at this https URL.",2020,ArXiv,2007.05892,,https://arxiv.org/pdf/2007.05892.pdf
c84991fe3bf0635e326a05e34b11ccaf74d233dc,0,,,1,0,1,0,0,0,0,0,0,0,0,A parameter-free label propagation algorithm for person identification in stereo videos,"Motivated by relaxing expensive and laborious person identity annotation in stereo videos, a number of research efforts have recently been dedicated to label propagation. In this work, we propose two heuristic label propagation algorithms for annotating person identities in stereo videos under the observation that the actors in two consecutive facial images in a video are more likely to be identical. In the light of this, after adjacent video frames divided into several groups, we propose our first algorithm (i.e. ZBLC4) to automatically annotate the unlabeled images with the one having the maximum summed similarity between unlabeled and labeled images in each group in the parameter-free manner. Moreover, to cope with singleton groups, an additional classifier is introduced into ZBLC4 algorithm to mitigate the suffering of unreliable prediction dependent on neighbors. We conduct experiments on three publicly-benchmarking stereo videos, demonstrating that our algorithms are superior to the state-of-the-arts. HighlightsPropose a parameter-free label propagation framework for person identification.Capture temporal label correlation across video shots.Beat state-of-the-art label propagation methods for label annotation in stereo videos.",2016,Neurocomputing,,10.1016/j.neucom.2016.08.069,https://manuscript.elsevier.com/S0925231216309651/pdf/S0925231216309651.pdf
c887397c89a8739fc1208c2ef8cab2994b6be8d9,0,,,0,1,0,0,0,0,0,0,0,0,0,The Devil is in the Decoder,"Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance.",2017,BMVC,1707.05847,10.5244/c.31.10,https://arxiv.org/pdf/1707.05847.pdf
c8ab7e84ff8d59e3ab48ae40eda9f34a7044732f,1,,1,1,0,0,0,0,0,0,0,0,0,0,Gender recognition based on face image using reinforced local binary patterns,"Gender recognition is a challenging and innovative research topic in the present sophisticated world of visual technology. This study proposes a system which can identify the gender based on face image. For finding the location of the face region, each input image is divided into overlapping blocks and Gabor features are extracted with different scale and orientations. Generate the enhanced feature, concatenate mean, standard deviation and skewness of Gabor features which are obtained from each block. For detecting face region, this feature is passed to ensemble classifier. To recognise the gender, reinforced local binary patterns are used to extract the facial local features. Adaboost algorithm is used to select and classify the discriminative features such as male or female. The authors' experimental results on Labeled Faces in the Wild (LFW), FERET and Gallagher databases for face detection using Gabor features achieve 98, 98.5 and 96.5% accuracy, respectively. Moreover, the reinforced local binary patterns achieve the accuracy for gender classification as 97.08, 98.5 and 94.21% on the LFW, FERET and Gallagher databases, respectively. Both are achieving improved performance compared with other standard methodologies described in the literature.",2017,IET Comput. Vis.,,10.1049/iet-cvi.2016.0087,
c8ad19f1a45b43dd5d338880d7d67a22cc263eb0,0,,,0,1,0,0,0,0,0,0,0,0,0,Tessellated Wasserstein Auto-Encoders,"Non-adversarial generative models such as variational auto-encoder (VAE), Wasserstein auto-encoders with maximum mean discrepancy (WAE-MMD), sliced-Wasserstein auto-encoder (SWAE) are relatively easy to train and have less mode collapse compared to Wasserstein auto-encoder with generative adversarial network (WAE-GAN). However, they are not very accurate in approximating the target distribution in the latent space because they don't have a discriminator to detect the minor difference between real and fake. To this end, we develop a novel non-adversarial framework called Tessellated Wasserstein Auto-encoders (TWAE) to tessellate the support of the target distribution into a given number of regions by the centroidal Voronoi tessellation (CVT) technique and design batches of data according to the tessellation instead of random shuffling for accurate computation of discrepancy. Theoretically, we demonstrate that the error of estimate to the discrepancy decreases when the numbers of samples $n$ and regions $m$ of the tessellation become larger with rates of $\mathcal{O}(\frac{1}{\sqrt{n}})$ and $\mathcal{O}(\frac{1}{\sqrt{m}})$, respectively. Given fixed $n$ and $m$, a necessary condition for the upper bound of measurement error to be minimized is that the tessellation is the one determined by CVT. TWAE is very flexible to different non-adversarial metrics and can substantially enhance their generative performance in terms of Frechet inception distance (FID) compared to VAE, WAE-MMD, SWAE. Moreover, numerical results indeed demonstrate that TWAE is competitive to the adversarial model WAE-GAN, demonstrating its powerful generative ability.",2020,ArXiv,2005.09923,,https://arxiv.org/pdf/2005.09923.pdf
c8b6f55984ef6cc169dea26cd9e152cedf438a18,0,,,1,0,0,0,0,0,0,0,0,0,0,Face Recognition in Adverse Conditions: A Look at Achieved Advancements,"In this chapter, the authors discuss the main outcomes from both the most recent literature and the research activities summarized in this book. Of course, a complete review is not possible. It is evident that each issue related to face recognition in adverse conditions can be considered as a research topic in itself and would deserve a detailed survey of its own. However, it is interesting to provide a compass to orient one in the presently achieved results in order to identify open problems and promising research lines. In particular, the final chapter provides more detailed considerations about possible future developments.",2014,,,10.4018/978-1-4666-5966-7.CH018,https://pdfs.semanticscholar.org/c8b6/f55984ef6cc169dea26cd9e152cedf438a18.pdf
c9982b87e46aa91384704fa8a0f28acfc4c17989,0,,,1,0,0,0,0,0,0,0,0,0,0,Visual recognition of human communication,"The objective of this work is visual recognition of speech and gestures. Solving this problem opens up a host of applications, such as transcribing archival silent films, or resolving multi- talker simultaneous speech, but most importantly it helps to advance the state of the art in speech recognition by enabling machines to take advantage of the multi-modal nature of human communications. However, visual recognition of speech and gestures is a challenging problem, in part due to the lack of annotations and datasets, but also due to the inter- and intra-personal variations, and in the case of visual speech, ambiguities arising from homophones. Training a deep learning algorithm requires a lot of training data. We propose a method to automatically collect, process and generate a large-scale audio-visual corpus from television videos temporally aligned with the transcript. To build such dataset, it is essential to know 'who' is speaking 'when'. We develop a ConvNet model that learns joint embedding of the sound and the mouth images from unlabelled data, and apply this network to the tasks of audio-to-video synchronisation and active speaker detection. Not only does this play a crucial role in building the dataset that forms the basis of much of the research done in this thesis, the method learns powerful representations of the visual and auditory inputs which can be used for related tasks such as lip reading. We also show that the methods developed here can be extended to the problem of generating talking faces from audio and still images. We then propose a number of deep learning models that are able to recognise visual speech at word and sentence level. In both scenarios, we also demonstrate recognition performance that exceeds the state of the art on public datasets; and in the case of the latter, the lip reading performance beats a professional lip reader on videos from BBC television. We also demonstrate that if audio is available, then visual information helps to improve speech recognition performance. Next, we present a method to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. We propose image encodings and ConvNet-based architectures to first recognise the signal, and then to localise the signal using back-propagation. The method is demonstrated for localising spoken words in audio, and for localising signed gestures in British Sign Language (BSL) videos. Finally, we explore the problem of speaker recognition. Whereas previous works for speaker identification have been limited to constrained conditions, here we build a new large-scale speaker recognition dataset collected from 'in the wild' videos using an automated pipeline. We propose a number of ConvNet architectures that outperforms traditional baselines on this dataset.",2017,,,,https://pdfs.semanticscholar.org/c998/2b87e46aa91384704fa8a0f28acfc4c17989.pdf
c9eae2c5db4c2502ca223953851821d931d262a8,0,,,0,1,0,0,0,0,0,0,0,0,0,DLGAN: Disentangling Label-Specific Fine-Grained Features for Image Manipulation,"Recent studies have shown how disentangling images into content and feature spaces can provide controllable image translation/ manipulation. In this paper, we propose a framework to enable utilizing discrete multi-labels to control which features to be disentangled, i.e., disentangling label-specific fine-grained features for image manipulation (dubbed DLGAN). By mapping the discrete label-specific attribute features into a continuous prior distribution, we leverage the advantages of both discrete labels and reference images to achieve image manipulation in a hybrid fashion. For example, given a face image dataset (e.g., CelebA) with multiple discrete fine-grained labels, we can learn to smoothly interpolate a face image between black hair and blond hair through reference images while immediately controlling the gender and age through discrete input labels. To the best of our knowledge, this is the first work that realizes such a hybrid manipulation within a single model. More importantly, it is the first work to achieve image interpolation between two different domains without requiring continuous labels as the supervision. Qualitative and quantitative experiments demonstrate the effectiveness of the proposed method.",2019,ArXiv,1911.09943,,https://arxiv.org/pdf/1911.09943.pdf
cc989b88f3799835f16842b066a36e171b607e7f,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,ShuffleFaceNet: A Lightweight Face Architecture for Efficient and Highly-Accurate Face Recognition,"The recent success of convolutional neural networks has led to the development of a variety of new effective and efficient architectures. However, few of them have been designed for the specific case of face recognition. Inspired on the state-of-the-art ShuffleNetV2 model, a lightweight face architecture is presented in this paper. The proposal, named ShuffleFaceNet, introduces significant modifications in order to improve face recognition accuracy. First, the Global Average Pooling layer is replaced by a Global Depth-wise Convolution layer, and Parametric Rectified Linear Unit is used as a non-linear activation function. Under the same experimental conditions, ShuffleFaceNet achieves significantly superior accuracy than the original ShuffleNetV2, maintaining the same speed and compact storage. In addition, extensive experiments conducted on three challenging benchmark face datasets, show that our proposal improves not only state-of-the-art lightweight models but also very deep face recognition models.",2019,2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW),,10.1109/ICCVW.2019.00333,http://openaccess.thecvf.com/content_ICCVW_2019/papers/LSR/Martindez-Diaz_ShuffleFaceNet_A_Lightweight_Face_Architecture_for_Efficient_and_Highly-Accurate_Face_ICCVW_2019_paper.pdf
cc9b992a90a6ed6c34c9ac7877f58924d3c2adbe,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,A family of online boosting algorithms,"Boosting has become a powerful and useful tool in the machine learning and computer vision communities in recent years, and many interesting boosting algorithms have been developed to solve various challenging problems. In particular, Friedman proposed a flexible framework called gradient boosting, which has been used to derive boosting procedures for regression, multiple instance learning, semi-supervised learning, etc. Recently some attention has been given to online boosting (where the examples become available one at a time). In this paper we develop a boosting framework that can be used to derive online boosting algorithms for various cost functions. Within this framework, we derive online boosting algorithms for Logistic Regression, Least Squares Regression, and Multiple Instance Learning. We present promising results on a wide range of data sets.",2009,"2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops",,10.1109/ICCVW.2009.5457453,http://vision.ucsd.edu/sites/default/files/osb_iccv09_cam.pdf
cc9d068cf6c4a30da82fd6350a348467cb5086d4,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Protecting Your Faces: MeshFaces Generation and Removal via High-Order Relation-Preserving CycleGAN,"Protecting person's face photos from being misused has been an important issue as the rapid development of ubiquitous face sensors. MeshFaces provide a simple and inexpensive way to protect facial photos and have been widely used in China. This paper treats MeshFace generation and removal as a dual learning problem and proposes a high-order relation-preserving CycleGAN framework to solve this problem. First, dual transformations between the distributions of MeshFaces and clean faces in pixel space are learned under the CycleGAN framework, which can efficiently utilize unpaired data. Then, a novel High-order Relation-preserving (HR) loss is imposed on CycleGAN to recover the finer texture details and generate much sharper images. Different from the L1 and L2 losses that result in image smoothness and blurry, the HR loss can better capture the appearance variation of MeshFaces and hence facilitates removal. Moreover, Identity Preserving loss is proposed to preserve both global and local identity information. Experimental results on three databases demonstrate that our approach is highly effective for MeshFace generation and removal.",2018,2018 International Conference on Biometrics (ICB),,10.1109/ICB2018.2018.00020,
cd595ed12927be2f05d986709548f86ae070b133,0,,,1,0,0,0,0,0,0,0,0,0,0,Hidden assumption of face recognition evaluation under different quality conditions,"Automatic face recognition remains a challenging task due to factors such as variations in recording condition, pose, and age. Many schemes have emerged to enhance the performance of face recognition to deal with poor quality facial images. It has been shown that reporting average accuracy, to cover a wide range of image quality, does not reflect the system's for any specific quality levels. This raises the need to evaluate biometric system's performance at each quality level separately. Challenging face databases have been recorded with varied face image qualities. Unfortunately, the performance of face recognition schemes under different quality conditions, reported in the literature, are evaluated under hidden assumption which cannot be achieved in real-life applications. In fact, this problem could be a source of attack that interferes with the verification through manipulating the recording condition. In order to remedy this problem, two requirements are to be imposed: 1) the matching criteria should be based an Adaptive Quality-Based Threshold (AQBT) and 2) at the verification stage the quality level of an input face image should be determined and classified into one of a non-overlapping predefined quality levels. We illustrate our idea by experiments conducted on the extended Yale B face benchmark dataset. Our experimental results indicate that if AQBT is not adopted, 0 rejection rates becomes very high (always reject) when using low quality face images.",2011,International Conference on Information Society (i-Society 2011),,10.1109/I-SOCIETY18435.2011.5978491,
cdd464c6075b5b5cbfb54710b0254cc371c47a6c,0,,,0,0,0,0,0,0,1,0,0,0,0,Submodular Mini-Batch Training in Generative Moment Matching Networks,"This article was withdrawn because (1) it was uploaded without the co-authors' knowledge or consent, and (2) there are allegations of plagiarism.",2017,ArXiv,1707.05721,,https://arxiv.org/pdf/1707.05721.pdf
ce49cd33cd4c28053737a30076786ec2d1fbad20,0,,,0,1,0,0,0,0,0,0,0,0,0,Unsupervised Transformation Network Based on GANs for Target-Domain Oriented Multi-domain Image Translation,"Multi-domain image translation with unpaired data is a challenging problem. This paper proposes a generalized GAN-based unsupervised multi-domain transformation network (UMT-GAN) for image translation. The generation network of UMT-GAN consists of a universal encoder, a reconstructor and a series of translators corresponding to different target domains. The encoder is used to learn the universal information among different domains. The reconstructor is designed to extract the hierarchical representations of the images by minimizing the reconstruction loss. The translators are used to perform the multi-domain translation. Each translator and reconstructor are connected to a discriminator for adversarial training. Importantly, the high-level representations are shared between the source and multiple target domains, and all network structures are trained together by using a joint loss function. In particular, instead of using a random vector z as inputs to generate high-resolution images, UMT-GAN rather employs the source domain images as the inputs of the generator, hence help the model escape from collapsing to a certain extent. The experimental studies demonstrate the effectiveness and superiority of the proposed algorithm compared with several state-of-the-art algorithms.",2018,ACCV,,10.1007/978-3-030-20890-5_26,
ce57c1426693a910173b33b43410a7c97d3a25ed,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,DeepEthnic: Multi-label Ethnic Classification from Face Images,"Ethnic group classification is a well-researched problem, which has been pursued mainly during the past two decades via traditional approaches of image processing and machine learning. In this paper, we propose a method of classifying an image face into an ethnic group by applying transfer learning from a previously trained classification network for large-scale data recognition. Our proposed method yields state-of-the-art success rates of 99.02%, 99.76%, 99.2%, and 96.7%, respectively, for the four ethnic groups: African, Asian, Caucasian, and Indian.",2018,ICANN,1912.02983,10.1007/978-3-030-01424-7_59,https://arxiv.org/pdf/1912.02983.pdf
ce75deb5c645eeb08254e9a7962c74cab1e4c480,0,,,0,0,0,0,0,1,0,0,0,0,0,Emotion-Preserving Representation Learning via Generative Adversarial Network for Multi-View Facial Expression Recognition,"Face frontalization is one way to overcome the pose variation problem, which simplifies multi-view recognition into one canonical-view recognition. This paper presents a multi-task learning approach based on the generative adversarial network (GAN) that learns the emotion-preserving representations in the face frontalization framework. Taking advantage of adversarial relationship between the generator and the discriminator in GAN, the generator can frontalize input non-frontal face images into frontal face images while preserving the identity and expression characteristics; in the meantime, it can employ the learnt emotion-preserving representations to predict the expression class label from the input face. The proposed network is optimized by combining both synthesis and classification objective functions to make the learnt representations generative and discriminative simultaneously. Experimental results demonstrate that the proposed face frontalization system is very effective for expression recognition with large head pose variations.",2018,2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018),,10.1109/FG.2018.00046,
cf0e805a928f4ce9a643052c20e14fb57b126e1a,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Automated 3D Face Reconstruction from Multiple Images Using Quality Measures,"Automated 3D reconstruction of faces from images is challenging if the image material is difficult in terms of pose, lighting, occlusions and facial expressions, and if the initial 2D feature positions are inaccurate or unreliable. We propose a method that reconstructs individual 3D shapes from multiple single images of one person, judges their quality and then combines the best of all results. This is done separately for different regions of the face. The core element of this algorithm and the focus of our paper is a quality measure that judges a reconstruction without information about the 1 shape. We evaluate different quality measures, develop a method for combining results, and present a complete processing pipeline for automated reconstruction.",2016,2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),,10.1109/CVPR.2016.372,http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Piotraschke_Automated_3D_Face_CVPR_2016_paper.pdf
cfbb2d32586b58f5681e459afd236380acd86e28,1,"[D18], [D20], [D30]",,1,0,0,1,0,0,0,0,0,0,0,Improving alignment of faces for recognition,"Face recognition systems for uncontrolled environments often work through an alignment, feature extraction, and recognition pipeline. Effective alignment of faces is thus crucial as can be an entry point in the process and poor alignments can greatly affect recognition performance. The task of alignment is particularly difficult when a face comes from highly unconstrained environments or so called faces in the wild. A lot of recent research activity has focused on faces in the wild and even simple similarity or affine transformations have proven both effective and essential to achieving state of the art performance. In this paper we explore a straightforward, fast and effective approach to aligning faces based on detecting facial landmarks using Haar-like image features and a cascade of boosted classifiers. Our approach is reminiscent of widely used face detection approaches, but focused on much more detailed features of a face such eye centres, the nose tip and corners of the mouth. This process generates multiple candidates for each landmark and we present a fast and effective filtering strategy allowing us to find sets of landmarks that are consistent. Our experiments show that this approach can outperform contemporary methods and easily fits into popular processing pipelines for faces in the wild.",2011,2011 IEEE International Symposium on Robotic and Sensors Environments (ROSE),,10.1109/ROSE.2011.6058545,http://www.professeurs.polymtl.ca/christopher.pal/2011/ROSE.v2.5.pdf
cfcb4d0d9ba7eb86f068c4fe0f9e6676a37481bc,0,,,0,1,0,0,0,0,0,0,0,0,0,Max-Boost-GAN: Max Operation to Boost Generative Ability of Generative Adversarial Networks,"Generative adversarial networks (GANs) can be used to learn a generation function from a joint probability distribution as an input, and then visual samples with semantic properties can be generated from a marginal probability distribution. In this paper, we propose a novel algorithm named Max-Boost-GAN, which is demonstrated to boost the generative ability of GANs when the error of generation is upper bounded. Moreover, the Max-Boost-GAN can be used to learn the generation functions from two marginal probability distributions as the input, and samples of higher visual quality and variety could be generated from the joint probability distribution. Finally, novel objective functions are proposed for obtaining convergence during training the Max-Boost-GAN. Experiments on the generation of binary digits and RGB human faces show that the Max-Boost-GAN achieves boosted ability of generation as expected.",2017,2017 IEEE International Conference on Computer Vision Workshops (ICCVW),,10.1109/ICCVW.2017.140,http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w18/Di_Max-Boost-GAN_Max_Operation_ICCV_2017_paper.pdf
cfd3476b49c98c7e4e3f1bc800b0b8735d6d9532,1,[D30],,1,0,0,0,0,0,0,0,0,0,0,SuperPatchMatch: An Algorithm for Robust Correspondences Using Superpixel Patches,"Superpixels have become very popular in many computer vision applications. Nevertheless, they remain underexploited, since the superpixel decomposition may produce irregular and nonstable segmentation results due to the dependency to the image content. In this paper, we first introduce a novel structure, a superpixel-based patch, called SuperPatch. The proposed structure, based on superpixel neighborhood, leads to a robust descriptor, since spatial information is naturally included. The generalization of the PatchMatch method to SuperPatches, named SuperPatchMatch, is introduced. Finally, we propose a framework to perform fast segmentation and labeling from an image database, and demonstrate the potential of our approach, since we outperform, in terms of computational cost and accuracy, the results of state-of-the-art methods on both face labeling and medical image segmentation.",2017,IEEE Transactions on Image Processing,1903.07169,10.1109/TIP.2017.2708504,https://arxiv.org/pdf/1903.07169.pdf
d0c615786458b02b18044c132ba9d58605611f65,0,,,1,0,0,0,0,0,0,0,0,0,0,Unsupervised face analysis from multi-view,,2014,,,10.32657/10356/59221,https://dr.ntu.edu.sg//bitstream/10356/59221/1/EEE__G0902328E_ANVAR.pdf
d0fea2f72fe11846ac07afaa8cf44bc6f2c84509,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,AverageExplorer: interactive exploration and alignment of visual data collections,"This paper proposes an interactive framework that allows a user to rapidly explore and visualize a large image collection using the medium of average images. Average images have been gaining popularity as means of artistic expression and data visualization, but the creation of compelling examples is a surprisingly laborious and manual process. Our interactive, real-time system provides a way to summarize large amounts of visual data by weighted average(s) of an image collection, with the weights reflecting user-indicated importance. The aim is to capture not just the mean of the distribution, but a set of modes discovered via interactive exploration. We pose this exploration in terms of a user interactively ""editing"" the average image using various types of strokes, brushes and warps, similar to a normal image editor, with each user interaction providing a new constraint to update the average. New weighted averages can be spawned and edited either individually or jointly. Together, these tools allow the user to simultaneously perform two fundamental operations on visual data: user-guided clustering and user-guided alignment, within the same framework. We show that our system is useful for various computer vision and graphics applications.",2014,TOGS,,10.1145/2601097.2601145,http://www.eecs.berkeley.edu/~junyanz/pdf/junyanz_cv.pdf
d124aa3a7cc0f748e59297eb99e883e163a3371a,0,,,0,0,0,0,0,0,0,1,0,0,0,Fast Lip Feature Extraction Using Psychologically Motivated Gabor Features,"The extraction of relevant lip features is of continuing interest in the speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by glimpse based psychological research into racial barcodes. This allows for 3D geometric features to be produced using Gabor based image patches. This new approach can successfully extract lip features with a minimum of processing, with parameters that can be quickly adapted and used for detailed analysis, and with preliminary results showing successful feature extraction from a range of different speakers. These features can be generated online without the need for trained models, and are also robust and can recover from errors, making them suitable for real world speech analysis.",2018,2018 IEEE Symposium Series on Computational Intelligence (SSCI),,10.1109/SSCI.2018.8628931,http://www.cs.stir.ac.uk/~lss/recentpapers/FASLIP.pdf
d2a4361533fe6657762b38e445d19b300b572672,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Application of Difficult Sample Mining based on Cosine Loss in Face Recognition,"Due to the development of deep convolutional neural networks, face recognition has made great progress, and its main goal is how to improve feature recognition capabilities. In this regard, several loss functions based on angular boundaries have been proposed to increase the feature margin between different classes. Although very good results have been achieved in this direction, there are still some problems. These loss functions only expand the feature margin from the perspective of real classification in training, and do not provide distinguishability for misclassified samples. In order to solve this problem, this paper improves on the original cosine loss function, and implements feature learning in the direction of difficult samples based on misclassified feature vectors.",2020,2020 IEEE International Conference on Mechatronics and Automation (ICMA),,10.1109/ICMA49215.2020.9233852,
d2ea8cfd31e5a5df2448166953cfd8657e90c48e,1,[D18],,1,0,0,1,0,1,0,0,0,0,0,Multi-supervised metric learning for fisher vector faces,"Metric learning has been widely used in face verification. However, most existing metric learning methods only have one single supervised goal, which is insufficient. This paper makes two contributions: first, we show that the multi-supervised metric learning on Fisher vector faces is better than the original one, and is capable of outperforming the state-of-the-art face verification performance on the challenging “LFW” benchmark on condition of 2D-alignment. Second, we show that patch-based alignment and 3D-alignment is useful to Fisher vector faces, and can improve the final result.",2015,2015 IEEE International Conference on Progress in Informatics and Computing (PIC),,10.1109/PIC.2015.7489803,
d309b1b2d8b6667fac99f83bde278ceb33e0f3dd,0,,,0,1,0,0,0,0,0,0,0,0,0,Improving Confidence Estimates for Unfamiliar Examples,,2020,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),1804.03166,10.1109/cvpr42600.2020.00276,https://arxiv.org/pdf/1804.03166.pdf
d33f5f3f230dafd06f0d3599fce3a478a1cf3b53,0,,,1,0,0,0,0,0,0,0,0,0,0,Category space dimensionality reduction for supervised learning,,2013,,,,
d41f3f473aa34f8c184f31bc18bb66c117d6fbc6,0,,,0,0,0,0,0,0,0,0,0,0,1,Automatic Engagement Prediction with GAP Feature,"In this paper, we propose an automatic engagement prediction method for the Engagement in the Wild sub-challenge of EmotiW 2018. We first design a novel Gaze-AU-Pose (GAP) feature taking into account the information of gaze, action units and head pose of a subject. The GAP feature is then used for the subsequent engagement level prediction. To efficiently predict the engagement level for a long-time video, we divide the long-time video into multiple overlapped video clips and extract GAP feature for each clip. A deep model consisting of a Gated Recurrent Unit (GRU) layer and a fully connected layer is used as the engagement predictor. Finally, a mean pooling layer is applied to the per-clip estimation to get the final engagement level of the whole video. Experimental results on the validation set and test set show the effectiveness of the proposed approach. In particular, our approach achieves a promising result with an MSE of 0.0724 on the test set of Engagement Prediction Challenge of EmotiW 2018.t with an MSE of 0.072391 on the test set of Engagement Prediction Challenge of EmotiW 2018.",2018,ICMI,,10.1145/3242969.3264982,http://vipl.ict.ac.cn/uploadfile/upload/2018110210230494.pdf
d48d644758dd74fae7265bd0db13d73f98bf3b7a,1,[D33],,1,0,0,0,0,0,0,0,0,0,0,Regional Attention Network (RAN) for Head Pose and Fine-grained Gesture Recognition,"Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in local appearance. The results show that this is a brittle approach since it relies on the accurate body parts/objects detection. In this work, we argue that there exist local discriminative semantic regions, whose “informativeness” can be evaluated by the attention mechanism for inferring fine-grained gestures/actions. To this end, we propose a novel end-to-end Regional Attention Network (RAN), which is a fully Convolutional Neural Network (CNN) to combine multiple contextual regions through attention mechanism, focusing on parts of the images that are most relevant to a given task. Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor. The model is extensively evaluated on ten datasets belonging to 3 different scenarios: 1) head pose recognition, 2) drivers state recognition, and 3) human action and facial expression recognition. The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.",2020,,,10.1109/taffc.2020.3031841,https://pdfs.semanticscholar.org/4f61/35c4cfe3fd8be7e944ad49d52d073dd45d99.pdf
d88207e1edff90a549413fafd71abafc7afa9838,0,,,1,0,0,0,0,0,0,0,0,0,0,Robust graph transduction,"Given a weighted graph, graph transduction aims to assign unlabeled examples explicit class labels rather than build a general decision function based on the available labeled examples. Practically, a dataset usually contains many noisy data, such as the “bridge points” located across different classes, and the “outliers” that incur abnormal distances from the normal examples of their classes. The labels of these examples are usually ambiguous and also difficult to decide. Labeling them incorrectly may further bring about erroneous classifications on the remaining unlabeled examples. Therefore, their accurate classifications are critical to obtaining satisfactory final performance. Unfortunately, current graph transduction algorithms usually fall short of tackling the noisy but critical examples, so they may become fragile and produce imperfect results sometimes. Therefore, in this thesis we aim to develop a series of robust graph transduction methodologies via iterative or non-iterative way, so that they can perfectly handle the difficult noisy data points. Our works are summarized as follows: In Chapter 2, we propose a robust non-iterative algorithm named “Label Prediction via Deformed Graph Laplacian” (LPDGL). Different from the existing methods that usually employ a traditional graph Laplacian to achieve label smoothness among pairs of examples, in LPDGL we introduce a deformed graph Laplacian, which not only induces the existing pairwise smoothness term, but also leads to a novel local smoothness term. This local smoothness term detects the ambiguity of each example by exploring the associated degree, and assigns confident labels to the examples with large degree, as well as allocates “weak labels” to the uncertain examples with small degree. As a result, the negative effects of outliers and bridge points are suppressed, leading to more robust transduction performance than some existing representative algorithms. Although LPDGL is designed for transduction purpose, we show that it can be easily extended to inductive settings. In Chapter 3, we develop an iterative label propagation approach, called “Fick’s Law Assisted Propagation” (FLAP), for robust graph transduction. To be specific, we regard label propagation on the graph as the practical fluid diffusion on a plane, and develop a novel label propagation algorithm by utilizing a well-known physical theory called Fick’s Law of Diffusion. Different from existing machine learning models that are based on some heuristic principles, FLAP conducts label propagation in a “natural” way, namely when and how much label information is received or transferred by an example, or where these labels should be propagated to, are naturally governed. As a consequence, FLAP not only yields more robust propagation results, but also requires less computational time than the existing iterative methods. In Chapter 4, we propose a propagation framework called “Teachingto-Learn and Learning-to-Teach” (TLLT), in which a “teacher” (i.e. a teaching algorithm) is introduced to guide the label propagation. Different from existing methods that equally treat all the unlabeled examples, in TLLT we assume that different examples have different classification difficulties, and their propagations should follow a simple-to-difficult sequence. As such, the previously “learned” simple examples can ease the learning for the subsequent more difficult examples, and thus these difficult examples can be correctly classified. In each iteration of propagation, the teacher will designate the simplest examples to the “learner” (i.e. a propagation algorithm). After “learning” these simplest examples, the learner will deliver a learning feedback to the teacher to assist it in choosing the next simplest examples. Due to the collaborative teaching and learning process, all the unlabeled examples are propagated in a well-organized sequence, which contributes to the improved performance over existing methods. In Chapter 5, we apply the TLLT framework proposed in Chapter 4 to accomplish saliency detection, so that the saliency values of all the superpixels are decided from simple superpixels to more difficult ones. The difficulty of a superpixel is judged by its informativity, individuality, inhomogeneity, and connectivity. As a result, our saliency detector generates manifest saliency maps, and outperforms baseline methods on the typical public datasets.",2016,,,,https://pdfs.semanticscholar.org/19e0/13bfe1be379909a0b48cf752a59f2d3af47c.pdf
d8b997237a30f7fd87a824c065597b759f5be72f,1,,1,1,0,0,0,0,0,0,0,0,0,0,Cross-Resolution Face Recognition via Prior-Aided Face Hallucination and Residual Knowledge Distillation,"Recent deep learning based face recognition methods have achieved great performance, but it still remains challenging to recognize very low-resolution query face like 28x28 pixels when CCTV camera is far from the captured subject. Such face with very low-resolution is totally out of detail information of the face identity compared to normal resolution in a gallery and hard to find corresponding faces therein. To this end, we propose a Resolution Invariant Model (RIM) for addressing such cross-resolution face recognition problems, with three distinct novelties. First, RIM is a novel and unified deep architecture, containing a Face Hallucination sub-Net (FHN) and a Heterogeneous Recognition sub-Net (HRN), which are jointly learned end to end. Second, FHN is a well-designed tri-path Generative Adversarial Network (GAN) which simultaneously perceives facial structure and geometry prior information, i.e. landmark heatmaps and parsing maps, incorporated with an unsupervised cross-domain adversarial training strategy to super-resolve very low-resolution query image to its 8x larger ones without requiring them to be well aligned. Third, HRN is a generic Convolutional Neural Network (CNN) for heterogeneous face recognition with our proposed residual knowledge distillation strategy for learning discriminative yet generalized feature representation. Quantitative and qualitative experiments on several benchmarks demonstrate the superiority of the proposed model over the state-of-the-arts. Codes and models will be released upon acceptance.",2019,ArXiv,1905.10777,,https://arxiv.org/pdf/1905.10777.pdf
d8c62e812a03f7cefe1e5300232e2a031e3dadfb,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,FACE IMAGE RECOGNITION BASED ON PARTIAL FACE MATCHING USING GENETIC ALGORITHM,"In various real-world face recognition applications such as forensics and surveillance, only partial face image is available. Hence, template matching and recognition are strongly needed. In this paper, a genetic algorithm to match a pattern of an image and then recognize this image by this pattern is proposed. This algorithm can use any pattern of an image such as eye, mouth or ear to recognize the image. The proposed genetic algorithm uses a small length chromosome to decrease the search space, and hence the results could be obtained in a short time. Two datasets were used to test the proposed method which are AR Face database and LFW database of face, the overall matching and recognition accuracy were calculated based on conducting sequences of experiments on random sub-datasets, where the overall matching and recognition accuracy was 91.7% and 90% respectively. The results of the proposed algorithm demonstrate the robustness and efficiency compared with other state-of-the-art algorithms. صلختسملا يف سمأ يف نوكن بلاقلا ةطساوب فرعتلاو بلاقلا ةقباطم اذهل ،دوجوم نوكی هجولا نم طقف ءزج هجولا ىلع فرعتلا تاقیبطت نم ریثك تمدق دق بلاقلا مادختساب قباطتلاو فرعتلل ةینیج ةیمزراوخ ،ثحبلا اذه يف ،اهیلا ةجاحلا . لثم ءزج يأ مدختست نا اهنكمی ةیمزراوخلا هذه ،نیعلا ا ،مفلا ةبولطملا ةروصلا ىلع فرعتتل نذلاا و . ،ةمدختسملا ةینیزختلا ةحاسملا لیلقتل ریصق هلوط موسومرك مدختست ةحورطملا ةیمزراوخلا دیزی اذهو ةیمزراوخلا ةعرس نم . مت دقل نینثا مادختسا روص تانایب دعاوق نم عم لماعتلا ىلع ءانب تبسح فرعتلاو قباطتلا جئاتن ةقد تناكو هوجولا ومجم دعاوق نم ةیئزج ةیئاوشع ةع روصلا , يواست ةیلكلا زییمتلاو ةقباطملا ةقد تناكو 91.7 % و 90 % روص ةدعاق لكل , حضوت جئاتنلا هذه ةدوج و تابث و ةءافك ةمدقملا ةیمزراوخلا نراقم ة ةزیمتم يرخا تایمزراوخب .",2017,,,,https://pdfs.semanticscholar.org/d8c6/2e812a03f7cefe1e5300232e2a031e3dadfb.pdf
d97d557ccf228a5f26feee33312afe0973bfc349,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Smoking Image Detection Based on Convolutional Neural Networks,"With the development of Internet technology and the improvement of network quality, online videos have become increasingly popular. In particular, online live broadcast has become a hotspot in recent years, and smoking behavior in these broadcasts is harmful to smokers and the surrounding environment. Therefore, it is necessary to detect and thereby effectively control smoking behaviors in video content. Traditionally, smoking images are detected based on the detection algorithms of cigarette smoke. Given the limited resolution of live broadcast videos, cigarette smoke is not visually apparent in the video content. This paper proposes a smoking image detection model based on a convolutional neural network, referred to as SmokingNet, which automatically detects smoking behaviors in video content through images. This method can detect smoking images by utilizing only the information of human smoking gestures and cigarette image characteristics without requiring the detection of cigarette smoke, showing high accuracy and superior performance for real-time monitoring.",2018,2018 IEEE 4th International Conference on Computer and Communications (ICCC),,10.1109/CompComm.2018.8781009,
da180d6bd0d609d74f2fe174a68f2fb41ea68683,0,,,0,1,0,0,0,0,0,0,0,0,0,Generative Restricted Kernel Machines,"We introduce a novel framework for generative models based on Restricted Kernel Machines (RKMs) with multi-view generation and uncorrelated feature learning capabilities, called Gen-RKM. To incorporate multi-view generation, this mechanism uses a shared representation of data from various views. The mechanism is flexible to incorporate both kernel-based, (deep) neural network and Convolutional based models within the same setting. To update the parameters of the network, we propose a novel training procedure which jointly learns the features and shared subspace representation. The latent variables are given by the eigen-decomposition of the kernel matrix, where the mutual orthogonality of eigenvectors represents uncorrelated features. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of generated samples on various standard datasets.",2019,ArXiv,1906.08144,,https://arxiv.org/pdf/1906.08144.pdf
dc696c21c68a9f679abdf85daf8f69e1a232159b,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Iterative projection based sparse reconstruction for face recognition,"Abstract This paper presents a projection based iterative method (PIM) for solving the L1-minimization problem with its application to sparse representation and reconstruction. First, the unconstrained basis pursuit denoising (BPDN) problem is transformed into the cross-and-bouquet (CAB) form with a variable λ, and an iterative algorithm is proposed based on the projection method with the gradient of ‖x‖1 being transformed into a piecewise-linear function, which enhances the convergence of the algorithm. The global convergence of the algorithm is proved by Lyapunov method. Then, experiments conducted on random Gaussian sparse signals reconstruction and five well-known face data sets present the effectiveness and robustness of the proposed algorithm. It is also shown that the algorithm is robust to different sparsity levels and amplitude of signals, and has higher convergence rate and recognition accuracy compared with other L1-minimization algorithms especially in the case of noise interference.",2018,Neurocomputing,,10.1016/j.neucom.2018.01.014,
dd2334020dee24ab81716478592dcb8eb4ddf687,0,,,0,1,0,0,0,0,0,0,0,0,0,One-Shot Image-to-Image Translation via Part-Global Learning with a Multi-adversarial Framework,"It is well known that humans can learn and recognize objects effectively from several limited image samples. However, learning from just a few images is still a tremendous challenge for existing main-stream deep neural networks. Inspired by analogical reasoning in the human mind, a feasible strategy is to translate the abundant images of a rich source domain to enrich the relevant yet different target domain with insufficient image data. To achieve this goal, we propose a novel, effective multi-adversarial framework (MA) based on part-global learning, which accomplishes one-shot cross-domain image-to-image translation. In specific, we first devise a part-global adversarial training scheme to provide an efficient way for feature extraction and prevent discriminators being over-fitted. Then, a multi-adversarial mechanism is employed to enhance the image-to-image translation ability to unearth the high-level semantic representation. Moreover, a balanced adversarial loss function is presented, which aims to balance the training data and stabilize the training process. Extensive experiments demonstrate that the proposed approach can obtain impressive results on various datasets between two extremely imbalanced image domains and outperform state-of-the-art methods on one-shot image-to-image translation.",2019,ArXiv,1905.04729,,https://arxiv.org/pdf/1905.04729.pdf
dd84369dd0bb476d0872a1f7b9914c2794be188b,0,,,0,1,0,0,0,0,0,0,0,0,0,Generative adversarial networks with decoder-encoder output noises,"In recent years, research on image generation has been developing very fast. The generative adversarial network (GAN) emerges as a promising framework, which uses adversarial training to improve the generative ability of its generator. However, since GAN and most of its variants use randomly sampled noises as the input of their generators, they have to learn a mapping function from a whole random distribution to the image manifold. As the structures of the random distribution and the image manifold are generally different, this results in GAN and its variants difficult to train and converge. In this paper, we propose a novel deep model called generative adversarial networks with decoder-encoder output noises (DE-GANs), which take advantage of both the adversarial training and the variational Bayesian inference to improve GAN and its variants on image generation performances. DE-GANs use a pre-trained decoder-encoder architecture to map the random noise vectors to informative ones and feed them to the generator of the adversarial networks. Since the decoder-encoder architecture is trained with the same data set as the generator, its output vectors, as the inputs of the generator, could carry the intrinsic distribution information of the training images, which greatly improves the learnability of the generator and the quality of the generated images. Extensive experiments demonstrate the effectiveness of the proposed model, DE-GANs.",2020,Neural Networks,,10.1016/j.neunet.2020.04.005,
ddbd2e29728ae6686d19f3cae5bf245e7efdf9e6,0,,,1,0,0,0,0,0,0,0,0,0,0,Improved Face Recognition approach Using ILTP for Low Resolution Images,The field of biometrics examines the unique physical or behavioural traits that can be used to determine a person’s identity. Biometric recognition is the automatic recognition of a person based on one or more of these traits. Low resolution is main problem in face recognition that degrades performance of recognition approach In this paper ILTP approach that computes texture features from gray scale images. ILTP approach is used for extraction of face texture features from low resolution images based on DCT and DWT wavelet filter. SVM classifier is used for the matching between the training and testing images. In our work we improved the accuracy for the low resolution images.,2016,,,,
ddf099f0e0631da4a6396a17829160301796151c,1,"[D18], [D24]",,1,0,0,0,0,0,0,0,1,0,0,Chen et al . Face Quality Value : Input : Feat -‐ 5 Features : L 2 R + PKM Model : Feat -‐,"Face image quality can be defined as a measure of the utility of a face image to automatic face recognition. In this work, we propose (and compare) two methods for learning face image quality based on target face quality values from (i) human assessments of face image quality (matcher-independent), and (ii) quality values computed from similarity scores (matcherdependent). A support vector regression model trained on face features extracted using a deep convolutional neural network (ConvNet) is used to predict the quality of a face image. The proposed methods are evaluated on two unconstrained face image databases, LFW and IJB-A, which both contain facial variations encompassing a multitude of quality factors. Evaluation of the proposed automatic face image quality measures shows we are able to reduce the FNMR at 1% FMR by at least 13% for two face matchers (a COTS matcher and a ConvNet matcher) by using the proposed face quality to select subsets of face images and video frames for matching templates (i.e., multiple faces per subject) in the IJB-A protocol. To our knowledge, this is the first work to utilize human assessments of face image quality in designing a predictor of unconstrained face quality that is shown to be effective in cross-database evaluation.",2018,,,,https://pdfs.semanticscholar.org/ddf0/99f0e0631da4a6396a17829160301796151c.pdf
de016dd2588b0550a6cc3c1236592b1879c94b7d,0,,,1,0,0,0,0,0,0,0,0,0,0,Automatically Generating Large Freely Available Image Datasets From the Web,"Although there are a few standard datasets in the computer vision community, there are several issues with creating new more challenging datasets. Most of these issues stem from privacy and copyright concerns. This project extends on work done by Mears [1] to develop a new paradigm for collecting and sharing image datasets. In this paradigm, only links to online images are shared using image feeds. Filters can be created and used to produce a new feed that is a subset of an already existing feed, allowing for the easy creation of a specific dataset by using an existing broader dataset feed or the cleaning up of a feed generated by a web crawler. The system consists of three main parts: a dataset feed generator, a feed subscriber, and a contest engine which will allow computer vision contests to be participated in in real time. Architectures for all three parts are provided in this paper and the first two have been implemented. The framework presented in this paper aids in the creation of new computer vision datasets that contain a large number of images, are more representative of the real world, and are less subject to copyright and privacy issues.",2011,,,,https://pdfs.semanticscholar.org/de01/6dd2588b0550a6cc3c1236592b1879c94b7d.pdf
de3415565a8c3072b9ab2016272eff360fc8cd67,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Probabilistic Morphable Models,"Abstract 3D Morphable Face Models have been introduced for the analysis of 2D face photographs. The analysis is performed by actively reconstructing the three-dimensional face from the image in an Analysis-by-Synthesis loop, exploring statistical models for shape and appearance. Here we follow a probabilistic approach to acquire a robust and automatic model adaptation. The probabilistic formulation helps to overcome two main limitations of the classical approach. First, Morphable Model adaptation is highly depending on a good initialization. The initial position of landmark points and face pose was given by manual annotation in previous approaches. Our fully probabilistic formulation allows us to integrate unreliable Bottom-Up cues from face and feature point detectors. This integration is superior to the classical feed-forward approach, which is prone to early and possibly wrong decisions. The integration of uncertain Bottom-Up detectors leads to a fully automatic model adaptation process. Second, the probabilistic framework gives us a natural way to handle outliers and occlusions. Face images are recorded in highly unconstrained settings. Often parts of the face are occluded by various objects. Unhandled occlusions can mislead the model adaptation process. The probabilistic interpretation of our model makes possible to detect and segment occluded parts of the image and leads to robust model adaptation. Throughout this chapter we develop a fully probabilistic framework for image interpretation. We start by reformulating the Morphable Model as a probabilistic model in a fully Bayesian framework. Given an image, we search for a posterior distribution of possible image explanations. The integration of Bottom-Up information and the model parameters adaptation is performed using a Data Driven Markov Chain Monte Carlo approach. The face model is extended to be occlusion-aware and explicitly segments the image into face and non-face regions during the model adaptation process. The segmentation and model adaptation is performed in an Expectation-Maximization-style algorithm utilizing a robust illumination estimation method. The presented fully automatic face model adaptation can be used in a wide range of applications like face analysis, face recognition or face image manipulation. Our framework is able to handle images containing strong outliers, occlusions and facial expressions under arbitrary poses and illuminations. Furthermore, the fully probabilistic embedding has the additional advantage that it also delivers the uncertainty of the resulting image interpretation.",2017,,,10.1016/B978-0-12-810493-4.00006-7,http://gravis.dmi.unibas.ch/publications/2017/2017_Chapter_ProbabilisticMorphableModels.pdf
de7924a2bf2f83064d209867221eb49a2d90047b,0,,,1,0,0,0,0,0,0,0,0,0,0,A Neural Framework for Low-Shot Learning,"There has been a growing interest in developing machine learning models that are capable of low-shot learning, the machine learning problem of learning from little data. Progress on low-shot learning has important practical applications to domains in which data for training state-of-the-art algorithms are scarce, for example in identifying rare diseases in medical images or personalizing online services to a user’s activity. Improvements on this task would also have important theoretical implications, as a successful solution in low-shot learning would likely also push the boundary in representation learning, natural language or image understanding, etc. Matching networks are a recently proposed model for low-shot learning that combine neural networks with nonparametric models. They were shown to perform well on benchmark low-shot learning tasks, highlighting the potential of this approach. To better understand the strengths and shortcomings of this family of models, in this work, we compare matching networks and several variants, against a strong baseline when applied to a diverse set of tasks. We find that on relatively simple low-shot learning tasks such as character recognition, specialized low-shot models are not necessary to do well. On more complex tasks such as facial recognition, we see significant improvements in accuracy when using matching networks.",2017,,,,
dec4a2ed38895e837e6f47d779257260ecd292dc,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,ARZombie: A mobile augmented reality game with multimodal interaction,"Augmented reality games have the power to extend virtual gaming into real world scenarios with real people, while enhancing the senses of the user. This paper describes the AR-Zombie game developed with the aim of studying and developing mobile augmented reality applications, specifically for tablets, using face recognition interaction techniques. The goal of the ARZombie player is to kill zombies that are detected through the display of the device. Instead of using markers as a mean of tracking the zombies, this game incorporates a facial recognition system, which will enhance the user experience by improving the interaction of players with the real world. As the player moves around the environment, the game will display virtual zombies on the screen if the detected faces are recognized as belonging to the class of the zombies. ARZombie was tested with users to evaluate the interaction proposals and its components were evaluated regarding the performance in order to ensure a better gaming experience.",2015,2015 7th International Conference on Intelligent Technologies for Interactive Entertainment (INTETAIN),,10.4108/icst.intetain.2015.259743,
deee22c979beda2740b0ba8b2fcaa7c9524b03a8,0,,,0,1,0,0,0,0,0,0,0,0,0,InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs,"Although Generative Adversarial Networks (GANs) have made significant progress in face synthesis, there lacks enough understanding of what GANs have learned in the latent representation to map a randomly sampled code to a photo-realistic face image. In this work, we propose a framework, called InterFaceGAN, to interpret the disentangled face representation learned by the state-of-the-art GAN models and thoroughly analyze the properties of the facial semantics in the latent space. We first find that GANs actually learn various semantics in some linear subspaces of the latent space when being trained to synthesize high-quality faces. After identifying the subspaces of the corresponding latent semantics, we are able to realistically manipulate the facial attributes occurring in the synthesized images without retraining the model. We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection, resulting in more precise control of the attribute manipulation. Besides manipulating gender, age, expression, and the presence of eyeglasses, we can even alter the face pose as well as fix the artifacts accidentally generated by GANs. Extensive experimental results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable face representation.",2020,IEEE transactions on pattern analysis and machine intelligence,2005.09635,10.1109/tpami.2020.3034267,https://arxiv.org/pdf/2005.09635.pdf
df2494da8efa44d70c27abf23f73387318cf1ca8,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Supervised Filter Learning for Representation Based Face Recognition,"Representation based classification methods, such as Sparse Representation Classification (SRC) and Linear Regression Classification (LRC) have been developed for face recognition problem successfully. However, most of these methods use the original face images without any preprocessing for recognition. Thus, their performances may be affected by some problematic factors (such as illumination and expression variances) in the face images. In order to overcome this limitation, a novel supervised filter learning algorithm is proposed for representation based face recognition in this paper. The underlying idea of our algorithm is to learn a filter so that the within-class representation residuals of the faces' Local Binary Pattern (LBP) features are minimized and the between-class representation residuals of the faces' LBP features are maximized. Therefore, the LBP features of filtered face images are more discriminative for representation based classifiers. Furthermore, we also extend our algorithm for heterogeneous face recognition problem. Extensive experiments are carried out on five databases and the experimental results verify the efficacy of the proposed algorithm.",2016,PloS one,,10.1371/journal.pone.0159084,
e0a3737381fe393f6d761ddba16c795b31bcdad2,0,,,0,1,0,0,0,0,0,0,0,0,0,High Fidelity Face Manipulation with Extreme Pose and Expression,"Face manipulation has shown remarkable advances with the flourish of Generative Adversarial Networks. However, due to the difficulties of controlling the structure and texture in high-resolution, it is challenging to simultaneously model pose and expression during manipulation. In this paper, we propose a novel framework that simplifies face manipulation with extreme pose and expression into two correlated stages: a boundary prediction stage and a disentangled face synthesis stage. In the first stage, we propose to use a boundary image for joint pose and expression modeling. An encoder-decoder network is employed to predict the boundary image of the target face in a semi-supervised way. Pose and expression estimators are used to improve the prediction accuracy. In the second stage, the predicted boundary image and the original face are encoded into the structure and texture latent space by two encoder networks respectively. A proxy network and a feature threshold loss are further imposed as constraints to disentangle the latent space. In addition, we build up a new high quality Multi-View Face (MVF-HQ) database that contains 120K high-resolution face images of 479 identities with pose and expression variations, which will be released soon. Qualitative and quantitative experiments on four databases show that our method pushes forward the advance of extreme face manipulation from 128 $\times$ 128 resolution to 1024 $\times$ 1024 resolution, and significantly improves the face recognition performance under large poses.",2019,ArXiv,1903.12003,,https://arxiv.org/pdf/1903.12003.pdf
e0f70bd37f12eba3e622a5f961bba9d4d14e1ea2,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Learning Discriminative and Complementary Patches for Face Recognition,"The ensemble of convolutional neural networks (CNNs) has widely been used in many computer vision tasks including face recognition. Many existing ensembles of face recognition CNNs apply a two-stage pipeline to target performance improvement [10], [20], [22], [23], [29]: (1) it trains multiple CNNs separately with many face patches covering different facial areas; (2) the features derived from different models are aggregated off-line by different fusion methods. The well-known face recognition work, DeepID2 [20] trains 200 networks based on 200 arbitrarily chosen facial areas and chooses the best 25 ones to achieve impressive performance. However, it is very time-consuming to train so many networks. In addition, a brute-force like way of choosing facial patches is used without knowing which face patches are complementary and discriminative. It might be lack of generalization capability for cross-database applications. To solve that, we propose a novel end-to-end CNN ensemble architecture which automatically learns the complementary and discriminative patches for face recognition. Specifically, we propose a novel Patch Generation Engine (PGE) with Patch Search Spatial Transformer Network (PS-STN) and ROI shrunk loss to perform the patch selection process. ROI shrunk loss enlarges the distance of learned features in spatial space and feature space and learn complementary features. In order to get final aggregated feature, we use a supervised fusion module named Two Stage Discriminative Fusion Module (TSDFM) which effective to capture the global and local information and further guide the PGE to learn better patches. Extensive experiments conducted on LFW and YTF datasets show the effectiveness of our novel end-to-end ensemble method.",2019,2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019),,10.1109/FG.2019.8756598,
e1dd586842419f3c40c0d7b70c120cdea72f5b5c,0,,,1,0,0,0,0,0,0,0,0,0,0,Facial feature detection using Conditional Regression Forests,"Even though there are many studies on facial feature detection from two dimensional still images, real-time facial feature detection is one of fresh fields. In this paper, a structure including Conditional Regression Forest and Local Zernike Moments is introduced to solve this problem. In this study, regression forests learn the relations between facial image patches and location of facial feature points conditional to head pose. This method is evaluated on Labeled Faces in the Wild (LFW) [2] database and promising results are obtained.",2015,2015 23nd Signal Processing and Communications Applications Conference (SIU),,10.1109/SIU.2015.7130080,
e3c211c6e8dbea9849790a1c9491aed290a1e144,0,,,0,1,0,0,0,0,0,0,0,0,0,Towards Better Representations with Deep/Bayesian Learning,Towards Better Representations with Deep/Bayesian Learning,2018,,,,http://chunyuan.li/doc/dissertation_cli.pdf
e4aaaf7034201fa94a9b9dc9bc8915cbe01c2c84,0,,,1,0,0,0,0,0,0,0,0,0,0,Multi-stage face recognition for biometric access,"Protecting the privacy of user-identification data is fundamental to protect the information systems from attacks and vulnerabilities. Providing access to such data only to the limited and legitimate users is the key motivation for `Biometrics'. In `Biometric Systems' confirming a user's claim of his/her identity reliably, is more important than focusing on `what he/she really possesses' or `what he/she remembers'. In this paper the use of face image for biometric access is proposed using two multistage face recognition algorithms that employ biometric facial features to validate the user's claim. The proposed algorithms use standard algorithms and classifiers such as EigenFaces, PCA and LDA in stages. Performance evaluation of both proposed algorithms is carried out using two standard datasets, the Extended Yale database and AT&T database. Results using the proposed multi-stage algorithms are better than those using other standard algorithms. Current limitations and possible applications of the proposed algorithms are also discussed along, with further scope of making these robust to pose, illumination and noise variations.",2015,2015 Annual IEEE India Conference (INDICON),,10.1109/INDICON.2015.7443449,
e59f25a68ed5f66bce4e0c14c026cfa7c9424fd4,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Evolutionary Cost-Sensitive Extreme Learning Machine,"Conventional extreme learning machines (ELMs) solve a Moore–Penrose generalized inverse of hidden layer activated matrix and analytically determine the output weights to achieve generalized performance, by assuming the same loss from different types of misclassification. The assumption may not hold in cost-sensitive recognition tasks, such as face recognition-based access control system, where misclassifying a stranger as a family member may result in more serious disaster than misclassifying a family member as a stranger. Though recent cost-sensitive learning can reduce the total loss with a given cost matrix that quantifies how severe one type of mistake against another, in many realistic cases, the cost matrix is unknown to users. Motivated by these concerns, this paper proposes an evolutionary cost-sensitive ELM, with the following merits: 1) to the best of our knowledge, it is the first proposal of ELM in evolutionary cost-sensitive classification scenario; 2) it well addresses the open issue of how to define the cost matrix in cost-sensitive learning tasks; and 3) an evolutionary backtracking search algorithm is induced for adaptive cost matrix optimization. Experiments in a variety of cost-sensitive tasks well demonstrate the effectiveness of the proposed approaches, with about 5%–10% improvements.",2017,IEEE Transactions on Neural Networks and Learning Systems,1505.04373,10.1109/TNNLS.2016.2607757,https://arxiv.org/pdf/1505.04373.pdf
e64d3c30f67be23b85aeb74a0585820285593485,0,,,0,0,1,0,0,0,0,0,0,0,0,Robust face representation and recognition under low resolution and difficult lighting conditions,"This dissertation focuses on different aspects of face image analysis for accurate face recognition under low resolution and poor lighting conditions. A novel resolution enhancement technique is proposed for enhancing a low resolution face image into a high resolution image for better visualization and improved feature extraction, especially in a video surveillance environment. This method performs kernel regression and component feature learning in local neighborhood of the face images. It uses directional Fourier phase feature component to adaptively lean the regression kernel based on local covariance to estimate the high resolution image. For each patch in the neighborhood, four directional variances are estimated to adapt the interpolated pixels. A Modified Local Binary Pattern (MLBP) methodology for feature extraction is proposed to obtain robust face recognition under varying lighting conditions. Original LBP operator compares pixels in a local neighborhood with the center pixel and converts the resultant binary string to 8-bit integer value. So, it is less effective under difficult lighting conditions where variation between pixels is negligible. The proposed MLBP uses a two stage encoding procedure which is more robust in detecting this variation in a local patch. A novel dimensionality reduction technique called Marginality Preserving Embedding (MPE) is also proposed for enhancing the face recognition accuracy. Unlike Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), which project data in a global sense, MPE seeks for a local structure in the manifold. This is similar to other subspace learning techniques but the difference with other manifold learning is that MPE preserves marginality in local reconstruction. Hence it provides better representation in low dimensional space and achieves lower error rates in face recognition. Two new concepts for robust face recognition are also presented in this dissertation. In the first approach, a neural network is used for training the system where input vectors are created by measuring distance from each input to its class mean. In the second approach, half-face symmetry is used, realizing the fact that the face images may contain various expressions such as open/close eye, open/close mouth etc., and classify the top half and bottom half separately and finally fuse the two results. By performing experiments on several standard face datasets, improved results were observed in all the new proposed methodologies. Research is progressing in developing a unified approach for the extraction of features suitable for accurate face recognition in a long range video sequence in complex environments.",2012,,,10.25777/F12J-ZX22,
e86f94293c31594dfa03a2553f44549371b8cf74,0,,,0,1,0,0,0,0,0,0,0,0,0,Generalizing Variational Autoencoders with Hierarchical Empirical Bayes,"Variational Autoencoders (VAEs) have experienced recent success as data-generating models by using simple architectures that do not require significant fine-tuning of hyperparameters. However, VAEs are known to suffer from over-regularization which can lead to failure to escape local maxima. This phenomenon, known as posterior collapse, prevents learning a meaningful latent encoding of the data. Recent methods have mitigated this issue by deterministically moment-matching an aggregated posterior distribution to an aggregate prior. However, abandoning a probabilistic framework (and thus relying on point estimates) can both lead to a discontinuous latent space and generate unrealistic samples. Here we present Hierarchical Empirical Bayes Autoencoder (HEBAE), a computationally stable framework for probabilistic generative models. Our key contributions are two-fold. First, we make gains by placing a hierarchical prior over the encoding distribution, enabling us to adaptively balance the trade-off between minimizing the reconstruction loss function and avoiding over-regularization. Second, we show that assuming a general dependency structure between variables in the latent space produces better convergence onto the mean-field assumption for improved posterior inference. Overall, HEBAE is more robust to a wide-range of hyperparameter initializations than an analogous VAE. Using data from MNIST and CelebA, we illustrate the ability of HEBAE to generate higher quality samples based on FID score than existing autoencoder-based approaches.",2020,ArXiv,2007.10389,,https://arxiv.org/pdf/2007.10389.pdf
e8ad62733a17aaef834d26ef395cfdf63a66d1e4,0,,,0,1,0,0,0,0,0,0,0,0,0,Harnessing Adversarial Distances to Discover High-Confidence Errors,"Given a deep neural network image classification model that we treat as a black box, and an unlabeled evaluation dataset, we develop an efficient strategy by which the classifier can be evaluated. Randomly sampling and labeling instances from an unlabeled evaluation dataset allows traditional performance measures like accuracy, precision, and recall to be estimated. However, random sampling may miss rare errors for which the model is highly confident in its prediction, but wrong. These high-confidence errors can represent costly mistakes, and therefore should be explicitly searched for. Past works have developed search techniques to find classification errors above a specified confidence threshold, but ignore the fact that errors should be expected at confidence levels anywhere below 100%. In this work, we investigate the problem of finding errors at rates greater than expected given model confidence. Additionally, we propose a query-efficient and novel search technique that is guided by adversarial perturbations to find these mistakes in black box models. Through rigorous empirical experimentation, we demonstrate that our Adversarial Distance search discovers high-confidence errors at a rate greater than expected given model confidence.",2020,2020 International Joint Conference on Neural Networks (IJCNN),2006.16055,10.1109/IJCNN48605.2020.9207395,https://arxiv.org/pdf/2006.16055.pdf
e8f4ded98f5955aad114f55e7aca6b540599236b,1,[D18],,1,0,0,0,0,0,1,0,0,0,0,Convolutional Fusion Network for Face Verification in the Wild,"Part-based methods have seen popular applications for face verification in the wild, since they are more robust to local variations in terms of pose, illumination, and so on. However, most of the part-based approaches are built on hand-crafted features, which may not be suitable for the specific face verification purpose. In this paper, we propose to learn a part-based feature representation under the supervision of face identities through a deep model that ensures that the generated representations are more robust and suitable for face verification. The proposed framework consists of the following two deliberate components: 1) a deep mixture model (DMM) to find accurate patch correspondence and 2) a convolutional fusion network (CFN) to extract the part-based facial features. Specifically, DMM robustly depicts the spatial-appearance distribution of patch features over the faces via several Gaussian mixtures, which provide more accurate patch correspondence even in the presence of local distortions. Then, DMM only feeds the patches which preserve the identity information to the following CFN. The proposed CFN is a two-layer cascade of convolutional neural networks: 1) a local layer built on face patches to deal with local variations and 2) a fusion layer integrating the responses from the local layer. CFN jointly learns and fuses multiple local responses to optimize the verification performance. The composite representation obtained possesses certain robustness to pose and illumination variations and shows comparable performance with the state-of-the-art methods on two benchmark data sets.",2016,IEEE Transactions on Circuits and Systems for Video Technology,,10.1109/TCSVT.2015.2406191,https://labicvl.github.io/docs/pubs/Chao_TCSVT_2015.pdf
e966a19e5009d79b40251790722ea0374ff28585,1,,1,0,0,0,0,0,0,1,0,0,0,0,Image steganography using texture features and GANs,"As steganography is the main practice of hidden writing, many deep neural networks are proposed to conceal secret information into images, whose invisibility and security are unsatisfactory. In this paper, we present an encoder-decoder framework with an adversarial discriminator to conceal messages or images into natural images. The message is embedded into QR code first which significantly improves the fault-tolerance. Considering the mean squared error (MSE) is not conducive to perfectly learn the invisible perturbations of cover images, we introduce a texture-based loss that is helpful to hide information into the complex texture regions of an image, improving the invisibility of hidden information. In addition, we design a truncated layer to cope with stego image distortions caused by data type conversion and a moment layer to train our model with varisized images. Finally, our experiments demonstrate that the proposed model improves the security and visual quality of stego images.",2019,2019 International Joint Conference on Neural Networks (IJCNN),,10.1109/IJCNN.2019.8852252,
ea74b140d928c655251b097726a14874e8c09952,0,,,0,1,0,0,0,0,0,0,0,0,0,Geometry-Aware GAN for Face Attribute Transfer,"In this paper, the geometry-aware GAN is proposed to address the issue of facial attribute transfer with unpaired data. To tackle the unpaired training sample problem, the CycleGAN architecture is applied, where the bilateral mappings between the source and target domains are learned. The deformation flow is learned to capture the geometric variation between two domains. We first warp the source face into desired pose and shape according to the flow. Then, the transfer sub-network is designed to refine the results by hallucinating new components on the warped image. The attribute is removed by the reconstruction sub-network, coupled with the warping process. Experiments on benchmark demonstrate the advantages of our method compared to baselines.",2019,2019 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2019.8803830,
eaca35ef2df920a4975b1bfca2ae7d0352a01f58,0,,,0,0,0,0,0,0,1,0,0,0,0,Efficient planar affine canonicalization,"Abstract This paper presents a fast and accurate affine canonicalization method for planar shapes. This method improves on previous ones based on iterative optimization that produce multiple canonical versions. Canonicalization provides a common reference frame for shape comparison without the loss of discrimination ability often caused by invariant features. It also gives for free the alignment transformation between any pair of shapes. The proposed method is based on the properties of the joint angular distribution of marginal skewness and kurtosis, the so-called SK signature, which can be efficiently computed in closed form from the raw image moments. The experiments demonstrate that the method is robust to the non-affine distortions caused by natural perspective image conditions. Thus, it can be used as an automatic preprocessing step to add affine invariance in statistical pattern recognition applications.",2017,Pattern Recognit.,,10.1016/j.patcog.2017.07.017,
eae7d5b15423a148e6bb32d24bbabedfacd0e2df,0,,,0,1,0,0,0,0,0,0,0,0,0,Learning deep representations by mutual information estimation and maximization,"This work investigates unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and compares favorably with fully-supervised learning on several classification tasks in with some standard architectures. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation learning objectives for specific end-goals.",2019,ICLR,1808.0667,,https://arxiv.org/pdf/1808.06670.pdf
ebc2a3e8a510c625353637e8e8f07bd34410228f,1,"[D37], [D38]",,1,0,0,0,0,0,0,0,0,0,0,Dual Sparse Constrained Cascade Regression for Robust Face Alignment,"Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem continues to be challenging due to the large variability in expression, illumination, pose, and the existence of occlusions in real-world face images. In this paper, we present a dual sparse constrained cascade regression model for robust face alignment. Instead of using the least-squares method during the training process of regressors, sparse constraint is introduced to select robust features and compress the size of the model. Moreover, sparse shape constraint is incorporated between each cascade regression, and the explicit shape constraints are able to suppress the ambiguity in local features. To improve the model's adaptation to large pose variation, face pose is estimated by five fiducial landmarks located by deep convolutional neuron network, which is used to adaptively design the cascade regression model. To the best of our best knowledge, this is the first attempt to fuse explicit shape constraint (sparse shape constraint) and implicit context information (sparse feature selection) for robust face alignment in the framework of cascade regression. Extensive experiments on nine challenging wild data sets demonstrate the advantages of the proposed method over the state-of-the-art methods.",2016,IEEE Transactions on Image Processing,,10.1109/TIP.2015.2502485,
ec3eb92b9a56b1fa84b127b8acc980555cd1f2e0,0,,,0,1,0,0,0,0,0,0,0,0,0,Channel-Recurrent Variational Autoencoders,"Variational Autoencoder (VAE) is an efficient framework in modeling natural images with probabilistic latent spaces. However, when the input spaces become complex, VAE becomes less effective, potentially due to the oversimplification of its latent space construction. In this paper, we propose to integrate recurrent connections across channels to both inference and generation steps of VAE. Sequentially building up the complexity of high-level features in this way allows us to capture global-to-local and coarse-to-fine structures of the input data spaces. We show that our channel-recurrent VAE improves existing approaches in multiple aspects: (1) it attains lower negative log-likelihood than standard VAE on MNIST; when trained adversarially, (2) it generates face and bird images with substantially higher visual quality than the state-of-the-art VAE-GAN and (3) channel-recurrency allows learning more interpretable representations; finally (4) it achieves competitive classification results on STL-10 in a semi-supervised setup.",2017,ArXiv,1706.03729,,https://arxiv.org/pdf/1706.03729.pdf
ed2bf771f04bdb43915282afbbdb206b12533459,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,A Method for Efficient and Robust Facial Features Localization,"We present a fast and robust algorithm for face alignment. There are three key contributions. The first is the introduction of a new shape indexed feature called multi-resolution wrapped features (MRWF), which is robust to scale and poses variation, and can be calculated very efficiently. The second is a new gradient boosting method based on a mixture re-sampling strategy, which allows the model to resistant to imbalance of training samples. The third contribution is a method for localizing facial feature points of an unknown image in a new iterative manner, which makes the algorithm robust to initial location. Extensive experiments over images with obvious pose, expression and illumination changes have shown the accuracy and efficiency of our method.",2013,CCBR,,10.1007/978-3-319-02961-0_12,
ed3a6bf2377867cb611fe2ff291d23cd6692dafa,0,,,0,0,0,0,1,0,0,0,0,0,0,A Deep Learning Approach for Age Invariant Face Recognition,"Soft computing, is as a collection of methodologies, and an important element for constructing a new generation of computationally intelligent systems.This has helped it to achieve great success in solving practical computing problems including Face Recognition. On the other hand, deep learning has become one of the most promising techniques in artificial intelligence in the past decade. The contemporary collection of Soft Computing methodologies, combined with Deep Learning technology reveals a promising direction for complex problem solving.Particularly when applied to face recognition,it improves the overall efficiencyof the algorithm and contributes significantly to the accuracy of recognized faces. Efficient face recognition still encounters several serious challenges despite the fact that there have been numerous recent advances in this field. In this paper we present a system, that utilizes the power of Deep learning through Convolutional Neural Networks, Recurrent Neural Networks and LSTM that enable face recognition, to be easily implemented . Our method uses a deep convolutional network trained to directly optimize the embedding of identity itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 256-bytes per face. On the widely used Cross Age Celebrity Dataset (CACD) dataset, our system achieves a very high degree of accuracy.",2017,,,,https://pdfs.semanticscholar.org/ed3a/6bf2377867cb611fe2ff291d23cd6692dafa.pdf
ed8a029f792ac31e34ff2fba4c3d165ae0f3a6e8,1,[D20],,1,0,0,1,0,0,0,0,0,0,0,Gender identification in unconstrained scenarios using Self-Similarity of Gradients features,"Gender identification has been a hot research topic with wide application requirements from social life. In general, effective feature representation is the key to solving this problem. In this paper, a new feature named Self-Similarity of Gradients (GSS) is proposed, which captures pairwise statistics of localized gradient distributions. There are three contributions made by us to practical gender identification. First, GSS features are proposed for gender identification in the wild, which achieve good performance compared with baseline approaches. Second, we originally utilize 31-dimensional HOG for practical gender identification and its excellent results demonstrates that HOG with both contrast sensitive and insensitive information is a better fit for this topic than that with only contrast insensitive information. Last, feature combination and multi-classifier combination strategies are adopted and the best gender identification performance is achieved. Experimental results show that the combination of GSS, HOG and LBP using a linear SVM outperforms state-of-the-art on the LFW database, which meets the “wild” condition.",2014,2014 IEEE International Conference on Image Processing (ICIP),,10.1109/ICIP.2014.7026194,http://robotics.szpku.edu.cn/c/publication/paper/ICIP2014-gaoyuan2.pdf
ef2665a91921035b39a81e5a7c150a6dddf7dbef,0,,,0,1,0,0,0,0,0,0,0,0,0,Unpaired Image-to-Image Translation with Domain Supervision,"Image-to-image translation has been widely investigated in recent years. Existing approaches are elaborately designed in an unsupervised manner and little attention has been paid to domain information beneath unpaired data. In this work, we treat domain information as explicit supervision and design an unpaired image-to-image translation framework, Domain-supervised GAN (briefly, DosGAN), that takes the first step towards exploration of domain supervision. Instead of representing domain characteristics with different generators in CycleGAN [32] or multiple domain codes in StarGAN [2], we pre-train a classification network to classify the domain of an image. After pre-training, this network is used to extract domain features of each image by using the output of its second-to-last layer. Such features, together with the latent semantic features extracted by another encoder (shared across different domains), are used to generate an image in the target domain. Experiments on multiple hair color translation, multiple identity translation and conditional edges-to-shoes/handbags demonstrate the effectiveness of our method. In addition, we transfer the domain feature extractor obtained on the Facescrub dataset with domain supervision information, to the CelebA dataset without domain supervision information, and succeed achieving conditional translation with any two images in CelebA, while previous models like StarGAN cannot handle this task. Our code is available at https://github.com/ linjx-ustc1106/DosGAN-PyTorch.",2019,ArXiv,,,
ef5fc224c3a8bbbb3f11a7296d7193793898a823,0,,,0,1,0,0,0,0,0,0,0,0,0,Swapping Autoencoder for Deep Image Manipulation,"Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.",2020,NeurIPS,2007.00653,,https://arxiv.org/pdf/2007.00653.pdf
effc2b3f5cd09120eadc2e9b28116d4227326e04,0,,,1,0,0,0,0,0,0,0,0,0,0,Morphing Detection Using a General- Purpose Face Recognition System,"Image morphing has proven to be very successful at deceiving facial recognition systems. Such a vulnerability can be critical when exploited in an automatic border control scenario. Recent works on this topic rely on dedicated algorithms which require additional software modules deployed alongside an existing facial recognition system. In this work, we address the problem of morphing detection by using state-of-the-art facial recognition algorithms based on hand-crafted features and deep convolutional neural networks. We show that a general-purpose face recognition system combined with a simple linear classifier can be successfully used as a morphing detector. The proposed method reuses an existing feature extraction pipeline instead of introducing additional modules. It requires neither fine-tuning nor modifications to the existing recognition system and can be trained using only a small dataset. The proposed approach achieves state-of-the-art performance on our morphing datasets using a 5-fold cross-validation.",2018,2018 26th European Signal Processing Conference (EUSIPCO),,10.23919/EUSIPCO.2018.8553375,https://www.eurasip.org/Proceedings/Eusipco/Eusipco2018/papers/1570437948.pdf
f19108c55b7c1831566ce3250322e0f5637d44c9,0,,,0,0,0,0,0,0,1,0,0,0,0,Learning Image Matching by Simply Watching Video,"This work presents an unsupervised learning based approach to the ubiquitous computer vision problem of image matching. We start from the insight that the problem of frame interpolation implicitly solves for inter-frame correspondences. This permits the application of analysis-by-synthesis: we first train and apply a Convolutional Neural Network for frame interpolation, then obtain correspondences by inverting the learned CNN. The key benefit behind this strategy is that the CNN for frame interpolation can be trained in an unsupervised manner by exploiting the temporal coherence that is naturally contained in real-world video sequences. The present model therefore learns image matching by simply “watching videos”. Besides a promise to be more generally applicable, the presented approach achieves surprising performance comparable to traditional empirically designed methods.",2016,ECCV,1603.06041,10.1007/978-3-319-46466-4_26,https://arxiv.org/pdf/1603.06041.pdf
f2470fdff95c0d9d0f488590a2790016b5d1d0c3,0,,,0,1,0,0,0,0,0,0,0,0,0,Defending Adversarial Attacks via Semantic Feature Manipulation,"Machine learning models have demonstrated vulnerability to adversarial attacks, more specifically misclassification of adversarial examples. In this paper, we propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples in an interpretable and efficient manner. The intuition is that the classification result of a normal image is generally resistant to non-significant intrinsic feature changes, e.g., varying thickness of handwritten digits. In contrast, adversarial examples are sensitive to such changes since the perturbation lacks transferability. To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features. The resistance to classification change over the morphs, derived by varying and reconstructing latent codes, is used to detect suspicious inputs. Further, combo-VAE is enhanced to purify the adversarial examples with good quality by considering both class-shared and class-unique features. We empirically demonstrate the effectiveness of detection and the quality of purified instance. Our experiments on three datasets show that FM-Defense can detect nearly $100\%$ of adversarial examples produced by different state-of-the-art adversarial attacks. It achieves more than $99\%$ overall purification accuracy on the suspicious instances that close the manifold of normal examples.",2020,ArXiv,2002.02007,,https://arxiv.org/pdf/2002.02007.pdf
f26dc5b1ff5482b368144c41694260cef066942e,0,,,0,0,0,0,0,1,0,0,0,0,0,Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks,"In this paper, we present a simple and effective framework called Occlusion-adaptive Deep Networks (ODN) with the purpose of solving the occlusion problem for facial landmark detection. In this model, the occlusion probability of each position in high-level features are inferred by a distillation module that can be learnt automatically in the process of estimating the relationship between facial appearance and facial shape. The occlusion probability serves as the adaptive weight on high-level features to reduce the impact of occlusion and obtain clean feature representation. Nevertheless, the clean feature representation cannot represent the holistic face due to the missing semantic features. To obtain exhaustive and complete feature representation, it is vital that we leverage a low-rank learning module to recover lost features. Considering that facial geometric characteristics are conducive to the low-rank module to recover lost features, we propose a geometry-aware module to excavate geometric relationships between different facial components. Depending on the synergistic effect of three modules, the proposed network achieves better performance in comparison to state-of-the-art methods on challenging benchmark datasets.",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),,10.1109/CVPR.2019.00360,http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Robust_Facial_Landmark_Detection_via_Occlusion-Adaptive_Deep_Networks_CVPR_2019_paper.pdf
f397d51e91f068bfc08ae9e66232abd433069abb,0,,,0,0,0,0,0,1,0,0,0,0,0,Patch warping based face frontalization,"Face frontalization increases accuracies of face and gesture recognition applications. In this paper, we propose a 2D patch warping based face frontalization method which that has a simple but efficient flow due to its lower computation cost. We partition the human face into 23 nearly planar regions that are constituted by 68 landmark points to form a frontal face model and used for warping process. Planar places warped by using homography unlike other affine transform based methods. Warping rectangle regions with homography preserve global structure of face as well as it decreased the computational cost of frontalization as againts situations that work with a lot of triangular region like Delaunay triangulation. In order to test recognition performance, every test sample frontalized with respect to average face model computed as the average of all train samples. Test sets created by the pose angles of samples, tested separately to measure the contribution of proposed method to recognition and we compare the proposed method to another state of art frontalization method in literature.",2018,2018 26th Signal Processing and Communications Applications Conference (SIU),,10.1109/SIU.2018.8404728,
f5870956b87e04ebfbc54f8702e2e19ee3f07c55,0,,,0,1,0,0,0,0,0,0,0,0,0,Ensemble Networks for Better Facial Recognition of Bearded Faces,"Face recognition systems such as FaceNet[9] perform poorly when certain facial features are obscured[10]. We propose an ensemble network architecture which combines FaceNet with a specialized secondary network for face recognition in the presence of the facial obscurity, as well as a dispatcher network to decide which face recognition model to use. We apply this architecture to bearded faces, and demonstrate superior performance over standalone face recognition systems. Our architecture extends to an arbitrary number of facial obscurities, indicating a potential for significant improvement to face recognition systems in general.",2019,,,,https://pdfs.semanticscholar.org/f587/0956b87e04ebfbc54f8702e2e19ee3f07c55.pdf
f7d64f6c88623acd53c7aff9d6062f749a464325,0,,,1,0,0,0,0,0,0,0,0,0,0,Privacy-Friendly Photo Sharing and Relevant Applications Beyond,"Popularization of online photo sharing brings people great convenience, but has also raised concerns for privacy. Researchers proposed various approaches to enable image privacy, most of which focus on encrypting or distorting image visual content. In this thesis, we investigate novel solutions to protect image privacy with a particular emphasis on online photo sharing. To this end, we propose not only algorithms to protect visual privacy in image content but also design of architectures for privacy-preserving photo sharing. Beyond privacy, we also explore additional impacts and potentials of employing daily images in other three relevant applications. First, we propose and study two image encoding algorithms to protect visual content in image, within a Secure JPEG framework. The first method scrambles a JPEG image by randomly changing the signs of its DCT coefficients based on a secret key. The second method, named JPEG Transmorphing, allows one to protect arbitrary image regions with any obfuscation, while secretly preserving the original image regions in application segments of the obfuscated JPEG image. Performance evaluations reveal a good degree of storage overhead and privacy protection capability for both methods, and particularly a good level of pleasantness for JPEG Transmorphing, if proper manipulations are applied. Second, we investigate the design of two architectures for privacy-preserving photo sharing. The first architecture, named ProShare, is built on a public key infrastructure (PKI) integrated with a ciphertext-policy attribute-based encryption (CP-ABE), to enable the secure and efficient access to user-posted photos protected by Secure JPEG. The second architecture is named ProShare S, in which a photo sharing service provider helps users make photo sharing decisions automatically based on their past decisions using machine learning. The photo sharing service analyzes not only the content of a user's photo, but also context information about the image capture and a prospective requester, and finally makes decision whether or not to share a particular photo to the requester, and if yes, at which granularity. A user study along with extensive evaluations were performed to validate the proposed architecture. In the end, we research into three relevant topics in regard to daily photos captured or shared by people, but beyond their privacy implications. In the first study, inspired by JPEG Transmorphing, we propose an animated JPEG file format, named aJPEG. aJPEG preserves its animation frames as application markers in a JPEG image and provides smaller file size and better image quality than conventional GIF. In the second study, we attempt to understand the impact of popular image manipulations applied in online photo sharing on evoked emotions of observers. The study reveals that image manipulations indeed influence people's emotion, but such impact also depends on the image content. In the last study, we employ a deep convolutional neural network (CNN), the GoogLeNet model, to perform automatic food image detection and categorization. The promising results obtained provide meaningful insights in design of automatic dietary assessment system based on multimedia techniques, e.g. image analysis.",2017,,,10.5075/EPFL-THESIS-7828,https://pdfs.semanticscholar.org/f7d6/4f6c88623acd53c7aff9d6062f749a464325.pdf
f9edf84ca07ba4b17f309b16f685af2819ead564,0,,,0,1,0,0,0,0,0,0,0,0,0,Attribute-Aware Face Aging With Wavelet-Based Generative Adversarial Networks,"Since it is difficult to collect face images of the same subject over a long range of age span, most existing face aging methods resort to unpaired datasets to learn age mappings. However, the matching ambiguity between young and aged face images inherent to unpaired training data may lead to unnatural changes of facial attributes during the aging process, which could not be solved by only enforcing identity consistency like most existing studies do. In this paper, we propose an attribute-aware face aging model with wavelet based Generative Adversarial Networks (GANs) to address the above issues. To be specific, we embed facial attribute vectors into both the generator and discriminator of the model to encourage each synthesized elderly face image to be faithful to the attribute of its corresponding input. In addition, a wavelet packet transform (WPT) module is incorporated to improve the visual fidelity of generated images by capturing age-related texture details at multiple scales in the frequency space. Qualitative results demonstrate the ability of our model in synthesizing visually plausible face images, and extensive quantitative evaluation results show that the proposed method achieves state-of-the-art performance on existing datasets.",2019,2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),1809.06647,10.1109/CVPR.2019.01215,https://arxiv.org/pdf/1809.06647.pdf
fa2837e28b6953317cbdb64c72b66929e574fdd3,0,,,0,0,0,0,0,0,1,0,0,0,0,Joint Segmentation and Registration Through the Duality of Congealing and Maximum Likelihood Estimate,"In this paper we consider the task of joint registration and segmentation. A popular method which aligns images and simultaneously estimates a simple statistical shape model was proposed by E. Learned-Miller and is known as.congealing. It considers the entropy of a simple, pixel-wise independent distribution as the objective function for searching the unknown transformations. Besides being intuitive and appealing, this idea raises several theoretical and practical questions, which we try to answer in this paper. First, we analyse the approach theoretically and show that the original congealing is in fact the DC-dual task (difference of convex functions) for a properly formulated Maximum Likelihood estimation task. This interpretation immediately leads to a different choice for the algorithm which is substantially simpler than the known congealing algorithm. The second contribution is to show, how to generalise the task for models in which the shape prior is formulated in terms of segmentation labellings and is related to the signal domain via a parametric appearance model. We call this generalisation unsupervised congealing. The new approach is applied to the task of aligning and segmenting imaginal discs of Drosophila melanogaster larvae.",2015,IPMI,,10.1007/978-3-319-19992-4_27,http://cmp.felk.cvut.cz/~flachbor/publications/ipmi2015_fin.pdf
fa32b29e627086d4302db4d30c07a9d11dcd6b84,1,,1,0,1,0,0,0,0,0,0,0,0,0,Weakly Supervised Facial Attribute Manipulation via Deep Adversarial Network,"Automatically manipulating facial attributes is challenging because it needs to modify the facial appearances, while keeping not only the person's identity but also the realism of the resultant images. Unlike the prior works on the facial attribute parsing, we aim at an inverse and more challenging problem called attribute manipulation by modifying a facial image in line with a reference facial attribute. Given a source input image and reference images with a target attribute, our goal is to generate a new image (i.e., target image) that not only possesses the new attribute but also keeps the same or similar content with the source image. In order to generate new facial attributes, we train a deep neural network with a combination of a perceptual content loss and two adversarial losses, which ensure the global consistency of the visual content while implementing the desired attributes often impacting on local pixels. The model automatically adjusts the visual attributes on facial appearances and keeps the edited images as realistic as possible. The evaluation shows that the proposed model can provide a unified solution to both local and global facial attribute manipulation such as expression change and hair style transfer. Moreover, we further demonstrate that the learned attribute discriminator can be used for attribute localization.",2018,2018 IEEE Winter Conference on Applications of Computer Vision (WACV),,10.1109/WACV.2018.00019,http://www.public.asu.edu/~swang187/publications/WACV18.pdf
fac345e3ce205477365488c8279bdeab53d9638e,1,[D18],,1,1,0,0,0,0,0,0,0,0,0,Self-Supervised Learning of Detailed 3D Face Reconstruction,"In this article, we present an end-to-end learning framework for detailed 3D face reconstruction from a single image. Our approach uses a 3DMM-based coarse model and a displacement map in UV-space to represent a 3D face. Unlike previous work addressing the problem, our learning framework does not require supervision of surrogate ground-truth 3D models computed with traditional approaches. Instead, we utilize the input image itself as supervision during learning. In the first stage, we combine a photometric loss and a facial perceptual loss between the input face and the rendered face, to regress a 3DMM-based coarse model. In the second stage, both the input image and the regressed texture of the coarse model are unwrapped into UV-space, and then sent through an image-to-image translation network to predict a displacement map in UV-space. The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage. The advantage of learning displacement map in UV-space is that face alignment can be explicitly done during the unwrapping, thus facial details are easier to learn from large amount of data. Extensive experiments demonstrate the superiority of our method over previous work.",2020,IEEE Transactions on Image Processing,1910.11791,10.1109/TIP.2020.3017347,https://arxiv.org/pdf/1910.11791.pdf
fc3326cb35519d4fb966216e267f088cb491c0b1,0,,,0,1,0,0,0,0,0,0,0,0,0,Online Exemplar Fine-Tuning for Image-to-Image Translation,"Existing techniques to solve exemplar-based image-to-image translation within deep convolutional neural networks (CNNs) generally require a training phase to optimize the network parameters on domain-specific and task-specific benchmarks, thus having limited applicability and generalization ability. In this paper, we propose a novel framework, for the first time, to solve exemplar-based translation through an online optimization given an input image pair, called online exemplar fine-tuning (OEFT), in which we fine-tune the off-the-shelf and general-purpose networks to the input image pair themselves. We design two sub-networks, namely correspondence fine-tuning and multiple GAN inversion, and optimize these network parameters and latent codes, starting from the pre-trained ones, with well-defined loss functions. Our framework does not require the off-line training phase, which has been the main challenge of existing methods, but the pre-trained networks to enable optimization in online. Experimental results prove that our framework is effective in having a generalization power to unseen image pairs and clearly even outperforms the state-of-the-arts needing the intensive training phase.",2020,ArXiv,2011.0933,,https://arxiv.org/pdf/2011.09330.pdf
fd1003eeca71ea2e92747767bfc6c862c6036f37,0,,,1,0,0,0,0,0,0,1,0,0,0,Video analytics for surveillance camera networks,"International usage and interest in Closed-Circuit Television (CCTV) for surveillance of public spaces has proved to be an effective in forensic, or reactive response to crime and terrorism. In the ideal scenario, it would be useful to detect events, in real-time or close to real-time, in order to mitigate possible harm. However, it is an issue to adequately monitor the video feeds with security guards. In this paper, we address the key problems in the existing surveillance system, followed by some discussions on the integration of social signal and video search to enhance the traditional surveillance system. We shows the performance of face-based identity inference under surveillance. as well as video indexing and search results. Finally, we discuss the related future research direction in this area.",2013,2013 19th IEEE International Conference on Networks (ICON),,10.1109/ICON.2013.6782002,
fd287ca0e82a7968dad25cc24483ca72e79ff010,0,,,0,1,0,0,0,0,0,0,0,0,0,GAN Memory with No Forgetting,,2020,ArXiv,2006.07543,,https://arxiv.org/pdf/2006.07543.pdf
fdd19fee07f2404952e629cc7f7ffaac14febe01,1,[D23],,1,0,0,0,0,0,0,1,0,0,0,Face recognition based on dictionary learning with the locality constraints of atoms,"Previous dictionary learning algorithms usually take the locality information of training samples into account in the learning process, and it may degrade the robustness of the dictionary. In this paper, an new locality constrained dictionary learning algorithm (LCDL) for face recognition by using the locality characters of atoms is proposed. Since the atoms are learned from the training samples, they are more robust to the noise and outliers than training samples. In the LCDL algorithm, we use atoms to construct a Laplacian graph, and then use the profile (the row vector of coding coefficients matrix) to measure the similarity among them. Then, we construct a locality constraint term by using the profile matrix and Laplacian graph of atoms. Since the profile and atoms can be adaptively updated in dictionary learning processing, the locality constraint term also can be adaptively updated. Moreover, the locality constraint term also can inherit the geometrical structure of the training samples, and it can enhance discriminative ability of the dictionary. Experiment results show that the LCDL algorithm achieve more promising performance than some state-of-the-art dictionary learning and sparse coding algorithms.",2016,"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)",,10.1109/CISP-BMEI.2016.7852754,
fe961cbe4be0a35becd2d722f9f364ec3c26bd34,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,"Computer-based tracking, analysis, and visualization of linguistically significant nonmanual events in American Sign Language (ASL)","Our linguistically annotated American Sign Language (ASL) corpora have formed a basis for research to automate detection by computer of essential linguistic information conveyed through facial expressions and head movements. We have tracked head position and facial deformations, and used computational learning to discern specific grammatical markings. Our ability to detect, identify, and temporally localize the occurrence of such markings in ASL videos has recently been improved by incorporation of (1) new techniques for deformable model-based 3D tracking of head position and facial expressions, which provide significantly better tracking accuracy and recover quickly from temporary loss of track due to occlusion; and (2) a computational learning approach incorporating 2-level Conditional Random Fields (CRFs), suited to the multi-scale spatio-temporal characteristics of the data, which analyses not only low-level appearance characteristics, but also the patterns that enable identification of significant gestural components, such as periodic head movements and raised or lowered eyebrows. Here we summarize our linguistically motivated computational approach and the results for detection and recognition of nonmanual grammatical markings; demonstrate our data visualizations, and discuss the relevance for linguistic research; and describe work underway to enable such visualizations to be produced over large corpora and shared publicly on the Web.",2014,,,,https://pdfs.semanticscholar.org/fe96/1cbe4be0a35becd2d722f9f364ec3c26bd34.pdf
ffdc2cc5ae2e0ff6b4165e3ee70a1405b1471f7a,1,[D18],,1,0,0,0,0,0,0,0,0,0,0,Mask Based Unsupervised Content Transfer,"We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. The proposed method disentangles the common and separate parts of these domains and, through the generation of a mask, focuses the attention of the underlying network to the desired augmentation alone, without wastefully reconstructing the entire target. This enables state-of-the-art quality and variety of content translation, as shown through extensive quantitative and qualitative evaluation. Furthermore, the novel mask-based formulation and regularization is accurate enough to achieve state-of-the-art performance in the realm of weakly supervised segmentation, where only class labels are given. To our knowledge, this is the first report that bridges the problems of domain disentanglement and weakly supervised segmentation. Our code is publicly available at this https URL.",2020,ICLR,1906.06558,,https://arxiv.org/pdf/1906.06558.pdf