Daily Research Log 2014-7-17

Learning a Deep Convolutional Network for Image Super-Resolution, Chao Dong, Xiaoou Tang

  • Overview
    • Learns an end-to-end LR to HR mapping via deep learning, differs from traditional method fundamentally
      • Does not learn a dictionary or manifold explicitly
      • But learns it implicitly
      • Pipeline is fully learned, without hard-coded pre/post-processing
    • SRCNN's appealing properties
      • Intentionally designed with simplicity in mind
      • Moderate numbers of filters and layers makes the method fast and on-line practical
      • Huge potential in performance when larger dataset, lager model is available
    • Contributions
      • CNN for SR, end-to-end mapping between LR and HR
      • Relate SRCNN and sparse-coding-based SR methods, which guides the construction of the network
      • Demonstrate DL is useful inSR
  • Model
    • Notations
      • Y: LR image, X: HR image, F: X = F(Y)
    • F incorporates three operations
      • Patch extraction and representation
        • Extract overlapping patches from LR image, represent as vectors comprise of feature maps
      • Non-linear mapping
        • Map the features into another high-dimensional vector
      • Reconstruction
    • Patch extraction
      • Traditionally: pre-trained bases like PCA, DCT, Haar
      • Ours: optimize the filters in the framework
      • W1 is c*f1*f1*n1 dimensional
        • C: channels, f1: kernel size, n1: filter numbers
      • B1: biases
      • Perform convolution of W1 and LR image
    • Non-linear mapping
      • Nothing to explain
      • W2 is of a size n1*1*1*n2
    • Reconstruction
      • Take mean of the output layer
      • Consider the 'mean' as a convolution kernel
      • W3 is n2*f3*f3*c dimensional
  • Experiment
    • Loss function: MSE
    • Parameters
      • F1=9, f3=5, n1=64, n2=32
    • Implementation, using 'cuda-convnet'
    • Channels
      • Only consider the luminance channel in YCrCb color space. Chrominance channels are bicubic upsampled
    • Evaluation
      • No padding the convolutional layers
      • Calculate and evaluate the inner area, with 20*20 crop
      • Evaluate PSNR value

Daily Research Log 2014-7-11

RESTRICTED BOLTZMANN MACHINE APPROACH TO COUPLE DICTIONARY TRAINING FOR IMAGE SUPER-RESOLUTION, ICIP2013, Gao, Junbin

  • Review of SR via sparse representation
    • Assumption of HR and LR
    • Sparse prior
    • Learn couple dictionary D_h and D_l
  • Our model
    • Use RBM to help training the couple dictionary
    • Min.
    • Given input y
      • Generate x0 by naïve methods
      • Use [x0 y0] as input, train RBM, get dictionary coefficients

Daily Research Log 2014-7-10

Space-Time Super-Resolution from a Single Video

  • Assumption
    • Small space-time patches recur many times
    • Statistically explored
  • Introduction
    • Problem Investigation
      • Spatial resolution
        • CCD density and point spreading function
      • Temporal resolution
        • Frame-rate and exposure time
  • Method
    • Find similar ST patches in a coarser temporal scale
    • Use the corresponding fine temporal patch to fill the super-resolution of current patch

Super-resolution & deep learning

Thoughts

  • Pros on using DL in video SR
    • Huge dataset, good for training
  • Cons on using DL in video SR
    • In converting a certain patch from LR to HR, the pixel number varies, results in the different number of neurons
  • Intrinsic problem
    • Information increasing?
      • Increasing resolution itself is information increasing problem, which is theoretically unsolvable
      • However the example pool has provided extra information
      • The core problem is to find the 'subspace' where the current video sequence lies in. The projection of LR image
  • Problems of current methods
    • I assume that current methods are based on exemplar approaches, where raw image patches are used
    • We can use features instead, or decomposing the existing exemplars, and reconstruct the high-resolution patch
  • Possible methods
    • Method I, RF decomposition
      • Decompose patch into resolution and feature dimensions
      • Features are resolution-invariant
      • Each patch belongs to the linear combination of certain feature-basis at certain resolution
      • For each feature basis, abundant multi-scale exemplars are available in the database, compared with raw exemplars
    • Method II
      • Directly train a low-to-high NN
        • The difference between scale and resolution
        • Multi-resolution auto-encoder
          • Pure hack

Notes: Learning to Detect A Salient Object, Liu Tie, Xiaoou Tang

Learning to Detect A Salient Object, PAMI 2011, Xiaoou Tang

  • First quantitative evaluation dataset for visual attention algorithms
  • Most existing saliency algorithms are based on bottom-up computational framework
    • Steps
      • Feature extraction
      • Saliency computation
        • Center-surround operation, self-information, graph-based random walk
      • Find fixations, or sparse points by winner-take-all or inhibition-of-return
    • Problem
      • Finding fixations rather than where visual attention should be
      • Focus on low-level features rather than real attention
  • Our model
    • Incorporate top-down information about salient object
      • User label is considered to be top-down information
    • Local, regional, global features to define generic salient object
    • Use Condition Random Field (CRF) learning
  • Our dataset
    • 20,000+ well labeled images
    • What is salient object? Multiple user labeling
    • Selected 60,000 out of 130000 images containing a salient object or a distinctive foreground object
      Further select 20,000 images for labeling

Method Description

  • Our model
    • CRF model
    • Features
      • E = E_salient_object + S(x, x'), salient object feature & pairwise term for adjacent pixels
      • Salient object feature
        • For every pixel,
          • Salient object feature

            This is designed for finding A={a} s.t. min. sum(F).
            If a pixel is hypothesised to be salient, the feature response is negative; if not, the feature is positive.
          • Pairwise term, appearance difference multiplies label difference. Penalties adjacent pixels with similar color labeled differently. Naïve stuff.
    • CRF learning
      • Learn lambda for each feature under the Maximized Likelihood criteria, in the following sense
      • Objective function being
    • Salient object features
      • Multi-scale contrast
        • Naïve contrast, center – neighbor, in Gaussian pyramids
          6 level pyramids, 9x9 window
          highlights boundaries
      • Center-surround histogram
        • Salient object usually has a larger extend in contrast
          For each pixel, enumerate hypothesis rectangle R and RS with different aspect ratio and size
          Test Chi-square distance of RGB histogram of R and RS

      • Color spatial distribution
        • Distinctiveness of color
          GMM color model

Implementation

Daily Research Log 2014-7-6

Learning to Detect A Salient Object, Xiaoou Tang

  • Our model
    • CRF model
    • Features
      • E = E_salient_object + S(x, x'), salient object feature & pairwise term for adjacent pixels
      • Salient object feature
        • For every pixel,
          • Salient object feature

            This is designed for finding A={a} s.t. min. sum(F).
            If a pixel is hypothesised to be salient, the feature response is negative; if not, the feature is positive.
          • Pairwise term, appearance difference multiplies label difference. Penalties adjacent pixels with similar color labeled differently. Naïve stuff.
    • CRF learning
      • Learn lambda for each feature under the Maximized Likelihood criteria, in the following sense
      • Objective function being

 

 

 

Daily Research Log 2014-7-5

The Secrets of Salient Object Segmentation, CVPR2014, Xiaodi Hou, Alan

  • Fixation v.s. salient object
    • People have long been neglecting the intrinsic difference between gaze data and salient object, while evaluating the two in the same dataset.
  • Dataset
    • Out dataset
      • Segment, fixation, salient object
    • Analysis
      • Ground-truth consistency
        • Measure F-score by comparing threshold binary map of user labeled salient object
      • Bias
        • Design bias
          • FT dataset, choose images with predominant salient objects
        • Center bias
  • FixationSalient object
    • Fixation-based representations are disadvantaged in a salient object segmentation task
      • [1] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. Frequency-tuned salient region detection. CVPR2009
      • [4] A. Borji, D. N. Sihite, and L. Itti. Salient object detection: A benchmark. In ECCV 2011
    • Traditionally two steps
      • Design a suitable representation for salient object segmentation
      • Saliency principles
    • Our Model
      • Overview: segmentation and rank by fixation
      • Step I
        • Object proposal by CPMC
      • Step II
        • Salient segment ranking
          • Density of fixation
          • Non-uniform spatial distribution of fixations, e.g. fixation at the center of a segment increases the probability
        • Learn a scoring function of object candidate, w.r.t. candidate mask and fixation distribution map
          • Features: 33
            • Shape features
              • Major axis length, eccentricity, minor axis length , Euler number
            • Fixation distribution features
              • Align by major axis, extract 4*3 histogram
            • No appearance feature
      • Random forest
  • Model Validation
    • Test the upper-bound of the selector
      • Run algorithm on ground-truth segments using PASCAL-S
    • Test the performance of the CPMC segmentation algorithm
      • Use the first 200 segments to compare with PASCAL-S segmentation groundtruth
  • QUESTIONS & ANSWER
    • Q: Is fixation predicting algorithm good enough?
    • A: According to the authors,
      • In addition to the fixation prediction results, we also tested the F-measure of the ground-truth human fixations on IS and PASCAL-S.
      • When we remove the effect of center bias and dataset design bias, the performance of fixation algorithms be- comes very competitive. We also notice that both fixation and salient object algorithms are on a par with human fixation data in F-measure. The

Learning to Detect A Salient Object, Xiaoou Tang

  • First quantitative evaluation dataset for visual attention algorithms
  • Most existing saliency algorithms are based on bottom-up computational framework
    • Steps
      • Feature extraction
      • Saliency computation
        • Center-surround operation, self-information, graph-based random walk
      • Find fixations, or sparse points by winner-take-all or inhibition-of-return
    • Problem
      • Finding fixations rather than where visual attention should be
      • Focus on low-level features rather than real attention
  • Our model
    • Incorporate top-down information about salient object
      • User label is considered to be top-down information
    • Local, regional, global features to define generic salient object
    • Use Condition Random Field (CRF) learning
  • Our dataset
    • 20,000+ well labeled images
    • What is salient object? Multiple user labeling
    • Selected 60,000 out of 130000 images containing a salient object or a distinctive foreground object
      Further select 20,000 images for labeling

     
     

  • 3rd party
    • CPMC
      • Overview: generate over-complete potential object coverage, assess objectness score
      • Initializes foreground seeds uniformly, calculate min-cut