- Transductive Unbiased Embedding for Zero-Shot Learning
- Frustum PointNets for 3D Object Detection from RGB-D Data
- Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior
- DiverseNet: When One Right Answer Is Not Enough
- SSNet: Scale Selection Network for Online 3D Action Prediction
- Very Large-Scale Global SfM by Distributed Motion Averaging
- PAD-Net: Multi-Tasks Guided Prediciton-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing
- Dynamic Feature Learning for Partial Face Recognition
- Context-aware Deep Feature Compression for High-speed Visual Tracking
- Between-class Learning for Image Classification
- DVQA: Understanding Data Visualizations via Question Answering
- Human Appearance Transfer
- Learning to Segment Every Thing
- Globally Optimal Inlier Set Maximization for Atlanta Frame Estimation
- Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation
- Learning to Compare: Relation Network for Few-Shot Learning
- Arbitrary Style Transfer with Deep Feature Reshuffle
- Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks
- Robust Video Content Alignment and Compensation for Rain Removal in a CNN Framework
- Guided Proofreading of Automatic Segmentations for Connectomics
- Deep PhaseNet for Video Frame Interpolation
- Context-aware Synthesis for Video Frame Interpolation
- Lean Multiclass Crowdsourcing
- Unsupervised Deep Generative Adversarial Hashing Network
- R-FCN-3000 at 30fps: Decoupling Detection and Classification
- Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
- Gated Fusion Network for Single Image Dehazing
- Learning a Complete Image Indexing Pipeline
- Mask-guided Contrastive Attention Model for Person Re-Identification
- Learning Pose Specific Representations by Predicting different Views
- Deep Mutual Learning
- Improving Occlusion and Hard Negative Handling for Single-Stage Object Detectors
- Defense against adversarial attacks using guided denoiser
- Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking
- Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships
- Decorrelated Batch Normalization
- On the Duality Between Retinex and Image Dehazing
- CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
- The Perception-Distortion Tradeoff
- Image Blind Denoising With Generative Adversarial Network Based Noise Modeling
- Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning
- A Low Power, High Throughput, Fully Event-Based Stereo System
- Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
- End-to-end Flow Correlation Tracking with Spatial-temporal Attention
- Exploiting Transitivity for Learning Person Re-identification Models on a Budget
- Imagination-IQA: No-reference Image Quality Assessment via Adversarial Learning
- Egocentric Activity Recognition on a Budget
- Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
- Duplex Generative Adversarial Network for Unsupervised Domain Adaptation
- Fine-grained Video Captioning for Sports Narrative
- High Performance Visual Tracking with Siamese Region Proposal Network
- Adversarially Occluded Samples for Person Re-identification
- MatNet: Modular Attention Network for Referring Expression Comprehension
- Low-Latency Video Semantic Segmentation
- MapNet: An Allocentric Spatial Memory for Mapping Environments
- Fast End-to-End Trainable Guided Filter
- Partial Transfer Learning with Selective Adversarial Networks
- Reconstruction Network for Video Captioning
- Improving Landmark Localization with Semi-Supervised Learning
- Unsupervised Person Image Synthesis in Arbitrary Poses
- Efficient Large-scale Approximate Nearest Neighbor Search on OpenCL FPGA
- Deep End-to-End Time-of-Flight Imaging
- Augmenting Crowd-Sourced 3D Reconstructions using Semantic Detections
- DocUNet: Document Image Unwarping via A Stacked U-Net
- Geometry Aware Optimization for Deep Learning: The Good Practice
- Learning to Detect Features in Texture Images
- LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation
- Spatially-Adaptive Filter Units for Deep Neural Networks
- Revisiting Video Saliency: A Large-scale Benchmark and a New Model
- Real-World Repetition Estimation by Div, Grad and Curl
- Learning Visual Knowledge Memory Networks for Visual Question Answering
- Attention-aware Compositional Network for Person Re-Identification
- Sim2Real View Invariant Visual Servoing by Recurrent Control
- Time-resolved Light Transport Decomposition for Thermal Photometric Stereo
- Trapping Light for Time of Flight
- A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation
- Global versus Localized Generative Adversarial Nets
- Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
- Learning a Toolchain for Image Restoration
- CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition
- Feature Quantization for Defending Against Distortion of Images
- A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds
- Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation
- Aperture Supervision for Monocular Depth Estimation
- Divide and Conquer for Full-Resolution Light Field Deblurring
- Multi-shot Pedestrian Re-identification via Sequential Decision Making
- Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features
- Depth-Aware Stereo Video Retargeting
- Multistage Adversarial Losses for Pose-Based Human Image Synthesis
- Multi-Content GAN for Few-Shot Font Style Transfer
- Multi-Cue Correlation Filters for Robust Visual Tracking
- A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
- Improving Color Reproduction Accuracy in the Camera Imaging Pipeline
- Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks
- Sketch-a-Classifier: Sketch-based Photo Classifier Generation
- Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks
- TOM-Net: Learning Transparent Object Matting from a Single Image
- Estimation of Camera Locations in Highly Corrupted Scenarios: All About the Base, No Shape Trouble
- Direction-aware Spatial Context Features for Shadow Detection
- Neural Motifs: Scene Graph Parsing with Global Context
- Object Referring in Videos with Language and Human Gaze
- Learning Transferable Architectures for Scalable Image Recognition
- View Extrapolation of Human Body from a Single Image
- Probabilistic Plant Modeling via Multi-View Image-to-Image Translation
- Learning a Discriminative Prior for Blind Image Deblurring
- Optimal Structured Light a la Carte
- Revisiting Deep Intrinsic Image Decompositions
- GAGAN: Geometry Aware Generative Adverserial Networks
- Learning Multi-grid Generative ConvNets by Minimal Contrastive Divergence
- Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
- Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification
- Variational Autoencoders for Deforming 3D Mesh Models
- Rotation Averaging and Strong Duality
- 3D Hand Pose Estimation: From Current Achievements to Future Goals
- Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
- A Robust Generative Framework for Generalized Zero-Shot Learning
- Two can play this Game: Visual Dialog with Discriminative Visual Question Generation and Visual Question Answering
- Rotation-sensitive Regression for Oriented Scene Text Detection
- Adversarial Feature Augmentation for Unsupervised Domain Adaptation
- Deep Regression Forests for Age Estimation
- FOTS: Fast Oriented Text Spotting with a Unified Network
- SoS-RSC: A Sum-of-Squares Polynomial Approach to Robustifying Subspace Clustering Algorithms
- Efficient Subpixel Refinement with Symbolic Linear Predictors
- Self-Supervised Feature Learning by Learning to Spot Artifacts
- PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
- Scale-recurrent Network for Deep Image Deblurring
- Multi-Cell Classification by Convolutional Dictionary Learning with Class Proportion Priors
- Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
- On the convergence of PatchMatch and its variants
- Clinical Skin Lesion Diagnosis using Representations Inspired by Dermatologist Criteria
- PoTion: Pose MoTion Representation for Action Recognition
- Zigzag Learning for Weakly Supervised Object Detection
- VITAL: VIsual Tracking via Adversarial Learning
- Crowd Counting with Deep Negative Correlation Learning
- Multi-Label Zero-Shot Learning with Structured Knowledge Graphs
- Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition
- A Closer Look at Spatiotemporal Convolutions for Action Recognition
- Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification
- End-to-End Deep Kronecker-Product Matching for Person Re-identification
- Consensus Maximization for Semantic Region Correspondences
- SBNet: Sparse Block’s Network for Fast Inference
- Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints
- Group Consistent Similarity Learning via Deep CRFs for Person Re-Identification
- Now You Shake Me: Towards Automatic 4D Cinema
- Defocus Blur Detection via Multi-Stream Bottom-Top-Bottom Fully Convolutional Network
- Interpret Neural Networks by Identifying Critical Data Routing Paths
- Deep Reinforcement Learning of Region Proposal Networks for Object Detection
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- Finding It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video"
- Semantic Visual Localization
- DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks
- Composing Two Objects of Interest for Flying Camera Photography
- Kernelized Subspace Pooling for Deep Local Descriptors
- Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks
- Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
- Deep Lesion Graph in the Wild: Relationship Learning and Organization of Significant Radiology Image Findings in a Diverse Large-scale Lesion Database
- An Efficient and Provable Approach for Mixture Proportion Estimation Using Linear Independence Assumption
- Eliminating Background-bias for Robust Person Re-identification
- Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View
- High-order tensor regularization with application to attribute ranking
- Taskonomy: Disentangling Task Transfer Learning
- BlockDrop: Dynamic Inference Paths in Residual Networks
- Attend and Interact: Higher-Order Object Interactions for Video Understanding
- Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation
- CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles
- Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification
- Large Scale Fine-Grained Categorization and the Effectiveness of Domain-Specific Transfer Learning
- BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
- Improved Human Pose Estimation through Adversarial Data Augmentation
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation
- Structured Uncertainty Prediction Networks
- Geometry-Guided CNN for Self-supervised Video Representation learning
- Low-Shot Recognition with Imprinted Weights
- Self-supervised Learning of Geometrically Stable Features Through Probabilistic Introspection
- Disentangling Structure and Aesthetics for Content-aware Image Completion
- A Volumetric Descriptive Network for 3D Object Synthesis
- Interpretable Convolutional Neural Networks
- Single Image Dehazing via Conditional Generative Adversarial Network
- Neural Inverse Kinematics for Unsupervised Motion Retargetting
- Environment Upgrade Reinforcement Learning for Non-differentiable Multi-stage Pipelines
- Teaching Categories to Human Learners with Visual Explanations
- Facelet-Bank for Fast Portrait Manipulation
- Convolutional Sequence to Sequence Model for Human Dynamics
- Human Semantic Parsing for Person Re-identification
- Latent RANSAC
- LiDAR-Video Driving Dataset: Learning Driving Policies Effectively
- Actor and Observer: Joint Modeling of First and Third-Person Videos
- Controllable Video Generation with Sparse Trajectories
- What have we learned from deep representations for action recognition?
- Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
- Language-Based Image Editing with Recurrent attentive Models
- Graph-Cut RANSAC
- Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition
- Memory Based Online Learning of Deep Representations from Video Streams
- Deep Layer Aggregation
- Learning Convolutional Networks for Content-weighted Image Compression
- Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250Hz
- Efficient, sparse representation of manifold distance matrices for classical scaling
- Visual to Sound: Generating Natural Sound for Videos in the Wild
- A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos
- Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
- Self-calibrating polarising radiometric calibration
- Pix3D: Dataset and Methods for 3D Object Modeling from a Single Image
- Learning to Promote Saliency Detectors
- Pose Transferrable Person Re-Identification
- Hashing as Tie-Aware Learning to Rank
- Baseline Desensitizing In Translation Averaging
- Conditional Image-to-Image Translation
- Blind Predicting Similar Quality Map for Image Quality Assessment
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
- CNN Driven Sparse Multi-Level B-spline Image Registration
- Through-Wall Human Pose Estimation Using Radio Signals
- xUnit: Learning a Spatial Activation Function for Efficient Image Restoration
- CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization
- FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds
- Weakly Supervised Coupled Networks for Visual Sentiment Analysis
- Ring loss: Convex Feature Normalization for Face Recognition
- Fast Spectral Ranking for Similarity Search
- PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
- AMNet: Memorability Estimation with Attention
- Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-grained Classification
- End-to-End Learning of Motion Representation for Video Understanding
- Smooth Neighbors on Teacher Graphs for Semi-supervised Learning
- SeedNet : Automatic Seed Generation with Deep Reinforcement Learning for Robust Interactive Segmentation
- Deep Spatio-Temporal Random Fields for Efficient Video Segmentation
- Perturbative Neural Networks: Rethinking Convolution in CNNs
- SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks
- Neural 3D Mesh Renderer
- Deep Parametric Continuous Convolutional Neural Networks
- Visual Question Reasoning on General Dependency Tree
- Non-local Neural Networks
- Light field intrinsics with a deep encoder-decoder network
- Feature Space Transfer for Data Augmentation
- Motion Segmentation by Exploiting Complementary Geometric Models
- Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation
- Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
- Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks
- Few-Shot Image Recognition by Predicting Parameters from Activations
- Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation
- CLEAR: Cumulative LEARning for One-Shot One-Class Image Recognition
- Pose-Robust Face Recognition via Deep Residual Equivariant Mapping
- Deep Cross-media Knowledge Transfer
- Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs
- A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos
- Recurrent Slice Networks for 3D Segmentation on Point Clouds
- Dimensionalitys Blessing: Detecting the distributions underlying images
- Augmented Skeleton Space Transfer for Depth-based Hand Pose Estimation
- Robust Classification with Convolutional Prototype Learning
- DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
- ICE-BA: Efficient, Consistent and Efficient Bundle Adjustment for Visual-Inertial SLAM
- Grounding Referring Expressions in Images by Variational Context
- Pseudo-Mask Augmented Object Detection
- Improvements to context based self-supervised learning
- Left-Right Comparative Recurrent Model for Stereo Matching
- Learning deep structured active contours end-to-end
- Efficient and Deep Person Re-Identification using Multi-Level Similarity
- Learning Intrinsic Image Decomposition from Watching the World
- Learning to Understand Image Blur
- Gaze Prediction in Dynamic
$360^\circ$ Immersive Videos - Emotional Attention: A Study of Image Sentiment and Visual Attention
- Single View Stereo Matching
- Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
- Video Representation Learning Using Discriminative Pooling
- Probabilistic Joint Face-Skull Modelling for Facial Reconstruction
- Indoor RGB-D Compass from a Single Line and Plane
- pOSE: Pseudo Object Space Error for Initialization-Free Bundle Adjustment
- Generative Adversarial Learning Towards Fast Weakly Supervised Detection
- Seeing Temporal Modulation of Lights from Standard Cameras
- Shape from Shading through Shape Evolution
- Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries
- Neural Style Transfer via Meta Networks
- UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition
- Cascaded Pyramid Network for Multi-Person Pose Estimation
- Detect-and-Track: Efficient Pose Estimation in Videos
- SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion
- NAG: Network for Adversary Generation
- Inferring Co-Attention in Social Scene Videos
- Unsupervised Learning of Single View Depth Estimation and Visual Odometry with Deep Feature Reconstruction
- Egocentric Basketball Motion Planning from a Single First-Person Image
- Geometric robustness of deep networks: analysis and improvement
- Pose-Guided Photorealistic Face Rotation
- Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
- Importance Weighted Adversarial Nets for Partial Domain Adaptation
- Towards High Performance Video Object Detection
- SurfConv: Bridging 3D and 2D Convolution for RGBD Images
- People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting
- Fully Convolutional Adaptation Networks for Semantic Segmentation
- Towards Pose Invariant Face Recognition in the Wild
- Interactive Image Segmentation with Latent Diversity
- Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images
- Detecting and Recognizing Human-Object Interactions
- Deep Image Prior
- 2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning
- Direct Shape Regression Networks for End-to-End Face Alignment
- Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition
- Scale-Transferrable Object Detection
- Learning by Asking Questions
- 3D Pose Estimation and 3D Model Retrieval for Objects in the Wild
- Deep Progressive Reinforcement Learning for Skeleton-based Action Recognition
- Future Person Localization in First-Person Videos
- 3D-RCNN: Instance-level 3D Scene Understanding via Render-and-Compare
- Manifold Learning in Quotient Spaces
- Image Correction via Deep Reciprocating HDR Transformation
- Focus Manipulation Detection via Photometric Histogram Analysis
- Density Adaptive Point Set Registration
- Multi-view Harmonized Bilinear Network for 3D Object Recognition
- SeGAN: Segmenting and Generating the Invisible
- VizWiz Grand Challenge: Answering Visual Questions from Blind People
- Sparse, Smart Contours to Represent and Edit Images
- Generative Non-Rigid Shape Completion with Graph Convolutional Autoencoders
- The power of ensembles for active learning in image classification
- OLÉ: Orthogonal Low-rank Embedding, A Plug and Play Geometric Loss for Deep Learning
- Learning Compositional Visual Concepts with Mutual Consistency
- Adversarial Complementary Learning for Weakly Supervised Object Localization
- Analytical Modeling of Vanishing Points and Curves in Catadioptric Cameras
- Exploit the Unknown Gradually:~ One-Shot Video-Based Person Re-Identification by Stepwise Learning
- Learning to Sketch with Shortcut Cycle Consistency
- Domain Adaptive Faster R-CNN for Object Detection in the Wild
- Attentive Generative Adversarial Network for Raindrop Removal from A Single Image
- Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN
- Making Convolutional Networks Recurrent for Visual Sequence Learning
- Multi-Task Adversarial Network for Disentangled Feature Learning
- Fight ill-posedness with ill-posedness: Single-shot variational depth super-resolution from shading
- Zero-Shot Sketch-Image Hashing
- Learning to Localize Sound Source in Visual Scenes
- Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation
- Semi-parametric Image Synthesis
- Multi-scale Location-aware Kernel Representation for Object Detection
- W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection
- Generative Modeling using the Sliced Wasserstein Distance
- MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses
- Dynamic Video Segmentation Network
- Learning a Discriminative Feature Network for Semantic Segmentation
- Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding
- Curve Reconstruction via the Global Statistics of Natural Curves
- Single-Shot Refinement Neural Network for Object Detection
- Density-aware Single Image De-raining using a Multi-stream Dense Network
- Learning Answer Embeddings for Visual Question Answering
- Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
- Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network
- Learning from the Deep: A Revised Underwater Image Formation Model
- Mean-Variance Loss for Deep Age Estimation from a Face
- Disentangled Person Image Generation
- Deep Sparse Coding for Invariant Multimodal Halle Berry Neurons
- DeepMVS: Learning Multi-View Stereopsis
- Embodied Question Answering
- Deflecting Adversarial Attacks with Pixel Deflection
- Dynamic-Structured Semantic Propagation Network
- Integrated facial landmark localization and super-resolution of real-world very low resolution faces in arbitrary poses with GANs
- A Two-Step Disentanglement Method
- Towards Effective Low-bitwidth Convolutional Neural Networks
- Natural and Effective Obfuscation by Head Inpainting
- Learning-Compression" algorithms for neural net pruning"
- Salient Object Detection Driven by Fixation Prediction
- Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective
- Uncalibrated Photometric Stereo under Natural Illumination
- Learning Monocular 3D Human Pose estimation on weakly-supervised Multi-view Images
- An Unsupervised Learning Model for Deformable Medical Image Registration
- Learning Deep Correspondence through Prior and Posterior Feature Constancy
- Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DB
- A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping
- Learned Shape-Tailored Descriptors for Segmentation
- One-shot Action Localization by Sequence Matching Network
- Robust Physical-World Attacks on Deep Learning Visual Classification
- What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
- Bidirecional Retrieval Made Simple
- Reward Learning by Instruction
- MegaDepth: Learning Single-View Depth Prediction from Internet Photos
- Cross-Dataset Adaptation for Visual Question Answering
- Interpretable Video Captioning via Trajectory Structured Localization
- MoCoGAN: Decomposing Motion and Content for Video Generation
- Left/Right Asymmetric Layer Skippable Networks
- Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation
- Unsupervised Discovery of Object Landmarks as Structural Representations
- Learning Deep Descriptors with Scale-Aware Triplet Networks
- Robust Depth Estimation from Auto Bracketed Images
- Aligning Infinite-Dimensional Covariance Matrices in Reproducing Kernel Hilbert Spaces for Domain Adaptation
- Local and Global Optimization Techniques in Graph-based Clustering
- Learning from Millions of 3D Scans for Large-scale 3D Face Recognition
- CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation
- Image Collection Pop-up: 3D Reconstruction and Clustering of Rigid and Non-Rigid Categories
- Ordinal Depth Supervision for 3D Human Pose Estimation
- Learning to Hash by Discrepancy Minimization
- MapNet: Geometry-Aware Learning of Maps for Camera Localization
- Im2Struct: Recovering 3D Shape Structure from a Single RGB Image
- A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking
- Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input
- Cross-Domain Self-supervised Multi-task Feature Learning Using Synthetic Game Imagery
- Coding Kendall's Shape Trajectories for 3D Action Recognition
- Camera Pose Estimation with Unknown Principal Point
- Learning Spatial-Aware Regressions for Visual Tracking
- The Easy, The Medium and The Hard: Adapting Across Varied Domain Shifts
- Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation
- A Hybrid L1-L0 Layer Decomposition Model for Tone Mapping
- LIME: Live Intrinsic Material Estimation
- Learning Representations for Single Cells in Microscopy Images
- Transparency by Design: Closing the Gap Between Performance and Interpretabilty in Visual Reasoning
- clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions
- Spanning Patches: Deep Patch Selection for Fast Multi-View Stereo
- LAMV: Learning to align and match videos with kernelized temporal layers
- Single Image Reflection Separation with Perceptual Losses
- Structure from Recurrent Motion: From Rigidity to Recurrency
- Customized Image Narrative Generation via Interactive Visual Question Generation and Answering
- Relation Networks for Object Detection
- An End-to-End TextSpotter with Explicit Alignment and Attention
- Photometric Stereo in Participating Media Considering Shape-Dependent Forward Scatter
- Sliced Wasserstein Distance for Learning Gaussian Mixture Models
- Generative Adversarial Image Synthesis with Decision Tree Latent Controller
- Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment
- Learning Multi-Instance Enriched Image Representation via Non-Greedy Simultaneous L1 -Norm Minimization and Maximization
- Separating Self-Expression and Visual Content in Hashtag Supervision
- Residual Dense Network for Image Super-Resolution
- Hand PointNet: 3D Hand Pose Estimation using Point Sets
- Human-centric Indoor Scene Synthesis Using Stochastic Grammar
- Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering
- Occlusion Aware Unsupervised Learning of Optical Flow
- Domain Generalization with Adversarial Feature Learning
- A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation
- PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image
- Deep Learning under Privileged Information Using Heteroscedastic Dropout
- Frame-Recurrent Video Super-Resolution
- Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration
- Content-Sensitive Supervoxels via Uniform Tessellations on Video Manifolds
- Planar Shape Detection at Structural Scales
- Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
- Learning to Parse Wireframes in Images of Man-Made Environments
- Harmonious Attention Network for Person Re-Identication
- Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks
- Every Smile is Unique: Landmark-guided Diverse Smile Generation
- Multi-Scale Weighted Nuclear Norm Image Restoration
- FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis
- Lightweight Probabilistic Deep Networks
- Learning Depth from Monocular Videos using Direct Methods
- Thoracic Disease Identification and Localization with Limited Supervision
- SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation
- Memory Matching Networks for One-Shot Image Recognition
- Compressed Video Action Recognition
- FFNet: Video Fast-Forwarding via Reinforcement Learning
- Representing and Learning High Dimensional Data with the Optimal Transport Map from a Probabilistic Viewpoint
- ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
- Fully Convolutional Attention Network for Multimodal Reasoning
- Lions and Tigers and Bears: Capturing Non-Rigid, 3D, Articulated Shape from Images
- Recurrent Pixel Embedding for Instance Grouping
- Name-removed-for-review: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection
- SGAN: An Alternative Training of Generative Adversarial Networks
- Learning Markov Clustering Networks for Scene Text Detection
- Occlusion-Aware Rolling Shutter Rectification of 3D Scenes
- Beyond Gröbner Bases: Basis Selection for Minimal Solvers
- Improving Object Localization with Fitness NMS and Bounded IoU Loss
- Generative Adversarial Perturbations
- Deep Photo Enhancer: Unsupervised Learning of Image Enhancement from Photographs with GANs
- Eye In-Painting with Exemplar Generative Adversarial Networks
- Encoder-Decoder Alignment for Zero-Pair Image-to-Image Translation
- Learning Structure and Strength of CNN Filters for Small Sample Size Training
- Path Aggregation Network for Instance Segmentation
- Learning Superpixels with Segmentation-Aware Affinity Loss
- Data Distillation: Towards Omni-Supervised Learning
- Deep Diffeomorphic Transformer Networks
- CodeSLAM --- Learning a Compact, Optimisable Representation for Dense Visual SLAM
- Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points
- Learning Latent Super-Events to Detect Multiple Activities in Videos
- MegDet: A Large Mini-Batch Object Detector
- Lose The Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion
- Unsupervised Domain Adaptation with Similarity-Based Classifier
- Visual Feature Attribution using Wasserstein GANs
- Tell Me Where To Look: Guided Attention Inference Network
- Towards Open-Set Identity Preserving Face Synthesis
- Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
- Multi-Evidence Fusion and Filtering for Weakly Supervised Object Recognition, Detection and Segmentation
- Deep Material-aware Cross-spectral Stereo Matching
- MakeupGAN: Makeup Transfer via Cycle-Consistent Adversarial Networks
- M3: Multimodal Memory Modelling for Video Captioning
- Fooling Vision and Language Models Despite Localization and Attention Mechanism
- Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies
- Jointly Localizing and Describing Events for Dense Video Captioning
- The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation
- End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching
- LDMNet: Low Dimensional Manifold Regularized Neural Networks
- 3D Human Pose Estimation in the Wild by Adversarial Learning
- Fast Video Object Segmentation by Reference-Guided Mask Propagation
- End-to-End Dense Video Captioning with Masked Transformer
- Towards dense object tracking in a 2D honeybee hive
- Appearance-and-Relation Networks for Video Classification
- StarGAN: Unified Generative Adversarial Networks for Controllable Multi-Domain Image-to-Image Translation
- Answer with Grounding Snippets: Focal Visual-Text Attention for Visual Question Answering
- GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
- Weakly Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer
- ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information
- Structured Set Matching Networks for One-Shot Part Labeling
- Real-Time Seamless Single Shot 6D Object Pose Prediction
- Triplet-Center Loss for Multi-View 3D Object Retrieval
- Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction
- Show Me a Story: Towards Coherent Neural Story Illustration
- DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map
- Missing Slice Recovery for Tensors Using a Low-rank Model in Embedded Space
- 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
- Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
- Link and code: Fast indexing with graphs and compact regression codes
- Two-Stream Convolutional Networks for Dynamic Texture Synthesis
- Weakly Supervised Action Localization by Sparse Temporal Pooling Network
- Viewpoint-aware Video Summarization
- 4D Human Body Correspondences from Panoramic Depth Maps
- Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems
- Discovering Point Lights with Intensity Distance Fields
- The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
- Geometry-aware Deep Network for Single-Image Novel View Synthesis
- Temporal Deformable Residual Networks for Action Segmentation in Videos
- Seeing Small Faces from Robust Anchor's Perspective
- Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers
- On the Importance of Label Quality for Semantic Segmentation
- AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
- First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations
- Learning Deep Sketch Abstraction
- Non-Linear Temporal Subspace Representations for Activity Recognition
- A Biresolution Spectral framework for Product Quantization
- Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatio-temporal Patterns
- Feature Super-Resolution: Make Machine See More Clearly
- Finding Tiny Faces in the Wild with Generative Adversarial Network
- DoubleFusion: Real-time Capture of Human Performance with Inner Body Shape from a Single Depth Sensor
- Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective
- Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction
- Recognize Actions by Disentangling Components of Dynamics
- Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
- Alive Caricature from 2D to 3D
- Learning Steerable Filters for Rotation Equivariant CNNs
- From source to target and back: Symmetric Bi-Directional Adaptive GAN
- Monocular Relative Depth Perception with Web Stereo Data Supervision
- Correlation Tracking via Joint Discrimination and Reliability Learning
- Boosting Domain Adaptation by Discovering Latent Domains
- HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization
- Learning from Noisy Web Data with Category-level Supervision
- Embodied Real-World Active Perception
- Boosting Self-Supervised Learning via Knowledge Transfer
- Video Captioning via Hierarchical Reinforcement Learning
- Weakly Supervised Phrase Localization with Multi-Scale Anchored Transformer Network
- Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection
- Wide Compression: Tensor Ring Nets
- Demo2Vec: Reasoning Object Affordances from Online Videos
- A High-Quality Denoising Dataset for Smartphone Cameras
- Collaborative and Adversarial Network for Unsupervised domain adaptation
- End-to-end weakly-supervised semantic alignment
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
- Feature Selective Networks for Object Detection
- Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints
- A Common Framework for Interactive Texture Transfer
- Depth and Transient Imaging with Compressive SPAD Array Cameras
- PointGrid: A Deep Network for 3D Shape Understanding
- A Network Architecture for Point Cloud Classification via Automatic Depth Images Generation
- Optimizing Local Feature Descriptors for Nearest Neighbor Matching
- 4DFAB: A Large Scale 4D Database for Facial Expression Analysis and Biometric Applications
- Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains
- Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network
- Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal
- What do Deep Networks Like to See?
- On the Robustness of Semantic Segmentation Models to Adversarial Attacks
- SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval
- Progressive Attention Guided Recurrent Network for Salient Object Detection
- IQA: Visual Question Answering in Interactive Environments
- Boosting Adversarial Attacks with Momentum
- Conditional Probability Models for Deep Image Compression
- Cascade R-CNN: Delving into High Quality Object Detection
- Scalable and Effective Deep CCA via Soft Decorrelation
- Discriminability objective for training descriptive captions
- Going from Image to Video Saliency: Augmenting Image Salience with Dynamic Attentional Push
- Recurrent Scene Parsing with Perspective Understanding in the Loop
- Semantic Video Segmentation by Gated Recurrent Flow Propagation
- FlipDial: A Generative Model for Two-Way Visual Dialogue
- Context Encoding for Semantic Segmentation
- Deep Marching Cubes: Learning Explicit Surface Representations
- Rethinking Feature Distribution for Loss Functions in Image Classification
- Optical Flow Guided Feature: A Motion Representation for Video Action Recognition
- Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
- HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification
- Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts
- Co-Occurrence Template Matching
- Defense against Universal Adversarial Perturbations
- PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
- Dynamic Zoom-in Network for Fast Object Detection in Large Images
- Objects as context for detecting their semantic parts
- Spline Error Weighting for Robust Visual-Inertial Fusion
- GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation
- Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks
- Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network
- Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions
- Burst Denoising with Kernel Prediction Networks
- Leveraging Unlabeled Data for Crowd Counting by Learning to Rank
- Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation
- Classifier Learning with Prior Probabilities for Facial Action Unit Recognition
- Active Fixation Control to Predict Saccade Sequences
- Reflection Removal for Large-Scale 3D Point Clouds
- Mesoscopic Facial Geometry inference Using Deep Neural Networks
- VITON: An Image-based Virtual Try-on Network
- Beyond the Pixel-Wise Loss for Topology-Aware Delineation
- HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN
- A Globally Optimal Solution to the Non-Minimal Relative Pose Problem
- Learning distributions of shape trajectories from longitudinal datasets: a hierarchical model on a manifold of diffeomorphisms
- Multispectral Image Intrinsic Decomposition via Low Rank Constraint
- Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams
- Alternating-Stereo VINS: Observability Analysis and Performance Evaluation
- Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View
- Style Aggregated Network for Facial Landmark Detection
- VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
- Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors
- Deep Adversarial Subspace Clustering
- Compassionately Conservative Balanced Cuts for Image Segmentation
- Deformable GANs for Pose-based Human Image Generation
- Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration
- The iNaturalist Species Classification and Detection Dataset
- Categorizing Concepts with Basic Level for Vision-to-Language
- InverseFaceNet: Deep Monocular Inverse Face Rendering at over 250 Hz
- Textbook Question Answering under Teacher Guidance with Memory Networks
- Learning to Find Good Correspondences
- Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning
- Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data
- Weakly Supervised Facial Action Unit Recognition through Adversarial Training
- Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
- Neighbors Do Help: Deeply Exploiting Local Structures of Point Clouds
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
- Dense 3D Regression for Hand Pose Estimation
- Detail-Preserving Pooling in Deep Networks
- Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation
- Reinforcement Cutting-Agent Learning for Video Object Segmentation
- SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis
- Wrapped Gaussian Process Regression on Riemannian Manifolds
- Document Enhancement using Visibility Detection
- Learning Discriminative Evaluation Metrics for Image Captioning
- GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning
- Learning Intelligent Dialogs for Bounding Box Annotation
- Efficient Diverse Ensemble for Discriminative Co-Tracking
- Recovering Realistic Texture in Image Super-resolution by Spatial Feature Modulation
- Mining on Manifolds: Metric Learning without Labels
- Revisiting knowledge transfer for training object class detectors
- GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose
- Differential Attention for Visual Question Answering
- A PID Controller Approach for Stochastic Optimization of Deep Networks
- Bootstrapping the Performance of Webly Supervised Semantic Segmentation
- Iterative Learning with Open-set Noisy Labels
- A Papier-Mâché Approach to Learning 3D Surface Generation
- Extreme 3D Face Reconstruction: Looking Past Occlusions
- High-speed Tracking with Multi-kernel Correlation Filters
- Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification
- Separating Style and Content for Generalized Style Transfer
- Learning Dual Convolutional Neural Networks for Low-Level Vision
- Wasserstein Introspective Neural Networks
- Deep Semantic Face Deblurring
- InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
- Temporal Hallucinating for Action Recognition with Few Still Images
- Deep Texture Manifold for Ground Terrain Recognition
- Discriminative Learning of Latent Features for Zero-Shot Recognition
- Neural Sign Language Translation
- GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints
- Repulsion Loss: Detecting Pedestrians in a Crowd
- Pulling Actions out of Context: Explicit Separation for Effective Combination
- Deep Group-shuffling Random Walk for Person Re-identification
- DenseASPP: Densely Connected Networks for Semantic Segmentation
- A Variational U-Net for Conditional Appearance and Shape Generation
- Universal Denoising Networks : A Novel CNN-based Network Architecture for Image Denoising
- Automatic 3D Indoor Scene Modeling from Single Panorama
- Five-point Fundamental Matrix Estimation for Uncalibrated Cameras
- PU-Net: Point Cloud Upsampling Network
- Generative Image Inpainting with Contextual Attention
- Im2Flow: Motion Hallucination from Static Images for Action Recognition
- Tagging Like Humans: Diverse and Distinct Image Annotation
- TextureGAN: Controlling Deep Image Synthesis with Texture Patches
- ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing
- Optimizing Video Object Detection via a Scale-Time Lattice
- Context Embedding Networks
- Motion-Guided Cascaded Refinement Network for Video Object Segmentation
- RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
- Conditional Generative Adversarial Network for Structured Domain Adaptation
- Large-scale Distance Metric Learning with Uncertainty
- Hierarchical Novelty Detection for Visual Object Recognition
- Deeper Look at Power Normalizations.
- Disentangling Factors of Variation by Mixing Them
- Beyond Holistic Object Recognition: Enriching Image Understanding with Part States
- LSTM Pose Machines
- End-to-end Recovery of Human Shape and Pose
- Geometric Multi-Model Fitting with a Convex Relaxation Algorithm
- Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects
- Modulated Convolutional Networks
- High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
- Learning Compressible 360° Video Isomers
- Easy Identification from Better Constraints: Multi-Shot Person Re-Identification from Reference Constraints
- TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays
- Good View Hunting: Learning Photo Composition from 1 Million View Pairs
- Visual Relationship Learning with a Factorization-based Prior
- Min-Entropy Latent Model for Weakly Supervised Object Detection
- Boundary Flow: A Siamese Network that Predicts Boundary Motion without Training on Motion
- SfSNet : Learning Shape, Reflectance and Illuminance of Faces `in the wild'
- Facial Expression Recognition by De-expression Residue Learning
- Empirical study of the topology and geometry of deep networks
- Learning Globally Optimized Object Detector via Policy Gradient
- Learning from Synthetic Data: Semantic Segmentation using Generative Adversarial Networks
- Recurrent Residual Module for Fast Inference in Videos
- Viewpoint-aware Attentive Multi-view Inference for Vehicle Re-identification
- Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing
- Deep Adversarial Metric Learning
- Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision
- Art of singular vectors and universal adversarial perturbations
- Free supervision from video games
- Unifying Identification and Context Learning for Person Recognition
- DensePose: Multi-Person Dense Human Pose Estimation In The Wild
- End-to-end Convolutional Semantic Embeddings
- Convolutional Image Captioning
- Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
- Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++
- Nonlinear 3D Face Morphable Model
- OATM: Occlusion Aware Template Matching by Consensus Set Maximization
- Multi-Image Semantic Matching by Mining Consistent Features
- Explicit Loss-Error-Aware Quantization for Deep Neural Networks
- Modeling Facial Geometry using Compositional VAEs
- Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction
- DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
- Attentional ShapeContextNet for Point Cloud Recognition
- Weakly Supervised Instance Segmentation using Class Peak Response
- Fast and Robust Estimation for Unit-Norm Constrained Linear Fitting Problems
- Maximum Classifier Discrepancy for Unsupervised Domain Adaptation
- Multi-Level Factorisation Net for Person Re-Identification
- Video Based Reconstruction of 3D People Models
- Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer
- Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks
- Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
- Image Super-resolution via Dual-state Recurrent Neural Networks
- Excitation Backprop for RNNs
- Image Generation from Scene Graphs
- Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking
- Image Restoration by Estimating Frequency Distribution of Local Patches
- Learning to Adapt Structured Output Space for Semantic Segmentation
- Deep Spatial Feature Reconstruction for Partial Person Re-identification
- Tight Nonconvex Relaxation of MAP Inference
- Multiple Granularity Group Interaction Prediction
- Accurate and Diverse Sampling of Sequences based on a ``Best of Many'' Sample Objective
- Learning Rich Features for Image Manipulation Detection
- DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Network
- A Benchmark for Articulated Human Pose Estimation and Tracking
- Preserving Semantic Relations for Zero-Shot Learning
- Geometry-Aware Scene Text Detection with Instance Transformation Network
- CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise
- Joint Cuts and Matching of Partitions in One Graph
- Fast and Accurate Online Video Object Segmentation via Tracking Parts
- Learning Nested Structures in Deep Neural Networks
- Practical Block-wise Neural Network Architecture Generation
- AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
- Modifying Non-Local Variations Across Multiple Views
- Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images
- Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN
- When will you do what? - Anticipating Temporal Occurrences of Activities
- Visual Question Answering with Memory-Augmented Networks
- Stochastic Variational Inference with Gradient Linearization
- Human Pose Estimation with Parsing Induced Learner
- 3D Registration of Curves and Surfaces using Local Differential Information
- Deformation Aware Image Compression
- PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos
- MovieGraphs: Towards Understanding Human-Centric Situations from Videos
- Hybrid Camera Pose Estimation
- Fast Monte-Carlo Localization on Aerial Vehicles using Approximate Continuous Belief Representations
- PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
- Hierarchical Recurrent Attention Networks for Structured Online Maps
- Learning Less is More - 6D Camera Localization via 3D Surface Regression
- Visual Question Generation as Dual Task of Visual Question Answering
- 3D Object Detection with Latent Support Surfaces
- An Analysis of Scale Invariance in Object Detection - SNIP
- 3D Semantic Trajectory Reconstruction from 3D Pixel Continuum
- KIPPI: KInetic Polygonal Partitioning of Images
- COCO-Stuff: Thing and Stuff Classes in Context
- Joint Optimization Framework for Learning with Noisy Labels
- Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks
- Deep Cost-Sensitive and Order-Preserving Feature Learning for Cross-Population Age Estimation
- Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
- Deep Back-Projection Networks For Super-Resolution
- Generating a Fusion Image: One' s Identity and Another's Shape
- V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
- Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty
- Cross-modal Deep Variational Hand Pose Estimation
- Learning to Estimate 3D Human Pose and Shape from a Single Color Image
- Video Rain Removal By Multiscale Convolutional Sparse Coding
- Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning
- Learning 3D Shape Completion from Point Clouds with Weak Supervision
- SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels
- Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display
- Weakly-supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation
- Rolling Shutter and Radial Distortion are Features for High Frame Rate Multi-camera Tracking
- Robust Hough Transform Based 3D Reconstruction from Circular Light Fields
- Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
- Learning Strict Identity Mappings in Deep Residual Networks
- Residual Parameter Transfer for Deep Domain Adaptation
- Exploring Disentangled Feature Representation Beyond Face Identification
- SPLATNet: Sparse Lattice Networks for Point Cloud Processing
- Unsupervised Training for 3D Morphable Model Regression
- A Bi-directional Message Passing Model for Salient Object Detection
- Learning to See in the Dark
- Erase or Fill? Deep Joint Recurrent Rain Removal and Reconstruction in Videos
- Finding beans in burgers: Deep semantic-visual embedding with localization
- Referring Relationships
- Adversarially Learned One-Class Classifier for Novelty Detection
- Surface Networks
- Efficient parametrization of multi-domain deep neural networks
- Recognizing Human Actions as Evolution of Pose Estimation Maps
- Soccer on Your Tabletop
- CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization
- Gesture Recognition: Focus on the Hands
- Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
- Real-world Anomaly Detection in Surveillance Videos
- Learning a Single Convolutional Super-Resolution Network for Multiple Degradations
- Iterative Visual Reasoning Beyond Convolutions
- Guide Me: Interacting with Deep Networks
- PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection
- Future Frame Prediction for Anomaly Detection A New Baseline
- Structure Preserving Video Prediction
- Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks
- Captioning Images with Style Transfer from Unaligned Text Corpora
- Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation
- Illuminant Spectra-based Source Separation Using Flash Photography
- 3D Human Pose Reconstruction and Action Classification in Robot Assisted Therapy of Children with Autism
- Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs
- Classification Driven Dynamic Image Enhancement
- Feature Generating Networks for Zero-Shot Learning
- Beyond Trade-off: Accelerate FCN-based Face Detection with Higher Accuracy
- MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition
- Unsupervised Learning and Segmentation of Complex Activities from Video
- Sparse Photometric 3D Face Reconstruction Guided by Morphable Models
- LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
- Inverse Composition Discriminative Optimization for Point Cloud Registration
- Inference in Higher Order MRF-MAP Problems with Small and Large Cliques
- Look at Boundary: A Boundary-Aware Face Alignment Algorithm
- LEGO: Learning Edge with Geometry all at Once by Watching Videos
- CosFace: Large Margin Cosine Loss for Deep Face Recognition
- Learning Semantic Concepts and Order for Image and Sentence Matching
- Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
- Low-shot learning with large-scale diffusion
- Multimodal Visual Concept Learning with Weakly Supervised Techniques
- Cross-View Image Synthesis using Conditional Generative Adversarial Nets
- Pixel-Wise Metric Learning for Blazingly Fast Video Object Segmentation
- PieAPP: Perceptual Image-Error Assessment through Pairwise Preference
- Cube Padding for Weakly-Supervised Saliency Prediction in 360$^{\circ}$ Videos
- CRRN: Multi-Scale Guided Concurrent Reflection Removal Network
- Stereoscopic Neural Style Transfer
- Low-shot Learning from Imaginary Data
- Fast, Simple, and Effective Resource-Constrained Structure Learning of Deep Networks
- Unsupervised Sparse Dirichlet-Net for Hyperspectral Image Super-Resolution
- Visual Grounding via Accumulated Attention
- Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars
- Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes
- Actor and Action Video Segmentation from a Sentence
- AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
- CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
- RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials
- Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging
- Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation
- Densely Connected Pyramid Dehazing Network
- Matching Adversarial Networks
- Automatic Map Inference from Aerial Images
- Polarimetric Dense Monocular SLAM
- Learning Attribute Representations with Localization for Flexible Fashion Search
- Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
- Unsupervised CCA
- Analyzing Filters Toward Efficient ConvNet
- Good Appearance Features for Multi-Target Multi-Camera Tracking
- Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
- Efficient Optimization for Rank-based Loss Functions
- ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing
- A Perceptual Measure for Deep Single Image Camera Calibration
- Radially-Distorted Conjugate Translations
- Multi-task Learning by Maximizing Statistical Dependence
- Creating Capsule Wardrobes from Fashion Images
- Towards Human-Machine Cooperation: Evolving Active Learning with Self-supervised Process for Object Detection
- Synthesizing Images of Humans in Unseen Poses
- Learning to Act Properly: Predicting and Explaining Affordances from Images
- Pyramid Stereo Matching Network
- Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene
- A General Two-Step Quantization Approach for Low-bit Neural Networks with High Accuracy
- GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition
- Convolutional Neural Networks with Alternately Updated Clique
- Squeeze-and-Excitation Networks
- NISP: Pruning Networks using Neuron Importance Score Propagation
- Audio to Body Dynamics
- ID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis
- Deep Learning of Graph Matching
- Neural Baby Talk
- Efficient Video Object Segmentation via Network Modulation
- Regularizing Deep Networks by Modeling and Predicting Label Structure
- Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation
- Face Detector Adaptation without Negative Transfer or Catastrophic Forgetting
- Motion-Appearance Co-Memory Networks for Video Question Answering
- Compare and Contrast: Learning Prominent Visual Differences
- Tangent Convolutions for Dense Prediction in 3D
- Single-Shot Object Detection with Enriched Semantics
- Generating Synthetic X-ray Images of a Person from the Surface Geometry
- Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
- Edit Probability for Scene Text Recognition
- MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
- Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
- Texture Mapping for 3D Reconstruction with RGB-D Sensor
- Multi-Agent Diverse Generative Adversarial Networks
- Towards Universal Representation for Unseen Action Recognition
- Zero-Shot Kernel Learning.
- DOTA: A Large-scale Dataset for Object Detection in Aerial Images
- Multi-Frame Quality Enhancement for Compressed Video
- From Lifestyle VLOGs to Everyday Interactions
- Occluded Pedestrian Detection through Guided Attention in CNNs
- Decoupled Networks
- Deep Cocktail Networks: Multi-source Unsupervised Domain Adaptation with Category Shift
- Partially Shared Multi-Task Convolutional Neural Network with Local Constraint for Face Attribute Learning
- Joint Pose and Expression Modeling for Facial Expression Recognition
- Unsupervised Textual Grounding: Linking Words to Image Concepts
- Interleaved Structured Sparse Convolutional Neural Networks
- Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
- ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes
- Image to Image Translation for Domain Adaptation
- A Face to Face Neural Conversation Model
- Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification
- FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors
- SO-Net: Self-Organizing Network for Point Cloud Analysis
- MoNet: Moments Embedding Network
- Coupled End-to-end Transfer Learning with Generalized Fisher Information
- Inferring Light Fields from Shadows
- LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image
- Multi-Level Fusion based 3D Object Detection from Monocular Images
- Single-Image Depth Estimation Based on Fourier Domain Analysis
- Flow Guided Recurrent Neural Encoder for Video Salient Object Detection
- Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes
- Seeing Voices and Hearing Faces: Cross-modal biometric matching
- Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images
- Fast and Accurate Single Image Super-Resolution via Information Distillation Network
- Learning and Using the Arrow of Time
- Rethinking the Faster R-CNN Architecture for Temporal Action Localization
- Deeply Learned Filter Response Functions for Hyperspectral Reconstruction
- Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes
- Intrinsic Image Transformation via Scale Space Decomposition
- Deep Ordinal Regression Network for Monocular Depth Estimation
- Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
- Functional Map of the World
- CSGNet: Neural Shape Parser for Constructive Solid Geometry
- Instance Embedding Transfer to Unsupervised Video Object Segmentation
- Statistical Tomography of Microscopic Life
- Point-wise Convolutional Neural Networks
- Pixar: Real-time 3D Object Detection from Point Clouds
- HydraNets: Specialized Dynamic Architectures for Efficient Inference
- Deep Depth Completion of a Single RGB-D Image
- Learning to Extract a Video Sequence from a Single Motion-Blurred Image
- A Fast Resection-Intersection Method for the Known Rotation Problem
- iVQA: Inverse Visual Question Answering
- Crowd Counting via Adversarial Cross-Scale Consistency Pursuit
- Trust your Model: Light Field Depth Estimation with inline Occlusion Handling
- PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition
- A Memory Network Approach for Story-based Temporal Summarization of 360° Videos
- Tags2Parts: Discovering Semantic Regions from Shape Tags
- Jerk-Aware Video Acceleration Magnification
- A Robust Method for Strong Rolling Shutter Effects Correction Using Lines with Automatic Feature Selection
- Mobile Video Object Detection with Temporally-Aware Feature Maps
- VirtualHome: Simulating Household Activities via Programs
- MoNet: Deep Motion Exploitation for Video Object Segmentation
- Detect globally, refine locally: A novel approach to saliency detection
- EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry
- Learning Face Age Progression: A Pyramid Architecture of GANs
- Normalized Cut Loss for Weakly Supervised CNN Segmentation
- Reconstructing Thin Structures of Manifold Surfaces by Integrating Spatial Curves
- Dynamic Few-Shot Visual Learning without Forgetting
- Camera Style Adaptation for Person Re-identification
- In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
- NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
- Resource Aware Person Re-identification across Multiple Resolutions
- Zero-Shot Super-Resolution using Deep Internal Learning
- Analysis of Hand Segmentation in the Wild
- Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination
- Face Aging with Identity-Preserved Conditional Generative Adversarial Networks
- Deep Extreme Cut: From Extreme Points to Object Segmentation
- Person Re-identification with Cascaded Pairwise Convolutions
- Distributable Consistent Multi-Graph Matching
- A Twofold Siamese Network for Real-Time Object Tracking
- AON: Towards Arbitrarily-Oriented Text Recognition
- Deep Cauchy Hashing for Hamming Space Retrieval
- Non-blind Deblurring: Handling Kernel Uncertainty with CNNs
- Referring Image Segmentation via Recurrent Refinement Networks
- Deep Density Clustering of Unconstrained Faces
- A Constrained Deep Neural Network for Ordinal Regression
-
Notifications
You must be signed in to change notification settings - Fork 11
kaluo-zZ/CVPR2018-papers
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published