publications | Chao Zhou

2025

Conference

Pay Attention to Small Weights

C. Zhou, Advait Gadhikar, Tom Jacobs, and Rebekka Burkholz

in NIPS 2025.

Abs PDF

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

Conference

Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch

Advait Gadhikar, Tom Jacobs, C. Zhou, and Rebekka Burkholz

in NIPS 2025.

Abs PDF

The performance gap between training sparse neural networks from scratch (PaI) and dense-to-sparse training presents a major roadblock for efficient deep learning. According to the Lottery Ticket Hypothesis, PaI hinges on finding a problem specific parameter initialization. As we show, to this end, determining correct parameter signs is sufficient. Yet, they remain elusive to PaI. To address this issue, we propose Sign-In, which employs a dynamic reparameterization that provably induces sign flips. Such sign flips are complementary to the ones that dense-to-sparse training can accomplish, rendering Sign-In as an orthogonal method. While our experiments and theory suggest performance improvements of PaI, they also carve out the main open challenge to close the gap between PaI and dense-to-sparse training.

Conference

Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?

Tom Jacobs, C. Zhou, and Rebekka Burkholz

in ICML 2025.

Abs PDF

Implicit bias plays an important role in explaining how overparameterized models generalize well. Explicit regularization like weight decay is often employed in addition to prevent overfitting. While both concepts have been studied separately, in practice, they often act in tandem. Understanding their interplay is key to controlling the shape and strength of implicit bias, as it can be modified by explicit regularization. To this end, we incorporate explicit regularization into the mirror flow framework and analyze its lasting effects on the geometry of the training dynamics, covering three distinct effects: positional bias, type of bias, and range shrinking. Our analytical approach encompasses a broad class of problems, including sparse coding, matrix sensing, single-layer attention, and LoRA, for which we demonstrate the utility of our insights. To exploit the lasting effect of regularization and highlight the potential benefit of dynamic weight decay schedules, we propose to switch off weight decay during training, which can improve generalization, as we demonstrate in experiments.

2023

Conference

BITS-Net: Blind Image Transparency Separation Network

C. Zhou, Z. Lyu, and M. R. D. Rodrigues

in IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 2023, pp. 375-379, doi: 10.1109/ICIP49359.2023.10222918.

Abs PDF Code

This research presents a new approach for blind single-image transparency separation, a significant challenge in image processing. The proposed framework divides the task into two parallel processes: feature separation and image reconstruction. The feature separation task leverages two deep image prior (DIP) networks to recover two distinct layers. An exclusion loss and deep feature separation loss are used to decompose features. For the image reconstruction task, we minimize the difference between the mixed image and the re-mixed image while also incorporating a regularizer to impose natural priors on each layer. Our results indicate that our method performs comparably or outperforms state-of-the-art approaches when tested on various image datasets.
Journal

Hyperspectral Blind Unmixing Using a Double Deep Image Prior

C. Zhou, and M. R. D. Rodrigues

in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2023.3294714.

Abs PDF Code

With the rise of machine learning, hyperspectral image (HSI) unmixing problems have been tackled using learning-based methods. However, physically meaningful unmixing results are not guaranteed without proper guidance. In this work, we propose an unsupervised framework inspired by deep image prior (DIP) that can be used for both linear and nonlinear blind unmixing models. The framework consists of three modules: 1) an Endmember estimation module using DIP (EDIP); 2) an Abundance estimation module using DIP (ADIP); and 3) a mixing module (MM). EDIP and ADIP modules generate endmembers and abundances, respectively, while MM produces a reconstruction of the HSI observations based on the postulated unmixing model. We introduce a composite loss function that applies to both linear and nonlinear unmixing models to generate meaningful unmixing results. In addition, we propose an adaptive loss weight strategy for better unmixing results in nonlinear mixing scenarios. The proposed methods outperform state-of-the-art unmixing algorithms in extensive experiments conducted on both synthetic and real datasets.
Journal

Image Separation With Side Information: A Connected Auto-Encoders Based Approach

W. Pu, B Sober, N. Daly, C. Zhou, Z. Sabetsarvestani, C. Higgitt, I. Daubechies, and M. R. D. Rodrigues

in IEEE Transactions on Image Processing, vol. 32, pp. 2931-2946, 2023, doi: 10.1109/TIP.2023.3275872.

Abs PDF Code

X-radiography (X-ray imaging) is a widely used imaging technique in art investigation. It can provide information about the condition of a painting as well as insights into an artist’s techniques and working methods, often revealing hidden information invisible to the naked eye. X-radiograpy of double-sided paintings results in a mixed X-ray image and this paper deals with the problem of separating this mixed image. Using the visible color images (RGB images) from each side of the painting, we propose a new Neural Network architecture, based upon ‘connected’ auto-encoders, designed to separate the mixed X-ray image into two simulated X-ray images corresponding to each side. This connected auto-encoders architecture is such that the encoders are based on convolutional learned iterative shrinkage thresholding algorithms (CLISTA) designed using algorithm unrolling techniques, whereas the decoders consist of simple linear convolutional layers; the encoders extract sparse codes from the visible image of the front and rear paintings and mixed X-ray image, whereas the decoders reproduce both the original RGB images and the mixed X-ray image. The learning algorithm operates in a totally self-supervised fashion without requiring a sample set that contains both the mixed X-ray images and the separated ones. The methodology was tested on images from the double-sided wing panels of the Ghent Altarpiece, painted in 1432 by the brothers Hubert and Jan van Eyck. These tests show that the proposed approach outperforms other state-of-the-art X-ray image separation methods for art investigation applications.

2022

Conference

Blind Unmixing Using A Double Deep Image Prior

C. Zhou, and M. R. D. Rodrigues

in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 1665-1669, doi: 10.1109/ICASSP43922.2022.9747545.

Abs PDF Code

In this paper, we propose a novel network structure to solve the blind hyperspectral unmixing problem using a double Deep Image Prior (DIP). In particular, the blind unmixing problem involves two sub-problems: endmember estimation and abundance estimation. We, therefore, propose two sub-networks, endmember estimation DIP (EDIP) and abundance estimation DIP (ADIP), to generate the estimation of endmembers and estimation of corresponding abundances respectively. The overall network is then constructed by assembling these two sub-networks. The network is trained in an end-to-end manner by minimizing a novel composite loss function. The experiments on synthetic and real datasets show the effectiveness of the proposed method over state-of-art unmixing methods.

2021

Journal

ADMM-Based Hyperspectral Unmixing Networks for Abundance and Endmember Estimation

C. Zhou, and M. R. D. Rodrigues

in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-18, 2022, Art no. 5520018, doi: 10.1109/TGRS.2021.3136336.

Abs PDF Code

Hyperspectral image (HSI) unmixing is an increasingly studied problem in various areas, including remote sensing. It has been tackled using both physical model-based approaches and more recently machine learning-based ones. In this article, we propose a new HSI unmixing algorithm combining both model- and learning-based techniques, based on algorithm unrolling approaches, delivering improved unmixing performance. Our approach unrolls the alternating direction method of multipliers (ADMMs) solver of a constrained sparse regression problem underlying a linear mixture model. We then propose a neural network structure for abundance estimation that can be trained using supervised learning techniques based on a new composite loss function. We also propose another neural network structure for blind unmixing that can be trained using unsupervised learning techniques. Our proposed networks are also shown to possess a lighter and richer structure containing less learnable parameters and more skip connections compared with other competing architectures. Extensive experiments show that the proposed methods can achieve much faster convergence and better performance even with a very small training dataset size when compared with other unmixing methods, such as model-inspired neural network for abundance estimation (MNN-AE), model-inspired neural network for blind unmixing (MNN-BU), unmixing using deep image prior (UnDIP), and endmember-guided unmixing network (EGU-Net).
Journal

Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover

W. Pu, C. Zhou, Y. C. Eldar, and M. R. D. Rodrigues

in https://doi.org/10.48550/arXiv.2110.10391.

Abs PDF

In this paper, we consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications. Specifically, we treat sensing problems with model mismatch where one wishes to recover a sparse high-dimensional vector from low-dimensional observations subject to uncertainty in the measurement operator. We then design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. Our proposed network - named Robust lEarned Shrinkage-Thresholding (REST) - exhibits an additional normalization processing compared to Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), leading to reliable recovery of the signal under sample-wise varying model mismatch. The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems wherein model mismatch is taken into consideration.
Conference

REST: Robust lEarned Shrinkage-Thresholding Network Taming Inverse Problems with Model Mismatch

W. Pu, C. Zhou, Y. C. Eldar, and M. R. D. Rodrigues

in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 2885-2889, doi: 10.1109/ICASSP39728.2021.9414141.

Abs PDF

We consider compressive sensing problems with model mismatch where one wishes to recover a sparse high-dimensional vector from low-dimensional observations subject to uncertainty in the measurement operator. In particular, we design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. Our proposed network –named Robust lErned Shrinkage-Thresholding (REST) –exhibits additional features including enlarged number of parameters and normalization processing compared to state-of-the-art deep architecture Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), leading to the reliable recovery of the signal under sample-wise varying model mismatch. Our proposed network is also shown to outperform LISTA in compressive sensing problems under sample-wise varying model mismatch.
Conference

An ADMM Based Network for Hyperspectral Unmixing Tasks

C. Zhou, and M. R. D. Rodrigues

in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 1870-1874, doi: 10.1109/ICASSP39728.2021.9414555.

Abs PDF Code

In this paper, we use algorithm unrolling approaches in order to design a new neural network structure applicable to hyperspectral unmixing challenges. In particular, building upon a constrained sparse regression formulation of the underlying unmixing problem, we unroll an ADMM solver onto a neural network architecture that can be used to deliver the abundances of different (known) endmembers given a reflectance spectrum. Our proposed network – which can be readily trained using standard supervised learning procedures – is shown to possess a richer structure consisting of various skip connections and shortcuts than other competing architectures. Moreover, our proposed network also delivers state-of-the-art unmixing performance compared to competing methods.

2017

Journal

Symmetric Channel Hopping for Blind Rendezvous in Cognitive Radio Networks Based on Union of Disjoint Difference Sets

X. J. Tan, C. Zhou, and J. Chen

in IEEE Transactions on Vehicular Technology, vol. 66, no. 11, pp. 10233-10248, Nov. 2017, doi: 10.1109/TVT.2017.2726352.

PDF