In deep neural networks with convolutional layers, all the neurons in each layer typically have the same size receptive fields (RFs) with the same resolution. Convolutional layers with neurons that have large RF capture global information from the input features, while layers with neurons that have small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural networks (MR-FCN), where each layer has a range of neurons with different RF sizes to extract multi- resolution features that capture the global and local information from its input features. The proposed MR-FCN is applied to separate the singing voice from mixtures of music sources. Experimental results show that using MR-FCN improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional neural networks (FCNs) on the audio source separation problem.


  author = {Grais, Emad M. and Wierstorf, H. and Ward, D. and Plumbley, M. D.},
  editor = {Deville, Y. and Gannot, S. and Mason, R. and Plumbley, M. D. and Ward, D.},
  title = {Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation},
  booktitle = {Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 3-5, 2018, Proceedings},
  month = jul,
  year = {2018},
  publisher = {Springer International Publishing},
  openaccess = {http://epubs.surrey.ac.uk/846316/},
  keywords = {"maruss"}