Contrastive Multi-View Textual-Visual Encoding:
Towards One Hundred Thousand-Scale One-Shot Logo Identification

Nakul Sharma, Abhirama Subramanyam Penamakuri, Anand Mishra

Indian Institute of Technology Jodhpur

ICVGIP 2022

[Paper] [Code] [Data] [arxiv]


     

Abstract

In this paper, we study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting. This problem setup is significantly more challenging than traditionally-studied ‘closed-set’ and ‘large-scale training samples per category’ logo recognition settings. We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos as well as the graphical design of the logos to learn robust contrastive representations. These representations are jointly learned for multiple views of logos over a batch and thereby they generalize well to unseen logos. We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks; and compare it against state-of-the-art methods. Further, the literature lacks a ‘very-large-scale’ collection of reference logo images that can facilitate the study of one-hundred thousand-scale logo identification. To fill this gap in the literature, we introduce Wi kidata Reference Logo Dataset (WiRLD), containing logos for 100K business brands harvested from Wikidata. Our proposed framework that achieves an area under the ROC curve of 91.3% on the QMUL-OpenLogo dataset for the verification task, outperforms state-of-the-art methods by 9.1% and 2.6% on the one-shot logo identification task on the Toplogos-10 and the FlickrLogos32 datasets, respectively. Further, we show that our method is more stable compared to other baselines even when the number of candidate logos is on a 100K scale.

Highlights

  • Proposed a contrastive multi-view encoding of visual-textual features of logos and learn more robust and generalizable features.
  • Studied the problem of logo identification in an extremely challenging scenario when number of candidate logos is as large as 100K.
  • Introduced a very-large-scale logo dataset, namely Wikipedia Reference Logo Dataset containing 100K reference logos.

Dataset Downloads (100K logos)

  • Dataset Images and Readme at this [Repository]

  • Bibtex

    Please cite our work as follows:

    @inproceedings{oneshotlogo2022,
      author    = "Sharma, Nakul and 
                  Penamakuri, Abhirama S. and
                  Mishra, Anand",
      title     = "Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification",
      booktitle = "ICVGIP",
      year      = "2022",
    }

    Contact

    Nakul Sharma email: sharma.86@iitj.ac.in
    Abhirama Subramanyam email: penamakuri.1@iitj.ac.in

    Acknowledgements

    Abhirama S. Penamakuri is supported by Prime Minister Research Fellowship (PMRF), Minsitry of Education, Government of India.