LOGO

Abstract

In this paper, we study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting. This problem setup is significantly more challenging than traditionally-studied ‘closed-set’ and ‘large-scale training samples per category’ logo recognition settings. We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos as well as the graphical design of the logos to learn robust contrastive representations. These representations are jointly learned for multiple views of logos over a batch and thereby they generalize well to unseen logos. We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks; and compare it against state-of-the-art methods. Further, the literature lacks a ‘very-large-scale’ collection of reference logo images that can facilitate the study of one-hundred thousand-scale logo identification. To fill this gap in the literature, we introduce Wi kidata Reference Logo Dataset (WiRLD), containing logos for 100K business brands harvested from Wikidata. Our proposed framework that achieves an area under the ROC curve of 91.3% on the QMUL-OpenLogo dataset for the verification task, outperforms state-of-the-art methods by 9.1% and 2.6% on the one-shot logo identification task on the Toplogos-10 and the FlickrLogos32 datasets, respectively. Further, we show that our method is more stable compared to other baselines even when the number of candidate logos is on a 100K scale.

Highlights

Proposed a contrastive multi-view encoding of visual-textual features of logos and learn more robust and generalizable features.
Studied the problem of logo identification in an extremely challenging scenario when number of candidate logos is as large as 100K.
Introduced a very-large-scale logo dataset, namely Wikipedia Reference Logo Dataset containing 100K reference logos.

Contrastive Multi-View Textual-Visual Encoding:
Towards One Hundred Thousand-Scale One-Shot Logo Identification

Nakul Sharma, Abhirama Subramanyam Penamakuri, Anand Mishra

Indian Institute of Technology Jodhpur

ICVGIP 2022

[Paper] [Code] [Data] [arxiv]

Abstract

Highlights

Dataset Downloads (100K logos)

Bibtex

Contact

Acknowledgements

Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification

Nakul Sharma, Abhirama Subramanyam Penamakuri, Anand Mishra

Indian Institute of Technology Jodhpur

ICVGIP 2022

[Paper] [Code] [Data] [arxiv]

Abstract

Highlights

Dataset Downloads (100K logos)

Bibtex

Contact

Acknowledgements

Contrastive Multi-View Textual-Visual Encoding:
Towards One Hundred Thousand-Scale One-Shot Logo Identification