FlickrBelgaLogos Dataset

Content-based logos and trademarks retrieval in large natural image collections is of high interest for many applications, including dissemination impact evaluation, prohibited or suspicious logos detection, automatic annotation, etc. Trademarks recognition has been widely addressed by the pattern recognition community in the last decades but there is surprisingly very few works dealing with natural images collections. The FlickrBelgaLogos dataset was specifically created for this purpose in the scope of the French ANR project OTMedia, using the logos from the BelgaLogos dataset.

Images

Evaluating the accuracy of object discovery and mining algorithms is more challenging than evaluating object retrieval with a pre-fixed set of queries. We actually need a complete groundtruth with all repeated objects of the dataset and with the precise location of all their instances. No previous evaluation dataset meeting these objectives exists, so that a contribution of this paper was to build one. We first extended the image-level groundtruth of BelgaLogos dataset by annotating manually the bounding boxes of all instances of the 37 targeted logos (correcting few errors along the way). The 9842 annotated instances were then visually classified as kept or rejected by 3 users, depending on whether or not they were all able to recognize the instance with confidence after it had been cropped from its image. After this step, only 2695 instances were classified as kept. This extended annotation is however not sufficient to evaluate the precision of object mining algorithms. Besides the 37 logos, other objects are actually instantiated several times in the dataset as well (including other logos, buildings, faces, near duplicates, etc.), so that they would be considered as false positives when detected. We therefore decided to create a new synthetic dataset by cutting and pasting the cropped logos of BelgaLogos II into a dataset of 10K distractor images crawled from Flickr. To reduce the probability of finding repeated objects in the distractors, all images come from distinct users and distinct geographic areas. The BelgaLogos instances were then pasted without any modifications (rotation or scaling, ...) at random positions in the distractors. Here are some thumbnails examples:

top

Annotations

The 10,000 images of BelgaLogos dataset have been manually annotated. Every logos (37 differents logos) have been surrounded with a rectangular bounding box.

Logo name Illustration #OK #Junk Total

Adidas 147 896 1043
Adidas-text 63 115 178
Airness 11 109 120
Base 162 86 248
BFGoodrich 86 222 308
Bik 65 205 270
Bouygues 14 18 32
Bridgestone 31 Junk 105
Bridgestone-text 64 74 201
Carglass 18 47 65
Citroen 78 164 242
Citroen-text 197 134 331
CocaCola 40 33 73
Cofidis 45 45 90
Dexia 235 391 626
ELeclerc 15 5 20
Ferrari 77 136 213
Gucci 2 2 4
Kia 141 101 242

Logo name Illustration #OK #Junk Total

Mercedes 86 193 279

Nike 235 2007 2242

Peugeot 5 2 7

Puma 157 643 800

Puma-text 27 53 80

Quick 57 196 253

Reebok 18 48 66

Roche 2 0 2

Shell 123 113 236

SNCF 7 3 10

Std-Liege 98 283 381

StellaArtois 20 8 28

TNT 102 81 183

Total 78 18 96

US-President 14 0 14

Umbro 153 506 659

Veolia 12 65 77

VRT 10 8 18

top

Download

To download the FlickrBelgaLogos dataset, please send an email to belgalogos@inria.fr with the following information:

top

References

All publications making use of the FlickrBelgaLogos dataset must include the following reference:

Pierre Letessier, Olivier Buisson, Alexis Joly, Scalable Mining of Small Visual Objects, In Proceedings of the 20th ACM international Conference on Multimedia, 2012.

@inproceedings{letessier12,
author = {Letessier, Pierre and Joly, Alexis and Buisson, Olivier},
title = {Scalable Mining of Small Visual Objects},
booktitle = {MM '12: Proceedings of the 20th ACM international conference on Multimedia},
year = {2012},
}

top

Related publications

Please send your publications related to BelgaLogos at belgalogos@inria.fr

top

Contact

Alexis Joly, alexis(dot)joly(at)inria.fr
Pierre Letessier,

top

INRIA - Rocquencourt - IMEDIA Project