Belga INRIA OTMedia

FlickrBelgaLogos Dataset

Content-based logos and trademarks retrieval in large natural image collections is of high interest for many applications, including dissemination impact evaluation, prohibited or suspicious logos detection, automatic annotation, etc. Trademarks recognition has been widely addressed by the pattern recognition community in the last decades but there is surprisingly very few works dealing with natural images collections. The FlickrBelgaLogos dataset was specifically created for this purpose in the scope of the French ANR project OTMedia, using the logos from the BelgaLogos dataset.

Related publications


Evaluating the accuracy of object discovery and mining algorithms is more challenging than evaluating object retrieval with a pre-fixed set of queries. We actually need a complete groundtruth with all repeated objects of the dataset and with the precise location of all their instances. No previous evaluation dataset meeting these objectives exists, so that a contribution of this paper was to build one. We first extended the image-level groundtruth of BelgaLogos dataset by annotating manually the bounding boxes of all instances of the 37 targeted logos (correcting few errors along the way). The 9842 annotated instances were then visually classified as kept or rejected by 3 users, depending on whether or not they were all able to recognize the instance with confidence after it had been cropped from its image. After this step, only 2695 instances were classified as kept. This extended annotation is however not sufficient to evaluate the precision of object mining algorithms. Besides the 37 logos, other objects are actually instantiated several times in the dataset as well (including other logos, buildings, faces, near duplicates, etc.), so that they would be considered as false positives when detected. We therefore decided to create a new synthetic dataset by cutting and pasting the cropped logos of BelgaLogos II into a dataset of 10K distractor images crawled from Flickr. To reduce the probability of finding repeated objects in the distractors, all images come from distinct users and distinct geographic areas. The BelgaLogos instances were then pasted without any modifications (rotation or scaling, ...) at random positions in the distractors. Here are some thumbnails examples:



The 10,000 images of BelgaLogos dataset have been manually annotated. Every logos (37 differents logos) have been surrounded with a rectangular bounding box.

Logo nameIllustration#OK #JunkTotal
Adidas147 896 1043
Adidas-text63 115 178
Airness11 109 120
Base162 86 248
BFGoodrich86222 308
Bik65 205270
Bouygues14 1832
Bridgestone31 Junk105
Bridgestone-text64 74201
Carglass18 4765
Citroen78 164242
Citroen-text197 134331
CocaCola40 3373
Cofidis45 4590
Dexia235 391626
ELeclerc15 520
Ferrari77 136213
Gucci22 4
Kia141 101242
Logo nameIllustration#OK #JunkTotal
Mercedes86 193 279
Nike235 2007 2242
Peugeot5 2 7
Puma157 643 800
Puma-text27 53 80
Quick57 196 253
Reebok18 48 66
Roche2 0 2
Shell123 113 236
SNCF7 3 10
Std-Liege98 283 381
StellaArtois20 8 28
TNT102 81 183
Total78 18 96
US-President 14 0 14
Umbro153 506 659
Veolia12 65 77
VRT10 8 18



To download the FlickrBelgaLogos dataset, please send an email to with the following information:



All publications making use of the FlickrBelgaLogos dataset must include the following reference:

Pierre Letessier, Olivier Buisson, Alexis Joly, Scalable Mining of Small Visual Objects, In Proceedings of the 20th ACM international Conference on Multimedia, 2012.

author = {Letessier, Pierre and Joly, Alexis and Buisson, Olivier},
title = {Scalable Mining of Small Visual Objects},
booktitle = {MM '12: Proceedings of the 20th ACM international conference on Multimedia},
year = {2012},


Related publications

Please send your publications related to BelgaLogos at



Alexis Joly, alexis(dot)joly(at)
Pierre Letessier,


INRIA - Rocquencourt - IMEDIA Project

INRIA   - updated July 3, 2012