QR code webimage analysis

This is a survey about images in the web, which might contain a QR Code. A QR Code is a two dimensional barcode, which are more and more used in marketing campaigns, product identifying or storing information. They can be read by decoding software and there are also some opensource libraries such as ZXING and the QR Code Library to do that. Furthermore, there are a lot of applications, which are designed for mobile phones and platforms, for instancte Java ME, Symbian, Blackberry, Iphone or Android.

The images of this project are found by using interfaces of major search engines like Yahoo Image Search, the Google Image Search and the Flickr API. All in all, these webservices do not provide access to more results than 1000 each, but the number of URLs was 4737 by searching with the keywords “qrcode” and “qr code”.

It is obvious, that this pool of images does not represent all QR Codes which are ever created or printed, but the results could show an actual overview over the contents of codes, which people created for showing on websites, and also describing them as a QR code, so that they can be found by the search engines. In the Flickr API, these images could also be tagged, which is a new method to decribe contents like blog postings, images or other types of media.

On the downside, an image, which is indexed by the searchengine, does not have to be a QR Code, and therefore the rate of decoding cannot be seen as the decoding quality itself. Also, the purpose of a QR code is not using the code image in the internet, but printing it on real objects like product cases, business cards, flyers or newspapers, apparel etc., so that the code creates a link between the non-virtual environment to its content, which is either a text with personal contact information (VCard) or a URL pointing to a website.

Decoded

4737 images were processed and 1935 were decoded successfully (40.85%)

Words

The following tag cloud represents the 200 words, which the QR Codes contained most often. By clicking one of the words, you will see a random selection of links to images with a QR Code, which the word contains. At the moment, some normal stop words are used.


MEBKM tel mobile URL blog fc2 seesaa htm mailto yahoo sakura gurumezone MECARD mt4i cocolog tanoshii BEGIN typecast asp index Code code ocn web blogspot rakuten aki jpg gem BIZCARD livedoor dtiblog The When geocities tosp sayopee Ericsson www blog17 NewTRON inform que moburl Llamanos blogzine Neil way neene teacup Japan biz Text malcolmhall flickr site infoseek page wikipedia wordpress 0 facebook 1 Mobile PHP just skr ekk bbs town text mapion tea 0zero readable por Radd rain Happy lib Street Catriona kameya ska jrc 2 barcodes xrea para love cool cpcommunication 3 edu mysterious All Title These MEMORY nionet blog1 NSW shtml encountered Clie LLC WEB estas wallut fukuoka sun mydns make biglobe M302 ARTECH Web legal Apple aspx Ethnographer Folsom 4 wretch plala place city rss years ground 5 003 phone black sapporo 6 empresa san 7 8 Japanese read newtron touch photo Simmons Est Non Causa information linkbox bar talk Codes pocket earth geekzone Apparatus Sampo sub thelabel miramarin donews QuChao Jac BiG neocool diary suwamasaru buvcc NewFiles wonderland kyokai Photos hanabistore Last goodidea paloa mt3151 i_index Vaguely shirakawa houga blog14 DoCoMo cosmosint hatena nishijhs free guide acport mood Vanilla JUMP kenet iecats linkshare ray bio titech

Content

The content, which more than one image contained, were:

  • http://www.staatstheater-darmstadt.de/mobile (7 times)
  • www.tagmore.com/esmadrid (7 times)
  • 2021200000 (6 times)
  • Liang
    http://www.flickr.com/photos/liang_2005/
    (6 times)
  • MEBKM:URL:http\://en.wikipedia.org/wiki/Main_Page;; (5 times)
  • http://www.malcolmhall.com (5 times)
  • http://aboutme.fernandezdiaz.es (4 times)
  • http://thesun.mobi (4 times)
  • http://bbc.co.uk/programmes (4 times)
  • http://d-qr.net/dqrcgi/r.aspx?r=8r13eci7 (4 times)
  • Domains


    the domains of links, that the codes link to, show the following distribution:

    • .com (483 times, 38.21%)
    • .jp (327 times, 25.87%)
    • .net (95 times, 7.52%)
    • .co.jp (88 times, 6.96%)
    • .org (53 times, 4.19%)
    • .de (18 times, 1.42%)
    • .info (16 times, 1.27%)
    • .mobi (15 times, 1.19%)
    • .es (9 times, 0.71%)
    • .tv (8 times, 0.63%)
    • .ch (8 times, 0.63%)
    • .co.uk (8 times, 0.63%)
    • .au (7 times, 0.55%)
    • .ca (7 times, 0.55%)
    • .tw (7 times, 0.55%)
    • .be (7 times, 0.55%)
    • .cc (5 times, 0.40%)
    • .br (5 times, 0.40%)
    • .gs (5 times, 0.40%)
    • .cx (4 times, 0.32%)



    Code Types

    Images

    • average image dimensions, which could be decoded: (219,213)
    • average image dimensions, which could not be decoded: (336,288)

    Conclusion

    Not surprisingly, the MEBKM code type appeared very often, because the QR Codes are used frequently in Japan to exchange contact data. The standard for this type was developed by NTT docomo and is called Bookmark Registration. You can also find descriptions of other code types on their page.

    As mentioned in the beginning, QR Codes are supposed to be used in the real world to link content. Maybe this is also the reason, that I did not find any indexing services, which could create masses of QR Codes, but only found individual Codes, which users created by different types of encoders (since it is an open standard) and show them on their sites, so that the image search engines could find them.

    Acknowledgement
    Thanks to Philip also coming up with the idea to do this.