Top Qs
Timeline
Chat
Perspective

Automatic image annotation

From Wikipedia, the free encyclopedia

Automatic image annotation
Remove ads

Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

Thumb
Output of DenseCap "dense captioning" software, analysing a photograph of a man riding an elephant

This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images.[1] The first methods learned the correlations between image features and training annotations. Subsequently, techniques were developed using machine translation to to attempt to translate the textual vocabulary into the 'visual vocabulary,' represented by clustered regions known as blobs. Subsequent work has included classification approaches, relevance models, and other related methods.

The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user.[2] At present, Content-Based Image Retrieval (CBIR) generally requires users to search by image concepts such as color and texture or by finding example queries. However, certain image features in example images may override the concept that the user is truly focusing on. Traditional methods of image retrieval, such as those used by libraries, have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.

Remove ads

Types of Image Annotation

  • 2D Bounding Boxes : It defines the boundaries of objects in a two-dimensional space using graphic representations.
  • Object Detection : This deals with detection of instances of semantic objects pertaining to a certain class (humans, buildings, or cars) with respect to digital images and videos.
  • Key Point Annotation : The Key Point image data annotation recognizes facial gestures, human poses, expressions, emotions, body language, and sentiments through the connection of multiple dots.
  • Polygon Annotation : This involves marking and drawing shapes on a digital image. It allows marking objects within an image based on their position and orientation.
  • 3D Cuboid Annotation : This is used for detecting and recognizing 3D objects in images.
  • Semantic Segmentation : A semantic segmentation technique is used in computer vision to segment images. An image dataset is semantically segmented to locate all categories and classes.
  • Image Classification : Images or objects are classified within images as per custom multi-level taxonomies like land use, crops, etc.
  • Skeletal Annotation : This is used to highlight body movement and alignment.[3]
Remove ads

See also

References

Further reading

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads