Unstructured data

Information without a formal data model / From Wikipedia, the free encyclopedia

Dear Wikiwand AI, let's keep it short by simply answering these key questions:

Can you list the top facts and stats about Unstructured data?

Summarize this article for a 10 years old


Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.

In 1998, Merrill Lynch said "unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%."[1] It's unclear what the source of this number is, but nonetheless it is accepted by some.[2] Other sources have reported similar or higher percentages of unstructured data.[3][4][5]

As of 2012, IDC and Dell EMC project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010.[6] More recently, IDC and Seagate predict that the global datasphere will grow to 163 zettabytes by 2025 [7] and majority of that will be unstructured. The Computer World magazine states that unstructured information might account for more than 70–80% of all data in organizations.