Top Qs
Timeline
Chat
Perspective

Delimiter-separated values

Text format for indicating tabular data From Wikipedia, the free encyclopedia

Remove ads

Delimiter-separated values (DSV)[2]:113 is a way of storing a two-dimensional array of text data by separating the fields (values) of each row with a specific delimiter character. Typically, the data is like a database table with each row containing information about a different item (such as a book or company) and each field storing information about the item (such as title or name).[3]

Quick Facts Uniform Type Identifier (UTI) ...

A delimited text file is a text file that stores data as DSV. Such a file can be can classified as a flat-file database if, in fact, the data is database-like accessing individual rows is meaningful.

Since DSV is commonly supported by database and spreadsheet software, it is often used for data exchange.

A commonly used alternative for text data is fixed-width where each column has the same number of characters limiting the length of each field value. In contrast, DSV supports field values of any length.[4]

Remove ads

Format

Summarize
Perspective

DSV is a categorization of data format; not a particular format. To be useful, a convention must be established that defines the precise format. In general a format is categorized as DSV if it is lines of delimiter-separated values (where lines are newline-separated). The first row is sometimes a special record containing the column names.

Any character may be used to separate field values, and the more commonly used include comma, tab, colon, vertical bar (a.k.a. pipe) and space.[2]:113[5] ASCII and Unicode include control characters that are intended to be used as delimiters: file separator, group separator, record separator, and unit separator. Use of these in DSV data is relatively uncommon although the MARC 21 bibliographic data format does.[6]

Two commonly used sub-categories of DSV, comma-separated values (CSV) and tab-separated values (TSV), are supported by many software packages including many spreadsheet and statistical applications. Some can import such data even without the user describing the format such as which character to use as the delimiter.[7][8] Even though such an application may more directly support a more capable and possibly proprietary internal data model (for example, accdb or xlsx), they can map DSV data to their internal data model.[citation needed]

Remove ads

Challenges

A continual challenge with DSV data is ensuring valid data structure. In particular, if the number of fields on each line varies, importing into a system such as a database may fail.

A particular challenge of DSV is delimiter collision what happens when the delimiter character is used in a field value when there is no accommodation for doing so. The character is interpreted as a separator splitting a single, logical value into two. Some DSV conventions provide for avoiding collision while others do not.

A commonly used way to avoid delimiter collision is to surround a field value in double quotes. A convention could require this for all values or it could be optional so that it might only be used for values that have an embedded delimiter character.

Collision can be avoided if the convention disallows the delimiter in a field value; which is the tacit implication if the convention provides no way to avoid collision. Using a relatively unusual character (i.e. tilde ~) limits the impact on possible field values. But, even though a character may seem unusual, in practice it might be used and then result in a processing error.

Remove ads

Example

In the following example, fields are separated by a comma.

"Date","Pupil","Grade"
"25 May","Bloggs, Fred","C"
"25 May","Doe, Jane","B"
"15 July","Bloggs, Fred","A"
"15 April","Muniz, Alvin ""Hank""","A"

Each field value is enclosed in double quotes so that a field value can contain a comma. The comma in "Bloggs, Fred" is not a value separator because the text is enclosed in double-quotes. Some formats allow newline to be included in a value via this mechanism.

The format for this example allows a double-quote to be embedded in a value by including two sequential double-quotes where the first one acts as an escape character so that the second one is interpreted as a double-quote instead of field begin or end. The value "Muniz, Alvin ""Hank""" is interpreted as Muniz, Alvin "Hank".

See also

Notes and references

Further reading

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads