Next: Dimensions, Previous: Table Settings, Up: Light Detail Member Format [Contents]
Formats =>
    int32[n-widths] int32*[n-widths]
    string[locale]
    int32[current-layer]
    bool[x7] bool[x8] bool[x9]
    Y0
    CustomCurrency
    count(
      v1(X0?)
      v3(count(X1 count(X2)) count(X3)))
Y0 => int32[epoch] byte[decimal] byte[grouping]
CustomCurrency => int32[n-ccs] string*[n-ccs]
If n-widths is nonzero, then the accompanying integers are
column widths as manually adjusted by the user.
locale is a locale including an encoding, such as
en_US.windows-1252 or it_IT.windows-1252.
(locale is often duplicated in Y1, described below).
epoch is the year that starts the epoch.  A 2-digit year is
interpreted as belonging to the 100 years beginning at the epoch.  The
default epoch year is 69 years prior to the current year; thus, in
2017 this field by default contains 1948.  In the corpus, epoch
ranges from 1943 to 1948, plus some contain -1.
decimal is the decimal point character.  The observed values
are ‘.’ and ‘,’.
grouping is the grouping character.  Usually, it is ‘,’ if
decimal is ‘.’, and vice versa.  Other observed values are
‘'’ (apostrophe), ‘ ’ (space), and zero (presumably
indicating that digits should not be grouped).
n-ccs is observed as either 0 or 5.  When it is 5, the
following strings are CCA through CCE format strings.  See Custom
Currency Formats in PSPP.  Most commonly these are all
-,,, but other strings occur.
A writer may safely use false for x7, x8, and x9.
X0 only appears, optionally, in version 1 members.
X0 => byte*14 Y1 Y2
Y1 =>
    string[command] string[command-local]
    string[language] string[charset] string[locale]
    bool[x10] bool[include-leading-zero] bool[x12] bool[x13]
    Y0
Y2 => CustomCurrency byte[missing] bool[x17]
command describes the statistical procedure that generated the
output, in English.  It is not necessarily the literal syntax name of
the procedure: for example, NPAR TESTS becomes “Nonparametric
Tests.”  command-local is the procedure’s name, translated
into the output language; it is often empty and, when it is not,
sometimes the same as command.
include-leading-zero is the LEADZERO setting for the
table, where false is OFF (the default) and true is ON.
See SET LEADZERO in PSPP.
missing is the character used to indicate that a cell contains
a missing value.  It is always observed as ‘.’.
A writer may safely use false for x10 and x17 and true
for x12 and x13.
X1 only appears in version 3 members.
X1 =>
    bool[x14]
    byte[show-title]
    bool[x16]
    byte[lang]
    byte[show-variables]
    byte[show-values]
    int32[x18] int32[x19]
    00*17
    bool[x20]
    bool[show-caption]
lang may indicate the language in use.  Some values seem to be
0: en, 1: de, 2: es, 3: it, 5: ko, 6: pl, 8:
zh-tw, 10: pt_BR, 11: fr.
show-variables determines how variables are displayed by
default.  A value of 1 means to display variable names, 2 to display
variable labels when available, 3 to display both (name followed by
label, separated by a space).  The most common value is 0, which
probably means to use a global default.
show-values is a similar setting for values.  A value of 1
means to display the value, 2 to display the value label when
available, 3 to display both.  Again, the most common value is 0,
which probably means to use a global default.
show-title is 1 to show the caption, 10 to hide it.
show-caption is true to show the caption, false to hide it.
A writer may safely use false for x14, false for x16, 0
for lang, -1 for x18 and x19, and false for
x20.
X2 only appears in version 3 members.
X2 =>
    int32[n-row-heights] int32*[n-row-heights]
    int32[n-style-map] StyleMap*[n-style-map]
    int32[n-styles] StylePair*[n-styles]
    count((i0 i0)?)
StyleMap => int64[cell-index] int16[style-index]
If present, n-row-heights and the accompanying integers are row
heights as manually adjusted by the user.
The rest of X2 specifies styles for data cells. At first glance this is odd, because each data cell can have its own style embedded as part of the data, but in practice X2 specifies a style for a cell only if that cell is empty (and thus does not appear in the data at all). Each StyleMap specifies the index of a blank cell, calculated the same was as in the Cells (see Cells), along with a 0-based index into the accompanying StylePair array.
A writer may safely omit the optional i0 i0 inside the
count(…).
X3 only appears in version 3 members.
X3 =>
    01 00 byte[x21] 00 00 00
    Y1
    double[small] 01
    (string[dataset] string[datafile] i0 int32[date] i0)?
    Y2
    (int32[x22] i0 01?)?
small is a small real number.  In the corpus, it overwhelmingly
takes the value 0.0001, with zero occasionally seen.  Nonzero numbers
with format 40 (see Value) whose magnitudes are
smaller than displayed in scientific notation.  (Thus, a small
of zero prevents scientific notation from being chosen.)
dataset is the name of the dataset analyzed to produce the
output, e.g. DataSet1, and datafile the name of the
file it was read from, e.g. C:\Users\foo\bar.sav.  The latter
is sometimes the empty string.
date is a date, as seconds since the epoch, i.e. since
January 1, 1970.  Pivot tables within an SPV file often have dates a
few minutes apart, so this is probably a creation date for the table
rather than for the file.
Sometimes dataset, datafile, and date are present
and other times they are absent.  The reader can distinguish by
assuming that they are present and then checking whether the
presumptive dataset contains a null byte (a valid string never
will).
x22 is usually 0 or 2000000.
A writer may safely use 4 for x21 and omit x22 and the
other optional bytes at the end.
Formats contains several indications of character encoding:
locale in Formats itself.
locale in Y1 (in version 1, Y1 is optionally nested inside X0;
in version 3, Y1 is nested inside X3).
charset in version 3, in Y1.
lang in X1, in version 3.
charset, if present, is a good indication of character
encoding, and in its absence the encoding suffix on locale in
Formats will work.
locale in Y1 can be disregarded: it is normally the same as
locale in Formats, and it is only present if charset is
also.
lang is not helpful and should be ignored for character
encoding purposes.
However, the corpus contains many examples of light members whose strings are encoded in UTF-8 despite declaring some other character set. Furthermore, the corpus contains several examples of light members in which some strings are encoded in UTF-8 (and contain multibyte characters) and other strings are encoded in another character set (and contain non-ASCII characters). PSPP treats any valid UTF-8 string as UTF-8 and only falls back to the declared encoding for strings that are not valid UTF-8.
The pspp-output program’s strings command can help
analyze the encoding in an SPV light member.  Use pspp-output
--help-dev to see its usage.
Next: Dimensions, Previous: Table Settings, Up: Light Detail Member Format [Contents]