3. The zLidar File Structure
3.1. Data Types
The following data types are used to store information contained within a zLidar file. Note that these data types are conform with the 1999 ANSI C Language Specification.
u8 = unsigned 8-bit interger (byte)
u16 = unsigned 16-bit integer (short)
u32 = unsigned 32-bit integer (long)
u64 = unsigned 64-bit integer (long long)
i16 = signed 16-bit integer (signed short)
i32 = signed 32-bit integer (signed long)
f64 = 64-bit floating-point (double)
3.2 File Header and Variable Length Records (VLRs)
The file header of a zLidar file is exactly the same as that found in a LAS file, with the one notable exception that the File Signature field is changed from “LASF” to “ZLDR” in the zLidar file header. Variable length records (VLRs) are stored in the same way described in the LAS specification. Notice that the zLidar format does not specify which LAS format the data are derived from and therefore a zLidar encoder/decoder should be able to handle LAS v1.1, v1.2, v1.3 and v1.4 formatted header data.
3.3 Point Record Data
Some point data formats require points to be sorted by some geographic criteria (i.e. spatial indexing) but this is not the case for zLidar files. There is no requirements for the ordering of points in a zLidar file although it is common to preserve the same point order as the underlying source LAS file. The start of the point data must be aligned to a 32-bit boundary (i.e. 4-byte word) and buffering with 0's may be used to ensure this condition when the ending of the previous data section (e.g. the VLRs) would not naturally align on a word boundary.
Point data are stored in blocks of points and the structure of point data contained within these blocks differs substantially from the corresponding structure of a LAS file. There is no requirement for a zLidar encoder/decoder to use a specific block size, and in fact each block may contain a different number of points. The use of point blocks, combined with the auxiliary index file, can allow for more localized access to point data, i.e. it limits the need to read all of the points in a file for applications that require a subset of points only. The WhiteboxTools zLidar encoder/decoder currently defaults to block sizes of 50 000 points, with the exception of the last block, which may have fewer points. Each point block begins with a header, seen below:
Table 1: The point block header structure of a zLidar file.
Description | Data Type | Bytes |
---|---|---|
NumberOfFields | u8 | 1 |
CompressionMethod | u8 | 1 |
MajorVersion | u8 | 1 |
MinorVersion | u8 | 1 |
FieldDescriptorValues | FieldDescriptor | 20 * NumberOfFields |
The byte offset to point data contained within the file header (i.e. the LAS header), should point to the start of the first point block header.
-
NumberOfFields: This value indicates how many point fields are used to describe point features. It is determined by the LAS point data record format. For example, point data containing LAS point format 0 data usually contain 9 point fields, including x, y, z, intensity, the return data byte (containing the return number, number of returns, scan direction bit flag, and edge of flight line bit flag), the classification byte, the scan angle, the user data, and the point source ID. Notice that the intensity and user data are actually optional fields in the LAS specification and so there may actually be fewer than 9 fields. Other LAS point record formats will have greater numbers of fields (e.g. GPS time data, RGB colour data, etc.). The reader is referred to the LAS specification for a mapping of mandatory and optional point fields onto point record formats.
-
CompressionMethod: At the moment only the DEFLATE compression method is supported and this field is set to 0. Future versions of the specification may allow for alternative compression methods (e.g. LZW), in which case other non-zero values could be used in this field.
-
MajorVersion and MinorVersion: A zLidar file that meets the current specification must set these values to 1 and 0 respectively.
-
FieldDescriptorValues: A point block header ends with a listing of FieldDescriptors, one for each point field (of which there are NumberOfFields fields) contained in the point block. The FieldDescriptor data structure is organized as follows:
Table 2: The FieldDescriptor data structure of a zLidar file.
Description | Data Type | Bytes |
---|---|---|
DataCode | u32 | 4 |
FileOffset | u64 | 8 |
ByteLength | u64 | 8 |
- DataCode: This value documents the kind of data contained within a point field. This numeric code maps values onto point fields based on the following scheme:
Table 3: Interpretation of the DataCode field used in a zLidar file.
Value | Field | Data Type |
---|---|---|
0 | x-coordinate value | i32 |
1 | y-coordinate value | i32 |
2 | z-coordinate value | i32 |
3 | intensity | u16 |
4 | point return bit-field | u8 |
5 | classification bit-field | u8 |
6 | scan angle | i16 |
7 | user data | u8 |
8 | point source ID | u16 |
9 | GPS time data | f64 |
10 | red-colour values | u16 |
11 | green-colour value | u16 |
12 | blue-colour value | u16 |
-
FileOffset: This value contains the offset within the file to the start of the point field data. It is zero-based and starts at the beginning of the file (and therefore includes the length of the header and any preceding VLRs). Offset values should be aligned with 32-bit boundaries and if the length of the previous point data field does not naturally allow this, a buffer of 0's may be used to ensure the starting byte of a field meets alignment requirements.
-
ByteLength: This value contains the compressed byte length of the point field data. Importantly, the length must be exclusive of any buffering 0 values used to ensure point field data are aligned to 32-bit boundaries.
Notice that there is no information contained within the header to describe the number of points in the point block. A zLidar decoder can obtain this information either from the number of data values that the decompression algorithm yields or from the auxiliary index file, if present. A zLidar decoder should perform checks to ensure that the number of values is the same for each point field, i.e. that there is a consistent block size of point field values within a point block.
Individual compressed point fields are stored sequentially after the point block header. While there is no requirement that point field data are stored in any particular order, these data must be sorted in the order described within the header's FieldDescriptorValues.
For most of the field data types (Table 3), the data are merely compressed using the appropriate compression method (e.g., DEFLATE) and then saved to the file at the location described by the corresponding offset value. For example, the compressed byte data for all of the x-coordinates of the point block are saved to the file, followed by the compressed byte data for all of the point y-coordinates, etc. This differs from the LAS structure, where all of the data associated with a single point is stored sequentially in the file on a point-by-point basis. The data type used for each point field matches that of the LAS specification. So, for example, x-coordinate data are represented as u32s, intensity data are represented as u16s, etc. The reader should refer the data types listed in Table 3 or to the LAS specification for more detail.
While most of the point fields are simply the DEFLATE treated byte representation of the raw data, certain field data are pre-processed before compression to allow for greater compression rates. Specifically, the x- and y-coordinate data are first converted to their i32 representations (using the offset and scale factors contained within the LAS header) and then each point value in the block sequence is subtracted from the previous value. It is these x and y point difference values that are then compressed and stored in the zLidar file. (Notice that the first point within the point block is left undifferenced since there is no preceding value.) Scan angle and GPS time data are also pre-differenced before compression is applied. The z-coordinate data are compressed using a similar scheme to the x and y coordinate data (i.e., converting to i32 values using the offsets and scale values and then performing point differencing), except that point values are differenced not from the previous point in the block, but instead from the previous point of the same broad return class, distinguishing between a late-return point class (i.e., last and single returns) and an early-return point class (i.e., first and intermediate returns). This does add the extra complexity that a zLidar decoder must ensure that the point return data is read prior to the point z-coordinate data.
3.4 Auxiliary Index File
The index file is optional and can allow for more effective retrieval of point data without the need to read the entire zLidar data file. Thus, the zLidar index file serves a similar function to the corresponding *.shx file for Shapefiles. zLidar encoders/decoders are not required to read or write these files; that is, zLidar encoders/decoders should have the ability to read zLidar formated data without an accompanying index file. When provided, these are sidecar files are stored using the *.zLidarx file extension. Thus a zLidar file named 1km174180469202017LLAKEERIE.zlidar may have an accompanying 1km174180469202017LLAKEERIE.zlidarx index file.
The zLidar index file begins with the same LAS file header and VLR data contained within the parent zLidar file. After these, the zLidar index file contains a data header for each of the point blocks contained within the parent zLidar file. This header is structured as follows:
Table 4: The point block header structure of a zLidar file.
Description | Data Type | Bytes |
---|---|---|
NumberOfFields | u8 | 1 |
CompressionMethod | u8 | 1 |
MajorVersion | u8 | 1 |
MinorVersion | u8 | 1 |
StartingPoint | u64 | 8 |
NumberOfPoints | u64 | 8 |
FieldDescriptorValues | FieldDescriptor | 20 * NumberOfFields |
Thus, the index file point block headers take the same general format to the block headers in the parent zLidar file, with the addition of two fields:
-
StartingPoint: This value is the index of the first point contained within the point block. Index values are zero-based, where index 0; thus, in a file where a constant block size of 50 000 is used, the first block header would have a StartingPoint of 0, the second of 49 999, etc.
-
NumberOfPoints: This value is the number of points contained within the point block.