zLidar File Specification Version 1.0 (DRAFT)
John Lindsay, PhD
Geomorphometry and Hydrogeomatics Research Group
The University of Guelph
June 8, 2020
1. What is a zLidar file?
The zLidar file format (stored in *.zlidar files) is intended to store the data output from LiDAR scanner systems. It adds a lossless compression scheme to the widely used LAS file format. The ASPRS-developed LAS file format is well suited to storing LiDAR data and has a well-maintained open specification. It is also very well supported by existing geomatics software and many LiDAR hardware manufacturers have adopted the LAS specification. However, LAS files are often extremely large because of their verbose nature. This can make data handling for large LiDAR projects challenging or even unmanageable. The zLidar format is intended to ease this situation by adding a lossless data compression scheme to LAS data.
2. The Case For The zLidar Format
Some readers may ask why there is a need for the zLidar format when the LiDAR community already has other compressed file formats, such as zLAS and LAZ? Importantly, neither of these compressed LiDAR file formats have an open specification, of the type that we see with LAS, e57, Shapefile, GeoTIFF, or other standard data formats used in geomatics.
The zLAS format (*.zlas), sometimes called the Esri optimized LAS file, is a proprietary format developed by Esri. While Esri maintain an open-source library (licensed under the Apache License, Version 2.0) that is available to 3rd party developers to add support for reading and writing the data format in their own software, the zLAS format itself is closed-source. Furthermore, the zLAS encoding/decoding library is only available on MS Windows operating system and is therefore not a cross-platform solution.
The LAZ file format is a compelling file format for compressed LiDAR data and has been widely adopted by the geomatics community. It provides a LiDAR-specific compression scheme that often results in impressive compression ratios; LAZ files are significantly smaller than the corresponding LAS files. The LAZ file format is the LiDAR data format that is used within the popular LasTools software, developed by rapidlasso. Like Esri's zLas format, an open-source LAZ encoding/decoding library (LASZip) is available and has been widely adopted by geomatics software applications, including many open-source GIS. Unlike the zLAS format, it is theoretically possible to determine how LAZ files are formatted by interrogating this library's source code.
The general structure of the LAZ file format is partially described in a short conference proceedings (Isenburg, 2013); however, this paper is evidently out of date compared with the current LAZ scheme. For example, the Isenburg (2013) paper makes reference to several future changes to the data format that may or may not have been adopted. Furthermore, this paper does not provide sufficient detail to allow the reader to develop an independent LAZ encoder/decoder because of the brief nature of conference proceedings. In 2017, efforts were made to create a file specification for the LAZ file format, and an 'in progress' draft document does exist. However, it appears that the errorts to complete the specification have been abandoned as it has not been updated since it was created in 2017. Without a completed file specification, the LAZ file format is effectively defined by an open-source reference implementation rather than an open specification. That is, there is currently no document that completely describes the internal structure of a LAZ file (other than the source code), like there is with the LAS and e57 data formats. Interrogating the source code to determine the file structure is also very challenging given the sophisticated and complex nature of the data format and a codebase of this scale. The complexity of the LAZ format reflects its primary objective, which is to create the highest possible lossless compression rate for LiDAR data, for which the LAZ format has done a commendable job. While challenging, it is not impossible to create an independent LAZ encoder/decoder, as evidenced by the las-perf project. However, again, efforts to create an independent encoder/decoder for the format will always be confounded by the possibility of ongoing changes to the original reference implementation, LASZip. This makes maintaining alternative LAZ encoders/decoders very challenging and creates opportunities for fragmentation.
Using a reference implementation to define a file format carries with it the potential for data incompatibility, a situation where the main reader/writer software no longer supports some legacy data when the implementation adopts backwards incompatible changes. That is, if the maintainers of the LAZ format alter the file structure (or rather the encoder/decoder implementation) in a backwards-incompatible manner, all existing data stored using the previous format would become incompatible with the only existing reader/writer of the format. While this is unlikely to occur, it is a general vulnerability associated with the use of a reference implementation rather than an open specification. Software implementations are constantly evolving, and therefore, so too are implementation-defined file formats.
An open specification for a data format, however, serves as a contract with the user and developer communities regarding the long-term stability and viability of a particular data format. With data formats defined by open specifications, developers must ensure that their 3rd party readers/writers are compatible with the specification, which adheres to strict versioning rules.
With the above rationale established, we present the zLidar data format in this document. The primary objectives of the zLidar data format are:
- To provide a simple and flexible means for storing LiDAR data in a losslessly compressed format that significantly reduces data storage needs compared with LAS data.
- To provide a compressed LiDAR data format that is defined by an open specification that allows for development of multiple independent encoder/decoder libraries that are accessible to the user and developer communities.
The main benefit of this approach is viable long-term storage of valuable LiDAR data resources in a manner that is less vulnerable to the ephemeral nature of individual software projects and their developers.
Weitz, Lindsay (Esri). 2015. Esri Optimized LAS (zLAS) I/O Runtime Library is Now Available. Online resource viewed 08/06/2020.
Isenburg, M., 2013. LASzip: lossless compression of LiDAR data. Photogrammetric engineering and remote sensing, 79(2), pp.209-217.
3. The zLidar File Structure
3.1. Data Types
The following data types are used to store information contained within a zLidar file. Note that these data types are conform with the 1999 ANSI C Language Specification.
u8 = unsigned 8-bit interger (byte)
u16 = unsigned 16-bit integer (short)
u32 = unsigned 32-bit integer (long)
u64 = unsigned 64-bit integer (long long)
i16 = signed 16-bit integer (signed short)
i32 = signed 32-bit integer (signed long)
f64 = 64-bit floating-point (double)
3.2 File Header and Variable Length Records (VLRs)
The file header of a zLidar file is exactly the same as that found in a LAS file, with the one notable exception that the File Signature field is changed from “LASF” to “ZLDR” in the zLidar file header. Variable length records (VLRs) are stored in the same way described in the LAS specification. Notice that the zLidar format does not specify which LAS format the data are derived from and therefore a zLidar encoder/decoder should be able to handle LAS v1.1, v1.2, v1.3 and v1.4 formatted header data.
3.3 Point Record Data
Some point data formats require points to be sorted by some geographic criteria (i.e. spatial indexing) but this is not the case for zLidar files. There is no requirements for the ordering of points in a zLidar file although it is common to preserve the same point order as the underlying source LAS file. The start of the point data must be aligned to a 32-bit boundary (i.e. 4-byte word) and buffering with 0's may be used to ensure this condition when the ending of the previous data section (e.g. the VLRs) would not naturally align on a word boundary.
Point data are stored in blocks of points and the structure of point data contained within these blocks differs substantially from the corresponding structure of a LAS file. There is no requirement for a zLidar encoder/decoder to use a specific block size, and in fact each block may contain a different number of points. The use of point blocks, combined with the auxiliary index file, can allow for more localized access to point data, i.e. it limits the need to read all of the points in a file for applications that require a subset of points only. The WhiteboxTools zLidar encoder/decoder currently defaults to block sizes of 50 000 points, with the exception of the last block, which may have fewer points. Each point block begins with a header, seen below:
Table 1: The point block header structure of a zLidar file.
|FieldDescriptorValues||FieldDescriptor||20 * NumberOfFields|
The byte offset to point data contained within the file header (i.e. the LAS header), should point to the start of the first point block header.
NumberOfFields: This value indicates how many point fields are used to describe point features. It is determined by the LAS point data record format. For example, point data containing LAS point format 0 data usually contain 9 point fields, including x, y, z, intensity, the return data byte (containing the return number, number of returns, scan direction bit flag, and edge of flight line bit flag), the classification byte, the scan angle, the user data, and the point source ID. Notice that the intensity and user data are actually optional fields in the LAS specification and so there may actually be fewer than 9 fields. Other LAS point record formats will have greater numbers of fields (e.g. GPS time data, RGB colour data, etc.). The reader is referred to the LAS specification for a mapping of mandatory and optional point fields onto point record formats.
CompressionMethod: At the moment only the DEFLATE compression method is supported and this field is set to 0. Future versions of the specification may allow for alternative compression methods (e.g. LZW), in which case other non-zero values could be used in this field.
MajorVersion and MinorVersion: A zLidar file that meets the current specification must set these values to 1 and 0 respectively.
FieldDescriptorValues: A point block header ends with a listing of FieldDescriptors, one for each point field (of which there are NumberOfFields fields) contained in the point block. The FieldDescriptor data structure is organized as follows:
Table 2: The FieldDescriptor data structure of a zLidar file.
- DataCode: This value documents the kind of data contained within a point field. This numeric code maps values onto point fields based on the following scheme:
Table 3: Interpretation of the DataCode field used in a zLidar file.
|4||point return bit-field||u8|
|8||point source ID||u16|
|9||GPS time data||f64|
FileOffset: This value contains the offset within the file to the start of the point field data. It is zero-based and starts at the beginning of the file (and therefore includes the length of the header and any preceding VLRs). Offset values should be aligned with 32-bit boundaries and if the length of the previous point data field does not naturally allow this, a buffer of 0's may be used to ensure the starting byte of a field meets alignment requirements.
ByteLength: This value contains the compressed byte length of the point field data. Importantly, the length must be exclusive of any buffering 0 values used to ensure point field data are aligned to 32-bit boundaries.
Notice that there is no information contained within the header to describe the number of points in the point block. A zLidar decoder can obtain this information either from the number of data values that the decompression algorithm yields or from the auxiliary index file, if present. A zLidar decoder should perform checks to ensure that the number of values is the same for each point field, i.e. that there is a consistent block size of point field values within a point block.
Individual compressed point fields are stored sequentially after the point block header. While there is no requirement that point field data are stored in any particular order, these data must be sorted in the order described within the header's FieldDescriptorValues.
For most of the field data types (Table 3), the data are merely compressed using the appropriate compression method (e.g., DEFLATE) and then saved to the file at the location described by the corresponding offset value. For example, the compressed byte data for all of the x-coordinates of the point block are saved to the file, followed by the compressed byte data for all of the point y-coordinates, etc. This differs from the LAS structure, where all of the data associated with a single point is stored sequentially in the file on a point-by-point basis. The data type used for each point field matches that of the LAS specification. So, for example, x-coordinate data are represented as u32s, intensity data are represented as u16s, etc. The reader should refer the data types listed in Table 3 or to the LAS specification for more detail.
While most of the point fields are simply the DEFLATE treated byte representation of the raw data, certain field data are pre-processed before compression to allow for greater compression rates. Specifically, the x- and y-coordinate data are first converted to their i32 representations (using the offset and scale factors contained within the LAS header) and then each point value in the block sequence is subtracted from the previous value. It is these x and y point difference values that are then compressed and stored in the zLidar file. (Notice that the first point within the point block is left undifferenced since there is no preceding value.) Scan angle and GPS time data are also pre-differenced before compression is applied. The z-coordinate data are compressed using a similar scheme to the x and y coordinate data (i.e., converting to i32 values using the offsets and scale values and then performing point differencing), except that point values are differenced not from the previous point in the block, but instead from the previous point of the same broad return class, distinguishing between a late-return point class (i.e., last and single returns) and an early-return point class (i.e., first and intermediate returns). This does add the extra complexity that a zLidar decoder must ensure that the point return data is read prior to the point z-coordinate data.
3.4 Auxiliary Index File
The index file is optional and can allow for more effective retrieval of point data without the need to read the entire zLidar data file. Thus, the zLidar index file serves a similar function to the corresponding *.shx file for Shapefiles. zLidar encoders/decoders are not required to read or write these files; that is, zLidar encoders/decoders should have the ability to read zLidar formated data without an accompanying index file. When provided, these are sidecar files are stored using the *.zLidarx file extension. Thus a zLidar file named 1km174180469202017LLAKEERIE.zlidar may have an accompanying 1km174180469202017LLAKEERIE.zlidarx index file.
The zLidar index file begins with the same LAS file header and VLR data contained within the parent zLidar file. After these, the zLidar index file contains a data header for each of the point blocks contained within the parent zLidar file. This header is structured as follows:
Table 4: The point block header structure of a zLidar file.
|FieldDescriptorValues||FieldDescriptor||20 * NumberOfFields|
Thus, the index file point block headers take the same general format to the block headers in the parent zLidar file, with the addition of two fields:
StartingPoint: This value is the index of the first point contained within the point block. Index values are zero-based, where index 0; thus, in a file where a constant block size of 50 000 is used, the first block header would have a StartingPoint of 0, the second of 49 999, etc.
NumberOfPoints: This value is the number of points contained within the point block.
Appendix A: Compression Rates and File Size
zLidar files are typically about 18% of the size of the equivalent LAS files and about 52% larger than the equivalent LAZ files. The following table presents a comparison of zLidar file size with LAS and LAZ files for a number of multi-tile data sets of varying sizes.
Table A.1. Comparison of LiDAR data format files sizes in gigabytes for sample data sets.
|Dataset||Tiles||Points||LAS (GB)||LAZ (GB)||zLidar (GB)|
|Rondeau Bay 2012||586||1.141x109||31.95||3.34||5.08|
|Rondeau Bay 2018||953||1.014x1010||304.19||29.88||44.16|