Known Issues and Limitations
- The LAZ and zLAS (Esri) compressed LiDAR data formats are currently unsupported although zLidar and zipped LAS files (.zip) are.
- There is no support for reading waveform data contained within or associated with LAS files.
- File directories cannot contain apostrophes (', e.g. /John's data/) as they will be interpreted in the arguments array as single quoted strings.
- The Python scripts included with WhiteboxTools require Python 3. They will not work with Python 2, which is frequently the default Python version installed on many systems.
- Not all types of GeoTIFF files are supported. The GeoTIFF format is very flexible and certain less common varieties may not be read properly by WhiteboxTools. The WhiteboxTools GeoTIFF reader/writer is a custom implementation and does not rely on other library (e.g. GDAL) and therefore there may be difficulties when exchanging GeoTIFF data between WhiteboxTools and GDAL supported software.
Memory Requirements
People sometimes ask me what the memory requirements of WhiteboxTools is. There is no simple answer to this question because each tool has different memory requirements. The memory requirements of a tool will depend on a number of factors. Clearly the tools that read and write raster data have among the largest memory requirements, although the LiDAR tools will also have substantial memory footprints compared with others as well. When you are working with raster data, say a digital elevation model (DEM) data set, a good convention is that the minimum memory that your system will need is at least 4X greater than the size of your data set. Likely you will want to ensure that you have more than this minimum, i.e. 5-10X is an appropriate range.
However, there are some important factors to consider when trying to calculate the memory that you'll need to process a data set. First, whatever size your raster file is on disc does not relate to how large it will be when read into memory. Raster files are commonly compressed, e.g. GeoTIFFs are usually compressed) and the actual uncompressed size of the file may be considerably larger. The number of grid cells (rows x columns) and the data type (bit depth) are better indicators of memory size than is the file size. Secondly, when WhiteboxTools reads your file, it will convert the grid cell values into 64-bit floating point values, while it is quite likely that the data set contains data in a smaller data type. For example, it is common for DEM data to be stored as 32-bit floats, meaning that the in-memory footprint of your DEM will be doubled. Why does WhiteboxTools do this? When WhiteboxTools reads your raster file, it converts it into a generic raster format that is then fed to the tool that you are running. The tool likely needs to read individual grid cell values. For example, it may scan an input raster DEM cell-by-cell and place the cell values into a variable (z = dem.get_value(row, col)
) to then work with. The variable z
needs to have a data type assigned to it at compile time and it cannot know in advance what type of data will be contained within your file. And so it will be assigned a data type that is large enough to hold (after conversion) any data type that the user may throw at it--a 64-bit float in most cases. This approach greatly simplifies the code for tools compared with the alternative by which each possible input data type is handled in separate code paths. The main implication of this approach is that whatever the uncompressed data size of your input raster file is, it will likely be doubled (if it is 32-bit data on disc), or even quadrupled (if it is 16-bit data on disc), when it is read into memory.
Furthermore, in addition to the memory requirements of the input raster, most tools need to create an output file, which exists in memory before being written into a file at the end of operation. This is why it was stated above that the minimum memory requirement is 4X the size of the data set (8X if it is in 16-bit form on disc), i.e. a doubling for the data type issue and another doubling for the input/output raster of the tool. Many tool, particularly those that perform complex tasks beyond simply scanning the data in a filter-type operation, will also have intermediate data that are often in the form of in-memory rasters or data structures used for spatial searches. This further increases the memory requirements of individual tools. I always try to develop tools so that they have as modest memory requirements as possible. However, in developing WhiteboxTools, I have made certain design decisions that favour speed over memory footprint. This is why I frequently get feedback from users that WhiteboxTools is impressively fast. The relation between performance speed and memory requirements is a balance and Whitebox has certainly favoured speed. This was an intentional design decision in recognition that 1) system memory has increased significantly since 2017 when the project began and continues to do so (systems with between 64 GB and 256 GB are now widely available), and RAM is becoming cheaper, and 2) users that are working with extremely massive data set often have access to systems with large enough memory resources to handle the processing needs of WhiteboxTools when working with these types of data.
WhiteboxTools' predecessor, Whitebox GAT, was developed using the Java programming language. Java is known to use memory very inefficiently, and I would regularly have users contact me with out-of-memory issues. This was one of the main motivations that I had in developing WhiteboxTools, and for choosing the Rust programming language to do so. Using Rust for this project has afforded me significant performance gains over Java, but it has also significantly lowered the memory requirements of equivalent tools. Rust uses memory more conservatively than Java. And this is the main reason why I rarely receive feedback from users who have encountered out-of-memory issues using WhiteboxTools. It certainly can happen however, particularly if you are trying to process a large data set on a system with insufficient memory resources and without an understanding of the relationship between file size and system memory.