Tools for reading OMI level 2 data

This page contains some suggestions and links to get you started reading and using OMI level 2 data. All OMI level 2 data are stored in HDF-EOS 5 files. This is an extension of standard HDF-5, mainly standardising how metadata is stored in the files, and the location of certain groups within a file. Standard HDF-5 tools can be used to read the OMI level 2 files. Functions to read HDF-5 files are built into IDL and Matlab, and a long list of other tools.

When you install the HDF-5 library, available pre-compiled for most commonly used platforms and in source-form for all others, you will also get a set of tools to manipulate HDF-5 files from the command-line. The most important are:

  • h5import - Imports ASCII or binary data into HDF5.
  • h5diff - Compares two HDF5 files and reports the differences. See also
  • h5repack - Copies an HDF5 file to a new file with or without compression/chunking.
  • h5dump - Enables the user to examine the contents of an HDF5 file and dump those contents to an ASCII file.
  • h5ls - Lists selected information about file objects in the specified format.
A more complete list is available from the HDF Group web-site. The hdfview graphical browser for HDF files is highly recommended to investigate the structure of unknown HDF (4 or 5) files.

HDF-4, HDF-5, HDF-EOS, NetCDF, … help!

The decision to use HDF-EOS is a NASA-wide decision, followed by the OMI science team. Many other instruments in the A-train (MODIS, CloudSat) use the older HDF-4, often indicated with just “HDF”. Note that applications meant for HDF-4 will not work on HDF-5 files. To complicate matters, the OMI level 1B data are stored in HDF-EOS 4 as well, whereas OMI level 2 data is stored in HDF-EOS 5. This page only deals with OMI level 2 data.

In the modelling world, a commonly used data format is NetCDF. As far as I can tell there is no fundamental reason why NetCDF could not have been used for archiving OMI data; the structures it provides are in principle suitable for use. There is one limitation in NetCDF 3 (the current version), and that is that transparent internal compression of the data is not available. Compression is a feature of HDF (both versions), and reduces the size of the data files significantly (~25 %). The good news is that NetCDF 4 will be file-compatible with HDF-5.

As mentioned before, the OMI level 1B file format is HDF-4. This is because the OMI level 1B format was defined before the level 2 file-formats. All fields are stored as SD sets, and therefore code you have for reading MODIS level 2 files can probably be re-used for OMI level 1B. Be sure to read the documentation of the files, as the method by which the data is stored may not be obvious at first.

Regarding compression

There are two compression methods in use for HDF-5 files: One is based on the ZLIB library (gzip) and another is based on the SZIP library. The ZLIB library is open source and can be used without limitations, while SZIP may require a separate license. The OMI files only use gzip compression and therefore the SZIP library is not required if you want to use OMI data. If you need to play it safe (license-wise), compile your own library without SZIP.

Building the library and tools from source

On my Mac I use the following steps to build the library. These instructions should work on most unix-like systems.

  1. Download and unpack the latest sources from the the HDF group download page.
  2. Move into the unpacked source directory and issue the following command:
        ./configure --prefix=/usr/local/ --enable-production --with-zlib
    You may notice that I do not enable the fortran interface. Building the fortran interface on a Macintosh PowerPC machine requires the IBM XLF compiler which I do not have. I tried to use gfortran as a substitute, but that didn't help. For the tools the fortran interface is not needed.
  3. Now the software can be built and tested with the usual make and make test. Do not skip the second step!
  4. After testing the software can be installed. This is platform dependent, in Mac OS X you'll need sudo make install, using an account with administrative powers. On linux you probably can use the same command, or use su root; make install. Ask your administrator (or experienced colleague) to do this for you if you don't know how to proceed.
You should now have the command-line tools to investigate the contents of OMI level 2 files.

Documentation to help interpret the data correctly

For each of the OMI products there is a product page with an overview of the product, the read-me file, a link to the data-access at the DISC, and a condensed file specification and a full product specification (examples here are for the ozone DOAS product).

The file specification contains a short description of the flags in various flagging fields. The full product specification is a more verbose version of the file specification. The read-me file contains a description of the reliability of the various product fields, including a recommended set of filter-flags to remove data that is considered unreliable by the algorithm developers and validators.

IDL code

Recent versions of IDL have the necessary functions installed, you'll need at least version 6.2, but 6.4 is strongly recommended as earlier releases contain some serious bugs that may or may not affect you (mainly in dealing with byte fields).

The CAMA toolkit is a validation toolkit for OMI level 2 data. It contains some convenience functions for reading OMI level 2 data. Based on this toolkit I wrote a function for IDL to read OMI data and apply the appropriate scaling factors and offset values. This function is called READ_OMI_LEVEL2(), and a usage example for this function is given below.

filename   = 'OMI-Aura_L2-OMCLDO2_2006m0823t2027-o11207_v002-2006m0824t192131.he5'
swathname  = 'CloudFractionAndPressure'
; usage: data = READ_OMI_LEVEL2(filename, swathname, fieldname [,/FLAGS])
latitudes  = READ_OMI_LEVEL2(filename, swathname, 'Latitude')
longitudes = READ_OMI_LEVEL2(filename, swathname, 'Longitude')
CloudFraction = READ_OMI_LEVEL2(filename, swathname, 'CloudFraction')
CloudPressure = READ_OMI_LEVEL2(filename, swathname, 'CloudPressure')
ProcessingQF  = READ_OMI_LEVEL2(filename, swathname, 'ProcessingQualityFlags', /FLAGS)
The swathname differs for each of the OMI level 2 products. You can find the name of the product you're interested in in the file specification, or use the interactive tool hdfview. You can use the same methods to find the fieldname you are interested in. The last line reads in the processing quality flags. Here the /FLAGS keyword is set to skip the scaling and fill-value filtering that is normally applied. Note that all strings (the swathname and fieldname) are case-sensitive.

Python Code

The most elegant HDF-5 interface I've found for any language is for Python, using the pytables package. Installation instructions can be found in the readme file within in the tar-bundle. You will have to install the HDF-5 library and the python NumPy package as well.

Once you have the required components, The following code will do the same as before:

import tables, numpy, os, math, sys

filename  = "OMI-Aura_L2-OMCLDO2_2007m1017t0009-o17311_v002-2007m1017t172248.he5"
swathname = "CloudFractionAndPressure"

# Open the HDF-file, and offset the root so that path references become shorter
hdf5ref = tables.openFile(filename, mode="r", rootUEP="/HDFEOS/SWATHS/"+swathname)

# get references to the geolocation and data fileds
geo = hdf5ref.getNode("/","Geolocation Fields")
data = hdf5ref.getNode("/","Data Fields")

# read the data
latitude =
longitude =
cldfrac =

# close access to the file
The values for the offset, slope and fill values are stored in attributes. The PyTables documentation has examples on how to do this.


© OMI -- Last update: Friday, 05-Dec-2008 12:05:10 UTC. --