PyFITS: FITS files in Python
In this article, we provide examples of using the python module PyFITS for working with FITS data. We first go through a brief overview of the FITS standard, and then we describe ways for accessing information in FITS files, using convenience functions defined in PyFITS. PyFITS offers facilities that provide more advanced access to information in FITS files, but these will not be discussed here. Perhaps, a future article will discuss these.
These instructions are based on the PyFITS Handbook (link is to a PDF file), which has exhaustive information on the facilties provided by PyFITS.
FITS stands for Flexibile Image Transport System, and is the IAU recognized standard for storing astronomical data.
A FITS file consists of one or more header and data units, referred to as HDUs. An HDU consists of a header, which describes the data contained in the HDU, followed by the data itself. The first HDU is called the Primary HDU. Other HDUs, if present, are referred to as extensions.
In a valid FITS file the data part of an HDU can be empty. The simplest valid FITS file is one with just the header of the Primary HDU. A FITS file with only the Primary HDU is refered to as a Basic FITS file or a Single Image FITS (SIF) file. A FITS file with extensions is referred to as a Multi-Extension FITS (MEF) file.
A header consists of records, also called card images, each having a maximum of 80 characters. A record contains a keyword with maximum length of 8 characters, an "=" sign at the 9th position, a space at the 10th position, followed by a value for the keyword. A "/" after the value of the keyword indicates the beginning of a commentfor the record and the characters from that point, through the 80th character becomes the comment. The only characters allowed in a header record are ASCII values 32 to 126. A record can be a blank record in which case the entire record will be filled with ASCII space character (ASCII 32). The special keywords HISTORY andCOMMENT, do not use "=" sign and simply uses positions 11 to 80, to store the value.
There is a set of standard keywords, some mandatory and others optional, defined in the FITS standard. Document describing the standard is available from the FITS Support Office. In addition to these there are many non-standard FITS keywords that are used quite frequently. The exact keywords included in an header depends on the type of the data in the data part of the HDU. A list of standard and, non-standard but frequently used, keywords is available athttp://fits.gsfc.nasa.gov/fits_dictionary.html. In addition, "World Coordinate System" (WCS) information is stored in keywords defined in appropriate standards, which are also linked to at the above url.
The value of a header keyword can be of the following types.
- String; ascii characters enclosed in single quotes.
- Logical; letter T or letter F.
- Integers; signed decimal numbers.
- Floating point numbers; similar to integers, with E or D denoting the exponent part.
- Complex numbers; specified as (real, imag).
The data in an HDU can be an image, which is an array of dimensions 1 to 999, a binary table or an ASCII table. There is another type of data structure called random groups, which is used only in radio interferometry work.
Data can be 8-bit characters, integers (8-bit unsigned, and 16-bit, 32-bit and 64-bit signed integers) and floating point numbers (32-bit and 64-bit). The data type is indicated using the BITPIX keyword, with value set to the number of bits, e.g., 8 for characters and unsigned integers. The values -32 and -64 are used for 32-bit and 64-bit formats, respectively.
An important thing to keep in mind is that the value stored in a data array, the raw value, need not be the actual value representing the physical quantity. If this is so, then the keywords BSCALE and BZERO will have values other than 1.0 and 0.0, respectively. The actual quantity that is being conveyed can be calculated as
physical_value = raw_value * BSCALE + BZERO.
For tables, the corresponding keywords are TSCALEn and TZEROn, where "n" denotes the field, i.e., table column, for which this value is applicable. So we have, 1 <= n <= TFIELDS, where TFIELDS is the number of columns in the table.
The number of "dimensions" for data arrays is specified using the keyword NAXIS. The length of each dimension is specified using the keyword NAXISn where 1 <= n <= NAXIS.
Consider a "data cube" with 200 rows (y-axis), 200 columns (x-axis) and 4 "pages" (z-axis). Then NAXIS = 3, NAXIS1 = 200 (x-axis), NAXIS2 = 200 (y-axis) and NAXIS3 = 4 (z-axis). The data is stored in "row-major" format so that, we would access an element in row 10, col 8 and page 1 as data in C, and asdata or data[0,9,8] in Python.
Tables can only appear in extensions and not in Primary HDU and the type of the table will be indicated in the XTENSION keyword of the header of the concerned HDU. If an extension contains image data, i.e., an array, then XTENSION = IMAGE.
For an ASCII table, i.e., XTENSION = TABLE, NAXIS will always be 2. NAXIS1 will be the total number of 8-bit characters in a row, and NAXIS2 will be the number of rows. In FITS a "column" is a position in a row, where as a "field" represents a column in the table. The number of columns in a table, is therefore, given by the value of the keyword TFIELDS.
Binary tables, i.e., XTENSION = BINTABLE, have the same interpretation for NAXIS and NAXIS2. In ASCII tables a value in a particular "cell" will be a scalar. But in a binary table this can be a vector. The value of NAXIS1 for a binary table will take this into account.
PyFITS provides several "convenience" functions that allow the user to work with FITS data, without having to deal with opening and closing files. As mentioned before, there are methods available that provide much finer control over accessing FITS data.
In the following examples, we use the FITS file "WFPC2u5780205r_c0fx.fits", which is linked to by the first link in the table athttp://fits.gsfc.nasa.gov/fits_samples.html.
In the code samples below, lines beginning with ">>>" are code entered into the python shell. Only part of long outputs are shown and these are indicated with the string "...".
>>> import pyfits >>> pyfits.info("WFPC2u5780205r_c0fx.fits") Filename: WFPC2u5780205r_c0fx.fits No. Name Type Cards Dimensions Format 0 PRIMARY PrimaryHDU 262 (200, 200, 4) float32 1 U5780205R_CVT.C0H.TAB TableHDU 353 4R x 49C [D25.17, D25.17, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, A1, E15.7, I12, I12, D25.17, D25.17, A8, A8, I12, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, I12, I12, I12, I12, I12, I12, I12, I12, A48, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7, E15.7]
The output shows that the file has two HDUs. The header of the Primary HDU has 262 header records and the data part is a 4x200x200 array, i.e., 4 "pages", 200 rows and 200 columns, of 32 bit floating point numbers. The file has a table extension which has a header of 353 records and the data part has 4 rows and 49 columns. The data format in each column is also given. The total number of 8-bit bytes in one row is 747+49, i.e., 747 bytes indicated by the format strings and one 8-bit separator associated with each column.
Note that while using PyFITS we often refer to the Primary HDU as extension number 0 and the first FITS extension as extension number 1.
The getheader function returns the header of a specified extension. If no extension is given then the header of the Primary HDU is returned.
>>> header_primary = pyfits.getheader("WFPC2u5780205r_c0fx.fits")
The header can be treated as a python dictionary. So to find all keywords in the header we can execute
>>> header.keys() ['SIMPLE', 'BITPIX', 'NAXIS', 'NAXIS1', 'NAXIS2', 'NAXIS3', 'EXTEND', 'COMMENT', 'BSCALE', ...
The keys can be used to access information in the header. Note that the keys are case-insensitive.
>>> header['BITPIX'] -32 >>> header['bitpix'] -32 >>> header['naxis'] 3 >>> header['naxis1'] 200 >>> header['naxis2'] 200 >>> header['naxis3] 4
To get the header of the first extension we can give a number, indicating the extension, to the getheader function. Here, since the table is the first extension we give the number 1. To get the primary header we can either omit the number or give the number 0.
>>> header_table = pyfits.getheader("WFPC2u5780205r_c0fx.fits",1) >>> header_table.keys() ['XTENSION', 'BITPIX', 'NAXIS', 'NAXIS1', 'NAXIS2', 'PCOUNT', 'GCOUNT', 'TFIELDS', 'EXTNAME', ... >>> header_table['naxis'] 2 >>> header_table['tfields'] # A field is a table column. 49 >>> header_table['naxis2'] 4 >>> header_table['naxis1'] 796
If we know that a header has a keyword, and we just want to find its value, we can use the getval function.
>>> pyfits.getval('WFPC2u5780205r_c0fx.fits','bitpix',0) -32 >>> pyfits.getval('WFPC2u5780205r_c0fx.fits','bitpix',1) 8 >>> pyfits.getval('WFPC2u5780205r_c0fx.fits','xtension',1) 'TABLE'
The last line of the output shows that the table in the FITS file is an ASCII table. A binary table will have the value 'BINTABLE' and an image array will have the value 'IMAGE'.
To see the comment associated with a keyword, we will need to access the list of Card Object in the header.
>>> header1_cards = header1.ascardlist() >>> print header1_cards['bitpix'] BITPIX = 8 / Data typex
To get the data part from an HDU we use the getdata function, passing it the name of the FITS file and the HDU from which data needs to be obtained.
>>> data_cube = pyfits.getdata('WFPC2u5780205r_c0fx.fits",0)
To get header along with data set the header option to True.
>>> data_cube, header_data_cube = pyfits.getdata( 'WFPC2u5780205r_c0fx.fits', 0, header=True)
If data in the HDU is an image, then the data returned is a numpy ndarray object .
>>> type(data_cube) <type 'numpy.ndarray'> >>> data_cube.shape (4, 200, 200)
The shape parameter returns the shape of the array as a tuple of the format (NAXIS3, NAXIS2, NAXIS1) or in general (NAXISn, ..., NAXIS1), where n is the value of the keyword NAXIS. This means that NAXIS1 is the number of columns (x axis), NAXIS2 is the number of rows (y axis) and NAXIS3 is the number of pages (z-axes).
>>> import copy >>> data_cube_copy = copy.deepcopy(data_cube) >>> data_cube_copy_page_3 = data_cube_copy >>> import numdisplay >>> numdisplay.display(data_cube) Image displayed with Z1: -4.29414 Z2: 813.35 >>> data_cube_copy_page_3[90:110,:] = 32000.0 >>> numdisplay.display(data_cube_copy_page_3) Image displayed with Z1: -4.29414 Z2: 32000.0 >>> data_cube_copy_page_3[:,20:30] = 32000.0 >>> numdisplay.display(data_cube_copy_page_3) Image displayed with Z1: -4.29414 Z2: 32000.0 >>> data_cube_copy_page_3[:,-30:-20] = 32000.0 >>> numdisplay.display(data_cube_copy_page_3) Image displayed with Z1: -4.29414 Z2: 32000.0
If data in the HDU is a table then an object similar to a numpy record array is returned. As mentioned before, in FITS, what we call a "table column" is referred to as a "field". A "column" in FITS referes to an 8-bit byte in a row, i.e., a position in a row. Hence a "field" will span a range of "columns".
>>> data_table, header_table = pyfits.getdata("WFPC2u5780205r_c0fx.fits", 1, header=True) >>> type(data_table) <class pyfits.core.FITS_rec'> >>> data_table # First row ... >>> data_table # First row, first column 182.63118863080001 >>> data_table.field(0) # first column array([ 182.63118863, 182.62552336, 182.65237923, 182.65002236]) >>> data_table.names # Names of individual columns ... >>> len(data_table.names) # Should be 49, since there are 49 columns 49 >>> data_table.field('crval1') # Data in column named CRVAL1 array([ 182.63118863, 182.62552336, 182.65237923, 182.65002236]) >>> b = data_table.field('backgrnd') >>> b # first row in the column 'backgrnd' -0.36763531
Keys can be easily added to and removed from headers, using the methods provided by the "header object".
>>> del header_table['bitpix'] >>> header_table.has_key('bitpix') False >>> header_table.update('bitpix', 8, comment="Data type in bits.") >>> header_table['bitpix'] 8
To add a key use the update method of the header. This method will add a keyword if that is not present, and will modify the value of the keyword if it is present.
>>> header_table.has_key('avg1') False >>> header_table.update('AVG1', 23.0, "Avg in BOX1") >>> header_table['avg1'] 23.0 >>> header_table.ascardlist()['avg1'] AVG1 = 23.0 / Average in BOX1
The methods, add_comment, add_history and add_blank can be used to insert COMMENT, HISTORY and blank keywords, while the get_comment andget_history methods will retrieve a list of COMMENT and HISTORY keywords in the header. One can use the syntax header['COMMENT'] to get the value of a comment, but since a FITS header can contain more than 1 COMMENT and HISTORY keywords, this will only return the first such keyword. The methods to add the above keywords also allow one to specify where the keywords must be inserted.
>>> header_table.add_comment("A new comment", before="median")
will insert a COMMENT key, just before the key MEDIAN. There is a keyword parameter, after, to do the insertion after a specified keyword.
To create a new FITS file with some data and header information we can use the writeto function, providing it with a filename, some data and a header.
>>> pyfits.writeto(filename, data, header)
>>> pyfits.writeto("WFPC_copy_page_3.fits", data_copy_page_3)
Since a header was not provided, a minimal header will be inserted into the file.
The function append can be used to append an HDU to an already existing FITS file.
>>> pyfits.append(filename, data, header)
The data and header provided will be appended as an HDU extension to the FITS file represented by filename.
Note that in both writeto and append functions, the header is optional. If not given a minimal header will be inserted into the HDU extension.
The function update is used to update an HDU.
>>> pyfits.update(filename, data, header, extension)
Update extension number 3 with provided data and header
>>> pyfits.update(filename, data, header, ext=3)
Update extension 3 with data but do not change the header
>>> pyfits.update(filename, data, ext=3)
Update the extension named 'sci'
>>> pyfits.update(filename, data, ext='sci')
THe following are some of the important keywords defined in the FITS standard.
|SIMPLE||Set to T if the file conforms to the FITS standard and to F if not.|
|BITPIX||Datatype of the data in the HDU.|
|XTENSION||Type of extension: IMAGE, TABLE, BINTABLE.|
|NAXIS||Number of dimensions of a data array. For tables this is always 2.|
|NAXISn||Length of each dimension. Here 1 <= n <= NAXIS. For tables NAXIS1 is the number of table rows.|
|TFIELDS||Number of fields, i.e., table columns, in the table.|
|TFORMn||Format of data in the nth table column. See table 7.3 in page 50 of the FITS Standard.|
|BSCALE||Factor by which data in an image has been scaled.|
|BZERO||Zero point of the data in an image array.|
|TSCALEn||Same as BSCALE, but for table column n.|
|TZEROn||Same as BZERO, but for table column n.|