CIF2MTZ (CCP4: Supported Program)¶
NAME¶
cif2mtz - Convert an mmCIF reflection file to MTZ format
SYNOPSIS¶
DESCRIPTION¶
CIF2MTZ is a program to convert an mmCIF reflection file to MTZ format. mmCIF reflection files are typically obtained from the Protein Data Bank. There are examples below for some representative PDB entries.
In practice, mmCIF reflection files from the PDB can have a wide variety of data item names and contents. The program will attempt to identify quantities correctly, but you should always check the resulting MTZ file. Keywords are provided to supply missing information or to help the program make choices.
There are a large number of mmCIF reflection files in the PDB which contain the PDB coordinate file header as a comment block. In particular, cell and symmetry information is held as the CRYST1 line, rather than using the correct mmCIF categories. The program will therefore look for a CRYST1 line, as well as the mmCIF categories. If neither are present, then cell and symmetry information must be provided via keywords.
Note: CIF2MTZ works with the macromolecular CIF format “mmCIF” which is substantially different from the original “CIF” format. The latter is usually used within small molecule crystallography, but you may come across it with SHELX. Small molecule “CIF” format has a different syntax (i.e. it is based on DDL1 rather than DDL2) and so cannot easily be read from CIF2MTZ. If you don’t know what format a file is in, look for _cell.length_a (with period: mmCIF) or _cell_length_a (with underscore: CIF).
KEYWORDED INPUT¶
Possible keywords are:
All keywords are optional. The program will read in data, such as symmetry, from the mmCIF file if it is there. The keywords can be used to provide missing information, or to override existing information.
TITLE <title>¶
Put a suitable title in the output MTZ file.
CELL <a> <b> <c> [ <alpha> <beta> <gamma> ]¶
Followed by the cell lengths and angles.
SYMMETRY <spacegroup>¶
Followed by the standard space group name or number, or explicit symmetry operators.
LABOUT <program label>=<file label> …¶
The program currently recognises the following mmCIF item names for reflection data:
_refln.index_h H h index
_refln.index_k K k index
_refln.index_l L l index
_refln.status FREE free R flag
_refln.F_meas_au _refln.F_meas FP structure factor
_refln.F_meas_sigma_au _refln.F_meas_sigma SIGFP sigma(F)
_refln.F_calc_au _refln.F_calc FC calculated SF
_refln.phase_calc PHIC calculated phase
_refln.phase_meas PHIB experimental phase
_refln.fom _refln.weight FOM figure of merit
_refln.intensity_meas _refln.F_squared_meas I intensity
_refln.intensity_sigma _refln.F_squared_sigma SIGI sigma(I)
_refln.F_part_au FPART partial structure factor
_refln.phase_part PHIP partial phase
_refln.pdbx_F_plus F(+)
_refln.pdbx_F_plus_sigma SIGF(+)
_refln.pdbx_F_minus F(-)
_refln.pdbx_F_minus_sigma SIGF(-)
_refln.pdbx_anom_difference DP
_refln.pdbx_anom_difference_sigma SIGDP
_refln.pdbx_I_plus I(+)
_refln.pdbx_I_plus_sigma SIGI(+)
_refln.pdbx_I_minus I(-)
_refln.pdbx_I_minus_sigma SIGI(-)
_refln.pdbx_HL_A_iso HLA HL coefficient A
_refln.pdbx_HL_B_iso HLB HL coefficient B
_refln.pdbx_HL_C_iso HLC HL coefficient C
_refln.pdbx_HL_D_iso HLD HL coefficient D
An MTZ column is output for each mmCIF item found. The default column name <program label> is given in the middle column above, but the LABOUT keyword can be used to rename these columns.
With the ANOMALOUS option, there are additional columns F(+) SIGF(+) F(-) SIGF(-) which can be renamed by LABOUT.
Note: the mmCIF file may contain alternative labels, e.g. _refln.F_meas rather than _refln.F_meas_au. Some alternative labels will be recognised, see table above. Otherwise it is sufficient to edit the label name directly in the mmCIF file to one of the above labels.
NAME PROJECT <pname> CRYSTAL <xname> DATASET <dname>¶
[Note that the keywords PNAME <pname>, XNAME <xname> and DNAME <dname> are also available, but the NAME keyword is preferred.]
Specify the project, crystal and dataset names for the output MTZ file. <pname> and <xname> are taken from _entry.id if present in the mmCIF file, and <dname> is taken from _diffrn.id if present in the mmCIF file. If the mmCIF file does not contain _entry.id and _diffrn.id, then it is strongly recommended that this information is given. Otherwise, the default project, crystal and dataset names are “unknown”, “unknown” and “unknownddmmyy” respectively.
The project-name specifies a particular structure solution project, the crystal name specifies a physical crystal contributing to that project, and the dataset-name specifies a particular dataset obtained from that crystal. All three should be given.
BLOCK <blockname>¶
ANOMALOUS¶
Warning: this option will work with mmCIF such as are output by MTZ2VARIOUS, where the -h-k-l reflection immediately follows the corresponding hkl, and where hkl reflections are in the CCP4 asymmetric unit. It will fail in other cases. Without the ANOMALOUS option, all reflections will be passed unchanged to HKLOUT.
Note also: this option deals with the case where anomalous pairs exist as different reflection rows. This is different (and incompatible with) the case where anomalous pairs exist as different columns, e.g. _refln.pdbx_F_plus and _refln.pdbx_F_minus
STATUS XPLO | CCP4¶
This is the convention used in the input mmCIF file. The MTZ file always adheres to the CCP4 convention.
END¶
End keyworded input.
ERROR MESSAGES¶
- “CCIF signal CCIF_PARTLOOP” / “Attempt to process loop with incomplete loop packet”
The file should contain a table of reflection data such that the total number of items is divisible by the number of columns. If the mmCIF file is badly formatted, two numbers may run together, reducing the apparent number of data items. This shouldn’t happen with files from the PDB, but may happen after local processing. If you get this error message, you need to check through the data careful looking for such mistakes.
EXAMPLES¶
Structure factors and their sigmas from 1gme:
cif2mtz hklin r1gmesf.ent hklout 1gme.mtz <<eof
END
eof
Another example of diffraction data, this time containing squared structure factors (assumed to be intensities) and calculated structure factors. The file only contains the list of reflections, so additional information must be supplied:
cif2mtz hklin r1d9ysf.ent hklout 1d9y.mtz <<eof
CELL 40.583 111.009 140.423 90.00 90.00 90.00
SYMM C2221
NAME PROJ MBP CRYS apoprotein DATA native
END
eof
Finally, 1gr5 contains structure factors and phases from electron microscopy data:
cif2mtz hklin r1gr5sf.ent hklout 1gr5.mtz <<eof
END
eof
Note that this file contains dummy cell dimensions and symmetry, and it may be convenient to set these with the CELL and SYMM keywords.