Dictionary Indexing

Last Update: 1/25/2022

Dictionary Indexing

Please note that a much more extensive tutorial for dictionary indexing can be obtained in the Complete Examples section (item 7) of this wiki collection.

Dictionary Indexing, or DI for short, is a technique for diffraction pattern indexing that employs the complete pattern instead of extracting features from the pattern. In the traditional Hough-based indexing for EBSD or TKD patterns, individual Kikuchi band orientations and locations are extracted from the pattern and then processed to extract interplanar or interzonal angles that are then compared against a lookup table. In the DI approach, no feature extraction is performed and complete patterns are compared to simulated patterns; hence the need for a good forward model to predict patterns.

In the sections below, we attempt to describe the complete process of taking experimental patterns and indexing them using the EMDI program; note that in EMsoftOO version 6.0, there is only one single program for the indexing of EBSD, ECP, and TKD patterns. The process of setting up a DI run is somewhat involved since there are a lot of input parameters needed to make things work.

Acquiring and preparing the data

This is the most crucial step for the whole DI approach. As you prepare to acquire your experimental data, you should add one step to your usual procedure, namely the acquisition of a single full size high quality pattern taken from near the center of your region of interest (ROI). This means no binning, and likely a longer exposure time. It is important that you do not change any microscope settings after you acquire this reference pattern; if you do change things, then the detector parameters determined by the fitting routine will be incorrect.

So, the suggested experimental procedure is as follows:

locate your ROI;
set up all the parameters for your data acquisition;
record a full high quality reference pattern (and store it in a separate file);
set the proper binning and exposure time for your data acquisition run;
and execute the run.

You can let the acquisition software index the patterns and generate an .ang or .ctf file, as usual; that way you can compare the DI results with the Hough-based indexing.

Dictionary indexing relies on the availability of experimental patterns, stored in some convenient format (so you must store all the patterns; make sure you have plenty of disk space to do so. EMsoft supports the following data formats:

Binary: this format was originally implemented to convert individual pattern files (jpeg, tiff or bmp) into a single file with extension .data. We do not recommend that this format be used, but if you really have no other way to convert thousands of pattern files to any of the formats below, then this will be your only remaining option. Contact us to get a MatLab script that will generate the binary file for you.
TSLup1 and TSLup2: The EDAX/TSL pattern acquisition software allows you to export patterns in .up1 (1-byte per pattern pixel) or .up2 (2-bytes per pattern pixel) formats; EMsoft can read both file formats.
TSLHDF: Recent versions of the EDAX/TSL acquisition software allow you to export the pattern to an HDF5 formatted file; this is really the preferred way of transporting pattern data, and EMsoft can readily read this format.
EMEBSD: When the EMEBSD program is used with the makedictionary parameter set to 'y', then the output file will be an HDF5 file that can subsequently be read by the indexing program. Patterns intensitites are stored as bytes [0-255].
EMEBSD32i: same as previous but the patterns are stored as 4-byte integers.
EMEBSD32f: same as previous but the patterns are stored as 4-byte floats.
BrukerHDF: Recent versions of the Bruker acquisition program can export HDF5 files that can be read by EMsoft.
NORDIF: support for the NORDIF binary pattern file format.
OxfordHDF: Oxford/Aztec software does not currently allow the user to store the patterns in HDF5 format.
OxfordBinary: EMsoft can read the Oxford .ebsp file format as long as the patterns in that file are not compressed.

Computing the master pattern

For each of the phases present in your sample, you need to compute a master pattern using the EMMCOpenCL program (with the correct sample tilt , microscope voltage, and crystal structure) followed by the EMEBSDmaster program for the actual master pattern. You will use these files as input to the indexing program. Details of these programs can be found in other wiki pages; there is also a worked out example available.

Fitting the detector parameters

This step is very important and makes use of the reference pattern that you recorded earlier. Details of the process are described on a separate wiki page.

Set up the DI parameters

The EMEBSDDI program takes a lot of parameters via the usual name list mechanism; execute the following command:

EMEBSDDI -t

to generate the template file, which will have the following content (broken up into blocks with explanations for clarity):

 &DIdata
! The line above must not be changed
!
! The values below are the default values for this program
!
!###################################################################
! INDEXING MODE
!###################################################################
!
! 'dynamic' for on the fly indexing or 'static' for pre calculated dictionary
 indexingmode = 'dynamic',
!

The indexingmode can take two values: dynamic or static; in static mode, the program will use an existing dictionary file created by the EMEBSD program. This mode is only recommended if you have a lot of data sets to index and they all have the same detector parameters (for instance, multiple slices from a FIB experiment). This requires a computer with a lot of memory (many tens of Gb of RAM, and lots of disk space). In the dynamic indexing mode, the dictionary patterns will be generated on-the-fly during the indexing process (and they are never stored on disk).

!###################################################################
! DICTIONARY PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! do you want Email or Slack notification when the run has completed?
 Notify = 'Off',
! width of data set in pattern input file
 ipf_wd = 100,
! height of data set in pattern input file
 ipf_ht = 100,
! define the region of interest as x0 y0 w h;  leave all at 0 for full field of view
! region of interest has the point (x0,y0) as its lower left corner and is w x h patterns
 ROI = 0 0 0 0,
! X and Y sampling step sizes
 stepX = 1.0,
 stepY = 1.0,
! number of top matches to keep from the dot product results
 nnk = 50,
! number of top matches to use for orientation averaging
 nnav =  20,
! number of top matches to use for Orientation Similarity Map computation
 nosm = 20,
! to use a custom mask, enter the mask filename here; leave undefined for standard mask option
 maskfile = 'undefined',
! mask or not
 maskpattern = 'n',
! mask radius (in pixels, AFTER application of the binning operation)
 maskradius = 240,
! hi pass filter w parameter; 0.05 is a reasonable value
 hipassw = 0.05,
! number of regions for adaptive histogram equalization
 nregions = 10,

The parameters in this block are common to dynamic and static indexing modes. Since the final output of indexing is usually an inverse pole figure (ipf) map, you must specify the ipf width and height (in pixels) for the complete data set; let's say that our data region is 600x400 pixels. You can then select a sub-region via the ROI parameter, which has four integers; if all integers are set to 0, then the complete 600x400 ipf is indexed. If the integers are 60 100 200 200, then a square area of 200x200 pixels is selected with the upper left corner located at the point (60,100). The sampling step size is next and is specified in microns. The next three integers (nnk, nnav, and nosm) define, respectively, how many of the top matches should be kept in the output file (typically 30-50 would be ok); how many of the top matches should be used to generate an IPF with orientations averaged over the top nnav matches (to be deprecated); and how many top matches should be used to generate the orientation similarity map (OSM). Then the user can specify the filename for an optional mask file; this is an experimental option in which one can define an arbitrary mask to be applied to the patterns before indexing. For details of the file format, see at the bottom of this wiki page. If maskpattern is set to 'y', then a circular mask of radius maskradius will be applied before indexing; this can be used to exclude the outer portion of the patterns. Finally, the hipassw and nregions parameters define the preprocessing parameters for the high pass filtering and adaptive histogram equalization steps that all patterns (both experimental and simulated) undergo before indexing. See the manual page for the EMDIpreview program for an explanation on how to determine these parameters.

!###################################################################
! OPTIONAL: use non-local pattern averaging  [NLPAR]
!###################################################################
!
! set to .TRUE. in order to use NLPAR as part of the pattern preprocessing
! this means the preprocessing will take longer but the indexing results will
! be better for noisy data sets
 doNLPAR = .FALSE.,
! the NLPAR search window will have size (2*sw+1) by (2*sw+1)
 sw = 3,
! weight decay parameter
 lambda = 0.375,

This section deals with non-local pattern averaging, which may be useful for very noisy data sets. The details of this approach can be found in the following paper: PT Brewick, SI Wright, and DJ Rowenhorst, "NLPAR: Non-local smoothing for enhanced EBSD pattern indexing", Ultramicroscopy 200:50-61 (2019)

!###################################################################
! ONLY SPECIFY WHEN INDEXINGMODE IS 'DYNAMIC'
!###################################################################
!
! =============================
! ==== FOR ALL MODALITIES 
! number of cubochoric points to generate list of orientations
 ncubochoric = 100,
! intensity scaling mode 'not' = no scaling, 'lin' = linear, 'gam' = gamma correction
 scalingmode = 'not',
! gamma correction factor
 gammavalue = 1.0,

! =============================
! ==== FOR EBSD/TKD MODALITIES 
! distance between scintillator and illumination point [microns]
 L = 15000.0,
! tilt angle of the camera (positive below horizontal, [degrees])
 thetac = 10.0,
!=================================================================================================!
! The following three parameters require some careful consideration.                              ! 
! Please check the wiki EMEBSDDI help page for detailed information on how to set these parameters!
! This change was implemented starting with EMsoft 5.0.3.                                         !
!=================================================================================================!
! CCD pixel size on the scintillator surface [microns]; this refers to effective pixel size AFTER BINNING
 delta = 50.0,
! pattern center coordinates in units of pixels AFTER BINNING
 xpc = 0.0,
 ypc = 0.0,
! size of the *Experimental* patterns in pixels, and the binning factor to be used.
! Note that the binning factor is *only* applied to the experimental patterns. The 
! dictionary patterns will be computed directly for the binned pattern size. 
 exptnumsx = 640,
 exptnumsy = 480,
 binning = 1, 
! size of the *Dictionary* patterns in pixels; this will be set to be equal to the size
! of the experimental patterns divided by the binning factor by the EMDI program.
! You can set the next two parameters to the correct values if you wish, but these 
! parameters will be overwritten by the program; you can also comment out these two lines. 
 numsx = 640,
 numsy = 480,
! angle between normal of sample and detector
 omega = 0.0,
! minimum and maximum energy to use for interpolation [keV]
 energymin = 10.0,
 energymax = 20.0,
! incident beam current [nA]
 beamcurrent = 150.0,
! beam dwell time [micro s]
 dwelltime = 100.0,

! =============================
! ==== FOR ECP MODALITY 
! size of output pattern in pixels (image is always square npix x npix)
 npix = 256,
! half angle of cone for incident beams (degrees)
 conesemiangle = 5.0,
! working distance [in mm]
 workingdistance = 13.0,
! inner radius of annular detector [in mm]
 Rin = 2.0,
! outer radius of annular detector [in mm]
 Rout = 6.0,

In this block we define the detector parameters and the orientation sampling. The ncubochoric parameter defines the angular step size in orientation space; typically aa value of 100 will produce good results. The detector parameters for EBSD and TKD modes are L (distance to detector), thetac (detector tilt from vertical), CCD pixel size, the number of pixels along x and y, the pattern center in units of pixel size (for definition, see the EBSD patterns simulation wiki page), omega (sample misalignment along RD axis), the energy range to be used in the pattern interpolation, beam current and dwell time (values don't really matter for indexing, as long as they are both non-zero), binning, scalingmode (typically you would use gamma scaling), and the gamma value (0.33 is a good value).

In EMsoft version 5.0.3, two new parameters were introduced in the namelist file; the new parameters exptnumsx and exptnumsy take over the role of the old numsx and numsy parameters which are now obsolete (but still present in the file). The reasoning behind this is that one often has large experimental patterns, say 1244x1024, but the indexing should be done on smaller patterns, say with 8x binning. The dictionary pattern size will automatically be set to (exptnumsx,exptnumsy)/binning so that the pattern generation occurs at maximal speed for the indexing size.

In addition to these changes, the pattern center coordinates (xpc, ypc) and detector pixel size delta also need to be modified. Let's assume that the PC coordinates were determined for the full size pattern before binning; to account for the binning, the pixel size delta must be multiplied by binning, and the (xpc,ypc) values need to be divided by binning.

!###################################################################
! INPUT FILE PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! name of datafile where the patterns are stored; path relative to EMdatapathname
 exptfile = 'undefined',
! input file type parameter: Binary, EMEBSD, TSLHDF, TSLup1, TSLup2, OxfordHDF, OxfordBinary, BrukerHDF, NORDIF
 inputtype = 'Binary',
! here we enter the HDF group names and data set names as individual strings (up to 10)
! enter the full path of a data set in individual strings for each group, in the correct order,
! and with the data set name as the last name; leave the remaining strings empty (they should all
! be empty for the Binary and TSLup1/2 formats)
 HDFstrings = '' '' '' '' '' '' '' '' '' '',
!

Next we have information about the pattern input file. There are several types (described above) and the correct type should be entered in the inputtype variable. The filename goes in the exptfile parameter (along with the appropriate partial path). If the input file is an HDF5 file, then you must define the complete path inside this file. For instance, if the pattern data set is called EBSDpatterns, and it is located inside a nested group Scan 1/data/EBSD, then you would enter four strings for HDFstrings: 'Scan 1', 'data', 'EBSD', and the last one is the data set name 'EBSDpatterns'. Note that these strings are all case sensitive, so make sure you get them right. You can use the HDFView program from the HDF Group to figure out what the correct strings are. Leave the other strings (there are 10 in total) empty.

!###################################################################
! OTHER FILE PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! temporary data storage file name ; will be stored in $HOME/.config/EMsoft/tmp
 tmpfile = 'EMEBSDDict_tmp.data',
 keeptmpfile = 'n',
! output file ; path relative to EMdatapathname
 datafile = 'undefined',
! ctf output file ; path relative to EMdatapathname
 ctffile = 'undefined',
! ang output file ; path relative to EMdatapathname
 angfile = 'undefined',
! euler angle input file
 eulerfile = 'undefined'

This block defines where all the results and temporary files will be kept. The indexing program uses a temporary file with the pre-processed patterns in the standard tmp folder (usually in the .config/EMsoft/tmp folder in your user home directory). You need to define the name of this temporary file in the tmpfile variable (no path necessary); it is important to pick a unique name if you are running multiple simultaneous indexing runs. You can keep the file is you want by setting keeptmpfile to 'y'. The indexing output is stored in two files: datafile is an HDF5 output file that has all the program output in it, whereas ctffile is a standard Oxford .ctf output file that can be read by most EBSD analysis programs. You also have the option to output an EDAX/TSL .ang file instead of or in addition to the .ctf file. If you set the eulerfile parameter to anything other than 'undefined', then the program will use the orientations in that file instead of the cubochoric sampling of orientations controlled by the ncubochoric parameter. This can be useful if you know that all the orientations are clustered around some orientation; you can then use the EMsampleRFZ program to generate a uniform sampling around that orientation instead of sampling the complete Rodrigues fundamental zone.

!###################################################################
! ONLY IF INDEXINGMODE IS STATIC
!###################################################################
!
 dictfile = 'undefined',
!

In static indexing mode, this is where you define the file that has the complete dictionary in it (generated by the EMEBSD program). Dictionary files can get very, very large, so be careful that you have enough RAM if you decide to use static indexing. It can be useful for serial sectioning data sets, where you use the same dictionary for all consecutive slices. In our experience, it is usually best to use the dynamic indexing mode.

!###################################################################
! ONLY IF INDEXINGMODE IS DYNAMIC
!###################################################################
!
! master pattern input file; path relative to EMdatapathname
 masterfile = 'undefined',
!

In this block you define the master pattern file from which all the dictionary patterns are computed.

!
!###################################################################
! IF REFINEMENT IS NEEDED ...
!###################################################################
!
! enter the name of the nml file for the EMFitOrientation program
 refinementNMLfile = 'undefined',

For convenience, you can start the EMFitOrientation immediately following the DI run by setting this parameter equal to the file name of the name list file for EMFitOrientation.

!###################################################################
! SYSTEM PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! number of dictionary files arranged in column for dot product on GPU (multiples of 16 perform better)
 numdictsingle = 1024,
! number of experimental files arranged in column for dot product on GPU (multiples of 16 perform better)
 numexptsingle = 1024,
! number of threads for parallel execution
 nthreads = 1,
! platform ID for OpenCL portion of program
 platid = 1,
! if you are running EMEBSDDI, EMECPDI, EMTKDDI, then define the device you wish to use 
 devid = 1,
! if you are running EMEBSDDImem on multiple GPUs, enter their device ids (up to eight) here; leave others at zero
 multidevid = 0 0 0 0 0 0 0 0,
! how many GPU devices do you want to use?
 usenumd = 0,
 /

This final block controls the computational resources. Dictionary indexing requires a GPU (graphical processing unit); use the EMOpenCLinfo program to figure out the platform and device IDs for the GPU that you intend to use for indexing. In the present version of the code, only one single GPU can be used, but the namelist file already allows for multiple devices. If the GPU you want to use is part of platform 2, and is device number 4 (you should be so lucky...) then set platid to 2 and devid to 4; also, put usenumd to 1 and the first entry of multidevid to the same number as devid. The nthreads parameter defines how many CPU cores (threads) you wish to use for the pattern computations; the GPU takes care of computing the pattern dot products while the threads do their thing. Finally, the numdictsingle and numexptsingle parameters define how big the memory chunks are that the program will send to the GPU; for optimal performance, this number must be a multiple of 16. If you set these parameters too large, then the GPU will not have sufficient global memory to perform the computations, and the program will likely abort with an early error message. So it can take a bit of experimenting to figure out what the best values are; it is suggested that you keep both numbers set to the same value. If your pattern size is 640 by 480, then the patterns will be organized as 1D vectors of length 640x480=307,200 and the GPU will receive two arrays of single precision floating point numbers of dimensions 307,200 by numdictsingle.

Note regarding the temporary files

It should be noted that the temporary pattern files that are generated by the indexing program can become very large, in some cases tens to hundreds of Gb, depending on your pattern size and how many patterns you have (obviously). If for some reason the indexing program aborts, or you decide to cancel the run, then this file will likely not be deleted. So, sporadically, you may want to make sure that the $HOME/.config/EMsoft/tmp folder on your drive is emptied, just so you won't run out of disk space. It is very easy to fill entire hard drives with these indexing runs...

Format for optional mask file

You can define your own pattern mask by generating a text file with the mask defined by strings of 1s and 0s. Let's assume that your patterns are 16 by 16 pixels (unlikely to happen, but this is just an example); then you generate a text file that has 16 strings of 16 characters each, with character 0 meaning no intensity will be allowed in that pixel, and 1 the opposite. So, your text file will look something like this:

0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000
1111111111111111
1111111111111111
1111111111111111
1111111111111111
0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000

This mask consists of a horizontal and a vertical band, each 4 pixels wide. You can play around with these masks to see how much pattern information you can remove and still be able to index the patterns. Obviously, the mask must have the correct size for your patterns (after binning).

Executing the indexing program

Once you have set up the name list file, you can run the program in the usual way:

EMEBSDDI inputfile.nml

Indexing runs can take a long time, and produce only a little bit of output; the dictionary is divided into chunks of numdictsingle patterns, and the GPU then computes the dot products between all experimental patterns and this chunk of the dictionary. When that is completed, a single line of output is produced that shows the largest dot product found for this chunk. Every ten chunks, an update of the 'time remaining' is shown as well. The program also generates an IPF-Z map using the current best fit orientations for all the experimental patterns; you can monitor that file to make sure that things are evolving properly. At the end of the run, all the requested output files are generated.

At this point in time, it is not possible to interrupt the program and have it restart from where it was interrupted; this would obviously be a useful feature to have, but it is actually rather difficult to implement, given the complexity of the GPU + multi-threads coding.

Apple Mac OS X and the new M1 processors

We have not yet tried to build EMsoftOO on this platform, but if you would like to try it, please do so and let us know of any problems you run in to. In the long run, Apple will no longer support OpenCL, which is the language that drives the GPU card for the indexing dot products. It is not clear at this point in time (January 2022) how we will deal with the disappearance of OpenCL.

Wiki pages are maintained by M. De Graef; they are part of the EMsoftOO package and fall under the same copyright (BSD2).

Information for Users

Home

SEM Modalities

- Monte Carlo Simulations
- EBSD Master Pattern Simulations
- EBSD Depth Master Pattern Simulations
- TKD Master Pattern Simulations
- ECP Master Pattern Simulations
- Overlap Master Patterns
- EBSD Pattern Simulations
- ECP Pattern Simulations
- TKD Pattern Simulations
- Dictionary Indexing
- EMHROSM
- EBSD Spherical Indexing
- EBSD Reflector Ranking
- Ion-induced Secondary Electron Master Pattern
- ECCI Defect Image Simulations
- 4DEBSD

TEM Modalities

- HH4
- PED
- CBED Pattern Simulations
- STEM-DCI Image Simulations
- EMIntegrateSTEM utility

XRD Modalities

- Laue Master Pattern Simulation
- EMLaue
- EMLaueSlit

General Parameter Definitions

* Foil Defect Configuration Definitions

Utility Programs

- EMConvertOrientations
- EMDisorientations
- EMHOLZ
- EMKikuchiMap
- EMOpenCLinfo
- EMZAgeom
- EMcuboMK
- EMdpextract
- EMdpmerge
- EMdrawcell
- EMeqvPS
- EMeqvrot
- EMfamily
- EMGBO
- EMGBOdm
- EMgetEulers
- EMgetOSM
- EMlatgeom
- EMlistSG
- EMlistTC
- EMmkxtal
- EMorbit
- EMorav
- EMorient
- EMqg
- EMsampleRFZ
- EMshowxtal
- EMsoftSlackTest
- EMsoftinit
- EMstar
- EMstereo
- EMxtalExtract
- EMxtalinfo
- EMzap

IDL Scripts

- Virtual Machine Apps
- SEMDisplay
- Efit
- CBEDDisplay

python wrappers

- python examples

Complete Examples

Information for Developers

Dictionary Indexing

Last Update: 1/25/2022