An unofficial replacement for the (now defunct) official FPDS XML conversion utility. Converts one or more FPDS data archives to a SQLite3 database.
This project intends to provide an unofficial replacement to the FPDS XML archive conversion utility. It addresses several of the issues/limitations (identified below) with the existing, but now discontinued, utility. This conversion utility will convert one or more FPDS XML archives to a SQLite3 database and provides support for FPDS Specification Versions 1.4 and 1.5.
The Federal Procurement Data System (FPDS) houses procurement/spend data for the U.S. Government. Archives of annual spend data are available in XML format for each Federal agency. Previously, the General Services Administration (GSA; the Federal agency who manages FPDS) published an XML conversion utility for converting an FPDS data archive into a pipe-separated [flat] file. The conversion utility was useful since the converted data could easily be imported into spreadsheet software and statistical packages, which increases the accessibility of Federal procurement data to taxpayers, agencies, suppliers and other parties.1
Support for the GSA's XML conversion utility was discontinued in 2009/2010 and no official replacement has been published. While the utility is still available for download, several issues preclude its use:
-
Data complexity has increased such that a single flat file may no longer properly and efficiently represent relationships between data elements (e.g., in cardinality)
-
For several agencies, the quantity of data may exceed the limits of commonly-used spreadsheet software (e.g., Microsoft Excel, LibreOffice Calc) and/or preclude usability of pivot tables and other tools for summarizing/aggregating data.
-
Support for conversion of XML archives ended with FPDS Specification Version 1.3. Archives are no longer posted in this version. Version 1.3 was deprecated on December 31, 2010, and replaced with Version 1.4. Version 1.4 was deprecated on September 30, 2017, and replaced with Version 1.5
Agency archives can be obtained from https://www.fpds.gov.
This utility can be built using CMake >= 3.14. Obtain the necessary depedencies, for example on Debian/Ubuntu:
$ sudo apt-get -y install \
build-essential \
cmake \
liblzma-dev \
libncurses-dev \
libsqlite3-dev \
libxml2 \
libxslt1-dev \
uuid-dev \
xxd \
zlib1g-dev
Verify that you have CMake >= 3.14 using cmake --version
and then build:
$ git clone --recurse-submodules https://github.com/wamuir/fpds-conversion-utility
$ mkdir build && cmake -S fpds-conversion-utility -B build && cmake --build build
The compiled executable will be at build/app/conversion-utility
.
Given an FPDS XML archive archive.xml
the utility can be run as:
conversion-utility xml_archive sqlite3_target
Multiple XML archives can be combined into a single SQLite database by invoking
the append (-a
) flag:
./conversion-utility archive1.xml bundle.sqlite3
./conversion-utility -a archive2.xml bundle.sqlite3
./conversion-utility -a archive3.xml bundle.sqlite3
And an existing database can be overwritten by invoking the overwrite (-o
)
flag:
./conversion-utility -o archive.xml db.sqlite3
This utility implements a streaming XML parser to limit memory usage, which is especially useful for converting large archives. The conversion rate is generally greater than 100 records per second (machine dependent).
To make use of multiple threads, pass the (-t
) option with the desired number of
threads, such as:
./conversion-utility -t 4 archive.xml target.sqlite3
By default only a single thread is used (equivalent to -t 1
). The optimal value
for t
depends on machine characteristics such as the processor and io.
SQLite3 Table | Cardinality | FPDS Element Group | FPDS Elements |
---|---|---|---|
additionalReporting | one-to-many | Legislative Mandates | 7G |
documentID | one-to-one | Contract Identification Information | 1A-1D 1F-1H |
competitionInformation | one-to-one | Competition Information | 10* |
contractInformation | one-to-one | Contract Information | 6A-6H 6J-6N 6P-6R 6T |
contractMarketingData | one-to-one | Contract Marketing Data | 5* |
contractorDataA | one-to-one | Contractor Data | 9* |
contractorDataB | one-to-one | Contractor Data | 13* |
dates | one-to-one | Dates | 2* |
dollarValues | one-to-one | Dollar Values, Total Dollar Values | 3* 3T* |
legislativeMandates | one-to-one | Legislative Mandates | 7A-7F |
preferencePrograms | one-to-one | Preference Programs | 11* |
productOrServiceInformation | one-to-one | Product or Service Information | 8* |
purchaserInformation | one-to-one | Purchaser Information | 4* |
soliciationID | one-to-one | Contract Identification Information | 1E |
transactionInformation | one-to-one | Transaction Information | 12* |
treasuryAccount | one-to-many | Contract Information | 6SC, 6SG, 6SH, 6SI |
- Information on data elements can be found within the FPDS data dictionary,
available at fpds.gov. For each element, the
corresponding column name in the Sqlite3 database is identical to the
XML Tag Name
within the data dictionary.
- Two views are provided for ease of working with the data
-
Identifies document id (integer and primary key), document type (award, IDV) and contract identifiers (Agency ID, PIID, Modification Number, Transaction Number, etc.)
-
This view is created by the conversion utility, as:
CREATE VIEW IF NOT EXISTS documentID AS
SELECT record.id AS id, record.docType as docType,
awardContractID.agencyID AS awardContractAgencyID,
awardContractID.PIID AS awardContractPIID,
awardContractID.modNumber AS awardContractModNumber,
awardContractID.transactionNumber AS awardContractTransactionNumber,
IDVID.agencyID AS IDVAgencyID,
IDVID.PIID AS IDVPIID,
IDVID.modNumber AS IDVModNumber,
referencedIDVID.agencyID AS referencedIDVagencyID,
referencedIDVID.PIID AS referencedIDVPIID,
referencedIDVID.modNumber AS referencedIDVmodNumber
FROM record
LEFT JOIN awardContractID ON record.id = awardContractID.id
LEFT JOIN IDVID ON record.id = IDVID.id
LEFT JOIN referencedIDVID ON record.id = referencedIDVID.id;
-
For potential use when importing, exporting or other instances where a fact table might come in handy
-
This view is created by the conversion utility, as:
CREATE VIEW fact AS
SELECT *
FROM documentID
LEFT JOIN competitionInformation on documentID.id = competitionInformation.id
LEFT JOIN contractInformation on documentID.id = contractInformation.id
LEFT JOIN contractMarketingData on documentID.id = contractMarketingdata.id
LEFT JOIN contractorDataA on documentID.id = contractorDataA.id
LEFT JOIN contractorDataB on documentID.id = contractorDataB.id
LEFT JOIN dates on documentID.id = dates.id
LEFT JOIN dollarValues on documentID.id = dollarValues.id
LEFT JOIN legislativeMandates on documentID.id = legislativeMandates.id
LEFT JOIN preferencePrograms on documentID.id = preferencePrograms.id
LEFT JOIN productOrServiceInformation on documentID.id = productOrServiceInformation.id
LEFT JOIN purchaserInformation on documentID.id = purchaserInformation.id
LEFT JOIN solicitationID on documentID.id = solicitationID.id
LEFT JOIN transactionInformation on documentID.id = transactionInformation.id;
A simple example is given below for importing data into R:
#!/usr/bin/env Rscript
conn <- DBI::dbConnect(RSQLite::SQLite(), dbname="archive.sqlite3")
query <- DBI::dbSendQuery(conn, "SELECT documentID.*, dollarValues.obligatedAmount
FROM documentID
LEFT JOIN dollarValues ON documentID.id = dollarValues.id
WHERE documentID.docType = 'award';")
df <- DBI::dbFetch(query, n = -1)
DBI::dbClearResult(query)
DBI::dbDisconnect(conn)
And, for importing the same data into Python:
#!/usr/bin/env python3
import sqlite3
conn = sqlite3.connect("archive.sqlite3")
c = conn.cursor()
c.execute('''SELECT documentID.*, dollarValues.obligatedAmount
FROM documentID
LEFT JOIN dollarValues ON documentID.id = dollarValues.id
WHERE documentID.docType = 'award';''')
df = c.fetchall()
conn.close()
Ideally, don't do this. If you wish to flatten and export data, a flat table view is provided of one-to-one relationships and can be exported as follows:
.open archive.sqlite3
.headers on
.mode csv
.output exported.csv
SELECT * FROM fact;
-
Currently, no support for(Other Transaction supported added 2019-05-19)other transactions
-
Currently, no support for agency-specific (e.g., NASA) data elements
1 Specifically, this refers to the accessibility of
sets of data for analyses. Individual transactions can be searched/queried at
fpds.gov. Data is also available via ATOM Feed as well
as aggregator sites (e.g., usaspending.gov)
but do not resolve one or more of the issues identified or present additional
issues.↩