PLTTools for Spark

A spark package for processing RMS ELT (Event Loss Table) and PLT (Period Loss Table) data.

This is largely based on the work found in the plttools repo (MIT License).
Since we use spark for data processing, the code structure is somewhat different than what is found in the plttools repo.

Building and Running

To build and run, you will need to install

Java 8,
Scala 2.11.12,
Spark 2.4.5, and
sbt 1.3.13,

or alternatively a cloud service (e.g., EMR) with this software.

A Dockerfile has been provided for demonstration purposes.
To build the docker image, run

make build-docker
# this builds a docker image with name and tag plttools-spark:latest

To run the simple example that's been set up in the main class, use

make run-docker

Test cases can be run with

make test-docker

Using as a Spark Package for PLT Statistics

We describe how to use this package to calculate statistics for PLT records.

We assume that a Spark DataFrame df has been defined with the following fields:

Field Name	Type	Description
SubportfolioId	String	A subportfolio identifier used for grouping in the calculations of the statistics
PeriodId	Integer	A period identifier
EventId	Integer	The event identifier
EventDate	Timestamp	The date of the simulated event
Weight	Double	The weight associated with the period
Loss	Double	The loss associated with the period and event

To calculate Average Annual Loss (AAL) and the Standard Deviation, use the following

spark> val stats = PLT.calculateStatistics(df)
spark> stats.show(false)

stats is a DataFrame containing the PLT statistics AAL, Standard Deviation, and Coefficient of Variation by subportfolio ID.

To calculate the tail statistics, one can use something like

spark> val rps = EPCurve.RETURN_PERIODS.toSeq.map(_.toDouble) // Or: Array[Double](...) for custom values
spark> val numberOfPeriods = 50000.0 // Specify the total number of periods considered
spark> val zeroLossRecordWeight =  1.0 / numberOfPeriods
spark> val pmlsdf = PLT.calculateCombinedPMLForReturnPeriodsBySubportfolio(df, rps, Some(zeroLossRecordWeight))
spark> // print
spark> pmlsdf.orderBy(col("SubportfolioId"), col("ReturnPeriod").desc).show(false)

The resulting DataFrame pmlsdf contains the Occurrence Probable Maximum Loss (OPML) and Aggregate Probable Maximum Loss (APML) by subportfolio ID and return period.
We use the terminology Probable Maximum Loss as described in [1] here as we are looking at loss levels associated with return periods.
Note that in the documentation from RMS and in the code repository referenced above, RMS often extends the use of the term Exceedance Probability to describe these analytics as well (i.e., you might see these analytics referred to as OEP and AEP in RMS documents).

For anyone interested in using other methods found in this package for calculating statistics for ELTs or PLTs, or for calculating EP curves for PLTs, the test cases might prove instructive.

References

David Homer and Ming Li, Notes on Using Property Catastrophe Model Results, Casualty Actuarial Society E-Forum, Spring 2017-Volume 2

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
project		project
src		src
.gitignore		.gitignore
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
Makefile		Makefile
NOTICE.txt		NOTICE.txt
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLTTools for Spark

Building and Running

Using as a Spark Package for PLT Statistics

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PLTTools for Spark

Building and Running

Using as a Spark Package for PLT Statistics

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages