Contained in this repository is some sample data that we'd like you to take a look at and do a bit of analysis.
test_data.csv
contains a sample of (made-up) data structured as follows.
Variable | Description |
---|---|
date | Date of a data Breach |
org_id | A numerical identifier noting the org associated with the breach |
sector | A string indicating what industry sector the organization operates in |
cause | The category of the cause fo the breach |
cost | The cost of the breach in $ |
- What is a 'typical' number of breaches an organization will experience in a year?
- How many breaches would an organization in the Education sector expect to experience in a year?
- What is a reasonable range for the number of breaches an Education organization would experience in a year?
- Create a breakdown of the frequency of the breach causes by sector similar to Figure 51 in the 2020 DBIR.
- What is a 'typical' cost?
- What is a typical cost for each of the different cause types?
- What is a reasonable range of costs for each cause type?
How would you estimate the total losses an organization might accrue in a single year from multiple breaches? There is no need to actually code this one up if you don't want to, but rather just describe how you'd go about it.