Skip to content

Commit

Permalink
More examples
Browse files Browse the repository at this point in the history
  • Loading branch information
systay committed Apr 12, 2018
1 parent cbb52a8 commit 7f258be
Showing 1 changed file with 70 additions and 186 deletions.
256 changes: 70 additions & 186 deletions cip/1.accepted/CIP2017-06-18-multiple-graphs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -720,7 +720,76 @@ This is the final step of the entire data integration pipeline, we return this g

image::opencypher-PersonCityCriminalEvents-graph.jpg[Graph,700,550]

// ._The full data integration query pipeline is given by_:

[[data-aggregation-example]]
=== Using a pipeline to perform aggregations and return tabular data and graphs

This example shows how to aggregate detailed sales data within a graph -- in effect, performing a 'roll-up' -- in order to obtain a high-level summarized view of the data, as a graph.
The summarized graph may be used to draw further high-level reports, but may also be used to undertake 'drill-down' actions by probing into the graph to extract more detailed information.

Assume we have the graph *SalesDetail*, representing the sale of products in stores across various regions:

image::opencypher-SalesDetail-graph.jpg[Graph,800,700]

This models the following entities:

* Regions may have many stores.
* Stores:
** A store is identified by a unique `code`.
** A store is contained in exactly one region.
** A store may have multiple orders.
* Products:
** A product is identified by a unique `code`.
** A product has a `RRP` property (Recommended Retail Price).
** A product may appear in one or more orders as a product _item_.
* Sales orders:
** An order is identified by a unique order number, given by `num`.
** The `YYYYMM` property represents the year and month portion of the date of the order.
** An order is associated with exactly one store and contains one or more product items, representing the fact that the product item was sold in the store and is a part of the order.
** The relationship of between an order and a product contains the following properties:
*** `soldPrice`: the price at which the product item was actually sold (usually lower than the product's RRP).
*** `numItemsSold`: the number of the actual product items sold in the order.

The following pipeline will create a summarized graph view of this data and return it.

[source, cypher]
----
[ 0] FROM SalesDetail
[ 1] MATCH (p:Product)-[r:IN]->(:Order)<-[HAS]-(s:Store)
[ 2] WITH p, s, sum(r.soldPrice * r.numItemsSold) AS storeProductTotal
[ 3] CONSTRUCT ON GRAPH CLONE p, s
[ 4] CREATE (p)-[:SUMMARY {totalSales: storeProductTotal}]->(s)
[ 5] WITH p, sum(storeProductTotal) AS productTotal
[ 6] CONSTRUCT ON GRAPH CLONE p
[ 7] CREATE (p)-[:SUMMARY]->(:SUMMARY {totalSales: productTotal})
[ 8] WITH p
[ 9] MATCH (p)-[r:SUMMARY]-(s:Store)-[:IN]-(reg:Region)
[10] WITH s, reg, sum(r.totalSales) AS storeTotal
[11] CONSTRUCT ON GRAPH CLONE s, reg
[12] CREATE (s)-[:SUMMARY]->({totalSales: storeTotal})
[13] WITH reg, sum(storeTotal) AS regionTotal
[14] CREATE (reg)-[:SUMMARY]->({totalSales: storeTotal})
[15] WITH reg
[16] MATCH (reg)<-[:IN]-(:Store)-[summary:SUMMARY]->(p:Product)
[17] WITH r, p, sum(summary.totalSales) as regionProductTotal
[18] CONSTRUCT ON GRAPH CLONE r, p
[19] CREATE (r)-[:SUMMARY {totalSales: regionProductTotal}]->(p)
[20] RETURN GRAPH
----


We start by specifying that we are working on SalesDetails [0], and then find all orders and which store they were created in [1].
The next step is to sum up all sales grouped by the product and the store [2]. Next, we start building up the summary graph by cloning the detail graph and adding a summary relationship directly between the Product and the Store, not going throught the order node.

Next up, we aggregate up all sales by product [5], and use this information to construct a graph [6] and add a summary relationship to the product node [7].

So far, we have been using the matches from the first MATCH[0], but now it's time to drop the incoming driving table [8] and start matching[9] from scratch again. We are matching for the summary relationships we added in [4] between stores and products, and using this to

// TODO: Finish explaining this example



//
Expand Down Expand Up @@ -796,191 +865,6 @@ image::opencypher-PersonCityCriminalEvents-graph.jpg[Graph,700,550]
//

//
// [[data-aggregation-example]]
// === Using a pipeline to perform aggregations and return tabular data and graphs
//
// This example shows how to aggregate detailed sales data within a graph -- in effect, performing a 'roll-up' -- in order to obtain a high-level summarized view of the data, stored and returned in another graph, as well as returning an even higher-level view as an executive report.
// The summarized graph may be used to draw further high-level reports, but may also be used to undertake 'drill-down' actions by probing into the graph to extract more detailed information.
//
// Assume we have the graph *SalesDetail*, representing the sale of products in stores across various regions:
//
// image::opencypher-SalesDetail-graph.jpg[Graph,800,700]
//
// This models the following entities:
//
// * Regions may have many stores.
// * Stores:
// ** A store is identified by a unique `code`.
// ** A store is contained in exactly one region.
// ** A store may have multiple orders.
// * Products:
// ** A product is identified by a unique `code`.
// ** A product has a `RRP` property (Recommended Retail Price).
// ** A product may appear in one or more orders as a product _item_.
// * Sales orders:
// ** An order is identified by a unique order number, given by `num`.
// ** The `YYYYMM` property represents the year and month portion of the date of the order.
// ** An order is associated with exactly one store and contains one or more product items, representing the fact that the product item was sold in the store and is a part of the order.
// ** The relationship of between an order and a product contains the following properties:
// *** `soldPrice`: the price at which the product item was actually sold (usually lower than the product's RRP).
// *** `numItemsSold`: the number of the actual product items sold in the order.
//
// The following pipeline will create a summarized view of this data, and store it in a new summary graph called *SalesSummary*.
//
// We begin by referencing the *SalesDetail* graph, and matching on all products in all orders for all stores in all regions.
//
// [source, cypher]
// ----
// FROM GRAPH SalesDetail AT ‘graph://...’
// MATCH (p:Product)-[r:IN]->(o:Order)<-[HAS]-(s:Store)-[:IN]->(reg:Region)
// ----
//
// We aggregate the (tabular) data across all orders in order to obtain the total sales amount grouped by the product, store and region, and alias this value as `storeProductTotal`.
// As this tabular data is required to populate the summary graph later on, we pass it further down the pipeline:
//
// [source, cypher]
// ----
// WITH reg.name AS regionName,
// s.code AS storeCode,
// p.code AS productCode,
// sum(r.soldPrice * r.numItemsSold) AS storeProductTotal
// ----
//
// The tabular data consists of the following:
//
// [source, cypher]
// ----
// +------------+-----------+-------------+-------------------+
// | regionName | storeCode | productCode | storeProductTotal |
// +------------+-----------+-------------+-------------------+
// | APAC | AC-888 | PEN-1 | 20.00 |
// | APAC | AC-888 | TOY-1 | 45.00 |
// | EMEA | LK-709 | BOOK-2 | 10.00 |
// | EMEA | LK-709 | TOY-1 | 40.00 |
// | EMEA | LK-709 | BOOK-5 | 15.00 |
// | EMEA | WW-531 | BOOK-5 | 18.00 |
// | EMEA | WW-531 | BULB-2 | 190.00 |
// | EMEA | WW-531 | PC-1 | 440.00 |
// +------------+-----------+-------------+-------------------+
// 8 rows
// ----
//
// Next, we read from the *SalesDetail* graph to get the store, product and region information:
//
// [source, cypher]
// ----
// MATCH (p:Product)-[:IN]->(o:Order)<-[:HAS]-(s:Store)-[:IN]->(r:Region)
// ----
//
// We now create a new graph, *SalesSummary*, containing the summarized view of the sales information across regions, products and stores:
//
// [source, cypher]
// ----
// INTO NEW GRAPH SalesSummary
// MERGE (s:Store {storeCode: s.code})
// MERGE (r:Region {name: r.name})
// MERGE (p:Product {productCode: p.code, RRP: p.RRP})
// MERGE (s)-[:IN]->(r)
// MERGE (p)-[:SOLD_IN]->(s)
//
// // Get the total amount sold for a store
// WITH storeCode, sum(storeProductTotal) AS totalSales
// // Get the total amount sold for a product
// WITH productCode, sum(storeProductTotal) AS soldTotal
//
// // Update all store nodes with the new totalSales property
// MATCH (s:Store)
// SET s.totalSales = totalSales
// WHERE s.code = storeCode
//
// // Update all product nodes with the new soldTotal property
// MATCH (p:Product)
// SET p.soldTotal = soldTotal
// WHERE p.code = productCode
//
// // Update all (:Product)-[SOLD_IN]->(:Store) relationships with the new sold property
// MATCH (p:Product)-[r:SOLD_IN]->(s:Store)
// SET r.sold = storeProductTotal
// WHERE p.code = productCode
// AND s.code = storeCode
// ----
//
// As a final step, the *SalesSummary* graph is returned, along with a high-level summarized tabular view of store sales data.
//
// [source, cypher]
// ----
// RETURN regionName,
// storeCode,
// sum(storeProductTotal) AS totalStoreSales
// GRAPH SalesSummary
// ----
//
// The *SalesSummary* graph is comprised of the following:
//
// image::opencypher-SalesSummary-graph.jpg[Graph,800,700]
//
// The high-level summarized tabular data consists of the following:
//
// [source, cypher]
// ----
// +------------+-----------+-----------------+
// | regionName | storeCode | totalStoreSales |
// +------------+-----------+-----------------+
// | APAC | AC-888 | 65.00 |
// | EMEA | LK-709 | 65.00 |
// | EMEA | WW-531 | 648.00 |
// +------------+-----------+-----------------+
// 3 rows
// ----
//
// We note that the *SalesSummary* graph can be used to generate further high-level sales summaries, such as the total sales of a particular product (shown <<data-aggregation-external-example, here>>), as well as more detailed views.
//
// ._The full aggregation query pipeline is given by_:
// [source, cypher]
// ----
// FROM GRAPH SalesDetail AT ‘graph://...’
// MATCH (p:Product)-[r:IN]->(o:Order)<-[HAS]-(s:Store)-[:IN]->(reg:Region)
//
// WITH reg.name AS regionName,
// s.code AS storeCode,
// p.code AS productCode,
// sum(r.soldPrice * r.numItemsSold) AS storeProductTotal
//
// MATCH (p:Product)-[:IN]->(o:Order)<-[:HAS]-(s:Store)-[:IN]->(r:Region)
//
// INTO NEW GRAPH SalesSummary
// MERGE (s:Store {code: s.code})
// MERGE (r:Region {name: r.name})
// MERGE (p:Product {code: p.code, RRP: p.RRP})
// MERGE (s)-[:IN]->(r)
// MERGE (p)-[:SOLD_IN]->(s)
//
// // Get the total amount sold for a store
// WITH storeCode, sum(storeProductTotal) AS totalSales
// //Get the total amount sold for a product
// WITH productCode, sum(storeProductTotal) AS soldTotal
//
// // Update all store nodes with the new totalSales property
// MATCH (s:Store)
// SET s.totalSales = totalSales
// WHERE s.code = storeCode
//
// // Update all product nodes with the new soldTotal property
// MATCH (p:Product)
// SET p.soldTotal = soldTotal
// WHERE p.code = productCode
//
// // Update all (:Product)-[SOLD_IN]->(:Store) relationships with the new sold property
// MATCH (p:Product)-[r:SOLD_IN]->(s:Store)
// SET r.sold = storeProductTotal
// WHERE p.code = productCode
// AND s.code = storeCode
//
// RETURN regionName,
// storeCode,
// sum(storeProductTotal) AS totalStoreSales
// GRAPH SalesSummary
// ----
//
// [[data-aggregation-external-example]]
// === Using a pipeline in an external execution context
Expand Down

0 comments on commit 7f258be

Please sign in to comment.