GTMPeerRepl-0.1
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
Copyright (c) 2016 Fidelity National Information Services, Inc. and/or its subsidiaries. All rights reserved. -------------------- REVISION HISTORY -------------------- 0.1 2016-04-22 Initial Release 0.11 2016-05-12 Changes from Peer Review -------------------- COPYRIGHT and LICENSING -------------------- No claim of copyright is made to the 3n+1 program in threeen1glo.m, or its input file prog_in.txt. Except where otherwise noted within a file, all other files in the GT.M Peer Replication package are covered by the Copyright statement at the beginning of this readme file. If you receive the software in this package integrated with a GT.M distribution, the license for your GT.M distribution is also your license for this plugin. In the event the package contains a COPYING file, that is your license for this software. Except as noted below, you must retain that file and include it if you redistribute the package or a derivative work thereof. If you have a signed agreement that legitimately provides you with GT.M under a different license from that in the COPYING file, you may, at your option, delete the COPYING file and use this software as an integral part of GT.M under the same terms as that of the GT.M license. If your GT.M license permits redistribution, you may redistribute this package integrated with your redistributed GT.M. Simple aggregation or bundling of this software with another for distribution does not create a derivative work. To make clear the distinction between this package and another with which it is aggregated or bundled, it suffices to place the package in a separate directory or folder when distributing the aggregate or bundle. Should you receive this package not integrated with a GT.M distribution, and missing a COPYING file, you may create a file called COPYING from the GNU Affero General Public License Version 3 or later (https://www.gnu.org/licenses/agpl.txt) and use this package under the terms of that license. -------------------- BACKGROUND -------------------- [In GT.M replication, an instance has a logically complete M global variable namespace that it either updates and replicates to other instances (a source), or to which it applies updates from a source (a receiver, of which there are two types, BC and SI). Instances for GT.M peer replication are logically complete M global variable namespaces, and an instance can be both a source and a receiver, i.e., instances are peers. Furthermore, while BC and SI replication are implemented by GT.M, peer replication is implemented using application code.] Some types of business logic require serialization. For example, as the logic to execute each financial transaction on an account depends on the result of the preceding transaction on that account, and potentially also on related accounts, distributing the processing over multiple instances increases the time to serialize updates, which in turn has an impact on transaction throughput: for contemporary computer systems, update/transaction serialization is typically the ultimate limit to application scalability. So for application logic that requires serialization, centralizing serialization to a single computing node yields the best throughput on current technology. Other types of business logic either do not require serialization or do not need serialization to be determined by processing logic. For example, a financial institution must keep track of balance inquiries, but while the balance reported by an inquiry depends on the previous update to that account, it does not depend on the result of the previous balance inquiry. Consider an application deployment where financial transactions for accounts are centrally processed on an instance A (with instances B, and C for business continuity), and streamed in real-time using GT.M replication to instances P, Q and R which handle inquiries. Since each inquiry is unique and handled by one of P, Q, or R, simply aggregating the inquiries processed across all three inquiry servers provides a complete record of inquiries. In other words, while there is business logic that requires serialization and hence is best centralized rather than distributed, there is other business logic that can be distributed: to improve application responsiveness by providing local service, for better ability to handle spikes in load, etc. Similar dichotomies exist in other application domains - for example, while orders to be fulfilled from a warehouse must be serialized (first come, first served; most important customer in line served first; whatever rules apply) since a warehouse with an inventory of four widgets cannot ship five, customer order histories are collections of orders regardless of which warehouse fulfilled each item, and can be aggregated without requiring serialization by application software. GT.M Peer Replication is a design pattern, supported by a working reference implementation, that uses application level code to aggregate from multiple instances specified database updates which do not need serialization. While the unmodified reference implementation may well satisfy the needs of many applications, complex applications will likely need to adapt and refine it to suit their needs. -------------------- THEORY OF OPERATION -------------------- The premise of the reference implementation is that application logic executed on each instance generates certain database updates to be captured and replicated to peer instances, to satisfy aggregation needs. The reference implementation expects that global variables to be replicated to peers are non-conflicting. This means that any global node updates replicated are either (a) unique to each node, e.g., the first subscript is a node id, or the timestamp of an inquiry (since the same inquiry cannot be independently satisfied by two different nodes at the same time); or (b) the same value if two instances compute a value for a given node. The latter is the case for the threeen1glo program used as a sample application for demonstration purposes packaged with the reference implementation. The threeen1glo program computes the 3n+1 sequence length for all integers from 1 through a number read from its input. If a number n (n>1) is even, halve it to get the next number; if it is odd, the next number in the sequence is 3n+1. For example, the 3n+1 sequence for 3 is 3, 10, 5, 16, 8, 4, 2, 1 whose length is 8. The as yet unproven Collatz Conjecture (https://en.wikipedia.org/wiki/Collatz_conjecture) states that all sequences eventually reach 1. In a sample output of the program: GT.M V6.3-000 Linux x86_64 2016-05-12 12:07:50 threeen1glo 100,000 351 8.847738 seconds; 217,211 nodes; 118,803 sets] the fields are: - $ZVERSION of the GT.M implementation - GT.M V6.3-000 Linux x86_64; - the current date and time in an international format - 2016-05-12 12:07:50; - the name of the program - threeen1glo; - the input number - 100,000; - the longest 3n+1 sequence found - 351; - elapsed time - 8.847738 seconds; - the number of nodes computed (which is greater than the input number for all but the most trivial cases) - 217,211; and - the number of nodes set locally (no greater than the number of nodes computed, because some of the nodes may have values replicated from peers) - 118,803. In the reference implementation of peer replication, GT.M triggers on database updates capture each update to be replicated and append it as nodes indexed by a relative timestamp in a sub-tree of the ^%YGTMPeerRepl global variable. "Push" and "pull" replication processes monitor the end of this sub-tree and when they observe new updates appended to the sub-tree, they replicate them in chronological order. Each push or pull process is specific to a pair of nodes, and uses two global directories, one maps to a "local" instance, and another that maps to a "remote" instance. As local and remote instances are mapped by global directories, and as the GT.M application code of the reference implementation uses only functionality available on databases on its own computer system as well as databases accessed via GT.CM, the reference implementation of GT.M peer replication is agnostic as to whether a remote instance is on the same computer system as the local node (as is the case for the demonstration) or is on a remote computer system accessed via GT.CM. Note that ^%YGTMPeerRepl needs to be be in the same BC/SI replication instance as the database updates that are being captured. Push and pull processes are unidirectional, i.e., bidirectional replication between A and B requires two processes, one to replicate in each direction. This arrangement allows for finer grain control of application deployment configurations, e.g., if an application configuration requires A to replicate to B, but not the other way around. The functionality of GT.M peer replication is implemented with a single file GTMPeerRepl.m. - mumps -run GTMPeerRepl --help prints a usage summary, and - mumps -run GTMPeerRepl --help=full prints more detailed information Depending on the data to be replicated, bi-directional replication may have the potential for infinte loops. If the updates at each node are not distinguished by the node on which application logic originally computes the update, the update can be replicated to a peer node, from where a trigger captures the update, and peer replication replicates it to the origating node, where a trigger captures the update, and... In the reference implementation, the push and pull proceses set the intrinsic special variable $ZTWORMHOLE to "GTMPeerRepl" and the triggers capture updates to be replicated only if $ZTWORMHOLE is not "GTMPeerRepl". The reference implementation of GT.M Peer Replication requires the GT.M POSIX plugin to be installed. -------------------- SPECIAL NOTE FOR APPLICATIONS USING TRANSACTION PROCESSING -------------------- GT.M executes an update and its associated trigger within a transaction. As GTMPeerRepl records each update by appending it to a subtree of ^%YGTMPeerRepl, during update intensive workloads, the end of that subtree can become a potential cause of TP restarts. If application logic does not use transaction processing, the trigger transaction encompasses only the actual update and the record of the update, and even when a trigger causes a restarts, the impact on the application is likely to be minimal since restarts only recompute the actual database update (from the global variable node to database blocks). If application uses transaction processing, the fact that a transaction is "open" for a longer period of time increases the probability of TP restarts. Furthermore, when application logic is executed inside a transaction, the cost of a restart is higher than when it is not. Two possible remedies use $JOB to ensure that multiple processes will not set the same node at the same time, and in turn allow processes to set VIEW "NOISOLATION" for ^%YGTMPeerRepl. - Adding $JOB as the last subscript to the nodes of ^%YGTMPeerRepl reduces the impact of "collisions". When two processes concurrently attempt to update the same block, a common scenario when appending to the end of a global variable subtree, GT.M recomputes the update (as it does for collisions that occur outside transactions) instead of restarting the transaction. This adds a small complication to the push and pull processes which, with VIEW "NOISOLATION", must deal with the possibility of multiple nodes (from different process) with the same relative timestamp. - Adding $JOB as the first subscript to the nodes of ^%YGTMPeerRepl reduces restarts by reducing the probability of multiple processes updating the same data block at the same time. However, this significantly increases the complexity of the push and pull processes, which each pass of which must now include an outer loop that scans across processes, and an inner loop that scans for updates by a process creating a time ordered list of updates in a local variable to be replicated. Note that a process with a higher $JOB may have a more recent update visible to a push or pull process scan than a process with an earlier update made after its updates were examined by the push or pull process. While dealing with this is not rocket science, to ensure time ordering, you must deal with it. A heuristic to speed up the scans would be to have ^%YGTMPeerRepl($JOB) store the relative timestamp of the most recent update by a process. To implement either of the above, modify the reference implementation in GTMPeerRepl.m to best match your situation. If you modify the structure of data in ^%YGTMPeerRepl, remember to adapt operations of GTMPeerRepl such as purge and status. A third remedy, which is well suited to high-volume production deployments, is to replicate the data from the system of record on which application logic computes serialized database updates to other instances, using GT.M Supplementary Instance (SI) replication. As the only source of trigger updates is the Update Process, peer replication from these supplementary instances will not cause transaction restarts, or indeed have any impact on the system of record. -------------------- MANIFEST -------------------- The GTM Peer Replication package consists of the following files: - clean.sh - a program to clean up files created by demolocal.sh - COPYING - if present, the license under which this software is distributed (see LICENSING, above) - demolocal.sh - a shell script to demonstrate & test GT.M Peer Replication on your system; discussed below - gde_in.txt - input to GDE for the demonstration and test script - gtmenv - a file sourced by demolocal.sh to attempt to set up GT.M environment variables for the demonstration & test - GTMPeerReplDemo.m - a driver program for demolocal.sh - GTMPeerRepl.m - the reference implementation of GT.M peer replication - prog_in.txt - input to the program threeen1glo - readme.txt - this file - threeen1glo.m - code for the "application" discussed above used to demonstrate GT.M Peer replication - trig.txt - a list of the global variable nodes to be replicated, one to a line -------------------- DEMO / TEST -------------------- Unpack the GT.M Peer Replication distribution in a directory, and make the demolocal.sh and clean.sh files executable. Invoke ./demolocal.sh to run the demonstration, and ./clean.sh to clean up after one or more runs of demolocal.sh. The following is a run of demolocal.sh annotated with comments: $ ./demolocal.sh ****Demo of GT.M Peer Replication $gtm_dist=/usr/lib/fis-gtm/V6.3-000_x86_64 $GTMXC_gtmposix=/usr/lib/fis-gtm/V6.3-000_x86_64/plugin/gtmposix.xc $gtmroutines="/home/gtmuser/GTMPeerReplDemo* /usr/lib/fis-gtm/V6.3-000_x86_64/plugin/o/_POSIX.so /usr/lib/fis-gtm/V6.3-000_x86_64/libgtmutil.so" $timestamp=2016-05-12_16:59:40 UTC demolocal.sh sources the gtmenv file in an attempt to find an installation of GT.M it can use, and sets environment variables for that installation. Check the values reported, and if they are incorrect for your system, edit gtmenv to provide correct values. In order to run demolocal.sh multiple times, the demonstration generates a timestamp, and creates subdirectories outfiles_$timestamp, peer{1,2,3}_$timestamp, to preserve the output of each run. GTMPeerReplDemo 2016-05-12_16:59:40 GTMPeerRepl 0.1 GTMPeerReplDemo identifies itself, the timestamp, and the version of the reference implementation of GT.M Peer Replication. MUPIP TRIGGER file to load triggers is /home/gtmuser/GTMPeerReplDemo/outfiles_2016-05-12_16:59:40/GTMPeerReplDemo_2016-05-12_16:59:40_31954_trigload.txt MUPIP TRIGGER file to delete triggers is /home/gtmuser/GTMPeerReplDemo/outfiles_2016-05-12_16:59:40/GTMPeerReplDemo_2016-05-12_16:59:40_31954_trigdel.txt From the list of global variables to be replicated in trig.txt, and the trigger templates at label trigdef in GTMPeerRepl.m, demolocal.sh creates files for MUPIP TRIGGER to load trigger definitions neeeded for peer replication and to delete them on completion. You should review the *trigload.txt file to see the definitions of triggers to capture updates. ****Checking --help: mumps -run GTMPeerRepl --help GT.M Peer Replication - n way database peer replication Usage: mumps -run GTMPeerRepl [option] ... where options are: --hang=time - when there is no data to be replicated, how many milliseconds to wait before checking again; default is 100 and zero means to keep checking in a tight loop --help[=full] - print helpful information and exit without processing any following options --[loca]lname=instname - name of this instance; instance names must start with a letter and are case-sensitive --[pull[from]|push[to]]=instname[,gbldirname] - remote instance name and access information --[purg]e[=time] - purge data recorded before time (measured as the number of microseconds since the UNIX epoch); 0 means all data recorded; a negative value is relative to the current time; default is all data already replicated to known instances --[stat]us[=instname[,...]] - show replication status of named instances; default to all instances --stop[repl][=instname[,...]] - stop replicating to/from named instances; default to all instances Options are processed from left to right. In many cases, you may be able to use GTMPeerRepl as distributed, at least to get started. mumps -run GTMPeerRepl --help gives you a convenient short list of the command line options it supports. ****Checking --help=full: $gtm_dist/mumps -run GTMPeerRepl --help=full >/home/gtmuser/GTMPeerReplDemo/outfiles_2016-05-12_16:59:40/help_full.txt The --help=full command line option generates more voluminous help, which the demolocal.sh script places in the help_full.txt file. ****Loading triggers - outputs in outtrigload.txt files in each peer. ****Turning on peer replication This is the beginning of the actual demonstration of peer replication. The demonstration creates three peers, peer1, peer2 and peer3, each in its own directory, and in each peer loads the triggers that capture the database updates to be replicated. ****Run peer instances of application Peer1 pushes to peer2. Peer3 pulls from peer1, peer2 & peer3 push to each other. This starts three instances of the threeen1glo "application", peer1, peer2 and peer3. In this demonstration, the replication between peer2 and peer3 is bidirectional, but the replication from peer1 to peer2 and peer3 is unidirectional (a push from peer1 to peer2, and a pull from peer1 by peer3). ****Check status - peer2 is only peer3 status, others are for all peers While the threeen1glo application instances are running, the demonstration runs the --status option of GTMPeerRepl. Sample results below. Localname=peer1 Lastseq=1,904,047,320,770 Peer=peer2 Sent=1,904,046,483,340 Received=0 Gbldir=/home/gtmuser/GTMPeerReplDemo/peer2_2016-05-12_16:59:40/gtm.gld push started pid=31970 at 1,904,046,085,229 Peer=peer3 Sent=1,904,046,229,617 Received=0 Captured updates are serialized by a relative microsecond time stamp. All instances must have a common time, and while the offset from $ZUT can be any value, all instances must agree on that offset. While the timestamps are nominally integer microseconds, multiple updates within the same microsecond increment the sequence number by .01, i.e., the resolution is theoretically ten nanoseconds, and when future computers exceed a sustained rate of 1E8 updates/second, GTMPeerRepl will need to be modified appropriately. The status report for peer1 identifies the name of the instance (peer1), and the last update sequence number captured (1,904,047,320,770). The last update pushed to peer2 is 1,904,046,483,340 (i.e., the backlog is 837,430 microseconds worth of updates), and since the replication from peer1 to peer2 is a push, the peer1 status reports the global directory it uses to access peer2. The last update pulled from peer1 by peer3 is 1,904,046,229,617. Since peer3 is pulling data from peer1, peer1 does not have a global directory for peer3. As replication from peer1 to peer2 and peer3 is unidirectional, peer1 reports that it has received no updates from either. Localname=peer2 Lastseq=1,904,047,494,105 Peer=peer3 Sent=1,904,046,533,849 Received=0 Gbldir=/home/gtmuser/GTMPeerReplDemo/peer3_2016-05-12_16:59:40/gtm.gld As the status command for peer1 and peer3 did not request status for a specific instance, they report status for all instances. The status command for peer2 in the demonstration requests status for peer3, for which it reports the status. Although each of peer2 and peer3 is pushing to the other (bidirectional peer replication), at the time of the status command, peer3 has not pushed any data to peer 2. Localname=peer3 Lastseq=0 Peer=peer1 Sent=0 Received=1,904,046,229,617 Gbldir=/home/gtmuser/GTMPeerReplDemo/peer1_2016-05-12_16:59:40/gtm.gld pull started pid=31982 at 1,904,046,206,899 Peer=peer2 Sent=0 Received=1,904,046,581,877 Gbldir=/home/gtmuser/GTMPeerReplDemo/peer2_2016-05-12_16:59:40/gtm.gld push started pid=31978 at 1,904,046,161,864 The status report for peer3 reports the replication status for both peer1 and peer2. Lastseq=0 means that peer3 has not computed and captured any updates locally - as of the time status was run, any data that the application logic on peer3 sought had already been replicated from peer1 or peer2. As peer3 is started after peer1 and peer2, Lastseq=0 is not unreasonable. ****Wait for programs to complete and get results - number of nodes should be the same for all peers ****Number of sets should be same as number of nodes for peer1; and less for other peers Results peer1: GT.M V6.3-000 Linux x86_64 2016-05-12 12:59:41 threeen1glo 100,000 351 35.875547 seconds; 217,211 nodes; 217,211 sets] peer2: GT.M V6.3-000 Linux x86_64 2016-05-12 12:59:41 threeen1glo 100,000 351 36.755706 seconds; 217,211 nodes; 118,967 sets] peer3: GT.M V6.3-000 Linux x86_64 2016-05-12 12:59:41 threeen1glo 100,000 351 36.620998 seconds; 217,211 nodes; 124,279 sets] After the instances complete, they report results. The length of the maximum 3n+1 sequence found must be the same for all instances (351 for an input of 100000), as must the number of nodes (217,211). However, there is a difference in the number of sets. For peer1, which receives no replication from another peer, the number of sets is equal to the number of nodes. But for peer2 and peer3, the number of sets is less than the number of nodes, as some of the nodes in their databases have been replicated from other peers. ****Stop replication - outputs in outstop.txt files in each peer, which should all be empty ****Deleting triggers - outputs in outtrigdel.txt files in each peer. After the instances complete, replication is stopped and the update capturing triggers are deleted. ****Space in database before purge of updates to be replicated - outputs in outpurge.txt files in each peer, which should all be empty peer1: Free/Total=1822/5000 peer2: Free/Total=1053/5000 peer3: Free/Total=4542/5000 ****Space after purge - all data for peers 1; last 3 seconds for peer2; only already replicated data for peer3 peer1: Free/Total=3960/5000 peer2: Free/Total=2529/5000 peer3: Free/Total=4542/5000 The triggers capture updates in a subtree of ^%YGTMPeerRepl, but never automatically delete data. When it's safe to purge replicated updates, the --purge command of GTMPeerRepl can purge captured updates - all updates, all replicated updates, or all updates prior to a timestamp (which can be absolute or relative). ****Confirm that there are no remaining processes in any peer %GTM-I-MUFILRNDWNSUC, File /home/gtmuser/GTMPeerReplDemo/peer1_2016-05-12_16:59:40/gtm.dat successfully rundown %GTM-I-MUFILRNDWNSUC, File /home/gtmuser/GTMPeerReplDemo/peer2_2016-05-12_16:59:40/gtm.dat successfully rundown %GTM-I-MUFILRNDWNSUC, File /home/gtmuser/GTMPeerReplDemo/peer3_2016-05-12_16:59:40/gtm.dat successfully rundown $ This is just a final cleanup to confirm that there are no remaining processes from the demonstration. -------------------- INSTALLATION AND USE -------------------- Review the GTMPeerRepl.m program, and determine whether the as-is reference implementation meets your needs; modify it as appropriate. Place it in the execution path for mumps processes as defined by $gtmroutines / $ZROUTINES. Compile it and locate the object code per your normal application configuration practices. Create triggers for global variables to be replicated. Set up a purge schedule, and deploy instances as needed.