-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Availability (DR) #1540
Comments
Link of Pool Replication for futur me reading this. |
I know the Pool Replication-feature. But it requires me to have the same storage at both sites. Scenario: Site B, Pool B, SR B is hosted by a Dell Equallogic SAN or ANY NFS/iSCSI-able storage appliance. As long as XOA see's SR B on Pool B it will be able to copy the VM. |
Idea added to the wiki |
@ISECNOC how to know a VM should be considered as down? Just its power state? |
@olivierlambert Yeah that is a good idea. Since XOA would handle both pool A and B I dont see why that wouldn't work. |
I noticed that I am logged in with my personal github account, but the answer is the same :-) |
I've been working in IT-finance for a little bit more than 6 years now and we've been audited numerous of times and the questions are mostly the same.
Recently, lets say the past 2 years there is a new repeated question: Do you have a DR-plan and how often do you test it? How long will it take to restore your infrastructure?
So I started thinking about this and I decided that replicating the SAN is probably the way together with the built in Disaster Recovery-feature in XenServer.
This was the main goal untill I started using XOA and I realized that XOA will probably be able to do everything for me, there is already a DR-Copy feature in the Backup-section, which does most of the stuff I want it to.
I am thinking of a new feature called "High Availability (DR)" which would work this way:
The scenario is the following, you have site A and site B. You have Pool A and B. You have SR A and B where A is connected to Pool A and vise versa.
You setup a "High Availability (DR)" job, exactly like you do backup-jobs (Simular functionality as DR-Copy job) to copy "important" VMs from site A, Pool A, SR A to Site B, Pool B and SR B.
The schedule can be everything from once per week, once per day or even once per hour if you have enought resources and bandwidth available.
XOA then monitors the source VM and source pool, if there is a normal failure HA will automaticly try to restart the VM in the same pool. But what if the whole pool goes down or even the site?
This is when it becomes interesting, because in my world XOA would try to reach the remote pool, and if it fails it should start the copied VMs on Site B, Pool B from SR B automaticly or maybe trigger some kind of alert which lets the user choose if the DR should be iniated.
Im not sure if you follow me, but I hope you are so far.
Now think of a scenario where you want to test this, how would it be done?
Well in my world it should be pretty simple, this is how I am thinking:
You inform everyone about a failover-simulation
You go into this job and you click "Test failover" and it prompts you again informing you that there will be a interruption on your production site where this test is performed. Also it informs you that there need to be free ram, CPU and such at the destination pool.
HA is disabled at the source pool.
The VMs in the source pool will be paused/suspended
The copied VM's will be started - This is when you can try to reach a application or such hosted at such a VM to really confirm that it is running properly.
There is a pre-defined time, lets say default is 10min and then the copied VM's will be halted.
The VM's at the source pool is now unpaused so that normal production can continue. This is done only when either it is automaticly confirmed that the VM's on the DR site has been halted or the user "overrides" any failures and tells XOA to move on with the next step.
HA is activated at the source pool.
A report is generated with status, time and simular information that PWC and other auditers wants to look at.
If there is any questions please do not hesitate to contact me!
At IRC or email, nikade@freenode or niklas.ahden@isec.com
The text was updated successfully, but these errors were encountered: