Troubleshooting replication bug in agentless VMware VM migration

This commodity describes some common issues and specific errors you might encounter when yous replicate on-premises VMware VMs using the Azure Migrate: Server Migration agentless method.

When y'all replicate a VMware virtual machine using the agentless replication method, information from the virtual motorcar's disks (vmdks) are replicated to replica managed disks in your Azure subscription. When replication starts for a VM, an initial replication cycle occurs in which total copies of the disks are replicated. Afterwards initial replication completes, incremental replication cycles are scheduled periodically to transfer any changes that take occurred since the previous replication bike.

You may occasionally meet replication cycles declining for a VM. These failures tin can happen due to reasons ranging from problems in on-premises network configuration to issues at the Azure Drift Deject Service backend. In this article, we volition:

  • Testify you how you can monitor replication status and resolve errors.
  • List some of the usually occurring replication errors and suggest boosted steps to remediate them.

Monitor replication status using the Azure portal

Use the following steps to monitor the replication condition for your virtual machines:

  1. Go to the Servers page in Azure Drift on the Azure portal. Image 1
  2. Navigate to the "Replicating machines" page by clicking on "Replicating servers" in the Server Migration tile. Image 2
  3. Yous'll encounter a list of replicating servers along with additional information such as condition, health, last sync time, etc. The health column indicates the current replication health of the VM. A 'Disquisitional' or 'Alarm' value in the health column typically indicates that the previous replication cycle for the VM failed. To get more than details, right-click on the VM, and select "Error Details." The Mistake Details page contains information on the error and additional details on how to troubleshoot. You lot'll too meet a "Recent Events" link that can exist used to navigate to the events folio for the VM. Image 3
  4. Click "Recent Events" to see the previous replication bike failures for the VM. In the events folio, expect for the most contempo issue of type "Replication cycle failed" or "Replication wheel failed for disk" for the VM. Image 4
  5. Click on the event to empathize the possible causes of the mistake and recommended remediation steps. Use the information provided to troubleshoot and remediate the error. Image 5

Mutual Replication Errors

This section describes some of the common errors, and how you tin can troubleshoot them.

Key Vault operation failed error when trying to replicate VMs

Fault: "Fundamental Vault operation failed. Functioning : Configure managed storage business relationship, Primal Vault: Key-vault-name, Storage Account: storage account name failed with the error:"

Error: "Cardinal Vault functioning failed. Operation : Generate shared access signature definition, Key Vault: Key-vault-name, Storage Account: storage account name failed with the error:"

Key Vault

This error typically occurs because the User Access Policy for the Cardinal Vault doesn't requite the currently logged in user the necessary permissions to configure storage accounts to exist Key Vault managed. To check for user access policy on the key vault, go to the Key vault page on the portal for the Key vault and select Access policies

When the portal creates the primal vault it also adds a user access policy granting the currently logged in user permissions to configure storage accounts to be Key Vault managed. This tin fail for two reasons

  • The logged in user is a remote principal on the customers Azure tenant (CSP subscription - and the logged in user is the partner admin). The workaround in this case is to delete the primal vault, log out from the portal, and then log in with a user account from the customers tenant (not a remote principal) and retry the operation. The CSP partner will typically accept a user account in the customers Azure Active Directory tenant that they can employ. If not they can create a new user account for themselves in the customers Azure Active Directory tenant, log in to the portal equally the new user then retry the replicate operation. The account used must have either Possessor or Contributor+User Access Ambassador permissions granted to the account on the resource group (Migrate project resource grouping)

  • The other case where this may happen is when 1 user (user1) attempted to setup replication initially and encountered a failure, but the key vault has already been created (and user access policy appropriately assigned to this user). At present at a later on point a different user (user2) tries to setup replication, simply the Configure Managed Storage Account or Generate SAS definition operation fails equally there is no user access policy corresponding to user2 in the cardinal vault.

Resolution: To workaround this upshot create a user access policy for user2 in the keyvault granting user2 permission to configure managed storage account and generate SAS definitions. User2 can do this from Azure PowerShell using the beneath cmdlets:

$userPrincipalId = $(Go-AzureRmADUser -UserPrincipalName "user2_email_address").Id

Fix-AzureRmKeyVaultAccessPolicy -VaultName "keyvaultname" -ObjectId $userPrincipalId -PermissionsToStorage get, list, delete, prepare, update, regeneratekey, getsas, listsas, deletesas, setsas, recover, backup, restore, purge

DisposeArtefactsTimedOut

Error ID: 181008

Mistake Message: VM: VMName. Error: Encountered timeout result 'DisposeArtefactsTimeout' in the land &'['Gateway.Service.StateMachine.SnapshotReplication.SnapshotReplicationEngine+WaitingForArtefactsDisposalPreCycle' ('WaitingForArtefactsDisposalPreCycle')]'.

Possible Causes:

The component trying to replicate information to Azure is either down or non responding. The possible causes include:

  • The gateway service running in the Azure Migrate appliance is down.
  • The gateway service is experiencing connectivity issues to Service Double-decker/Event hub/Appliance Storage account.

Identifying the verbal cause for DisposeArtefactsTimedOut and the respective resolution:

  1. Ensure that the Azure Migrate appliance is upward and running.

  2. Check if the gateway service is running on the apparatus:

    1. Log in to the Azure Migrate apparatus using remote desktop and do the post-obit.

    2. Open the Microsoft services MMC snap-in (run > services.msc), and bank check if the "Microsoft Azure Gateway Service" is running. If the service is stopped or not running, start the service. Alternatively, yous can open command prompt or PowerShell and do: "Internet Start asrgwy"

  3. Check for connectivity issues between Azure Migrate appliance and Apparatus Storage Account:

    Run the following control later downloading azcopy in the Azure Migrate apparatus:

    azcopy bench https://[account].blob.cadre.windows.net/[container]?SAS

    Steps to run the performance benchmark examination:

    1. Download azcopy

    2. Look for the appliance Storage Account in the Resource Grouping. The Storage Business relationship has a proper noun that resembles migrategwsa**********. This is the value of parameter [account] in the above command.

    3. Search for your storage account in the Azure portal. Ensure that the subscription y'all use to search is the same subscription (target subscription) in which the storage account is created. Go to Containers in the Blob Service department. Click on +Container and create a Container. Go out Public Access Level to default selected value.

    4. Get to Shared Access Signature under Settings. Select Container in "Immune Resource Blazon." Click on Generate SAS and connection string. Copy the SAS value.

    5. Execute the in a higher place command in Command Prompt past replacing account, container, SAS with the values obtained in steps 2, 3, and four respectively.

    Alternatively, download the Azure Storage Explore on to the appliance and endeavor to upload 10 blobs of ~64 MB into the storage accounts. If there is no consequence, the upload should be successful.

    Resolution: If this test fails, there's a networking effect. Engage your local networking team to check connectivity issues. Typically, there tin be some firewall settings that are causing the failures.

  4. Check for connectivity bug betwixt Azure Migrate appliance and Service Jitney:

    This test checks if the Azure Migrate appliance tin communicate to the Azure Drift Cloud Service backend. The appliance communicates to the service backend through Service Passenger vehicle and Result Hub message queues. To validate connectivity from the apparatus to the Service Bus, download the Service Bus Explorer, effort to connect to the appliance Service Charabanc and perform send message/receive message. If there is no upshot, this should be successful.

    Steps to run the exam:

    1. Copy the connectedness cord from the Service Passenger vehicle that got created in the Migrate Projection
    2. Open the Service Coach Explorer
    3. Become to File then Connect
    4. Paste the connection string and click Connect
    5. This will open up Service Bus Proper name Space
    6. Select Snapshot Managing director in the topic. Right click on Snapshot Manager, select "Receive Letters" > select "peek", and click OK
    7. If the connection is successful, you lot will meet "[10] letters received" on the panel output. If the connection is not successful, y'all'll see a message stating that the connection failed

    Resolution: If this test fails, there's a networking issue. Engage your local networking team to check connectivity issues. Typically, there tin can exist some firewall settings that are causing the failures.

  5. Connectivity issues betwixt Azure Drift appliance and Azure Fundamental Vault:

    This test checks for connectivity issues betwixt the Azure Migrate apparatus and the Azure Key Vault. The Key Vault is used to manage Storage Account admission used for replication.

    Steps to bank check connectivity:

    1. Fetch the Key Vault URI from the list of resources in the Resource Grouping corresponding to Azure Drift Project.

    2. Open PowerShell in the Azure Migrate appliance and run the following command:

    test-netconnection Fundamental Vault URI -P 443

    This command volition endeavor a TCP connection and will render an output.

    • In the output, cheque the field "TcpTestSucceeded". If the value is "Truthful", there is no connectivity effect between the Azure Migrate Appliance and the Azure Key Vault. If the value is "False", there is a connectivity upshot.

    Resolution: If this exam fails, there'due south a connectivity issue between the Azure Migrate appliance and the Azure Cardinal Vault. Engage your local networking team to bank check connectivity bug. Typically, in that location can be some firewall settings that are causing the failures.

DiskUploadTimedOut

Fault ID: 1011

Error Bulletin: The upload of data for disk DiskPath, DiskId of virtual machine VMName; VMId did not complete within the expected time.

This mistake typically indicates either that the Azure Drift appliance performing the replication is unable to connect to the Azure Cloud Services, or that replication is progressing slowly causing the replication wheel to time out.

The possible causes include:

  • The Azure Migrate apparatus is down.
  • The replication gateway service on the appliance is non running.
  • The replication gateway service is experiencing connectivity issues to ane of the following Azure service components that are used for replication: Service Omnibus/Consequence Hub/Azure cache Storage Account/Azure Key Vault.
  • The gateway service is being throttled at the vCenter level while trying to read the disk.

Identifying the root crusade and resolving the consequence:

  1. Ensure that the Azure Migrate appliance is up and running.

  2. Cheque if the gateway service is running on the appliance:

    1. Log in to the Azure Migrate appliance using remote desktop and do the following.

    2. Open the Microsoft services MMC snap-in (run > services.msc), and cheque if the "Microsoft Azure Gateway Service" is running. If the service is stopped or not running, start the service. Alternatively, you can open command prompt or PowerShell and practice: "Cyberspace Start asrgwy".

  3. Check for connectivity issues between Azure Drift appliance and cache Storage Account:

    Run the following command later downloading azcopy in the Azure Migrate appliance:

    azcopy demote https://[business relationship].blob.core.windows.internet/[container]?SAS

    Steps to run the performance benchmark test:

    1. Download azcopy

    2. Look for the Appliance Storage Account in the Resource Group. The Storage Account has a name that resembles migratelsa**********. This is the value of parameter [account] in the higher up command.

    3. Search for your storage account in the Azure portal. Ensure that the subscription you utilise to search is the aforementioned subscription (target subscription) in which the storage account is created. Go to Containers in the Blob Service section. Click on +Container and create a Container. Exit Public Access Level to default selected value.

    4. Go to Shared Access Signature nether Settings. Select Container in "Allowed Resource Blazon." Click on Generate SAS and connexion string. Copy the SAS value.

    5. Execute the above command in Command Prompt past replacing business relationship, container, SAS with the values obtained in steps 2, iii, and 4 respectively.

    Alternatively, download the Azure Storage Explore on to the appliance and attempt to upload x blobs of ~64 MB into the storage accounts. If there is no issue, the upload should be successful.

    Resolution: If this examination fails, there's a networking issue. Appoint your local networking team to check connectivity issues. Typically, there can be some firewall settings that are causing the failures.

  4. Connectivity bug between Azure Drift appliance and Azure Service Bus:

    This test will check whether the Azure Migrate appliance can communicate to the Azure Migrate Cloud Service backend. The appliance communicates to the service backend through Service Motorcoach and Event Hub message queues. To validate connectivity from the apparatus to the Service Bus, download the Service Bus Explorer, effort to connect to the appliance Service Bus and perform send message/receive message. If there is no issue, this should be successful.

    Steps to run the exam:

    1. Copy the connection string from the Service Bus that got created in the Resources Group respective to Azure Migrate Project

    2. Open Service Bus Explorer

    3. Go to File > Connect

    4. Paste the connectedness string y'all copied in footstep 1, and click Connect

    5. This volition open Service Coach namespace.

    6. Select Snapshot Director in the topic in namespace. Right click on Snapshot Manager, select "Receive Messages" > select "peek", and click OK.

    If the connectedness is successful, you will run across "[ten] messages received" on the console output. If the connection is non successful, you'll run across a message stating that the connection failed.

    Resolution: If this test fails, there'southward a connectivity effect between the Azure Drift appliance and Service Jitney. Engage your local networking team to check these connectivity problems. Typically, there can be some firewall settings that are causing the failures.

  5. Connectivity issues betwixt Azure Migrate appliance and Azure Cardinal Vault:

    This test checks for connectivity problems between the Azure Migrate appliance and the Azure Key Vault. The Cardinal Vault is used to manage Storage Account access used for replication.

    Steps to bank check connectivity:

    1. Fetch the Key Vault URI from the list of resources in the Resource Group corresponding to Azure Migrate Project.

    2. Open PowerShell in the Azure Migrate appliance and run the following command:

    exam-netconnection Primal Vault URI -P 443

    This command will try a TCP connection and will return an output.

    1. In the output, check the field "TcpTestSucceeded". If the value is "True", there is no connectivity issue between the Azure Drift Appliance and the Azure Key Vault. If the value is "False", there is a connectivity issue.

    Resolution: If this test fails, there's a connectivity upshot betwixt the Azure Migrate appliance and the Azure Fundamental Vault. Engage your local networking team to check connectivity issues. Typically, there tin be some firewall settings that are causing the failures.

Encountered an fault while trying to fetch changed blocks

Error Message: 'Encountered an error while trying to fetch modify blocks'

The agentless replication method uses VMware's changed block tracking technology (CBT) to replicate data to Azure. CBT lets Server Migration tool rail and replicate only the blocks that take changed since the final replication cycle. This mistake occurs if changed block tracking for a replicating virtual automobile is reset or if the changed block tracking file is corrupt.

This fault can exist resolved in the following two means:

  • If you had opted for "Automatically repair replication" by selecting "Yes" when yous triggered replication of VM, the tool volition try to repair it for y'all. Right click on the VM, and select "Repair Replication."
  • If yous did not opt for "Automatically repair replication" or the above pace did non work for yous, then finish replication for the virtual motorcar, reset changed block tracking on the virtual machine, and and then reconfigure replication.

1 such known consequence that may cause a CBT reset of virtual machine on VMware vSphere 5.5 is described in VMware KB 1020128: Inverse Cake Tracking is reset afterward a storage vMotion operation in vSphere v.x . If yous are on VMware vSphere 5.five ensure that yous apply the updates described in this KB.

Alternatively, yous can reset VMware changed block tracking on a virtual machine using VMware PowerCLI.

An internal error occurred

Sometimes y'all may striking an mistake that occurs due to problems in the VMware environment/API. Nosotros accept identified the following set of errors equally VMware surroundings-related errors. These errors have a fixed format.

Error Message: An internal fault occurred. [Mistake bulletin]

For example: Mistake Message: An internal error occurred. [An Invalid snapshot configuration was detected].

The following section lists some of the commonly seen VMware errors and how you lot can mitigate them.

Mistake Bulletin: An internal error occurred. [Server Refused Connexion]

The issue is a known VMware issue and occurs in VDDK six.vii. You demand to stop the gateway service running in the Azure Migrate appliance, download an update from VMware KB, and restart the gateway service.

Steps to stop gateway service:

  1. Press Windows + R, open up services.msc. Click on "Microsoft Azure Gateway Service", and stop it.
  2. Alternatively, you can open command prompt or PowerShell and practise: Net Stop asrgwy. Ensure you lot expect until you get the message that service is no longer running.

Steps to start gateway service:

  1. Printing Windows + R, open up services.msc. Correct click on "Microsoft Azure Gateway Service", and start it.
  2. Alternatively, you can open up command prompt or PowerShell and do: Net Commencement asrgwy.

Error Message: An internal error occurred. ['An Invalid snapshot configuration was detected.']

If you take a virtual motorcar with multiple disks, you may encounter this error if you lot remove a disk from the virtual machine. To remediate this problem, refer to the steps in this VMware article.

Error Message: An internal error occurred. [Generate Snapshot Hung]

This issue occurs when snapshot generation stops responding. When this event occurs, you can encounter create snapshot job stops at 95% or 99%. Refer to this VMware KB to overcome this outcome.

Mistake Bulletin: An internal mistake occurred. [Failed to consolidate the disks on VM [Reasons]]

When we consolidate disks at the end of replication wheel, the operation fails. Follow the instructions in the VMware KB by selecting the appropriate Reason to resolve the issue.

The post-obit errors occur when VMware snapshot-related operations – create, delete, or consolidate disks fail. Follow the guidance in the adjacent section to remediate the errors:

Error Message: An internal mistake occurred. [Another task is already in progress]

This consequence occurs when there are conflicting virtual machine tasks running in the background, or when a task within the vCenter Server times out. Follow the resolution provided in the following VMware KB.

Error Message: An internal fault occurred. [Operation not allowed in current state]

This issue occurs when vCenter Server management agents stop working. To resolve this result, refer to the resolution in the following VMware KB.

Error Message: An internal error occurred. [Snapshot Disk size invalid]

This is a known VMware outcome in which the deejay size indicated by snapshot becomes goose egg. Follow the resolution given in the VMware KB.

Error Message: An internal error occurred. [Retention allocation failed. Out of memory.]

This happens when the NFC host buffer is out of memory. To resolve this consequence, you need to move the VM (compute vMotion) to a different host, which has costless resources.

Error Message: An internal mistake occurred. [File is larger than maximum file size supported (1012384)]

This happens when the file size is larger than the maximum supported file size while creating the snapshot. Follow the resolution given in the VMware KB

Error Bulletin: An internal fault occurred. [Cannot connect to the host (1004109)]

This happens when ESXi hosts cannot connect to the network. Follow the resolution given in the VMware KB.

Mistake bulletin: An error occurred while saving the snapshot: Invalid change tracker error code

This error occurs when at that place's a problem with the underlying datastore on which the snapshot is existence stored. Follow the resolution given in the VMware KB.

Error bulletin: An error occurred while taking a snapshot: Unable to open the snapshot file.

This error occurs when the size of the snapshot file created is larger than the available free infinite in the datastore where the VM is located. Follow the resolution given in this document.

Replication cycle failed

Mistake ID: 181008

Fault Bulletin: VM: 'VMName'. Error: No disksnapshots were found for the snapshot replication with snapshot Id : 'SnapshotID'.

Possible Causes:

Possible reasons are:

  1. Path of i or more included disks changed due to Storage VMotion.
  2. One or more included disks is no longer attached to the VM.

Recommendation:

Following recommendations are provided

  1. Restore the included disks to original path using storage vMotion and and then disable storage vmotion.
  2. Disable Storage VMotion, if enabled, stop replication on the virtual motorcar, and replicate the virtual machine again. If the result persists, contact support.

Side by side Steps

Continue VM replication, and perform test migration.