Media Agent Networking

I get a lot of questions about the best way to configure networking for backup media agents or media servers in order to get the best throughput.    I thought a discussion of how the networking (and link aggregation) works would help shed some light.

Client to Media Agent:
In general we consider the media agents to be the ‘sink’ for data flows during backup from clients.  This data flow originates (typically) from many clients destined for a single media agent.   Environments with multiple media agents can be thought of as multiple single-agent configs.

The nature of this is that we have many flows from many sources destined for a single sink.  It is important then if we want to utilize multiple network interfaces on the sink (media agent) that the switch to which it is attached be able to distribute the data across the multiple interfaces.  By definition then we must be in a switch-assisted network link aggregation senario.    Meaning that the switch must be configured to utilize either LACP or similar protocols.   The server must also be configured to utilize the same methods of teaming.

Why can’t we use adaptive load balancing (ALB) or other non-switch assisted methods?  This issue is that the decision of which member of a link-aggregation-group a packet is transmitted over is made by the device transmitting the packet.  In the scenario above the bulk of the data is being transmitted from the switch to the media agent, therefore the switch must be configured to support spreading the traffic across multiple physical ports.  ALB and other non-switch –assisted aggregation methods will not allow the switch to do this and will therefore result in the switch using only one member of the  aggregation group to send data.  Net result begin that the total throughput is restricted to that of a single link.

So, if you want to bond multiple 1GbE interfaces to support traffic from your clients to the media agent the use of LACP or similar switch assisted link aggregation is critical.

Media Agent to IP Storage:
Now from the media agent to storage we consider that most traffic will originate to the media agent and be destined for the storage.  Really not much in the way of many-to-one or one-to-many relationships here it’s all one-to-one.  First question is always “will LACP or ALB help?”  the answer is probably no.  Why is that?

First understand that the media agent is typically connected to a switch, and the storage is typically attached to the same or another switch.  Therefore we have two hops we need to address MA to switch and switch to storage.

ALB does a very nice job of spreading transmitted packets from the MA to the switch across multiple physical ports.  Unfortunately all of these packets are destined for the same IP and MAC address (the storage).  So while they packets are received by the switch on multiple physical ports they are all going to go to the same destination and thus leave the switch on the same port.   If the MA is attached via 1GbE and the storage via 10GbE this may be fine.  If it’s 1GbE down to the storage then the bandwidth will be limited to that.

But didn’t I just say in the client section that LACP (switch assisted aggregation) would address this?  Yes and no.  LACP can spread traffic across multiple links even if it has the same destination, but only  if it comes from multiple sources.  The reason is that LACP uses either an IP or MAC based hash algorithm to decided which member of a aggregation group a packet should be transmitted on.  That means that all packets originating from MAC address X and going to MAC address Y will always go down the same group member.  Same is true for source IP X and destination IP Y.   This means that while LACP may help balance traffic from multiple hosts going to the same storage, it can’t solve the problem of a single host going to a single storage target.

By the way, this is a big part of the reason we don’t see many iSCSI storage vendors using a single IP for their arrays.  By giving the arrays multiple IP’s it becomes possible to spread the network traffic across multiple physical switch ports and network ports on the array.  Combine that with using multiple IP’s on the media agent host and multi-path IO (MPIO) software and now the host can talk to the array across all combinations of source and destination IPs (and thus physical ports) and fully utilize all the available bandwidth.

MPIO works great for iSCSI block storage.  What about CIFS (or NFS) based storage?   Unfortunately MPIO sits down low in the storage stack, and isn’t part of the network filing (requester) stack used by CIFS and NFS.  Which means that MPIO can’t help.    Worse with the NFS and CIFS protocols the target storage is always defined by an IP address or DNS name.  So having multiple IP’s on the array in and of itself doesn’t help either.

So what can we do for CIFS (or NFS)?  Well, if you create multiple share points (shares) on the storage, and bind each to a separate IP address you can create a situation where each share has isolated bandwidth.  And by accessing the shares in parallel you can aggregate that bandwidth (between the switch and the storage).  To aggregate between the host and switch you must force traffic to originate from specific IP’s or use LACP to spread the traffic across multiple host interfaces.  You could simulate MPIO type behavior by using routing tables to map a host IP to an array IP one-to-one.    It can be done but there is no ‘easy’ button.

So as we wrap this up what do I recommend for media agent networking?   And IP storage?
On the front end – aggregate interfaces with LACP.
On the back end – use iSCSI and MPIO rather than CIFS/NFS.  Or use 10GbE if you want/need CIFS/NFS

Asigra Linux Restore with ‘sudo’

Conduct an Asigra restore to a UNIX or Linux server using sudo credentials

Verify that user is listed in /etc/sudoers file on restore target system

media_1373309867237.png

The sudo utility allows users root level access to some (or all) subsystems without requiring users to know the root password. Please look at documentation for the sudo utility for more information.

From Asigra restore dialog, choose files to be restored

media_1373309875752.png

Select Alternate location and click on ‘>>’

media_1373310026490.png

Enter server name or IP address for restore target and check both “Ask for credentials” and “‘sudo’ as alternate user’

media_1373310083088.png

Enter username and password for user configured in /etc/sudoers file

media_1373309230681.png

Enter “root” and same password as in previous step

media_1373309513645.png

Do NOT enter the ‘root’ password. The sudo utility uses the regular user’s password.

Select restore location and truncate path, if required

media_1373309543033.png
media_1373309558317.png

Accept defaults

media_1373309569710.png

Restore in progress…

media_1373310480356.png

Verify restore completed

media_1373310743301.png

Using Asigra DS-Client Logs

How to understand backup operations using the DS-Client logs

For Lewan Managed Data Protection customers wanting additional information beyond what is available in the daily or weekly reports, the Asigra software provides the ability to look at the DS-Client activity logs. This post assumes that the user has installed or been given access to the DS-User interface and is able to connect to their DS-Client server.
A previous blog post (http://blog.lewan.com/2012/03/29/asigra-ds-user-installation-and-log-file-viewing/) addressed the installation of the DS-User along with some basics on the activity logs. This post will provide additional detail regarding the data provided by the activity logs.

Open the DS-User interface and connect to the appropriate DS-Client

media_1370017433779.png

From the menus select “Logs” and open the “Activity Log”

media_1370017447703.png

Set the parameters for logs desired

media_1370017475203.png

By default the system will display all logs for the current and previous days. For this exercise only backup activity will be required. The date and time range as well as specific nodes (backup clients) or backup sets can also be selected.
Once all options have been set, click the “Find” button to locate the specified logs.

Backup windows

media_1370017539988.png

For each set backed up, the start time, end time and total duration of the backup job can be observed. Each column can be sorted to assist in viewing.

On line Data Changed

media_1370018486463.png

The column labeled “Online” indicates to total size of changed files for the backup. That is the total amount of space used by all files which had any chage since the last backup session. For example a server with a 30 GB database which has daily updates and 4 new 1 MB documents would show 32,216,449,024 (30 GB + 4 MB). This is the amoutn of data copied from the backup client to the DS-Client.

Data Transmitted to the cloud

media_1370018516767.png

The column labeled “Transmitted…” shows the actual amount of data changed and copied to the cloud based device. This is the amount of data contained in changed blocks from all of the changed files, after compression and encryption. If, in the example above, the database file only had 1 MB of changes the Transmitted column would contain a number similar to 5,242,880 (roughly 5 MB).

Determining error and warning causes

media_1370019239770.png

In some cases a backup set will show a status of “Completed with Errors” or “Completed with Warnings”. In most cases the errors and warnings are inconsequential but should usually be looked at.
Select the line containing the backup set in question and click on the “Event Log” button

Backup Session Event Log

media_1370018398623.png

Each event in the backup session is listed in the log. Errors are flagged with a red ‘X’ and warnings with a yellow ‘!’. Selecting the event will show the detail. In the example shown above a file is stored for which the backup user does not have permission to read the file. Other common errors are due to a file being used by another process and a file which has been moved or deleted between the initial scan of the file system and the attempt to access it for backup.
In some cases there will be a large number of errors “The network name cannot be found.” These usually indicate that there is a problem with the network connection between the DS-Client and the backup target but could be caused by a reboot of the backup target or other connectivity issues.

For our Managed Data Protection customers, the Lewan Operations team checks backup sets for errors on a daily basis and will correct any critical issues.

Additional analysis

media_1370020250336.png

The activity log can also be saved to a file (text or Excel spreadsheet) for additional analysis. Right-click anywhere in the activity log and select “Save As”. Used the resulting dialog to configure the location and file type.

Lewan Achieves Veeam Gold/Platinum Partner Status

veeam backup, lewan gold parnterOur Enterprise Solutions team has been hard at work in the lab, training to become solutions experts for Veeam Data Protection and Backup tools. Their hard work, dedication and opportunity to train with Veeam’s technical team has earned Lewan the recognition of Gold Partner status in the Veeam ProPartner Program. Congratulations!

View all of Lewan’s Vendor Partners: http://www.lewan.com/vendorpartners