Application Sizing: It’s All About Your Assumptions

All of us in  IT  are familiar with the oft-used vendor’s answer, “It depends,” in response to questions about how big and how much when it comes to application and infrastructure sizing.  Of course, the companion questions to that stock answer could be “What are the requirements” and “what are your assumptions?”  And more often than not, that second question about assumptions is what drives the solutions design instead of the first question about the actual hard requirements.  This exact scenario occurred yesterday when a customer asked me to design a Vblock based on minimal data.

In performing the sizing exercise, we came up with wildly divergent designs, all of which would meet the minimal requirements provided by the customer.  The divergence was a result of differing assumption that were made to fill in the gap created by the absence of adequate data.  This is all the requirements that the customer initially provided:

  • Application: Microsoft SQL Server 2005
  • Capacity requirement: 4 TBs
  • Performance requirement: 80,000 IOPS

As you can see, not much data to go on and in fact, we have gone back to the customer to request more data so we can successfully design the compute, network, virtualization, as well storage layers.  However, we did have enough data to initially size out the storage component of a Vblock Series 300 HX, with an EMC VNX 7500 as the back-end.  I thought it would be educational to show the different design options and walk though the assumptions made to come up with these initial storage designs, particularly since storage sizing is no different for virtualized or bare-metal applications.

Two pieces of background information that you’ll need to keep in mind:

  • I use the following formula to calculate storage IOPS capabilities for SQL Server.  It is the basis for the calculator I created to for storage sizing:

Number of RAID Groups x (((Read Ratio x Disk Operations/Sec) + ((Write Ratio x Disk Operations/Sec)/Write Penalty)) x Quantity of Disk in RAID Group) = Storage IOPS

For example for 15K SAS in a single RAID 10 (4+4) RAID group:

1 x (((70% x 180) + (30% x 180)/2)) x 8 = 1,224 IOPS

IOPS

  • The Vblock leverages FAST technology in the EMC storage subsystem to automate block-level tiering across multiple drive types and tiers (FAST VP) and to use flash drives as a secondary cache for the front-end storage processors (FAST Cache).  The “FAST Effect” factored into the different storage designs.

fast_sub_lun

Here are the 4 design options, based on the requirements above:

SQL Options

Let’s take a look at each design, focusing specially on performance sizing since the capacity requirements were low and we were clearly sizing for performance.  First though, here are the assumptions that were made regarding the application workload profile:

  • 8k block size (Note that the disk operations per second varies depending on the block size)
  • Random I/O workload
  • 70% reads
  • 30% writes

Option 1

Opt1

This is the most intriguing option since it is an all flash solution.  The data set is so small that it fits on a relatively small number of flash drives that can satisfy the 80k IOPS requirement.  In this scenario, FAST VP is not needed since there is only 1 tier of drives and FAST Cache is not needed since the entire database is already hosted in the highest performant tier.  Someone may wonder why the VNX 7500 would be used in options 1, 3, and 4, given that it can house up to 960 drives.  It has to do with the performance requirements, given that the more powerful storage processors in the VNX 7500 would be needed to extract the required 80,000+ IOPS, with some room to spare.

Option 2

Opt2

This is a more traditional option that may be chosen by a customer who is not comfortable with placing their data on to flash media.  Although the entire database is on SAS drives, this solution assumes that the application is not cache friendly and that FAST Cache will not be needed. This option may also be chosen if a customer is conservative and wants to design for a worst case scenario where caching is not available or have little to no impact.  For this option, the VNX 7500 would be chosen, not only for performance, but also because it is the only VNX system that would support more than 500 drives.

Option 3

Opt3

This option attempts to use a mix of flash and SAS drives to meet the stated IOPS requirements.  FAST VP would be utilized with the assumption that while the bulk of the data set is housed on SAS, frequently accessed data will be migrated to the flash tier over time.  As in the previous option, it is assumed that FAST Cache will have little impact.

Option 4

Opt4

The last option offers a mix of drives from all tiers and includes a healthy quantity of flash for FAST Cache.  The assumption here is that this is an extremely cache friendly application.  FAST VP will migrate frequently accessed data to the flash tier over time so that FAST Cache can be used for other data that is “hot” at any given moment.

As mentioned earlier, we are asking the customer for more data and more exact requirements.  However, I hope this will provide some insight as to how sizing may be done on EMC storage and more generally, the role of assumptions in any application sizing methodology.

Looking forward to writing more in the future; see you all on the other side of 2013.

Advertisement

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s