The AWS Love/Hate Relationship with Data Gravity


I received the e-mail above from Amazon Web Services after recently signing up for another test account. The e-mail had me thinking about the impact of data gravity on AWS, both positively and negatively. For those who are new to the term, data gravity is a concept first coined by Dave McCrory, current CTO of Basho. It refers to the idea that “As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data.” McCrory attributes this data gravity phenomenon to “Latency and Throughput, which act as the accelerators in continuing a stronger and stronger reliance or pull on each other.” This is so because the closer services and applications are to their data, i.e. in the same physical facility, the lower the latency and higher the throughput. This in turn enable more useful and reliable services and applications.


A second characteristic of data gravity is that as more data is accumulated, the more difficult it is to move that data. That’s the reason services and applications tend to coalesce around data. The further you try to move data and the more data you try to move, the harder it is to do because latency increases and throughput decreases. This is know as the “speed of light problem.” Practically, this means that at a certain capacity, it becomes extremely difficult or too costly to try and move data to another facility, such as a cloud provider.

Data gravity, therefore, represents both a challenge and and opportunity for Amazon Web Services. Given that the vast majority of data today live outside of AWS data centers and have been accumulating for sometime in locations such as customer data centers, data gravity becomes a major challenge for AWS adoption by established enterprises. This of course is something AWS must overcome to continue their growth beyond startups and niche workloads in enterprises. If AWS is able to remove the barriers to migrating data into their facilities, they can then turn data gravity into an advantage and an opportunity.

The opportunity that data gravity affords AWS is to continue and to extend their dominance as a cloud provider. As users store more data within AWS services such as S3 and EBS, data gravity kicks in and users find it often easier and more efficient to use additional AWS services to leverage that data more fully. This creates, for Amazon, a “virtuous cycle” where data gravity opens up opportunities for more AWS services to be used, which generates more data, that then opens up more services to be consumed.

Data gravity and the need to both overcome and to utilize it is the reason so many AWS services is focused on data and how it can be more easily moved to AWS or how it can be more fully leveraged to produce additional value for customers. Take a look below at some of the many services that are particularly designed to attenuate or to accentuate data gravity.

Services Description
Athena Query service for analyzing S3 data
Aurora MySQL compatible highly performant relational database
CloudFront Global content delivery network to accelerate content delivery to users
Data Pipeline Orchestration service for reliably processing and moving data between compute and storage services
Database Migration Service Migrates on-premises relational databased to Amazon RDS
DynamoDB Managed NoSQL database service
Elastic Block Storage Persistent block storage volumes attached to EC2 instances
Elastic File System Scalable file storage that can be mounted by EC2 instances
Elastic Map Reduce Managed Hadoop framework for process large scale data
Glacier Low-cost storage for data archival and long-term backups
Glue Managed ETL service for moving data between data stores
Kinesis Service for loading and analyzing streaming data
Quicksight Managed business analytics service
RDS Managed relational database service
Redshift Petabyte scale managed data warehouse service
S3 Scalable and durable object storage for storing unstructured files
Snowball Petabyte scale service using appliances to transfer data to and from AWS
Snowmobile Exabyte scale service using shipping containers to transfer data to  and from AWS
Storage Gateway Virtual appliance providing hybrid storage between AWS and on-premises environments

So what are some takeaways as we consider AWS and it’s love/hate relationship with data gravity? Here are a few to consider:

  • If you are an enterprise that wants to migrate to AWS but is being held back by data gravity in your data center, expect that AWS will innovate beyond services like the Snowball and the Snowmobile to make migration of large data sets easier.
  • If you are a user who is “all-in” on AWS and has either created and/or migrated all or most of your data to AWS, the good news is that you will continue to see an ever growing number of services that will allow you to gain more value from that data.
  • If you are user who is concerned about vendor/cloud provider lock-in, you need to consider carefully the benefits and consequences of creating and/or moving large amount of data on AWS or using higher level services such as RDS and Amazon Redshift. (As an aside, the subject of lock-in is probably worth a dedicated blog post since I believe that it is often misunderstood. In brief, each user should consider if the benefits of being locked-in may be greater than the perceived liability, e.g, what if lock-in potentially costs me $1 million but generate $2 million in revenues over the same time period. Opportunity cost is difficult to calculate and generally ignored in the ROI models I see.
  • Finally, If you an AWS partner or an individual who want to work at AWS or an AWS partner, put some focus on (in addition to security and Lambda) storage, analytics, database, and data migration services since they are all strategic for Amazon in how they deal with the positive and negative impact of data gravity. This was in evidence at the most re:Invent conference where much of the focus was placed on storage and database services such as EFS, snowmobile and Aurora.

Given its importance and critical impact, AWS observers should keep a careful eye on what Amazon will continue to do to both overcome and to leverage data gravity. It may very well dictate the future success of Amazon Web Services.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: