With a stated mission of being “the most customer-centric company on earth”, Amazon.com has a big job of ensuring an exceptional experience for their tens of millions of customers worldwide. To do this, Amazon.com’s Client Experience Analytics (CXA) Team runs customer simulations against Amazon’s global web properties on an ongoing basis. These customer simulations help the team measure website latency across the globe, identify trends or issues, simulate website activity, and more. These simulations are done on a massive scale to mimic the 98 million active customer accounts across more than 10 web properties. As a result, these simulations produce a lot of data.
Until recently, the CXA team was using a traditional MySQL Relational Database Management System (RDBMS) to store and query the data. But the size of their data set was increasing by 12-15 GB each day, which required the team to regularly upgrade to beefier, more expensive hardware. Even among the team’s MySQL-savvy staff, this required a significant time investment to provision the hardware and set up and configure the software. Not to mention the ongoing admin hours required to run regular backups and keep the software up-to-date with the latest patches.