• 1 Post
  • 126 Comments
Joined 1 year ago
cake
Cake day: August 28th, 2023

help-circle





  • That’s not really… possible at this point. We have thousands of customers (some very large ones, like A——n and G—-e and Wal___t) with tens or hundreds of millions of users, and even at lowest traffic periods do 60k+ queries per second.

    This is the same MySQL instance I wrote about a while ago that hit the 16TiB table size limit (due to ext4 file system limitations) and caused a massive outage; worst I’ve been involved in during my 26 year career.

    Every day I am shocked at our scale, considering my company is only like 90 engineers.








  • INC-224, never forget.

    I am an infra engineer at a fairly large scale (not like Amazon, but we have some BIG customers) SaaS company; despite our scale, we are only like 250 people and of them only about 90 engineers. We store a bunch of data in MySQL.

    15:30:00, I get a page “MySQL table is full.” I immediately know my day is ruined, since I’ve never heard of this error before, but know it ain’t great.

    15:30:10, every Pagerduty escalation policy in the entire company gets bombarded with pages.

    I look at the database instance. The table size is “only” 16TiB, so it’s a bit confusing.

    We are hard down for several hours as we scramble to delete data or somehow free up space. Turns out, google backs ClpudSQL MySQL instances with ext4 disks instead of zfs, and the max file size on ext4 is… you guessed it, 16TiB.

    We learned a LOT of lessons from this, and are now offloading a shitload of json into either MongoDB or gcs, depending on the requirements. The largest table is down to 3TiB now :D