Discussion in 'OT Technology' started by Peyomp, Dec 24, 2008.
Just to do it.
What properties would a 40 EBS volume RAID 0 have?
Curious? I am. I think I'm gonna try it, but with 100 meg volumes, and then put some files on there, and see what its like with things like grep -R.
If its interesting, I'll write it up and call it: "The world's fastest idiot: benchmarking inefficient operations on large EBS stripes"
Would 4 drives simple overload a gigabit ethernet drive for sequential reads, or would ANY operation possibly benefit from a large striped array? Keep in mind this is happening in a single data center (ideally) in Amazon's cloud.
Some of you know rather a lot about network storage. Talk to me.
I'm gonna do some SERIOUS data mining on the milfhunter collection.
Excellent! I'm sure you'll come across some very interesting data.
What interests me isn't the 40TB part. What interests me is the 40EBS volumes part.
What if each EBS volume is only 100MB...?
Keeping in mind that they probably use 1TB SATA disks though... so smaller volumes could result in more than one per disk.
What is EBS?
Amazon Elastic Block Storage. You can dynamically allocate up to 1TB volumes ad infinitum for your EC2 instances.
I am thinking I have a real use case for this. We have a large log of data in MySQL that spans many days and takes up about a gig a day. I'm going to split it up into partitions of 365 days. In practice, I'll only work with say... 10 days of data. So, I'll make a RAID 0 EBS drive out of several drives, for each partition. And see how different setups effect it.
how are these volumes made accessible to the host who is directly accessing the storage? do they provide any details of whether it is somehow directly attached over a fiber SAN or is it network attached SAN like iSCSI or is it more like a shared volume a la NFS or CIFS (or some proprietary mix of those)? i'm just wondering because striping generally gives you good performance, unless of course it is possible that a really slow storage source gets into the mix. for example, is it possible that it is all network attached and it is possible that a storage source from a different datacenter could get in the mix screwing up all of your IO?
hmm... i think i would rather have a bunch of tests with a single sized volume striped across increasingly smaller EBS volumes to limit the chance that the filesystem itself was affecting the tests.
Yeah, but with smaller EBS volumes you might get two on the same disk... then again, who knows if they even allocate whole disks?