Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance

Shard-Query + ICE scales very well up to at least 20 nodes

This post is a detailed performance analysis of what I’ve coined “distributed set processing”.

Please also read this post’s “sister post” which describes the distributed set processing technique.

Also, remember that Percona can help you get up and running using these tools in no time flat. We are the only ones with experience with them.

20 is the maximum number of nodes that I am allowed to create in EC2. I can test further on our 32 core system, but I wanted to do a real world “cloud” test to show that this works over the network, in a real world environment. There is a slight performance oddity at 16 nodes. I suspect the EC2 environment is the reason for this, but it requires further investigation.

Next I compared the performance of 20 m1.large machines cold, versus hot. I did not record the cold results on the c1.medium machines, so only the warm results are provided for reference. Remember that the raw input data set was 55GB before converting to a star schema (21GB) and being compressed by ICE to 2.5GB. Many of these queries examine the entire data set doing origin/count(distinct destination) combinations across two dimension (origin/dest), each with 400 unique items.

In the following chart you will see performance at a single node as the tall blue line, and the short cyan line is 20 nodes. In order to avoid too many bars on the chart, response times between 2 and 16 nodes (inclusive) are shown as lines.

This chart shows the same data in another way:

Concurrency testing is important too. I tested a 20 node m1.large system at 1,2,4,8,16 and 32 threads of concurrency.

The following simple bash scripts were used for the concurrency test:

$ cat start_bench
#!/bin/bash
if [ "$1" = "" ]
then
  workers=1
else
  workers=$1
fi
if [ "$2" = "" ]
then
  iterations=3
else
  iterations=$2
fi
for i in `seq 1 $workers`
do
  echo "Launching benchmark worker #$i with $iterations iterations"
  ./bench_worker $iterations $workers &
done
wait

$ cat bench_worker
#!/bin/bash
for i in `seq 1 $1`
do
        mkdir -p "results/$2/"
        ./run_query < queries.sql |egrep "rows|^-" > results/$2/raw.$$.$i.txt
done;

Query processing is handled by a Gearman queue which limits the maximum number of concurrent storage node queries. This prevents the system from being overloading and provides a scalable average response time under increased concurrency. Another queue in front of the storage nodes is probably advisable.

The post Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes. appeared first on MySQL Performance Blog.

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance

Shard-Query + ICE scales very well up to at least 20 nodes

Trending Articles

SHA FM SINDU KAMARE WITH EMBILIPITIYA DELIGHTED 2018-06-22

Too Short-Gettin It Album Number Ten-CD-FLAC-1996-Mrflac

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Maya Mohini 10-10-2016 – Vijay tv Serial

Neem Baba Extra Questions Answer Class 6 English Poorvi

State Champs – Living Proof (2018) [FLAC 24bit/44,1kHz]

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Raj Panchayat 3rd / Third Grade Teacher Revised Result 2012 Level 1-2...

Download: Promise – By Fire (Prod By J Kabs)

How to Configure Data Captures for Intermittent/Sporadic SChannel Events

Practice Sheet of Right form of verbs for HSC Students

Isilon CLI Command Reference

Former Waltham man, 30, jailed for eight-and-a-half years for raping four women

Foreigner found dead in Kg Sungai Teraban area

Elle Duncan’s Husband Omar Abdul Ali

Umapathy Hanumanthappa (reply)

Advertisement Writing Class 12 Format, Examples

Gemvision Matrix 9.0 7349 Full crack + Rhinoceros 5.14 + Clayoo 2.5.18071.9

The 6 Best Sex Scenes in Nollywood Movies

Bureau of Internal Revenue: Regional Offices (Directory)