Group 1 - Groups of scalabillity problems

Introduction

This group will focus on looking into how we can categorize the different scalability challenges. Typically by breaking down scale like

data volume
data complexity
request volume
computation complexity
.. and so forth...

Group members...

Tobias Torrissen
Kjetil Valstadsve
Johannes Brodwall
Emil Eifrem
Finn-Robert Kristensen

The result..

Runtime scalability:

Request load * Read ** Random ** Sequential * Write ** Append-only ** Random ** process and forward: large datasets who needs to be aggregated * Prio?

Data load * size * complexity ** connectedness ** semi-structure * volatility?

Consisteny requirements * availablitity, reliability, freshness * overall: when does an actor require to see the effects of the operations (its own and others)

Development time scalability:

"What happens when you get many of---"
KLOC size ** unintended consequences
Distributed services/components size ** performance ** change cost (formalized) ** debugging
technologies complexity ** open source frameworks + libraries + languages ** => assumptions inherited ** => interactions n^2
Developers - scaling numbers of developers

Operation scalability

Layers ** infrastructure ** OS ** Platform (app server) ** App
Concerns ** Monitoring/management ** Configuration ** Deployment ** Security

Causality / dependency suite

Problem domain ** => Runtime issues * => development issues **** => Operational issues

Case study: Twitter

Data is no naturally partitionable High random reads High write, append mostly Milk data - interested in recent data

Case study: Instrument data acquisition

10" write clients 10" packets/sec Compress and store Aggregate data

Case study: Green driving:

Each 1km road has a price 1-10 cents

Norway 2 billion sensors Peak of number of cars 500" Each car drives at 60 km/h 15 million messages/sec -> sends: carId + sensorId Extremely partitionable