Data-Oriented Design
Not just for performance
What is DOD?
Think about the data FIRST, before anything else.
What does this mean? To really understand, you need to understand some context...
It came from game developers
DOD became popular in the mid to late 2000s with the advent of the Playstation 3 and Xbox 360
These processors have high clock rates, huge pipelines, and slow memory
What else to know about DOD?
Its sworn enemy is object-oriented programming
Why? We'll talk about this later, for now let's try to explain what DOD is technically.
Which function should we optimize?
func a(x float64) float64 {
return math.Sqrt(x)
}
var data []float64
func b(x int) float64 {
return data[x]
}
Imagine you're a game developer and your game is running slow
These functions are called about the same number of times
Which do you optimize? Think of your answer now
You might think square root is expensive, and it is, let's optimize it
This will lead us down a dark path...
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
return y;
}
This is the famous fast inverse square root function from Quake 3 Arena.
Although it was originally attributed to and made famous by John Carmack, an investigation traced variations of the algorithm back to the 80s.
It has been formatted to fit your screen.
This function goes to great lengths to avoid doing a real sqrt operation.
But is this sqrt really slower?
func a(x float64) float64 {
return math.Sqrt(x)
}
var data []float64
func b(x int) float64 {
return data[x]
}
Hopefully both get compiled into roughly one instruction each
a = some sort of sqrt instruction
b = some sort of load instruction
How long do these instructions take to execute?
Where's the data?
Register 0 cycles
L1 4 cycles
L2 11 cycles
L3 40-75 cycles
RAM 100-300 cycles
Each tier on this list gets bigger and slower
Let's try to get a feel for this...
Let's not talk about disk or network
We could do 20 sqrts while waiting for RAM
This is a huge problem
DOD was invented to solve it
How?
If we could just keep everything in cache...
Before we get to the solution, let's go back to a question from earlier...
Why oh why do DOD enthusiasts hate OOP?
What does OOP have to do with cache and RAM?
DOOM 3 is an example of an object-oriented game
Let's look at DOOM 3's update function
The update function executes 60 times per second in a game, or 30 times per second in a bad game
for ( idEntity* ent = activeEntities.Next();
ent != NULL;
ent = ent->activeNode.Next() )
{
if ( g_cinematic.GetBool() && inCinematic && !ent->cinematic )
{
ent->GetPhysics()->UpdateTime( time );
continue;
}
timer_singlethink.Clear();
timer_singlethink.Start();
RunEntityThink( *ent, cmdMgr );
timer_singlethink.Stop();
ms = timer_singlethink.Milliseconds();
if ( ms >= g_timeentities.GetFloat() )
Printf( "%d: entity '%s': %.1f ms\n", time, ent->name.c_str(), ms );
num++;
}
Great OOP design
Can change objects without changing this loop
Complexity encapsulated inside objects
Could be update loop for almost any game
Why does DOD hate this?
DOOM 3 memory visualized
(1) Objects are stored randomly on the heap
(2,3,4) We're gonna update Podrick, Jon, then Tyrion
(5) Let's do Podrick first. What happens? Data is loaded from RAM into cache
(6) Think of the cache as a spotlight that automatically loads things closeby
(7) When the spotlight shifts drastically, the whole cache is thrown away and reloaded from RAM
(8) When we come back to Tyrion, we have to reload everything around him a second time
Bad for performance but also makes it difficult to answer important DOD questions
Data-oriented questions
What is executing?
In what order is it executing?
Where is the data stored in memory?
OOP makes it very difficult to answer these questions
In fact the whole point of OOP is to make it so you can't answer these questions
It hides the details
DOD considers these questions to be very important
So how does DOD do this instead?
A data-oriented update function
for (int i = 0; i < rigid_bodies.length; i++)
rigid_bodies[i].update();
for (int i = 0; i < ai_controllers.length; i++)
ai_controllers[i].update();
for (int i = 0; i < players.length; i++)
players[i].update();
// etc...
Difference #1: lists of actual data, not pointers to data (this is language specific; doesn't apply to Python or Javascript where everything is a pointer)
Difference #2: we always know what kind of data we're working with
We can control the order they execute and where they're stored
What does our memory look like now?
Data-oriented memory visualized
We're accessing the data in order. This maximizes cache locality
So let's sum up what we've learned so far...
DOD = good performance
OOP = bad performance
Any questions?
This leads somewhere very unhelpful...
The discourse turns toxic
Game developers
"If you've never read a CPU spec sheet, you're not a real programmer"
"Electron is everything wrong with modern computing"
Everyone else
"I don't have time to obsess over bits and bytes"
"I'm not going to make my code ugly just to make it go faster."
Where does this misunderstanding come from?
A poor definition
DOD: it's about cache locality and performance
This is what we commonly understand as DOD, but I propose a broader definition
DOD is not just for performance wonks, it's for you!
My thesis
DOD: think about the data FIRST, before anything else.
This actually results in better design
It doesn't just improve performance, it improves your design
What does it mean to have "better design"?
What do I mean that DOD results in "better design"?
We've just seen this with DOOM 3
Switching to DOD didn't just improve performance and cache locality
It gave us a clearer mental model that allowed us to answer what is executing, in what order is it executing, and where is the data stored
These questions are useful not just for performance, but for readability and understandability
But let's try to make this "better design" more concrete...
What exactly is "better design"?
What is software design?
Quick detour: the history of software
Unstructured programming
L1: push ecx
cmp BYTE PTR[esi], 0
je L4
mov cl, [edi]
cmp cl,0
jge L2
rol BYTE PTR[esi],cl
jmp L3
L2: ror BYTE PTR[esi], cl
Here we are looking at the real stuff the computer is doing (sort of).
The code is completely flat, and we can jump anywhere in it.
It's chaos.
Structured programming
30 INPUT "How many stars do you want: "; N
40 S$ = ""
50 FOR I = 1 TO N
60 S$ = S$ + "*"
70 NEXT I
80 PRINT S$
Next comes if statements and loops.
These help us break down the code into manageable chunks.
Within each chunk, we have some assurance that the code will not jump somewhere else.
Functional decomposition
func doMath() {
add(1, 2)
add(2, 2)
}
func add(x, y int) {
fmt.Printf("%d\n", x + y)
}
Next we break down the code further into reusable functions.
The function has an interface that other functions can use without worrying about what happens inside the function.
This gives us some assurance that a change in one function will not sprawl out of control and impact other functions.
OOP
class Employee {
String getName();
void fire();
void promote();
}
Alan Kay coined the term
He regretted that inheritance became the focus of OOP later on, when in reality, message passing was its most important aspect
Message passing enabled two key goals of OOP: encapsulation and decoupling
This gives us the most assurance possible that a change in one object will not affect other objects, as long as the interface remains the same
We've now broken down our code into separate pieces as much as possible
OOP also gives us an abstraction that lets us easily think about real-world objects instead of having to think about what the computer is actually dealing with (i.e., the data)
This is also the first time we start talking about "creating" and "deleting" things
To sum it all up
Abstraction
The history of software is about building abstractions and design patterns
Let's try "abstraction-oriented design" and see where it gets us
Trying it out
Let's use abstraction to design a real-world project
Make sure to share sound
This is a game jam entry I did with some friends a few years ago
Watch a minute or two of the video to understand the goal
Apologies for the language
So let's try to make this game using "abstraction-oriented design"...
My friend starts with an abstraction
class Cell {
int getNumber();
int getX();
int getY();
Player getOwner();
}
But soon runs into problems...
When I combine cell A with cell B, do I delete A or B?
If I want to move cell A to the left, should the cell move, or should I delete it and create a new one?
Each cell will need a way to reference its neighbors...
I'll probably have an array of cells, how do I keep this up to date?
I'll have to pass the array in to each cell...
Dang, it's looking like the whole thing is going to be one giant object.
This really happened, my friend got completely bogged down.
Now let's try thinking about data first.
What's the most straightforward, simple way to represent this game in memory?
How about an array of numbers?
// v
a := [][]int{
{2, 2, 4, 8},
{0, 2, 0, 0},
{0, 2, 2, 0},// <
{0, 0, 0, 0},
}
a[2][1] += a[2][2]
a[2][2] = 0
// {2, 2, 4, 8},
// {0, 2, 0, 0},
// {0, 4, 0, 0},
// {0, 0, 0, 0},
To shift and combine this cell to the left...
// v
a := [][]int{
{2, 2, 4, 8},
{0, 2, 0, 0},
{0, 2, 2, 0},// <
{0, 0, 0, 0},
}
a[2][1] += a[2][2]
a[2][2] = 0
// {2, 2, 4, 8},
// {0, 2, 0, 0},
// {0, 4, 0, 0},
// {0, 0, 0, 0},
We just add it to the cell to the left, then zero it out.
// v
a := [][]int{
{2, 2, 4, 8},
{0, 2, 0, 0},
{0, 2, 2, 0},// <
{0, 0, 0, 0},
}
a[2][1] += a[2][2]
a[2][2] = 0
// {2, 2, 4, 8},
// {0, 2, 0, 0},
// {0, 4, 0, 0},
// {0, 0, 0, 0},
To shift the whole grid, we could loop over it and shift each cell.
for y := 0; y < 4; y++ {
for x := 1; x < 4; x++ {
if a[y][x-1] == a[y][x] {
a[y][x-1] += a[y][x]
a[y][x] = 0
}
}
}
I bet we could break this into functions...
func shiftGridLeft(a [][]int) {
for y := 0; y < 4; y++ {
for x := 1; x < 4; x++ {
shiftCellLeft(a, x, y)
}
}
}
func shiftCellLeft(a [][]int, x int, y int) {
if a[y][x-1] == a[y][x] {
return
}
a[y][x-1] += a[y][x]
a[y][x] = 0
}
Next...
We could turn these into pure functions
We could turn the whole grid into an object
We could write tests
You may notice something about the process we just went through...
We just progressed through the whole history of software
Unstructured programming: a = b
Structured programming: for { }
Functional decomposition: shiftCellLeft()
OOP
Using my definition, DOD doesn't preclude you from using objects. Just be careful...
Don't put the cart before the horse!
OOP
Functional decomposition
Structured programming
Unstructured programming
Again, doesn't mean we can't use OOP
DOD doesn't mean you can't use objects and other abstractions
You can use DOD in any domain and any language, even enterprise Java
There are many different ways of thinking about software, and DOD doesn't preclude you from using them... all it says is...
My thesis
Think about the data FIRST, before anything else.
Thank you!
If you think about it in terms of data first, everything else will fall into place.