Week 10 Day 1 - Fast Inverse Square Root and Duff's Device

Channel:

Bill Kerney

Subscribers:

2,700

Published on March 17, 2021 5:55:39 AM ● Video Link: https://www.youtube.com/watch?v=gdN-pO3zrow

Duration: 1:54:18

636 views

We started off by reviewing Makefiles, as I am trying to encourage you all to make your own Makefiles and to feel comfortable tweaking them to squeeze more performance out of your code, or to reduce build times.

Next we reviewed pipelining again, and how RAW dependencies cause stalls in the pipeline, and talked about how to reorder instructions to minimize stalls again. We then reviewed IEEE 754 format.

We then studied two of the most horrible and also awesome algorithms known to computer scientists: the Fast Inverse Square Root algorithm and Duff's Device.

Inverse square roots are used when normalizing vectors. To normalize a vector means to make it of length one. For example, the vector {3,4} has a length of 5, so to normalize it you divide each element of the vector by the length, giving {3/5,4/5} as the result. Dividing by the length of a vector is... an inverse square root.

The Fast Inverse Square Root algorithm is a lossy approximation of doing a real square root, but it is good enough (4% error, or less than 1% error with one iteration of Newton's method) for computer graphics. While you might see 4% error with the human eye, less than 1% error is usually not perceptible

unless you're really looking for it. In video games, speed is more important than accuracy. So this ungodly hack of an algorithm is used to make it go quickly. Depending on your architecture, it can be very significantly faster. If you want more accuracy, you can slow it down to get more precision as well.

Duff's Device involves abusing the syntax of the C (and C++) language to do loop unroll 8 instructions at a time (which speeds up code significantly) without requiring the input to be a multiple of 8 (which is something I often do, it's nearly free to pad arrays these days). So if you're going to do 35 instructions in a tight loop, it will do four iterations of 8 instructions each, and then one iteration of 3 instructions. (4x8 + 3 = 35 instructions) Look at the syntax and cry. Or laugh? It's up to you.

Other Videos By Bill Kerney

2021-03-24	Week 11 Day 1 - From Low Level Syscalls to High Level Modern C++
2021-03-23	Week 10 Day 1 - Social Impact of Computer Science
2021-03-22	Week 10 - Triangularization and Rasterization
2021-03-22	Week 11 Day 1 - Move Constructors
2021-03-20	Week 10 Day 3 - Sets vs. Maps vs. Unordered Maps
2021-03-19	Week 9 Day 3 - Algorithmic Bias
2021-03-19	Week 10 Day 2 - HUDs
2021-03-18	Week 10 Day 2 - NEON Part III, CALL
2021-03-18	Week 9 Day 2 - Algorithms II
2021-03-17	Week 10 Day 2 - BSTs Part 3
2021-03-16	Week 10 Day 1 - Fast Inverse Square Root and Duff's Device
2021-03-16	Week 10 Day 1 - Guest Lecture + Animation
2021-03-16	Week 9 Day 1 - Algorithms
2021-03-15	Week 10 Day 1 - BSTs Part II
2021-03-13	Extra Credit Workshop: Sea Shanties
2021-03-13	Week 8 Day 3 - Quantifier Fallacies and Cognitive Biases
2021-03-12	Week 9 Day 3 - Binary Search Trees
2021-03-12	Week 9 Day 2 - Matrix Multiplies and Interactivity
2021-03-11	Lecture 9 Day 2 - Floating Point Numbers
2021-03-11	Week 8 Day 2 - Online Privacy
2021-03-10	Week 9 Day 2 - File I/O with Classes and NCURSES Programming

Tags:

csci 45

duff's device

fast inverse square root

pipelining

ieee 754

Channel	Latest
Scott Jund	6 hours ago
Smutsen	6 hours ago
BeastyqtSC2	6 hours ago
Exalted	6 hours ago
Bonkol Live	6 hours ago
Teh Spearhead	6 hours ago
Ashe Challenger	6 hours ago
Austinmp88	6 hours ago
Ask About Parenting & Care	6 hours ago
GranaDy	7 hours ago
Catninja909	7 hours ago
Sion VOD Gaming	7 hours ago
mlodyhubson	7 hours ago
Outplanet Studios	7 hours ago
RakuInariLP	7 hours ago
Xmilek62	7 hours ago
BranOnline	7 hours ago
ketsueki_randi	7 hours ago
beavsbaut	7 hours ago
JugZone	7 hours ago
PIMPNITE	7 hours ago
ItzMiketheman	7 hours ago
Secretnc	7 hours ago
Jeisonlk	7 hours ago
Kaghoegaming	7 hours ago