Duff's Device: loop unrolling for interpreted languages
In 1983, Tom Duff invented a really strange way to use the C language's switch and case statements for the in code "unrolling" optimization of large loops. As an example, he described a simple loop that copies an array to an output register:
send(to, from, count)
register short *to, *from;
register count;
{
do
*to = *from++;
while(--count>0);
}
On every iteration of the loop, in addition to the copy that is being performed, the count variable is decremented and a comparison is done against 0. Duff's Device allows you to achieve the same result, but with only an 8th of the overhead:
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}
What happens is that the loop is unrolled 8 times, so each iteration of the loop runs the internal code 8 times over without the comparison. The genius of Duff's Device is that it takes advantage of the way the C switch/case structure works. The first time through, if the iterations don't divide evenly by 8, the loop code is executed enough times to equal the remainder of iterations/8. It's a little bizarre, because the "do" statement occurs within the switch, but there are "case" statements within the "do". Nevertheless, it's valid C.
Before someone cries foul, remember that this is only really suitable for speeding up the performance of inner loops when no suitable, better algorithm is available. If you code C, most modern compilers are smart enough to automatically optimize your code and unroll loops for you.
For interpreted languages like PHP or Javascript, however, you sometimes need to do a little optimization on your own if you want to squeeze out some extra performance. Luckily, both languages have a c-style switch statement.
Andy King wrote about a slightly altered version of this loop unrolling algorithm which ditches the switch statement and breaks the normal Duff's Device into two separate loops, one for the remainder and one for the unrolling. In Javascript, it performs a simple addition loop in only a third of the time of a normal for loop (testVal++ is the normal loop's interior):
function duffFasterLoop8(iterations) {
var testVal=0;
var n = iterations % 8;
if (n>0) {
do
{
testVal++;
}
while (--n); // n must be greater than 0 here
}
n = parseInt(iterations / 8);
do
{
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
}
while (--n);
}
It's not as syntactically clever as Duff's Device, but it's a good way to manually unroll an inner loop and get the best performance for your extra effort.
Duff's Device
Andy King's Optimizing JavaScript For Execution Speed
Posted by Jason Striegel |
May 20, 2008 08:50 PM
Software Engineering |
Permalink
| Comments (3)
Recent Entries
- Objective-J and Cappuccino released
- HOWTO - reset a lost Ubuntu password
- Google Chrome's comic-strip technical overview
- LEGO 3D printer
- Basement Apollo Guidance Computer
- Pringles can macro photography
- YouTube Comment Snob
- iPhone macro focus
- Multitouch touch-pad support for Linux laptops
- Dealing with large numbers of files in Unix
Comments
Newest comments listed first.
| Posted by: anti on May 21, 2008 at 7:48 AM |
I wonder why these stupid things keep coming up every 5 to 10 years.
Ever thought about cache behaviour ?
Or the fact that an "decrease, compare and jump" is basically free ?
Please take a look at the assembler that gets generated from "Duff's Device" and then tell me again that it's a good idea to use it.
Please you young and innocent out there:
Forget that you ever saw this, there is a good reason why it has been buried long time ago.
:)
| Posted by: on May 21, 2008 at 8:49 AM |
function duffFasterLoop8(iterations) {
var testVal=0;
for(n = iterations; n > 8; n -= 8)
{
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
}
do
{
testVal++;
}
while (--n);
}
| Posted by: on September 5, 2008 at 3:02 AM |
I'm one of those who re-discovered same optimization several years ago. And it works. Benchmarks show real speed improvements. And it probably worked even better in 1983, when C compilers were really bad.
What about code size and cache issues? As long as code completely fits into L1, who cares?
Leave a comment
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- AppleTV
- Astronomy
- Baseball
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Design
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- OpenOffice
- Outdoor
- Parenting
- PCs
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Security
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
Recent Posts
- Objective-J and Cappuccino released
- HOWTO - reset a lost Ubuntu password
- Google Chrome's comic-strip technical overview
- LEGO 3D printer
- Basement Apollo Guidance Computer
- Pringles can macro photography
- YouTube Comment Snob
- iPhone macro focus
- Multitouch touch-pad support for Linux laptops
- Dealing with large numbers of files in Unix
www.flickr.com
|





