hn-classics/_stories/2005/3959567.md

210 lines
8.1 KiB
Markdown

---
created_at: '2012-05-11T15:09:01.000Z'
title: Writing the Fastest Code, by Hand, for Fun (2005)
url: http://www.nytimes.com/2005/11/28/technology/28super.html
author: gaius
points: 69
story_text: ''
comment_text:
num_comments: 15
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1336748941
_tags:
- story
- author_gaius
- story_3959567
objectID: '3959567'
year: 2005
---
In the most recent ranking of supercomputers, I.B.M. machines overtook a
number of supercomputers using Mr. Goto's software to capture the top
three spots in the fastest computer rankings. Still, the Goto Basic
Linear Algebra Subroutines, or BLAS, as his programs are known, were
used by 4 of the world's 11 fastest computers.
Mr. Goto has become a legend in the supercomputing community because of
his solitary crusade. And he shows no signs of flagging in the contest
to wring every ounce of computing speed from the world's fastest
microprocessor chips.
But for all the acclaim he has received, Mr. Goto is a relative newcomer
to the supercomputing field, having made his breakthrough about a decade
ago.
"At first I didn't know anything," he said in an interview at the annual
supercomputing conference held in Seattle in mid-November. "This was all
trial and error, but now I have experience."
The value of his work goes far beyond setting speed records. Because his
programs can more efficiently solve complex linear equations, they can
offer better solutions to virtually every computational science and
engineering problem. For example, the subroutines are used in simulation
programs to model the flow of air over the surface of a plane or a car
more precisely.
One of Mr. Goto's principal rivals is a software project known as Atlas,
created by a group of researchers working with Jack Dongarra, a computer
scientist at the University of Tennessee. Atlas is an automated effort
to find the most efficient way to solve linear algebra functions for
specific microprocessors -- a task that Mr. Goto does meticulously by
hand.
Like chess-playing software, the Atlas project tries to overcome the
shortcomings of different kinds of computer designs by systematically
testing thousands of solutions for each chip to find the most efficient
one for each type of microprocessor.
Advertisement
[Continue reading the main story](#story-continues-4)
By contrast, Mr. Goto uses only a program called a software debugger
that allows him to track how data moves among different components of a
microprocessor.
He then reorganizes the individual software instructions so that his
subroutines perform crucial algebraic functions more quickly to gain
small amounts of processing speed from a specific type of computer chip.
Typically these are highly repetitive operations that can consume vast
amounts of computing capacity. For example, one challenging type of
calculation requires the microprocessor to multiply numbers from two
tables stored in memory together.
Mr. Dongarra acknowledges that Mr. Goto's hand-tuned programs are more
efficient and can still outperform Atlas.
"I tell them that if they want the fastest they should still turn to Mr.
Goto," said Mr. Dongarra, who is one of the researchers who maintains
the Top 500 listing of the world's fastest-performing computers from a
computing speed race held twice a year.
Mr. Goto came to his passion for supercomputing almost by accident.
Educated in power engineering at Waseda University in Tokyo, he worked
as an employee of the Japanese Patent Office, doing research on early
inventions like video recorders.
To help in his work, Mr. Goto purchased a Digital Equipment workstation
based on the Alpha microprocessor in 1994 to perform a simulation.
## Newsletter Sign Up
[Continue reading the main story](#continues-post-newsletter)
###
Please verify you're not a robot by clicking the box.
Invalid email address. Please re-enter.
You must select a newsletter to subscribe to.
You agree to receive occasional updates and special offers for The New
York Times's products and services.
### Thank you for subscribing.
### An error has occurred. Please try again later.
[View all New York Times newsletters.](/newsletters)
But when it arrived he could not understand why it was performing so
slowly. So he explored the Alpha's design to see where the performance
bottlenecks were.
He later purchased a second Alpha-based computer and by rewriting the
crucial subroutines was able to improve its performance to 78 percent of
its theoretical peak calculating speed, up from 44 percent.
Advertisement
[Continue reading the main story](#story-continues-5)
Although he was not formally trained in computer or software design, he
perfected his craft by learning from programmers on an Internet mailing
list focusing on the Linux operating system for the Alpha chip. His
curiosity quickly became a passion that he pursued in his free time and
during his twice daily two-hour train commute between his job in Tokyo
and his home in Kanagawa Prefecture.
"I would frequently work on these problems until midnight," he said. "I
did it to relax."
As a teenager, Mr. Goto developed a passion for electronic design,
building his own stereo equipment from the most basic components.
His current interest, he says, is not in the pure mathematics of the
linear equations, but rather in finding clever ways to overcome the
shortcomings of the architecture and internal organization of
microprocessors that are used in every kind of computer, from hand-held
devices to supercomputers.
Modern computers are organized to offer the programmer a hierarchical
series of data storage areas that range from the computer's disk drive
DRAM memory, as well as relatively small temporary memory areas called
caches. Typically, the fastest memories are also the smallest.
One of the simplest ways to speed a program is to keep the calculation
in the memory unit, which is closest to the microprocessor's calculating
engine.
Every time the calculation engine is required to stop what it is doing
to get new data from a more distant memory area, processing speed slows.
But in some cases, keeping data in the closest memory cache may not be
as efficient as keeping it in a larger cache that is farther away.
Robert A. van de Geijin, a computer scientist who works with Mr. Goto at
the Texas Center, said that Mr. Goto's special skill was in the
step-by-step reordering of software instructions to take the greatest
advantage of the performance trade-offs offered by each type of chip.
"He combines both scientific insight and engineering skills," Mr. van de
Geijin said.
They met in 2002 when Mr. Goto took a sabbatical from his job at the
patent office to spend a year at the Texas center. (He has since
resigned from the patent office.)
Advertisement
[Continue reading the main story](#story-continues-6)
Once Mr. Goto arrived in Texas, he turned his attention to optimizing
the speed of the Pentium 4 microprocessor. When computer scientists at
the University at Buffalo added Goto BLAS to their Pentium-based
supercomputer, the calculating power of the system jumped from 1.5
trillion to 2 trillion mathematical operations per second out of a
theoretical limit of 3 trillion.
The increase was so astounding that the record keepers for
supercomputing Top 500 called the researchers in Buffalo because they
did not think such a speed was credible.
"I teased them and suggested that the speed of light was faster in
Buffalo than it was in Tennessee," Mr. van de Geijin recalled.
Recently there has been a quiet controversy around the Goto BLAS because
Mr. Goto has been slow to offer his work as open-source software, the
free model of software distribution.
Some programmers have suggested that Mr. Goto has not joined the
open-source movement because he wants to protect his secrets and
strategies from competitors.
That is not so, he said recently, noting that the Goto BLAS software is
freely available for noncommercial use. And he said he was preparing an
open-source version.
He said his next big challenge was to expose chip designers to his ideas
to help speed their processors.
"Computer architects are stubborn," he observed. "They have their own
ideas." His ideas on computing efficiency, he said, speak for
themselves.
[Continue reading the main story](#whats-next)