hn-classics/_stories/2005/3959567.md

8.1 KiB

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2012-05-11T15:09:01.000Z Writing the Fastest Code, by Hand, for Fun (2005) http://www.nytimes.com/2005/11/28/technology/28super.html gaius 69 15 1336748941
story
author_gaius
story_3959567
3959567 2005

In the most recent ranking of supercomputers, I.B.M. machines overtook a number of supercomputers using Mr. Goto's software to capture the top three spots in the fastest computer rankings. Still, the Goto Basic Linear Algebra Subroutines, or BLAS, as his programs are known, were used by 4 of the world's 11 fastest computers.

Mr. Goto has become a legend in the supercomputing community because of his solitary crusade. And he shows no signs of flagging in the contest to wring every ounce of computing speed from the world's fastest microprocessor chips.

But for all the acclaim he has received, Mr. Goto is a relative newcomer to the supercomputing field, having made his breakthrough about a decade ago.

"At first I didn't know anything," he said in an interview at the annual supercomputing conference held in Seattle in mid-November. "This was all trial and error, but now I have experience."

The value of his work goes far beyond setting speed records. Because his programs can more efficiently solve complex linear equations, they can offer better solutions to virtually every computational science and engineering problem. For example, the subroutines are used in simulation programs to model the flow of air over the surface of a plane or a car more precisely.

One of Mr. Goto's principal rivals is a software project known as Atlas, created by a group of researchers working with Jack Dongarra, a computer scientist at the University of Tennessee. Atlas is an automated effort to find the most efficient way to solve linear algebra functions for specific microprocessors -- a task that Mr. Goto does meticulously by hand.

Like chess-playing software, the Atlas project tries to overcome the shortcomings of different kinds of computer designs by systematically testing thousands of solutions for each chip to find the most efficient one for each type of microprocessor.

Advertisement

Continue reading the main story

By contrast, Mr. Goto uses only a program called a software debugger that allows him to track how data moves among different components of a microprocessor.

He then reorganizes the individual software instructions so that his subroutines perform crucial algebraic functions more quickly to gain small amounts of processing speed from a specific type of computer chip.

Typically these are highly repetitive operations that can consume vast amounts of computing capacity. For example, one challenging type of calculation requires the microprocessor to multiply numbers from two tables stored in memory together.

Mr. Dongarra acknowledges that Mr. Goto's hand-tuned programs are more efficient and can still outperform Atlas.

"I tell them that if they want the fastest they should still turn to Mr. Goto," said Mr. Dongarra, who is one of the researchers who maintains the Top 500 listing of the world's fastest-performing computers from a computing speed race held twice a year.

Mr. Goto came to his passion for supercomputing almost by accident. Educated in power engineering at Waseda University in Tokyo, he worked as an employee of the Japanese Patent Office, doing research on early inventions like video recorders.

To help in his work, Mr. Goto purchased a Digital Equipment workstation based on the Alpha microprocessor in 1994 to perform a simulation.

Newsletter Sign Up

Continue reading the main story

Please verify you're not a robot by clicking the box.

Invalid email address. Please re-enter.

You must select a newsletter to subscribe to.

You agree to receive occasional updates and special offers for The New York Times's products and services.

Thank you for subscribing.

An error has occurred. Please try again later.

View all New York Times newsletters.

But when it arrived he could not understand why it was performing so slowly. So he explored the Alpha's design to see where the performance bottlenecks were.

He later purchased a second Alpha-based computer and by rewriting the crucial subroutines was able to improve its performance to 78 percent of its theoretical peak calculating speed, up from 44 percent.

Advertisement

Continue reading the main story

Although he was not formally trained in computer or software design, he perfected his craft by learning from programmers on an Internet mailing list focusing on the Linux operating system for the Alpha chip. His curiosity quickly became a passion that he pursued in his free time and during his twice daily two-hour train commute between his job in Tokyo and his home in Kanagawa Prefecture.

"I would frequently work on these problems until midnight," he said. "I did it to relax."

As a teenager, Mr. Goto developed a passion for electronic design, building his own stereo equipment from the most basic components.

His current interest, he says, is not in the pure mathematics of the linear equations, but rather in finding clever ways to overcome the shortcomings of the architecture and internal organization of microprocessors that are used in every kind of computer, from hand-held devices to supercomputers.

Modern computers are organized to offer the programmer a hierarchical series of data storage areas that range from the computer's disk drive DRAM memory, as well as relatively small temporary memory areas called caches. Typically, the fastest memories are also the smallest.

One of the simplest ways to speed a program is to keep the calculation in the memory unit, which is closest to the microprocessor's calculating engine.

Every time the calculation engine is required to stop what it is doing to get new data from a more distant memory area, processing speed slows. But in some cases, keeping data in the closest memory cache may not be as efficient as keeping it in a larger cache that is farther away.

Robert A. van de Geijin, a computer scientist who works with Mr. Goto at the Texas Center, said that Mr. Goto's special skill was in the step-by-step reordering of software instructions to take the greatest advantage of the performance trade-offs offered by each type of chip.

"He combines both scientific insight and engineering skills," Mr. van de Geijin said.

They met in 2002 when Mr. Goto took a sabbatical from his job at the patent office to spend a year at the Texas center. (He has since resigned from the patent office.)

Advertisement

Continue reading the main story

Once Mr. Goto arrived in Texas, he turned his attention to optimizing the speed of the Pentium 4 microprocessor. When computer scientists at the University at Buffalo added Goto BLAS to their Pentium-based supercomputer, the calculating power of the system jumped from 1.5 trillion to 2 trillion mathematical operations per second out of a theoretical limit of 3 trillion.

The increase was so astounding that the record keepers for supercomputing Top 500 called the researchers in Buffalo because they did not think such a speed was credible.

"I teased them and suggested that the speed of light was faster in Buffalo than it was in Tennessee," Mr. van de Geijin recalled.

Recently there has been a quiet controversy around the Goto BLAS because Mr. Goto has been slow to offer his work as open-source software, the free model of software distribution.

Some programmers have suggested that Mr. Goto has not joined the open-source movement because he wants to protect his secrets and strategies from competitors.

That is not so, he said recently, noting that the Goto BLAS software is freely available for noncommercial use. And he said he was preparing an open-source version.

He said his next big challenge was to expose chip designers to his ideas to help speed their processors.

"Computer architects are stubborn," he observed. "They have their own ideas." His ideas on computing efficiency, he said, speak for themselves.

Continue reading the main story