hn-classics/_stories/1996/10573313.md

221 lines
8.0 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[Source](http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html "Permalink to fpgacpu.org - On FPGAs as PC Coprocessors")
# fpgacpu.org - On FPGAs as PC Coprocessors
| ----- |
| | | | **_On FPGAs as PC Coprocessors_**
|
| [**Home][1]**
[_Regexps in FPGAs][2] **>>**
**<<** [Emulating FPGAs][3]_
**Usenet Postings**
  [By Subject][1]
  [By Date][4]**
**FPGA CPUs**
  [Why FPGA CPUs?][5]
  [Homebuilt processors][6]
  [Altera, Xilinx Announce][7]
  [Soft cores][8]
  [Porting lcc][9]
  [32-bit RISC CPU][10]
  [Superscalar FPGA CPUs][11]
  [Java processors][12]
  [Forth processors][13]
  [Reimplementing Alto][14]
  [Transputers][15]
  [FPGA CPU Speeds][16]
  [Synthesized CPUs][17]
  [Register files][18]
  [Register files (2)][19]
  [Floating point][20]
  [Using block RAM][21]
  [Flex10K CPUs][22]
  [Flex10KE CPUs][23]
[**Multiprocessors][24]**
  [Multis and fast unis][25]
  [Inner loop datapaths][26]
  [Supercomputers][27]
**Systems-on-a-Chip**
  [SoC On-Chip Buses][28]
  [On-chip Memory][29]
  [VGA controller][30]
  [Small footprints][31]
[**CNets][32]**
  [CNets and Datapaths][33]
  [Generators vs. synthesis][34]
**FPGAs vs. Processors**
  [CPUs vs. FPGAs][35]
  [Emulating FPGAs][3]
  _**FPGAs as coprocessors**_
  [Regexps in FPGAs][2]
  [Life in an FPGA][36]
  [Maximum element][37]
**Miscellaneous**
  [Floorplanning][38]
  [Pushing on a rope][39]
  [Virtex speculation][40]
  [Rambus for FPGAs][41]
  [3-D rendering][42]
  [LFSR Design][43]
 
| [Google][44] SiteSearch
| | |
| | |
Newsgroups: comp.arch.fpga
Subject: On FPGAs as PC coprocessors
Date: 6 May 1996 22:12:11 GMT
One of the "way out speculation" questions asked at FCCM 96 (IEEE
Symposium on FPGAs for Custom Computing Machines) was "when, if ever,
will an FPGA coprocessor ship on every PC motherboard?"
Ignoring the daunting language and interface standards issues, and just
looking at current hardware approaches to FPGA "coprocessors", the
answer must be "not any time soon".
Consider that today's power user's CPU (such as a 200 MHz Pentium Pro
or a 400 MHz Alpha 21164A) *running out of L1 cache* issues peak three
or four instructions per 2.5-5 ns clock. Also consider that the
current "low latency high bandwidth" approach to FPGA coprocessor
integration is to hang the FPGA on the PCI bus.
In this scenario, the kinds of quasi-general purpose computing problems
that an FPGA coprocessor can usefully assist with are quite limited.
Issuing a write and then a read back operation to the FPGA could easily
take 10 PCI bus cycles (300 ns), assuming no PCI bus contention. In
that time (assuming hand crafted code (less effort than a FCCM)) a
Pentium Pro could issue as many as 180 instructions; the Alpha, 480
64-bit instructions. Future versions of such processors will soon be
doing VIS- or MMX-like limited bytewise 8+-way SIMD parallelism, and
eventually superscalar versions of same. Such designs might issue
between 500 (*) and 4000 (**) hand coded packed byte operations while
that single 300 ns FPGA write/read is still in progress.
((At peak speeds like 400e6 clock/s * 4 issue/clock * 8 byte ops/issue,
e.g. 12e9 byte ops/s, you have to agree that superscalar micros with
bytewise SIMD are going to displace many FPGA applications on PC and
workstation platforms.))
So as long as FPGAs are attached on relatively glacially slow I/O buses
-- including 32-bit 33 MHz PCI -- it seems unlikely they will be of
much use in general purpose PC processor acceleration. Sure, for
applications such as cryptography, image and signal processing, they
might be a win (***), given a semi-autonomous problem which either fits
in the FPGA and local storage, or which can employ DMA to stream data
into or through the FPGA without much CPU intervention or management.
Of course, the PCI ASIC crowd has the same latency problems, but they
dont share FCCM aspirations of accelerating general purpose computing,
rather they focus on the same aforementioned special purpose
applications.
Five times better latency and four times better bandwidth could be
achieved if FPGA vendors invent a way to directly connect their parts
to the Pentium Pro external bus, as a peer of the memory/bus
controller. A custom, dedicated Pentium Pro interface would probably
be required, since FPGA configurable logic would be too slow and
electrically incompatible.
This could be a good volume business, and not quite the moving target
it might appear -- I expect the PPro external bus to be just as
ubiquitous and as long lived as have been the 486 and Pentium buses.
Someone could make a plug in card which sits in the PPro ZIF socket and
which hosts a PPro and its FPGA(s).
Alternately, the FPGA coprocessor could be attached on the new advanced
graphics memory port, or whatever it is to be called, that will be
available in future Intel memory/PCI controller chipsets.
One might argue that Xilinx made a big mistake in not offering a
version of the XC6200 with a dedicated 66 MHz Pentium external bus
interface -- after all it is by far the most popular and most supported
processor interface for the most lucrative general computing market.
If any vendor does pursue this idea, I would appreciate a couple of
sample parts. :-)
--
(*) 500 op in 300 ns: forthcoming 200 MHz PPro with MMX: 60 clocks x 1
8-byte MMX insn/clock
(**) 4000 op in 300 ns: hypothetical 400 MHz Alpha with each integer
unit enhanced for bytewise SIMD: 120 clocks x 4 8-byte insns/clock
(***) "win": much cheaper/faster than simply adding a second processor
--
Acknowledgements: this posting is a spin-off of a discussion with Mark
Shand, and the "way out speculative" question was suggested by Mike
Butts.
Jan Gray
Copyright © 2000, Gray Research LLC. All rights reserved.
Last updated: Feb 03 2001
|
[1]: http://fpgacpu.org/index.html
[2]: http://fpgacpu.org/re.html
[3]: http://fpgacpu.org/emulating_fpgas.html
[4]: http://fpgacpu.org/bydate.html
[5]: http://fpgacpu.org/why.html
[6]: http://fpgacpu.org/homebrew.html
[7]: http://fpgacpu.org/a-x-announce.html
[8]: http://fpgacpu.org/soft.html
[9]: http://fpgacpu.org/lcc.html
[10]: http://fpgacpu.org/32bits.html
[11]: http://fpgacpu.org/superscalar.html
[12]: http://fpgacpu.org/javaproc.html
[13]: http://fpgacpu.org/forth.html
[14]: http://fpgacpu.org/alto.html
[15]: http://fpgacpu.org/transputer.html
[16]: http://fpgacpu.org/speed.html
[17]: http://fpgacpu.org/synth_cpu.html
[18]: http://fpgacpu.org/regfile.html
[19]: http://fpgacpu.org/regfile2.html
[20]: http://fpgacpu.org/fp.html
[21]: http://fpgacpu.org/bb.html
[22]: http://fpgacpu.org/altera_cpus.html
[23]: http://fpgacpu.org/altera_cpus_dual_port_EABs.html
[24]: http://fpgacpu.org/multiprocessors.html
[25]: http://fpgacpu.org/multis_and_fast_fpgacpus.html
[26]: http://fpgacpu.org/innerloop.html
[27]: http://fpgacpu.org/array.html
[28]: http://fpgacpu.org/buses.html
[29]: http://fpgacpu.org/memory.html
[30]: http://fpgacpu.org/vga.html
[31]: http://fpgacpu.org/12mm2.html
[32]: http://fpgacpu.org/cnets.html
[33]: http://fpgacpu.org/cnets_datapath.html
[34]: http://fpgacpu.org/generators.html
[35]: http://fpgacpu.org/cpus_vs_fpgas.html
[36]: http://fpgacpu.org/life.html
[37]: http://fpgacpu.org/max.html
[38]: http://fpgacpu.org/floorplanning.html
[39]: http://fpgacpu.org/rope_pushing.html
[40]: http://fpgacpu.org/virtex_spec.html
[41]: http://fpgacpu.org/rambus.html
[42]: http://fpgacpu.org/render.html
[43]: http://fpgacpu.org/lfsrs.html
[44]: http://www.google.com