242 lines
8.3 KiB
Markdown
242 lines
8.3 KiB
Markdown
---
|
||
created_at: '2015-11-16T08:46:47.000Z'
|
||
title: On FPGAs as PC Coprocessors (1996)
|
||
url: http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html
|
||
author: luu
|
||
points: 73
|
||
story_text:
|
||
comment_text:
|
||
num_comments: 32
|
||
story_id:
|
||
story_title:
|
||
story_url:
|
||
parent_id:
|
||
created_at_i: 1447663607
|
||
_tags:
|
||
- story
|
||
- author_luu
|
||
- story_10573313
|
||
objectID: '10573313'
|
||
|
||
---
|
||
[Source](http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html "Permalink to fpgacpu.org - On FPGAs as PC Coprocessors")
|
||
|
||
# fpgacpu.org - On FPGAs as PC Coprocessors
|
||
|
||
|
||
| ----- |
|
||
| | | | **_On FPGAs as PC Coprocessors_**
|
||
|
||
|
|
||
| [**Home][1]**
|
||
|
||
[_Regexps in FPGAs][2] **>>**
|
||
**<<** [Emulating FPGAs][3]_
|
||
|
||
**Usenet Postings**
|
||
[By Subject][1]
|
||
[By Date][4]**
|
||
|
||
**FPGA CPUs**
|
||
[Why FPGA CPUs?][5]
|
||
[Homebuilt processors][6]
|
||
[Altera, Xilinx Announce][7]
|
||
[Soft cores][8]
|
||
[Porting lcc][9]
|
||
[32-bit RISC CPU][10]
|
||
[Superscalar FPGA CPUs][11]
|
||
[Java processors][12]
|
||
[Forth processors][13]
|
||
[Reimplementing Alto][14]
|
||
[Transputers][15]
|
||
[FPGA CPU Speeds][16]
|
||
[Synthesized CPUs][17]
|
||
[Register files][18]
|
||
[Register files (2)][19]
|
||
[Floating point][20]
|
||
[Using block RAM][21]
|
||
[Flex10K CPUs][22]
|
||
[Flex10KE CPUs][23]
|
||
|
||
[**Multiprocessors][24]**
|
||
[Multis and fast unis][25]
|
||
[Inner loop datapaths][26]
|
||
[Supercomputers][27]
|
||
|
||
**Systems-on-a-Chip**
|
||
[SoC On-Chip Buses][28]
|
||
[On-chip Memory][29]
|
||
[VGA controller][30]
|
||
[Small footprints][31]
|
||
|
||
[**CNets][32]**
|
||
[CNets and Datapaths][33]
|
||
[Generators vs. synthesis][34]
|
||
|
||
**FPGAs vs. Processors**
|
||
[CPUs vs. FPGAs][35]
|
||
[Emulating FPGAs][3]
|
||
_**FPGAs as coprocessors**_
|
||
[Regexps in FPGAs][2]
|
||
[Life in an FPGA][36]
|
||
[Maximum element][37]
|
||
|
||
**Miscellaneous**
|
||
[Floorplanning][38]
|
||
[Pushing on a rope][39]
|
||
[Virtex speculation][40]
|
||
[Rambus for FPGAs][41]
|
||
[3-D rendering][42]
|
||
[LFSR Design][43]
|
||
|
||
|
||
| [Google][44] SiteSearch
|
||
| | |
|
||
|
||
| | |
|
||
|
||
|
||
|
||
Newsgroups: comp.arch.fpga
|
||
Subject: On FPGAs as PC coprocessors
|
||
Date: 6 May 1996 22:12:11 GMT
|
||
|
||
One of the "way out speculation" questions asked at FCCM 96 (IEEE
|
||
Symposium on FPGAs for Custom Computing Machines) was "when, if ever,
|
||
will an FPGA coprocessor ship on every PC motherboard?"
|
||
|
||
Ignoring the daunting language and interface standards issues, and just
|
||
looking at current hardware approaches to FPGA "coprocessors", the
|
||
answer must be "not any time soon".
|
||
|
||
Consider that today's power user's CPU (such as a 200 MHz Pentium Pro
|
||
or a 400 MHz Alpha 21164A) *running out of L1 cache* issues peak three
|
||
or four instructions per 2.5-5 ns clock. Also consider that the
|
||
current "low latency high bandwidth" approach to FPGA coprocessor
|
||
integration is to hang the FPGA on the PCI bus.
|
||
|
||
In this scenario, the kinds of quasi-general purpose computing problems
|
||
that an FPGA coprocessor can usefully assist with are quite limited.
|
||
Issuing a write and then a read back operation to the FPGA could easily
|
||
take 10 PCI bus cycles (300 ns), assuming no PCI bus contention. In
|
||
that time (assuming hand crafted code (less effort than a FCCM)) a
|
||
Pentium Pro could issue as many as 180 instructions; the Alpha, 480
|
||
64-bit instructions. Future versions of such processors will soon be
|
||
doing VIS- or MMX-like limited bytewise 8+-way SIMD parallelism, and
|
||
eventually superscalar versions of same. Such designs might issue
|
||
between 500 (*) and 4000 (**) hand coded packed byte operations while
|
||
that single 300 ns FPGA write/read is still in progress.
|
||
|
||
|
||
((At peak speeds like 400e6 clock/s * 4 issue/clock * 8 byte ops/issue,
|
||
e.g. 12e9 byte ops/s, you have to agree that superscalar micros with
|
||
bytewise SIMD are going to displace many FPGA applications on PC and
|
||
workstation platforms.))
|
||
|
||
|
||
So as long as FPGAs are attached on relatively glacially slow I/O buses
|
||
-- including 32-bit 33 MHz PCI -- it seems unlikely they will be of
|
||
much use in general purpose PC processor acceleration. Sure, for
|
||
applications such as cryptography, image and signal processing, they
|
||
might be a win (***), given a semi-autonomous problem which either fits
|
||
in the FPGA and local storage, or which can employ DMA to stream data
|
||
into or through the FPGA without much CPU intervention or management.
|
||
|
||
Of course, the PCI ASIC crowd has the same latency problems, but they
|
||
dont share FCCM aspirations of accelerating general purpose computing,
|
||
rather they focus on the same aforementioned special purpose
|
||
applications.
|
||
|
||
|
||
Five times better latency and four times better bandwidth could be
|
||
achieved if FPGA vendors invent a way to directly connect their parts
|
||
to the Pentium Pro external bus, as a peer of the memory/bus
|
||
controller. A custom, dedicated Pentium Pro interface would probably
|
||
be required, since FPGA configurable logic would be too slow and
|
||
electrically incompatible.
|
||
|
||
This could be a good volume business, and not quite the moving target
|
||
it might appear -- I expect the PPro external bus to be just as
|
||
ubiquitous and as long lived as have been the 486 and Pentium buses.
|
||
Someone could make a plug in card which sits in the PPro ZIF socket and
|
||
which hosts a PPro and its FPGA(s).
|
||
|
||
Alternately, the FPGA coprocessor could be attached on the new advanced
|
||
graphics memory port, or whatever it is to be called, that will be
|
||
available in future Intel memory/PCI controller chipsets.
|
||
|
||
One might argue that Xilinx made a big mistake in not offering a
|
||
version of the XC6200 with a dedicated 66 MHz Pentium external bus
|
||
interface -- after all it is by far the most popular and most supported
|
||
processor interface for the most lucrative general computing market.
|
||
|
||
If any vendor does pursue this idea, I would appreciate a couple of
|
||
sample parts. :-)
|
||
|
||
--
|
||
(*) 500 op in 300 ns: forthcoming 200 MHz PPro with MMX: 60 clocks x 1
|
||
8-byte MMX insn/clock
|
||
|
||
(**) 4000 op in 300 ns: hypothetical 400 MHz Alpha with each integer
|
||
unit enhanced for bytewise SIMD: 120 clocks x 4 8-byte insns/clock
|
||
|
||
(***) "win": much cheaper/faster than simply adding a second processor
|
||
--
|
||
|
||
Acknowledgements: this posting is a spin-off of a discussion with Mark
|
||
Shand, and the "way out speculative" question was suggested by Mike
|
||
Butts.
|
||
|
||
Jan Gray
|
||
|
||
|
||
Copyright © 2000, Gray Research LLC. All rights reserved.
|
||
Last updated: Feb 03 2001
|
||
|
|
||
|
||
[1]: http://fpgacpu.org/index.html
|
||
[2]: http://fpgacpu.org/re.html
|
||
[3]: http://fpgacpu.org/emulating_fpgas.html
|
||
[4]: http://fpgacpu.org/bydate.html
|
||
[5]: http://fpgacpu.org/why.html
|
||
[6]: http://fpgacpu.org/homebrew.html
|
||
[7]: http://fpgacpu.org/a-x-announce.html
|
||
[8]: http://fpgacpu.org/soft.html
|
||
[9]: http://fpgacpu.org/lcc.html
|
||
[10]: http://fpgacpu.org/32bits.html
|
||
[11]: http://fpgacpu.org/superscalar.html
|
||
[12]: http://fpgacpu.org/javaproc.html
|
||
[13]: http://fpgacpu.org/forth.html
|
||
[14]: http://fpgacpu.org/alto.html
|
||
[15]: http://fpgacpu.org/transputer.html
|
||
[16]: http://fpgacpu.org/speed.html
|
||
[17]: http://fpgacpu.org/synth_cpu.html
|
||
[18]: http://fpgacpu.org/regfile.html
|
||
[19]: http://fpgacpu.org/regfile2.html
|
||
[20]: http://fpgacpu.org/fp.html
|
||
[21]: http://fpgacpu.org/bb.html
|
||
[22]: http://fpgacpu.org/altera_cpus.html
|
||
[23]: http://fpgacpu.org/altera_cpus_dual_port_EABs.html
|
||
[24]: http://fpgacpu.org/multiprocessors.html
|
||
[25]: http://fpgacpu.org/multis_and_fast_fpgacpus.html
|
||
[26]: http://fpgacpu.org/innerloop.html
|
||
[27]: http://fpgacpu.org/array.html
|
||
[28]: http://fpgacpu.org/buses.html
|
||
[29]: http://fpgacpu.org/memory.html
|
||
[30]: http://fpgacpu.org/vga.html
|
||
[31]: http://fpgacpu.org/12mm2.html
|
||
[32]: http://fpgacpu.org/cnets.html
|
||
[33]: http://fpgacpu.org/cnets_datapath.html
|
||
[34]: http://fpgacpu.org/generators.html
|
||
[35]: http://fpgacpu.org/cpus_vs_fpgas.html
|
||
[36]: http://fpgacpu.org/life.html
|
||
[37]: http://fpgacpu.org/max.html
|
||
[38]: http://fpgacpu.org/floorplanning.html
|
||
[39]: http://fpgacpu.org/rope_pushing.html
|
||
[40]: http://fpgacpu.org/virtex_spec.html
|
||
[41]: http://fpgacpu.org/rambus.html
|
||
[42]: http://fpgacpu.org/render.html
|
||
[43]: http://fpgacpu.org/lfsrs.html
|
||
[44]: http://www.google.com
|
||
|