2018-02-23 18:58:03 +00:00
|
|
|
|
---
|
|
|
|
|
created_at: '2015-11-16T08:46:47.000Z'
|
|
|
|
|
title: On FPGAs as PC Coprocessors (1996)
|
|
|
|
|
url: http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html
|
|
|
|
|
author: luu
|
|
|
|
|
points: 73
|
|
|
|
|
story_text:
|
|
|
|
|
comment_text:
|
|
|
|
|
num_comments: 32
|
|
|
|
|
story_id:
|
|
|
|
|
story_title:
|
|
|
|
|
story_url:
|
|
|
|
|
parent_id:
|
|
|
|
|
created_at_i: 1447663607
|
|
|
|
|
_tags:
|
|
|
|
|
- story
|
|
|
|
|
- author_luu
|
|
|
|
|
- story_10573313
|
|
|
|
|
objectID: '10573313'
|
|
|
|
|
|
|
|
|
|
---
|
2018-02-23 18:19:40 +00:00
|
|
|
|
[Source](http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html "Permalink to fpgacpu.org - On FPGAs as PC Coprocessors")
|
|
|
|
|
|
|
|
|
|
# fpgacpu.org - On FPGAs as PC Coprocessors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ----- |
|
|
|
|
|
| | | | **_On FPGAs as PC Coprocessors_**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| [**Home][1]**
|
|
|
|
|
|
|
|
|
|
[_Regexps in FPGAs][2] **>>**
|
|
|
|
|
**<<** [Emulating FPGAs][3]_
|
|
|
|
|
|
|
|
|
|
**Usenet Postings**
|
|
|
|
|
[By Subject][1]
|
|
|
|
|
[By Date][4]**
|
|
|
|
|
|
|
|
|
|
**FPGA CPUs**
|
|
|
|
|
[Why FPGA CPUs?][5]
|
|
|
|
|
[Homebuilt processors][6]
|
|
|
|
|
[Altera, Xilinx Announce][7]
|
|
|
|
|
[Soft cores][8]
|
|
|
|
|
[Porting lcc][9]
|
|
|
|
|
[32-bit RISC CPU][10]
|
|
|
|
|
[Superscalar FPGA CPUs][11]
|
|
|
|
|
[Java processors][12]
|
|
|
|
|
[Forth processors][13]
|
|
|
|
|
[Reimplementing Alto][14]
|
|
|
|
|
[Transputers][15]
|
|
|
|
|
[FPGA CPU Speeds][16]
|
|
|
|
|
[Synthesized CPUs][17]
|
|
|
|
|
[Register files][18]
|
|
|
|
|
[Register files (2)][19]
|
|
|
|
|
[Floating point][20]
|
|
|
|
|
[Using block RAM][21]
|
|
|
|
|
[Flex10K CPUs][22]
|
|
|
|
|
[Flex10KE CPUs][23]
|
|
|
|
|
|
|
|
|
|
[**Multiprocessors][24]**
|
|
|
|
|
[Multis and fast unis][25]
|
|
|
|
|
[Inner loop datapaths][26]
|
|
|
|
|
[Supercomputers][27]
|
|
|
|
|
|
|
|
|
|
**Systems-on-a-Chip**
|
|
|
|
|
[SoC On-Chip Buses][28]
|
|
|
|
|
[On-chip Memory][29]
|
|
|
|
|
[VGA controller][30]
|
|
|
|
|
[Small footprints][31]
|
|
|
|
|
|
|
|
|
|
[**CNets][32]**
|
|
|
|
|
[CNets and Datapaths][33]
|
|
|
|
|
[Generators vs. synthesis][34]
|
|
|
|
|
|
|
|
|
|
**FPGAs vs. Processors**
|
|
|
|
|
[CPUs vs. FPGAs][35]
|
|
|
|
|
[Emulating FPGAs][3]
|
|
|
|
|
_**FPGAs as coprocessors**_
|
|
|
|
|
[Regexps in FPGAs][2]
|
|
|
|
|
[Life in an FPGA][36]
|
|
|
|
|
[Maximum element][37]
|
|
|
|
|
|
|
|
|
|
**Miscellaneous**
|
|
|
|
|
[Floorplanning][38]
|
|
|
|
|
[Pushing on a rope][39]
|
|
|
|
|
[Virtex speculation][40]
|
|
|
|
|
[Rambus for FPGAs][41]
|
|
|
|
|
[3-D rendering][42]
|
|
|
|
|
[LFSR Design][43]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| [Google][44] SiteSearch
|
|
|
|
|
| | |
|
|
|
|
|
|
|
|
|
|
| | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Newsgroups: comp.arch.fpga
|
|
|
|
|
Subject: On FPGAs as PC coprocessors
|
|
|
|
|
Date: 6 May 1996 22:12:11 GMT
|
|
|
|
|
|
|
|
|
|
One of the "way out speculation" questions asked at FCCM 96 (IEEE
|
|
|
|
|
Symposium on FPGAs for Custom Computing Machines) was "when, if ever,
|
|
|
|
|
will an FPGA coprocessor ship on every PC motherboard?"
|
|
|
|
|
|
|
|
|
|
Ignoring the daunting language and interface standards issues, and just
|
|
|
|
|
looking at current hardware approaches to FPGA "coprocessors", the
|
|
|
|
|
answer must be "not any time soon".
|
|
|
|
|
|
|
|
|
|
Consider that today's power user's CPU (such as a 200 MHz Pentium Pro
|
|
|
|
|
or a 400 MHz Alpha 21164A) *running out of L1 cache* issues peak three
|
|
|
|
|
or four instructions per 2.5-5 ns clock. Also consider that the
|
|
|
|
|
current "low latency high bandwidth" approach to FPGA coprocessor
|
|
|
|
|
integration is to hang the FPGA on the PCI bus.
|
|
|
|
|
|
|
|
|
|
In this scenario, the kinds of quasi-general purpose computing problems
|
|
|
|
|
that an FPGA coprocessor can usefully assist with are quite limited.
|
|
|
|
|
Issuing a write and then a read back operation to the FPGA could easily
|
|
|
|
|
take 10 PCI bus cycles (300 ns), assuming no PCI bus contention. In
|
|
|
|
|
that time (assuming hand crafted code (less effort than a FCCM)) a
|
|
|
|
|
Pentium Pro could issue as many as 180 instructions; the Alpha, 480
|
|
|
|
|
64-bit instructions. Future versions of such processors will soon be
|
|
|
|
|
doing VIS- or MMX-like limited bytewise 8+-way SIMD parallelism, and
|
|
|
|
|
eventually superscalar versions of same. Such designs might issue
|
|
|
|
|
between 500 (*) and 4000 (**) hand coded packed byte operations while
|
|
|
|
|
that single 300 ns FPGA write/read is still in progress.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
((At peak speeds like 400e6 clock/s * 4 issue/clock * 8 byte ops/issue,
|
|
|
|
|
e.g. 12e9 byte ops/s, you have to agree that superscalar micros with
|
|
|
|
|
bytewise SIMD are going to displace many FPGA applications on PC and
|
|
|
|
|
workstation platforms.))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
So as long as FPGAs are attached on relatively glacially slow I/O buses
|
|
|
|
|
-- including 32-bit 33 MHz PCI -- it seems unlikely they will be of
|
|
|
|
|
much use in general purpose PC processor acceleration. Sure, for
|
|
|
|
|
applications such as cryptography, image and signal processing, they
|
|
|
|
|
might be a win (***), given a semi-autonomous problem which either fits
|
|
|
|
|
in the FPGA and local storage, or which can employ DMA to stream data
|
|
|
|
|
into or through the FPGA without much CPU intervention or management.
|
|
|
|
|
|
|
|
|
|
Of course, the PCI ASIC crowd has the same latency problems, but they
|
|
|
|
|
dont share FCCM aspirations of accelerating general purpose computing,
|
|
|
|
|
rather they focus on the same aforementioned special purpose
|
|
|
|
|
applications.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Five times better latency and four times better bandwidth could be
|
|
|
|
|
achieved if FPGA vendors invent a way to directly connect their parts
|
|
|
|
|
to the Pentium Pro external bus, as a peer of the memory/bus
|
|
|
|
|
controller. A custom, dedicated Pentium Pro interface would probably
|
|
|
|
|
be required, since FPGA configurable logic would be too slow and
|
|
|
|
|
electrically incompatible.
|
|
|
|
|
|
|
|
|
|
This could be a good volume business, and not quite the moving target
|
|
|
|
|
it might appear -- I expect the PPro external bus to be just as
|
|
|
|
|
ubiquitous and as long lived as have been the 486 and Pentium buses.
|
|
|
|
|
Someone could make a plug in card which sits in the PPro ZIF socket and
|
|
|
|
|
which hosts a PPro and its FPGA(s).
|
|
|
|
|
|
|
|
|
|
Alternately, the FPGA coprocessor could be attached on the new advanced
|
|
|
|
|
graphics memory port, or whatever it is to be called, that will be
|
|
|
|
|
available in future Intel memory/PCI controller chipsets.
|
|
|
|
|
|
|
|
|
|
One might argue that Xilinx made a big mistake in not offering a
|
|
|
|
|
version of the XC6200 with a dedicated 66 MHz Pentium external bus
|
|
|
|
|
interface -- after all it is by far the most popular and most supported
|
|
|
|
|
processor interface for the most lucrative general computing market.
|
|
|
|
|
|
|
|
|
|
If any vendor does pursue this idea, I would appreciate a couple of
|
|
|
|
|
sample parts. :-)
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
(*) 500 op in 300 ns: forthcoming 200 MHz PPro with MMX: 60 clocks x 1
|
|
|
|
|
8-byte MMX insn/clock
|
|
|
|
|
|
|
|
|
|
(**) 4000 op in 300 ns: hypothetical 400 MHz Alpha with each integer
|
|
|
|
|
unit enhanced for bytewise SIMD: 120 clocks x 4 8-byte insns/clock
|
|
|
|
|
|
|
|
|
|
(***) "win": much cheaper/faster than simply adding a second processor
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
Acknowledgements: this posting is a spin-off of a discussion with Mark
|
|
|
|
|
Shand, and the "way out speculative" question was suggested by Mike
|
|
|
|
|
Butts.
|
|
|
|
|
|
|
|
|
|
Jan Gray
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright © 2000, Gray Research LLC. All rights reserved.
|
|
|
|
|
Last updated: Feb 03 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[1]: http://fpgacpu.org/index.html
|
|
|
|
|
[2]: http://fpgacpu.org/re.html
|
|
|
|
|
[3]: http://fpgacpu.org/emulating_fpgas.html
|
|
|
|
|
[4]: http://fpgacpu.org/bydate.html
|
|
|
|
|
[5]: http://fpgacpu.org/why.html
|
|
|
|
|
[6]: http://fpgacpu.org/homebrew.html
|
|
|
|
|
[7]: http://fpgacpu.org/a-x-announce.html
|
|
|
|
|
[8]: http://fpgacpu.org/soft.html
|
|
|
|
|
[9]: http://fpgacpu.org/lcc.html
|
|
|
|
|
[10]: http://fpgacpu.org/32bits.html
|
|
|
|
|
[11]: http://fpgacpu.org/superscalar.html
|
|
|
|
|
[12]: http://fpgacpu.org/javaproc.html
|
|
|
|
|
[13]: http://fpgacpu.org/forth.html
|
|
|
|
|
[14]: http://fpgacpu.org/alto.html
|
|
|
|
|
[15]: http://fpgacpu.org/transputer.html
|
|
|
|
|
[16]: http://fpgacpu.org/speed.html
|
|
|
|
|
[17]: http://fpgacpu.org/synth_cpu.html
|
|
|
|
|
[18]: http://fpgacpu.org/regfile.html
|
|
|
|
|
[19]: http://fpgacpu.org/regfile2.html
|
|
|
|
|
[20]: http://fpgacpu.org/fp.html
|
|
|
|
|
[21]: http://fpgacpu.org/bb.html
|
|
|
|
|
[22]: http://fpgacpu.org/altera_cpus.html
|
|
|
|
|
[23]: http://fpgacpu.org/altera_cpus_dual_port_EABs.html
|
|
|
|
|
[24]: http://fpgacpu.org/multiprocessors.html
|
|
|
|
|
[25]: http://fpgacpu.org/multis_and_fast_fpgacpus.html
|
|
|
|
|
[26]: http://fpgacpu.org/innerloop.html
|
|
|
|
|
[27]: http://fpgacpu.org/array.html
|
|
|
|
|
[28]: http://fpgacpu.org/buses.html
|
|
|
|
|
[29]: http://fpgacpu.org/memory.html
|
|
|
|
|
[30]: http://fpgacpu.org/vga.html
|
|
|
|
|
[31]: http://fpgacpu.org/12mm2.html
|
|
|
|
|
[32]: http://fpgacpu.org/cnets.html
|
|
|
|
|
[33]: http://fpgacpu.org/cnets_datapath.html
|
|
|
|
|
[34]: http://fpgacpu.org/generators.html
|
|
|
|
|
[35]: http://fpgacpu.org/cpus_vs_fpgas.html
|
|
|
|
|
[36]: http://fpgacpu.org/life.html
|
|
|
|
|
[37]: http://fpgacpu.org/max.html
|
|
|
|
|
[38]: http://fpgacpu.org/floorplanning.html
|
|
|
|
|
[39]: http://fpgacpu.org/rope_pushing.html
|
|
|
|
|
[40]: http://fpgacpu.org/virtex_spec.html
|
|
|
|
|
[41]: http://fpgacpu.org/rambus.html
|
|
|
|
|
[42]: http://fpgacpu.org/render.html
|
|
|
|
|
[43]: http://fpgacpu.org/lfsrs.html
|
|
|
|
|
[44]: http://www.google.com
|
|
|
|
|
|