diff options
Diffstat (limited to 'docs/architecture.md')
-rw-r--r-- | docs/architecture.md | 208 |
1 files changed, 208 insertions, 0 deletions
diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..eb20cce --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,208 @@ +# General system architecture + +<!-- TODO: top-level system architecture + context diagram --> + +## PPU + +Here's a list of features our PPU has: + +- 320x240 @ 60Hz VGA output (upscaled to 640x480) +- single tilemap with room for 1024 tiles of 16x16 pixels +- 8 colors per palette, with 4096 possible colors (12-bit color depth) +- 640x480 background canvas with scrolling +- NO background scrolling splits +- 128 total sprites on screen (NO scanline sprite limit) +- sprites are always drawn on top of the background layer +- PPU control using DMA (dual-port asynchronous RAM) +- tiles can be flipped using FAM or BAM +- no frame buffer +- vertical and horizontal sync and blank output + +Notable differences: + +- NES nametable equivalent is called BAM (background attribute register) +- NES OAM equivalent is called FAM (foreground attribute register) +- 320x240 @ 60Hz output + + Since we're using VGA, we can't use custom resolutions without an + upscaler/downscaler. This resolution was chosen because it's exactly half of + the lowest standard VGA resolution 640x480. +- No scanline sprite limit + + Unless not imposing any sprite limit makes the hardware implementation + impossible, or much more difficult, this is a restriction that will likely + lead to frustrating debugging sessions, so will not be replicated in our + custom PPU. +- Sprites are 16x16 + + Most NES games already tile multiple 8x8 tiles together into "metatiles" to + create the illusion of larger sprites. This was likely done to save on memory + costs as RAM was expensive in the '80s, but since we're running on an FPGA + cost is irrelevant. +- Single 1024 sprite tilemap shared between foreground and background sprites + + The NES OAM registers contain a bit to select which tilemap to use (of two), + which effectively expands each tile's index address by one byte. Instead of + creating the illusion of two separate memory areas for tiles, having one + large tilemap seems like a more sensible solution to indexed tiles. +- 8 total palettes, with 8 colors each + + More colors is better. Increasing the total palette count is a very memory + intensive operation, while increaing the palette color count is likely slower + when looking up color values for each pixel on real hardware. +- Sprites can be positioned paritally off-screen on all screen edges using only + the offset bits in the FAM register + + The NES has a separate PPUMASK register to control special color effects, and + to shift sprites off the left and top screen edges, as the sprite offsets + count from 0. Our PPU's FAM sprite offset bits count from -15, so the sprite + can shift past the top and left screen edges, as well as the standard bottom + and right edges. +- No status line register, only V-sync and H-sync outputs are supplied back to + CPU + + The NES status line register contains some handy lines, such as a buggy + status line for reaching the max sprite count per scanline, and a status line + for detecting collisions between background and foreground sprites. Our PPU + doesn't have a scanline limit, and all hitbox detection is done in software. + Software hacks involving swapping tiles during a screen draw cycle can still + be achieved by counting the V-sync and H-sync pulses using interrupts. +- No background scrolling splits + + This feature allows only part of the background canvas to be scrolled, while + another portion stays still. This was used to draw HUD elements on the + background layer for displaying things like health bars or score counters. + Since we are working with a higher foreground sprite limit, we'll use regular + foreground sprites to display HUD elements. +- Sprites are always drawn on top of the background layer + + Our game doesn't need this capability for any visual effects. Leaving this + feature out will lead to a simpler hardware design + +### Hardware design schematics + +#### Top (level 1) + +![PPU top-level design](../assets/ppu-level-1.svg) + +Important notes: + +- The STM32 can reset the PPU. This line will also be connected to a physical + button on the FPGA. +- The STM32 uses direct memory access to control the PPU. +- The PPU's native resolution is 320x240. It works in this resolution as if it + is a valid VGA signal. The STM32 is also only aware of this resolution. This + resolution is referred to as "tiny" resolution. Because VGA-compatible LCD's + likely don't support this resolution due to low clock speed, a built-in + pixel-perfect 2X upscaler is chained after the PPU's "tiny" output. This + means that the display sees the resolution as 640x480, but the PPU and STM32 + only work in 320x240. +- The STM32 receives the TVSYNC and THSYNC lines from the PPU. These are the + VSYNC and HSYNC lines from the tiny VGA signal generator. These lines can be + used to trigger interrupts for counting frames, and to make sure no + read/write conflicts occur for protected memory regions in the PPU. +- NVSYNC, NHSYNC and the RGB signals refer to the output of the native VGA + signal generator. + +#### Level 2 + +![PPU level 2 design (data flows from top to bottom)](../assets/ppu-level-2.svg) + +Important notes: + +- The pixel fetch logic is pipelined in 5 stages: + 1. - (Foreground sprite info) calculate if foreground sprite exists at + current pixel using FAM register + - (Background sprite info) get background sprite info from BAM register + 2. - (Sprite render) calculate pixel to read from TMM based on sprite info + 3. - (Compositor) get pixel with 'highest' priority (pick first foreground + sprite with non-transparent color at current pixel in order, fallback to + background) + - (Palette lookup) lookup palette color using palette register + - (VGA signal generator) output real color to VGA signal generator +- The pipeline stages with two clock cycles contain an address set and memory + read step. +- The pipeline takes 5 clock ticks in total. About 18 are available during each + pixel. For optimal display compatibility, the output color signal should be + stable before 50% of the pixel clock pulse width (9 clock ticks). +- Since the "sprite info" and "sprite render" steps are fundamentally different + for the foreground and background layer, these components will be combined + into one for each layer respectively. They are separated in the above diagram + for pipeline stage illustration. +- The BAX, FAM, and PAL registers are implemented in the component that + directly accesses them, but are exposed to the PPU RAM bus for writing. +- Each foreground sprite render component holds its own sprite data copy from + the RAM in it's own cache memory. The cache updates are fetched during the + VBLANK time between each frame. + +#### Level 3 + +This diagram has several flaws, but a significant amount of time has already +been spent on these, so they are highlighted here instead of being fixed. + +![PPU level 3 design](../assets/ppu-level-3.svg) + +Flaws: + +- Pipeline stages 1-4 aren't properly connected in this diagram, see level 2 + notes for proper functionality +- The global RESET input resets all PPU RAM, but isn't connected to all RAM + ports +- All DATA inputs on the same line as an ADDR output are connections to a + memory component. Not all of these are connected in the diagram, though they + should be. +- All ADDR and ADDR drivers are also tri-state. EN inputs need to be added to + support switching the output on/off. + +Important notes: + +- The background sprite and foreground sprite component internally share some + components for coordinate transformations +- The foreground sprite component is only shown once here, but is cloned for + each foreground sprite the PPU allows. +- The CIDX lines between the sprite and compositor components is shared by all + sprite components, and is such tri-state. A single sprite component outputs a + CIDX signal based on the \*EN signal from the compositor. +- All DATA and ADDR lines are shared between all RAM ports. WEN inputs are + controlled by the address decoder. + +### Registers + +|Address|Size (bytes)|Alias|Description| +|-|-|-|-| +|`0x00000`|`0x00000`|TMM |[tilemap memory][TMM]| +|`0x00000`|`0x00000`|BAM |[background attribute memory][BAM]| +|`0x00000`|`0x00000`|FAM |[foreground attribute memory][FAM]| +|`0x00000`|`0x00000`|PAL |[palettes][PAL]| +|`0x00000`|`0x00000`|BAX |[background auxiliary memory][BAX]| + +[TMM]: #tilemap-memory +#### Tilemap memory + +- TODO: list format + +[BAM]: #background-attribute-memory +#### Background attribute memory + +- TODO: list format + +[FAM]: #foreground-attribute-memory +#### Foreground attribute memory + +- TODO: list format + +[PAL]: #palettes +#### Palettes + +- TODO: list format + +[BAX]: #background-auxiliary-memory +#### Background auxiliary memory + +- background scrolling + +[nesppuspecs]: https://www.copetti.org/writings/consoles/nes/ +[nesppudocs]: https://www.nesdev.org/wiki/PPU_programmer_reference +[nesppupinout]: https://www.nesdev.org/wiki/PPU_pinout +[custompputimings]: https://docs.google.com/spreadsheets/d/1MU6K4c4PtMR_JXIpc3I0ZJdLZNnoFO7G2P3olCz6LSc + |