aboutsummaryrefslogtreecommitdiff
path: root/docs/architecture.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/architecture.md')
-rw-r--r--docs/architecture.md208
1 files changed, 208 insertions, 0 deletions
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..eb20cce
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,208 @@
+# General system architecture
+
+<!-- TODO: top-level system architecture + context diagram -->
+
+## PPU
+
+Here's a list of features our PPU has:
+
+- 320x240 @ 60Hz VGA output (upscaled to 640x480)
+- single tilemap with room for 1024 tiles of 16x16 pixels
+- 8 colors per palette, with 4096 possible colors (12-bit color depth)
+- 640x480 background canvas with scrolling
+- NO background scrolling splits
+- 128 total sprites on screen (NO scanline sprite limit)
+- sprites are always drawn on top of the background layer
+- PPU control using DMA (dual-port asynchronous RAM)
+- tiles can be flipped using FAM or BAM
+- no frame buffer
+- vertical and horizontal sync and blank output
+
+Notable differences:
+
+- NES nametable equivalent is called BAM (background attribute register)
+- NES OAM equivalent is called FAM (foreground attribute register)
+- 320x240 @ 60Hz output
+
+ Since we're using VGA, we can't use custom resolutions without an
+ upscaler/downscaler. This resolution was chosen because it's exactly half of
+ the lowest standard VGA resolution 640x480.
+- No scanline sprite limit
+
+ Unless not imposing any sprite limit makes the hardware implementation
+ impossible, or much more difficult, this is a restriction that will likely
+ lead to frustrating debugging sessions, so will not be replicated in our
+ custom PPU.
+- Sprites are 16x16
+
+ Most NES games already tile multiple 8x8 tiles together into "metatiles" to
+ create the illusion of larger sprites. This was likely done to save on memory
+ costs as RAM was expensive in the '80s, but since we're running on an FPGA
+ cost is irrelevant.
+- Single 1024 sprite tilemap shared between foreground and background sprites
+
+ The NES OAM registers contain a bit to select which tilemap to use (of two),
+ which effectively expands each tile's index address by one byte. Instead of
+ creating the illusion of two separate memory areas for tiles, having one
+ large tilemap seems like a more sensible solution to indexed tiles.
+- 8 total palettes, with 8 colors each
+
+ More colors is better. Increasing the total palette count is a very memory
+ intensive operation, while increaing the palette color count is likely slower
+ when looking up color values for each pixel on real hardware.
+- Sprites can be positioned paritally off-screen on all screen edges using only
+ the offset bits in the FAM register
+
+ The NES has a separate PPUMASK register to control special color effects, and
+ to shift sprites off the left and top screen edges, as the sprite offsets
+ count from 0. Our PPU's FAM sprite offset bits count from -15, so the sprite
+ can shift past the top and left screen edges, as well as the standard bottom
+ and right edges.
+- No status line register, only V-sync and H-sync outputs are supplied back to
+ CPU
+
+ The NES status line register contains some handy lines, such as a buggy
+ status line for reaching the max sprite count per scanline, and a status line
+ for detecting collisions between background and foreground sprites. Our PPU
+ doesn't have a scanline limit, and all hitbox detection is done in software.
+ Software hacks involving swapping tiles during a screen draw cycle can still
+ be achieved by counting the V-sync and H-sync pulses using interrupts.
+- No background scrolling splits
+
+ This feature allows only part of the background canvas to be scrolled, while
+ another portion stays still. This was used to draw HUD elements on the
+ background layer for displaying things like health bars or score counters.
+ Since we are working with a higher foreground sprite limit, we'll use regular
+ foreground sprites to display HUD elements.
+- Sprites are always drawn on top of the background layer
+
+ Our game doesn't need this capability for any visual effects. Leaving this
+ feature out will lead to a simpler hardware design
+
+### Hardware design schematics
+
+#### Top (level 1)
+
+![PPU top-level design](../assets/ppu-level-1.svg)
+
+Important notes:
+
+- The STM32 can reset the PPU. This line will also be connected to a physical
+ button on the FPGA.
+- The STM32 uses direct memory access to control the PPU.
+- The PPU's native resolution is 320x240. It works in this resolution as if it
+ is a valid VGA signal. The STM32 is also only aware of this resolution. This
+ resolution is referred to as "tiny" resolution. Because VGA-compatible LCD's
+ likely don't support this resolution due to low clock speed, a built-in
+ pixel-perfect 2X upscaler is chained after the PPU's "tiny" output. This
+ means that the display sees the resolution as 640x480, but the PPU and STM32
+ only work in 320x240.
+- The STM32 receives the TVSYNC and THSYNC lines from the PPU. These are the
+ VSYNC and HSYNC lines from the tiny VGA signal generator. These lines can be
+ used to trigger interrupts for counting frames, and to make sure no
+ read/write conflicts occur for protected memory regions in the PPU.
+- NVSYNC, NHSYNC and the RGB signals refer to the output of the native VGA
+ signal generator.
+
+#### Level 2
+
+![PPU level 2 design (data flows from top to bottom)](../assets/ppu-level-2.svg)
+
+Important notes:
+
+- The pixel fetch logic is pipelined in 5 stages:
+ 1. - (Foreground sprite info) calculate if foreground sprite exists at
+ current pixel using FAM register
+ - (Background sprite info) get background sprite info from BAM register
+ 2. - (Sprite render) calculate pixel to read from TMM based on sprite info
+ 3. - (Compositor) get pixel with 'highest' priority (pick first foreground
+ sprite with non-transparent color at current pixel in order, fallback to
+ background)
+ - (Palette lookup) lookup palette color using palette register
+ - (VGA signal generator) output real color to VGA signal generator
+- The pipeline stages with two clock cycles contain an address set and memory
+ read step.
+- The pipeline takes 5 clock ticks in total. About 18 are available during each
+ pixel. For optimal display compatibility, the output color signal should be
+ stable before 50% of the pixel clock pulse width (9 clock ticks).
+- Since the "sprite info" and "sprite render" steps are fundamentally different
+ for the foreground and background layer, these components will be combined
+ into one for each layer respectively. They are separated in the above diagram
+ for pipeline stage illustration.
+- The BAX, FAM, and PAL registers are implemented in the component that
+ directly accesses them, but are exposed to the PPU RAM bus for writing.
+- Each foreground sprite render component holds its own sprite data copy from
+ the RAM in it's own cache memory. The cache updates are fetched during the
+ VBLANK time between each frame.
+
+#### Level 3
+
+This diagram has several flaws, but a significant amount of time has already
+been spent on these, so they are highlighted here instead of being fixed.
+
+![PPU level 3 design](../assets/ppu-level-3.svg)
+
+Flaws:
+
+- Pipeline stages 1-4 aren't properly connected in this diagram, see level 2
+ notes for proper functionality
+- The global RESET input resets all PPU RAM, but isn't connected to all RAM
+ ports
+- All DATA inputs on the same line as an ADDR output are connections to a
+ memory component. Not all of these are connected in the diagram, though they
+ should be.
+- All ADDR and ADDR drivers are also tri-state. EN inputs need to be added to
+ support switching the output on/off.
+
+Important notes:
+
+- The background sprite and foreground sprite component internally share some
+ components for coordinate transformations
+- The foreground sprite component is only shown once here, but is cloned for
+ each foreground sprite the PPU allows.
+- The CIDX lines between the sprite and compositor components is shared by all
+ sprite components, and is such tri-state. A single sprite component outputs a
+ CIDX signal based on the \*EN signal from the compositor.
+- All DATA and ADDR lines are shared between all RAM ports. WEN inputs are
+ controlled by the address decoder.
+
+### Registers
+
+|Address|Size (bytes)|Alias|Description|
+|-|-|-|-|
+|`0x00000`|`0x00000`|TMM |[tilemap memory][TMM]|
+|`0x00000`|`0x00000`|BAM |[background attribute memory][BAM]|
+|`0x00000`|`0x00000`|FAM |[foreground attribute memory][FAM]|
+|`0x00000`|`0x00000`|PAL |[palettes][PAL]|
+|`0x00000`|`0x00000`|BAX |[background auxiliary memory][BAX]|
+
+[TMM]: #tilemap-memory
+#### Tilemap memory
+
+- TODO: list format
+
+[BAM]: #background-attribute-memory
+#### Background attribute memory
+
+- TODO: list format
+
+[FAM]: #foreground-attribute-memory
+#### Foreground attribute memory
+
+- TODO: list format
+
+[PAL]: #palettes
+#### Palettes
+
+- TODO: list format
+
+[BAX]: #background-auxiliary-memory
+#### Background auxiliary memory
+
+- background scrolling
+
+[nesppuspecs]: https://www.copetti.org/writings/consoles/nes/
+[nesppudocs]: https://www.nesdev.org/wiki/PPU_programmer_reference
+[nesppupinout]: https://www.nesdev.org/wiki/PPU_pinout
+[custompputimings]: https://docs.google.com/spreadsheets/d/1MU6K4c4PtMR_JXIpc3I0ZJdLZNnoFO7G2P3olCz6LSc
+