Intel SA-1110 Food Processor User Manual


 
52 SA-1110 Developers Manual
Caches, Write Buffer, and Read Buffer
5.3.2.2 Writes to a Bufferable and Noncacheable Location (B=1,C=0)
If the write buffer is enabled and the processor performs a write to a bufferable but noncacheable
location and misses in the Dcaches, the data is placed in the write buffer and the CPU continues
execution. The write buffer performs the external write sometime later. Store multiples are not
merged in the write buffer when B = 1, C = 0.
5.3.2.3 Unbufferable and Noncacheable Writes (B=0, C=0)
If the write buffer is disabled or the CPU performs a write to an unbufferable area, the processor is
stalled until the write buffer empties and the write completes externally. This requires several
external clock cycles.
5.3.2.4 Writes to a Non-Bufferable and Cacheable Location (B=0, C=1)
When store multiples occur to a page that is cacheable but not buffereable (B=0,C=1), the write
data will be merged into the write buffer and burst writes will occur to memory.
5.3.3 Enabling the Write Buffer
To enable the write buffer, ensure that the MMU is enabled by setting bit 0 in the control register,
then enable the write buffer by setting bit 3 in the control register. The MMU and write buffer can
be enabled simultaneously with a single write to the control register.
5.3.3.1 Disabling the Write Buffer
To disable the write buffer, clear bit 3 in the control register. Any writes already in the write buffer
will complete normally, but a drain write buffer needs to be done to force all writes out to memory.
Note: The write buffer is used to hold dirty copy-back cached lines from the data cache. It must be
enabled along with the data cache.
5.4 Read Buffer (RB)
The SA-1110 contains a software-programmable read buffer that can increase the performance of
critical loop code by prefetching data. The RB enables the preallocation of read-only data into one
of four 32-byte buffers without stalling the pipe. For subsequent loads that hit in the RB, data is
sourced from the buffer instead of the Dcaches at a rate of 1 word per core clock (as long as the
load address hits in the TLB of the DMMU). Also, because the programmer specifies which entry
of the RB is used, critical data can be “locked” in to eliminate bus latency.
The RB is controlled using coprocessor 15, register 9, and provides the capability to allocate 1
word, a half-line (4 words), or a full line (8 words) into one of four entries of the RB. (See
Chapter 6, “Coprocessors” for a detailed RB coprocessor description.) Half-line loads are
automatically aligned onto half-block boundaries (the lower four address bits are ignored).
Full-line loads are automatically aligned onto line boundaries (the lower five address bits are
ignored). For partial cache line RB loads, only the words actually fetched are marked valid and can
be sourced from the buffer. A small queue is used to ensure that subsequent RB load instructions go
out in order.