Support User Manuals

Intel SA-1110 Food Processor User Manual

Open as PDF

of 442

52 SA-1110 Developer’s Manual

Caches, Write Buffer, and Read Buffer

5.3.2.2 Writes to a Bufferable and Noncacheable Location (B=1,C=0)

If the write buffer is enabled and the processor performs a write to a bufferable but noncacheable

location and misses in the Dcaches, the data is placed in the write buffer and the CPU continues

execution. The write buffer performs the external write sometime later. Store multiples are not

merged in the write buffer when B = 1, C = 0.

5.3.2.3 Unbufferable and Noncacheable Writes (B=0, C=0)

If the write buffer is disabled or the CPU performs a write to an unbufferable area, the processor is

stalled until the write buffer empties and the write completes externally. This requires several

external clock cycles.

5.3.2.4 Writes to a Non-Bufferable and Cacheable Location (B=0, C=1)

When store multiples occur to a page that is cacheable but not buffereable (B=0,C=1), the write

data will be merged into the write buffer and burst writes will occur to memory.

5.3.3 Enabling the Write Buffer

To enable the write buffer, ensure that the MMU is enabled by setting bit 0 in the control register,

then enable the write buffer by setting bit 3 in the control register. The MMU and write buffer can

be enabled simultaneously with a single write to the control register.

5.3.3.1 Disabling the Write Buffer

To disable the write buffer, clear bit 3 in the control register. Any writes already in the write buffer

will complete normally, but a drain write buffer needs to be done to force all writes out to memory.

Note: The write buffer is used to hold dirty copy-back cached lines from the data cache. It must be

enabled along with the data cache.

5.4 Read Buffer (RB)

The SA-1110 contains a software-programmable read buffer that can increase the performance of

critical loop code by prefetching data. The RB enables the preallocation of read-only data into one

of four 32-byte buffers without stalling the pipe. For subsequent loads that hit in the RB, data is

sourced from the buffer instead of the Dcaches at a rate of 1 word per core clock (as long as the

load address hits in the TLB of the DMMU). Also, because the programmer specifies which entry

of the RB is used, critical data can be “locked” in to eliminate bus latency.

The RB is controlled using coprocessor 15, register 9, and provides the capability to allocate 1

word, a half-line (4 words), or a full line (8 words) into one of four entries of the RB. (See

Chapter 6, “Coprocessors” for a detailed RB coprocessor description.) Half-line loads are

automatically aligned onto half-block boundaries (the lower four address bits are ignored).

Full-line loads are automatically aligned onto line boundaries (the lower five address bits are

ignored). For partial cache line RB loads, only the words actually fetched are marked valid and can

be sourced from the buffer. A small queue is used to ensure that subsequent RB load instructions go

out in order.

previous next