Documentation/Intel: Add NativeRaminit documentation
Add documentation for Intel native raminit on Intel SandyBridge. Documented so far: * Register * Read training * Frequency selection * SMBIOS type 17 memory reporting * Various Kconfig options and features Change-Id: I3b977460ecb29c9a54e3fab82349982fca9918e7 Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-on: https://review.coreboot.org/22275 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Stefan Reinauer <stefan.reinauer@coreboot.org>
This commit is contained in:
parent
dc27d62921
commit
bf8db8d45b
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,124 @@
|
|||
# Sandy Bridge Raminit
|
||||
|
||||
## Introduction
|
||||
|
||||
This documentation is intended to document the closed source memory controller
|
||||
hardware for Intel 2nd Gen (Sandy Bride) and 3rd Gen (Ivy Bridge) core-i CPUs.
|
||||
|
||||
The memory initialization code has to take care of lots of duties:
|
||||
1. Selection of operating frequency
|
||||
* Selection of common timings
|
||||
* Applying frequency specific compensation values
|
||||
* Read training of all populated channels
|
||||
* Write training of all populated channels
|
||||
* Adjusting delay networks of address and command signals
|
||||
* DQS training of all populated channels
|
||||
* Programming memory map
|
||||
* Report DRAM configuration
|
||||
* Error handling
|
||||
|
||||
## Definitions
|
||||
| Symbol | Description | Units | Valid region |
|
||||
|---------|-------------------------------------------------------------------|------------|--------------|
|
||||
| SCK | DRAM system clock cycle time | s | - |
|
||||
| tCK | DRAM system clock cycle time | 1/256th ns | - |
|
||||
| DCK | Data clock cycle time: The time between two SCK clock edges | s | - |
|
||||
| timA | IO phase: The phase delay of the IO signals | 1/64th DCK | [0-512) |
|
||||
| SPD | Manufacturer set memory timings located on an EEPROM on every DIMM| bytes | - |
|
||||
| REFCK | Reference clock, either 100 or 133 | Mhz | 100, 133 |
|
||||
| MULT | DRAM PLL multiplier | - | [3-12] |
|
||||
| XMP | Extreme Memory Profiles | - | - |
|
||||
|
||||
## (Inoffical) register documentation
|
||||
[Sandy Bride - Register documentation](SandyBridge_registers.md)
|
||||
|
||||
## Frequency selection
|
||||
[Sandy Bride - Frequency selection](SandyBridge_freq.md)
|
||||
|
||||
## Read training
|
||||
[Sandy Bride - Read training](SandyBridge_read.md)
|
||||
|
||||
### SMBIOS type 17
|
||||
The SMBIOS specification allows to report the memory configuration in use.
|
||||
On GNU/Linux you can run `# dmidecode -t 17` to view it.
|
||||
Example output of dmidecode:
|
||||
|
||||
```
|
||||
Handle 0x0045, DMI type 17, 34 bytes
|
||||
Memory Device
|
||||
Array Handle: 0x0042
|
||||
Error Information Handle: Not Provided
|
||||
Total Width: 64 bits
|
||||
Data Width: 64 bits
|
||||
Size: 8192 MB
|
||||
Form Factor: DIMM
|
||||
Set: None
|
||||
Locator: ChannelB-DIMM0
|
||||
Bank Locator: BANK 2
|
||||
Type: DDR3
|
||||
Type Detail: Synchronous
|
||||
Speed: 933 MHz
|
||||
Manufacturer: 0420
|
||||
Serial Number: 00000000
|
||||
Asset Tag: 9876543210
|
||||
Part Number: F3-1866C9-8GSR
|
||||
Rank: 2
|
||||
Configured Clock Speed: 933 MHz
|
||||
```
|
||||
The memory frequency printed by dmidecode is the active memory frequency. It's
|
||||
**not** the double datarate and it's **not** the one encoded maximum frequency
|
||||
in each DIMM's SPD.
|
||||
|
||||
> **Note:** This feature is available since coreboot 4.4
|
||||
|
||||
### MRC cache
|
||||
The name *MRC cache* might be missleading as in case of *Native ram init*
|
||||
there's no MRC, but for historical reasons it's still named *MRC cache*.
|
||||
The MRC cache is part of flash memory that is writeable by coreboot.
|
||||
At the end of the boot process coreboot will write the RAM training results to
|
||||
flash for future use, as RAM training is time intensive. Storing the results
|
||||
allows to boot faster on normal boot and allows to support S3 resume,
|
||||
as the RAM training results can't be stored in RAM (you need to configure
|
||||
the memory controller first to access RAM).
|
||||
|
||||
The MRC cache needs to be invalidated in case the memory configuration has
|
||||
been changed. To detect a changed memory configuration the CRC16 of each DIMM
|
||||
is stored to MRC cache.
|
||||
> **Note:** This feature is available since coreboot 4.4
|
||||
|
||||
### Error handling
|
||||
As of writing the only supported error handling is to disable the failing
|
||||
channel and restart the memory training sequence. It's very likely to succeed,
|
||||
as memory channels operate independent of each other.
|
||||
In case no DIMM could be initilized coreboot will halt. The screen will stay
|
||||
black until you power of your device. On some platforms there's additional
|
||||
feedback to indicate such an event.
|
||||
|
||||
If you find `dmidecode -t 17` to report only half of the memory installed,
|
||||
it's likely that a fatal memory init failure had happened.
|
||||
It is assumed, that a working board with less physical memory, is much better,
|
||||
than a board that doesn't boot at all.
|
||||
|
||||
> **Note:** This feature is available since coreboot 4.5
|
||||
|
||||
Try to swap memory modules and or try to use a different vendor. If nothing
|
||||
helps you could have a look at capter [Debuggin] or report a ticket
|
||||
at [ticket.coreboot.org]. Please provide a full RAM init log,
|
||||
that has been captured using EHCI debug.
|
||||
|
||||
To enable extensive RAM training logging enable the Kconfig option
|
||||
`DEBUG_RAM_SETUP`
|
||||
#### Lenovo Thinkpads
|
||||
Lenovo Thinkpads do have an additional feature to indicate that RAM init has
|
||||
failed and coreboot has died (it calls die() on fatal error, thus the name).
|
||||
The Kconfig options
|
||||
`H8_BEEP_ON_DEATH`
|
||||
`H8_FLASH_LEDS_ON_DEATH`
|
||||
enable blinking LEDs and enable a beep to indicate death.
|
||||
|
||||
> **Note:** This feature is available since coreboot 4.7
|
||||
|
||||
## Debugging
|
||||
It's recommended to use an external debugger, such as serial or EHCI debug
|
||||
dongle. In case of failing memory init the board might not boot at all,
|
||||
preventing you from using CBMEM.
|
|
@ -0,0 +1,132 @@
|
|||
# Frequency selection
|
||||
|
||||
## Introduction
|
||||
This chapter explains the frequency selection done on Sandybride and Ivybridge.
|
||||
|
||||
## Definitions
|
||||
| Symbol | Description | Units | Valid region |
|
||||
|---------|-------------------------------------------------------------------|------------|--------------|
|
||||
| SCK | DRAM system clock cycle time | s | - |
|
||||
| tCK | DRAM system clock cycle time | 1/256th ns | - |
|
||||
| DCK | Data clock cycle time: The time between two SCK clock edges | s | - |
|
||||
| SPD | Manufacturer set memory timings located on an EEPROM on every DIMM| bytes | - |
|
||||
| REFCK | Reference clock, either 100 or 133 | Mhz | 100, 133 |
|
||||
| MULT | DRAM PLL multiplier | - | [3-12] |
|
||||
| XMP | Extreme Memory Profiles | - | - |
|
||||
|
||||
## SPD
|
||||
The [SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
|
||||
located on every DIMM is factory program with various timings. One of them
|
||||
specifies the maximum clock frequency the DIMM should be used with. The
|
||||
operating frequency is stores as fixed point value (tCK), rounded to the next
|
||||
smallest supported operating frequency. Some
|
||||
[SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
|
||||
contains additional and optional
|
||||
[XMP](https://de.wikipedia.org/wiki/Extreme_Memory_Profile "Extreme Memory Profile")
|
||||
data, that stores so called "performance" modes, that advertises higher clock
|
||||
frequencies.
|
||||
|
||||
## XMP profiles
|
||||
At time of writing coreboot's raminit is able to parse XMP profile 1 and 2.
|
||||
Only **XMP profile 1** is being used in case it advertises:
|
||||
* 1.5V operating voltage
|
||||
* The channel's installed DIMM count doesn't exceed the XMP coded limit
|
||||
|
||||
In case the XMP profile doesn't fullfill those limits, the regular SPD will be
|
||||
used.
|
||||
> **Note:** XMP Profiles are supported since coreboot 4.4.
|
||||
|
||||
It is possible to ignore the max DIMM count limit set by XMP profiles.
|
||||
By activating Kconfig option `NATIVE_RAMINIT_IGNORE_XMP_MAX_DIMMS` it is
|
||||
possible to install two DIMMs per channel, even if XMP tells you not to do.
|
||||
|
||||
> **Note:** Ignoring XMP Profiles limit is supported since coreboot 4.7.
|
||||
|
||||
## Soft fuses
|
||||
Every board manufacturer does program "soft" fuses to indicate the maximum
|
||||
DRAM frequency supported. However, those fuses don't set a limit in hardware
|
||||
and thus are called "soft" fuses, as it is possible to ignore them.
|
||||
|
||||
> **Note:** Ignoring the fuses might cause system instability !
|
||||
|
||||
On Sandy Bride *CAPID0_A* is being read, and on Ivybridge *CAPID0_B* is being
|
||||
read. coreboot reads those registers and honors the limit in case the Kconfig
|
||||
option `CONFIG_NATIVE_RAMINIT_IGNORE_MAX_MEM_FUSES` wasn't set.
|
||||
Power users that want to let their RAM run at DRAM's "stock" frequency need to
|
||||
enable the Kconfig symbol.
|
||||
|
||||
It is possible to override the soft fuses limit by using a board-specific
|
||||
[devicetree](#devicetree) setting.
|
||||
|
||||
> **Note:** Ignoring max mem freq. fuses is supported since coreboot 4.7.
|
||||
|
||||
## <a name="hard_fuses"></a> Hard fuses
|
||||
"Hard" fuses are programmed by Intel and limit the maximum frequency that can
|
||||
be used on a given CPU/board/chipset. At time of writing there's no register
|
||||
to read this limit, before trying to set a given DRAM frequency. The memory PLL
|
||||
won't lock, indicating that the chosen memory multiplier isn't available. In
|
||||
this case coreboot tries the next smaller memory multiplier until the PLL will
|
||||
lock.
|
||||
|
||||
## <a name="devicetree"></a> Devicetree
|
||||
The devicetree register ```max_mem_clock_mhz``` overrides the "soft" fuses set
|
||||
by the board manufacturer.
|
||||
|
||||
By using this register it's possible to force a minimum operating frequency.
|
||||
|
||||
## Reference clock
|
||||
While Sandybride supports 133 MHz reference clock (REFCK), Ivy Bridge also
|
||||
supports 100 MHz reference clock. The reference clock is multiplied by the DRAM
|
||||
multiplier to select the DRAM frequency (SCK) by the following formula:
|
||||
|
||||
REFCK * MULT = 1 / DCK
|
||||
|
||||
> **Note:** Since coreboot 4.6 Ivy Bridge supports 100MHz REFCK.
|
||||
|
||||
## Sandy Bride's supported frequencies
|
||||
| SCK [Mhz] | DDR [Mhz] | Mutiplier (MULT) | Reference clock (REFCK) | Comment |
|
||||
|------------|-----------|------------------|-------------------------|---------------|
|
||||
| 400 | DDR3-800 | 3 | 133 MHz | |
|
||||
| 533 | DDR3-1066 | 4 | 133 MHz | |
|
||||
| 666 | DDR3-1333 | 5 | 133 MHz | |
|
||||
| 800 | DDR3-1600 | 6 | 133 MHz | |
|
||||
| 933 | DDR3-1866 | 7 | 133 MHz | |
|
||||
| 1066 | DDR3-2166 | 8 | 133 MHz | ||
|
||||
|
||||
## Ivybridge's supported frequencies
|
||||
| SCK [Mhz] | DDR [Mhz] | Mutiplier (MULT) | Reference clock (REFCK) | Comment |
|
||||
|------------|-----------|------------------|-------------------------|---------------|
|
||||
| 400 | DDR3-800 | 3 | 133 MHz | |
|
||||
| 533 | DDR3-1066 | 4 | 133 MHz | |
|
||||
| 666 | DDR3-1333 | 5 | 133 MHz | |
|
||||
| 800 | DDR3-1600 | 6 | 133 MHz | |
|
||||
| 933 | DDR3-1866 | 7 | 133 MHz | |
|
||||
| 1066 | DDR3-2166 | 8 | 133 MHz | |
|
||||
| 700 | DDR3-1400 | 7 | 100 MHz | '1 |
|
||||
| 800 | DDR3-1600 | 8 | 100 MHz | '1 |
|
||||
| 900 | DDR3-1800 | 9 | 100 MHz | '1 |
|
||||
| 1000 | DDR3-2000 | 10 | 100 MHz | '1 |
|
||||
| 1100 | DDR3-2200 | 11 | 100 MHz | '1 |
|
||||
| 1200 | DDR3-2400 | 12 | 100 MHz | '1 ||
|
||||
|
||||
> '1: since coreboot 4.6
|
||||
|
||||
## Multiplier selection
|
||||
coreboot select the maximum frequency to operate at by the following formula:
|
||||
```
|
||||
if devicetree's max_mem_clock_mhz > 0:
|
||||
freq_max := max_mem_clock_mhz
|
||||
else:
|
||||
freq_max := soft_fuse_max_mhz
|
||||
|
||||
for i in SPDs:
|
||||
freq_max := MIN(freq_max, ddr_spd_max_mhz[i])```
|
||||
|
||||
As you can see, by using DIMMs with different maximum DRAM frequencies, the
|
||||
slowest DIMMs' frequency will be selected, to prevent over-clocking it.
|
||||
|
||||
The selected frequency gives the PLL multiplier to operate at. In case the PLL
|
||||
locks (see Take me to [Hard fuses](#hard_fuses)) the frequency will be used for
|
||||
all DIMMs. At this point it's not possible to change the multiplier again,
|
||||
until the system has been powered off. In case the PLL doesn't lock, the next
|
||||
smaller multiplier will be used until a working multiplier will be found.
|
|
@ -0,0 +1,142 @@
|
|||
# Read training
|
||||
|
||||
## Introduction
|
||||
|
||||
This chapter explains the read training sequence done on Sandy Bride and
|
||||
Ivy Bridge memory initialization.
|
||||
|
||||
Read training is done to compensate the skew between DQS and SCK and to find
|
||||
the smallest supported roundtrip delay.
|
||||
|
||||
Every board does have a vendor depended routing topology, and can be equip
|
||||
with any combination of DDR3 memory modules, that introduces different
|
||||
skew between the memory lanes. With DDR3 a "Fly-By" routing topology
|
||||
has been introduced, that makes the biggest part of DQS-SCK skew.
|
||||
The memory code measures the actual skew and actives delay gates,
|
||||
that will "compensate" the skew.
|
||||
|
||||
When in read training the DRAM and the controller are placed in a special mode.
|
||||
On every read instruction the DRAM outputs a predefined pattern and the memory
|
||||
controller samples the DQS after a given delay. As the pattern is known, the
|
||||
actual delay of every lane can be measured.
|
||||
|
||||
The values programmed in read training effect DRAM-to-MC transfers only !
|
||||
|
||||
## Definitions
|
||||
| Symbol | Description | Units | Valid region |
|
||||
|---------|-------------------------------------------------------------------|------------|--------------|
|
||||
| SCK | DRAM system clock cycle time | s | - |
|
||||
| tCK | DRAM system clock cycle time | 1/256th ns | - |
|
||||
| DCK | Data clock cycle time: The time between two SCK clock edges | s | - |
|
||||
| timA | IO phase: The phase delay of the IO signals | 1/64th DCK | [0-512) |
|
||||
| SPD | Manufacturer set memory timings located on an EEPROM on every DIMM| bytes | - |
|
||||
| REFCK | Reference clock, either 100 or 133 | Mhz | 100, 133 |
|
||||
| MULT | DRAM PLL multiplier | - | [3-12] |
|
||||
| XMP | Extreme Memory Profiles | - | - |
|
||||
| DQS | Data Strobe signal used to sample all lane's DQ signals | - | - |
|
||||
|
||||
## Hardware
|
||||
The hardware does have delay logic blocks that can delay the DQ / DQS of a
|
||||
lane/rank by one or multiple clock cylces and it does have delay logic blocks
|
||||
that can delay the signal by a multiple of 1/64th DCK per lane.
|
||||
|
||||
All delay values can be controlled via software by writing registers in the
|
||||
MCHBAR.
|
||||
|
||||
## IO phase
|
||||
|
||||
The IO phase can be adjusted in [0-512) * 1/64th DCK. Incrementing it by 64 is
|
||||
the same as Incrementing IO delay by 1.
|
||||
|
||||
## IO delay
|
||||
Delays the DQ / DQS signal by one or multiple clock cycles.
|
||||
|
||||
### Roundtrip time
|
||||
The roundtrip time is the time the memory controller waits for data arraving
|
||||
after a read has been issued. Due to clock-domain crossings, multiple
|
||||
delay instances and phase interpolators, the signal runtime to DRAM and back
|
||||
to memory controller defaults to 55 DCKs. The real roundtrip time has to be
|
||||
measured.
|
||||
|
||||
After a read command has been issued, a counter counts down until zero has been
|
||||
reached and activates the input buffers.
|
||||
|
||||
The following pictures shows the relationship between those three values.
|
||||
The picture was generated from 16 IO delay values times 64 timA values.
|
||||
The highest IO delay was set on the right-hand side, while the last block
|
||||
on the left-hand side has zero IO delay.
|
||||
|
||||
** roundtrip 55 DCKs **
|
||||
![alt text][timA_lane0-3_rt55]
|
||||
|
||||
[timA_lane0-3_rt55]: timA_lane0-3_rt55.png "timA for lane0 - lane3, roundtrip 55"
|
||||
|
||||
** roundtrip 54 DCKs **
|
||||
![alt text][timA_lane0-3_rt54]
|
||||
|
||||
[timA_lane0-3_rt54]: timA_lane0-3_rt54.png "timA for lane0 - lane3, roundtrip 54"
|
||||
|
||||
|
||||
** roundtrip 53 DCKs **
|
||||
![alt text][timA_lane0-3_rt53]
|
||||
|
||||
[timA_lane0-3_rt53]: timA_lane0-3_rt53.png "timA for lane0 - lane3, roundtrip 53"
|
||||
|
||||
As you can see the signal has some jitter as every sample was taken in a
|
||||
different loop iteration. The result register only contains a single bit per
|
||||
lane.
|
||||
|
||||
## Algorithm
|
||||
### Steps
|
||||
The algorithm finds the roundtrip time, IO delay and IO phase. The IO phase
|
||||
will be adjusted to match the falling edge of the preamble of each lane.
|
||||
The roundtrip time is adjusted to an minimal value, that still includes the
|
||||
preamble.
|
||||
|
||||
### Synchronize to data phase
|
||||
|
||||
The first measurement done in read-leveling samples all DQS values for one
|
||||
phase [0-64) * 1/64th DCK. It then searches for the middle of the low data
|
||||
symbol and adjusts timA to the found phase and thus the following measurements
|
||||
will be aligned to the low data symbol.
|
||||
The code assumes that the initial roundtrip time causes the measurement to be
|
||||
in the alternating pattern data phase.
|
||||
|
||||
### Finding the preamble
|
||||
After adjusting the IO phase to the middle of one data symbol the preamble will
|
||||
be located. Unlike the data phase, which is an alternating pattern (010101...),
|
||||
the preamble consists of two high data cycles.
|
||||
|
||||
The code decrements the IO delay/RTT and samples the DQS signal with timA
|
||||
untouched. As it has been positioned in the middle of the data symbol, it'll
|
||||
read as either "low" or "high".
|
||||
|
||||
If it's "low" we are still in the data phase.
|
||||
If it's "high" we have found the preamble.
|
||||
|
||||
The roundtrip time and IO delay will be adjusted until all lanes are aligned.
|
||||
The resulting IO delay is visible in the picture below.
|
||||
|
||||
** roundtrip time: 49 DCKs, IO delay (at blue point): 6 DCKs **
|
||||
![alt text][timA_lane0-3_discover_420x]
|
||||
|
||||
[timA_lane0-3_discover_420x]: timA_lane0-3_discover_420x.png "timA for lane0 - lane3, finding minimum roundtrip time"
|
||||
|
||||
** Note: The sampled data has been shifted by timA. The preamble is now
|
||||
in phase. **
|
||||
|
||||
## Fine adjustment
|
||||
|
||||
As timA still points the middle of the data symbol an offset of 32 is added.
|
||||
It now points the falling edge of the preamble.
|
||||
The fine adjustment is to reduce errors introduced by jitter. The phase is
|
||||
adjusted from `timA - 25` to `timA + 25` and the DQS signal is sampled 100
|
||||
times. The fine adjustment finds the middle of each rising edge (it's actual
|
||||
the falling edge of the preamble) to get the final IO phase. You can see the
|
||||
result in the picture below.
|
||||
|
||||
![alt text][timA_lane0-3_adjust_fine]
|
||||
|
||||
[timA_lane0-3_adjust_fine]: timA_lane0-3_adjust_fine.png "timA for lane0 - lane3, fine adjustment"
|
||||
|
||||
Lanes 0 - 2 will be adjusted by a phase of -10, while lane 3 is already correct.
|
Binary file not shown.
After Width: | Height: | Size: 90 KiB |
Binary file not shown.
After Width: | Height: | Size: 98 KiB |
Binary file not shown.
After Width: | Height: | Size: 101 KiB |
Binary file not shown.
After Width: | Height: | Size: 102 KiB |
Binary file not shown.
After Width: | Height: | Size: 103 KiB |
Loading…
Reference in New Issue