Documentation/Intel/NativeRaminit: Remove trailing whitespace
Change-Id: I1d38aea07e2d9ffb89115410603a5beac5e4d44d Signed-off-by: Elyes HAOUAS <ehaouas@noos.fr> Reviewed-on: https://review.coreboot.org/25831 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Patrick Georgi <pgeorgi@google.com>
This commit is contained in:
parent
4713b5cd9e
commit
0c80d2f8e3
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
The MCHBAR can be enabled by using register 0x48 of PCI(0:0:0) device.
|
The MCHBAR can be enabled by using register 0x48 of PCI(0:0:0) device.
|
||||||
|
|
||||||
This documentation is incomplete and might be incorrect.
|
This documentation is incomplete and might be incorrect.
|
||||||
Please handle with care !
|
Please handle with care !
|
||||||
|
|
||||||
**MCHBAR + 0x4**
|
**MCHBAR + 0x4**
|
||||||
|
|
|
@ -15,15 +15,15 @@ This chapter explains the frequency selection done on Sandybride and Ivybridge.
|
||||||
| XMP | Extreme Memory Profiles | - | - |
|
| XMP | Extreme Memory Profiles | - | - |
|
||||||
|
|
||||||
## SPD
|
## SPD
|
||||||
The [SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
|
The [SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
|
||||||
located on every DIMM is factory program with various timings. One of them
|
located on every DIMM is factory program with various timings. One of them
|
||||||
specifies the maximum clock frequency the DIMM should be used with. The
|
specifies the maximum clock frequency the DIMM should be used with. The
|
||||||
operating frequency is stores as fixed point value (tCK), rounded to the next
|
operating frequency is stores as fixed point value (tCK), rounded to the next
|
||||||
smallest supported operating frequency. Some
|
smallest supported operating frequency. Some
|
||||||
[SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
|
[SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
|
||||||
contains additional and optional
|
contains additional and optional
|
||||||
[XMP](https://de.wikipedia.org/wiki/Extreme_Memory_Profile "Extreme Memory Profile")
|
[XMP](https://de.wikipedia.org/wiki/Extreme_Memory_Profile "Extreme Memory Profile")
|
||||||
data, that stores so called "performance" modes, that advertises higher clock
|
data, that stores so called "performance" modes, that advertises higher clock
|
||||||
frequencies.
|
frequencies.
|
||||||
|
|
||||||
## XMP profiles
|
## XMP profiles
|
||||||
|
@ -32,51 +32,51 @@ Only **XMP profile 1** is being used in case it advertises:
|
||||||
* 1.5V operating voltage
|
* 1.5V operating voltage
|
||||||
* The channel's installed DIMM count doesn't exceed the XMP coded limit
|
* The channel's installed DIMM count doesn't exceed the XMP coded limit
|
||||||
|
|
||||||
In case the XMP profile doesn't fullfill those limits, the regular SPD will be
|
In case the XMP profile doesn't fullfill those limits, the regular SPD will be
|
||||||
used.
|
used.
|
||||||
> **Note:** XMP Profiles are supported since coreboot 4.4.
|
> **Note:** XMP Profiles are supported since coreboot 4.4.
|
||||||
|
|
||||||
It is possible to ignore the max DIMM count limit set by XMP profiles.
|
It is possible to ignore the max DIMM count limit set by XMP profiles.
|
||||||
By activating Kconfig option `NATIVE_RAMINIT_IGNORE_XMP_MAX_DIMMS` it is
|
By activating Kconfig option `NATIVE_RAMINIT_IGNORE_XMP_MAX_DIMMS` it is
|
||||||
possible to install two DIMMs per channel, even if XMP tells you not to do.
|
possible to install two DIMMs per channel, even if XMP tells you not to do.
|
||||||
|
|
||||||
> **Note:** Ignoring XMP Profiles limit is supported since coreboot 4.7.
|
> **Note:** Ignoring XMP Profiles limit is supported since coreboot 4.7.
|
||||||
|
|
||||||
## Soft fuses
|
## Soft fuses
|
||||||
Every board manufacturer does program "soft" fuses to indicate the maximum
|
Every board manufacturer does program "soft" fuses to indicate the maximum
|
||||||
DRAM frequency supported. However, those fuses don't set a limit in hardware
|
DRAM frequency supported. However, those fuses don't set a limit in hardware
|
||||||
and thus are called "soft" fuses, as it is possible to ignore them.
|
and thus are called "soft" fuses, as it is possible to ignore them.
|
||||||
|
|
||||||
> **Note:** Ignoring the fuses might cause system instability !
|
> **Note:** Ignoring the fuses might cause system instability !
|
||||||
|
|
||||||
On Sandy Bride *CAPID0_A* is being read, and on Ivybridge *CAPID0_B* is being
|
On Sandy Bride *CAPID0_A* is being read, and on Ivybridge *CAPID0_B* is being
|
||||||
read. coreboot reads those registers and honors the limit in case the Kconfig
|
read. coreboot reads those registers and honors the limit in case the Kconfig
|
||||||
option `CONFIG_NATIVE_RAMINIT_IGNORE_MAX_MEM_FUSES` wasn't set.
|
option `CONFIG_NATIVE_RAMINIT_IGNORE_MAX_MEM_FUSES` wasn't set.
|
||||||
Power users that want to let their RAM run at DRAM's "stock" frequency need to
|
Power users that want to let their RAM run at DRAM's "stock" frequency need to
|
||||||
enable the Kconfig symbol.
|
enable the Kconfig symbol.
|
||||||
|
|
||||||
It is possible to override the soft fuses limit by using a board-specific
|
It is possible to override the soft fuses limit by using a board-specific
|
||||||
[devicetree](#devicetree) setting.
|
[devicetree](#devicetree) setting.
|
||||||
|
|
||||||
> **Note:** Ignoring max mem freq. fuses is supported since coreboot 4.7.
|
> **Note:** Ignoring max mem freq. fuses is supported since coreboot 4.7.
|
||||||
|
|
||||||
## <a name="hard_fuses"></a> Hard fuses
|
## <a name="hard_fuses"></a> Hard fuses
|
||||||
"Hard" fuses are programmed by Intel and limit the maximum frequency that can
|
"Hard" fuses are programmed by Intel and limit the maximum frequency that can
|
||||||
be used on a given CPU/board/chipset. At time of writing there's no register
|
be used on a given CPU/board/chipset. At time of writing there's no register
|
||||||
to read this limit, before trying to set a given DRAM frequency. The memory PLL
|
to read this limit, before trying to set a given DRAM frequency. The memory PLL
|
||||||
won't lock, indicating that the chosen memory multiplier isn't available. In
|
won't lock, indicating that the chosen memory multiplier isn't available. In
|
||||||
this case coreboot tries the next smaller memory multiplier until the PLL will
|
this case coreboot tries the next smaller memory multiplier until the PLL will
|
||||||
lock.
|
lock.
|
||||||
|
|
||||||
## <a name="devicetree"></a> Devicetree
|
## <a name="devicetree"></a> Devicetree
|
||||||
The devicetree register ```max_mem_clock_mhz``` overrides the "soft" fuses set
|
The devicetree register ```max_mem_clock_mhz``` overrides the "soft" fuses set
|
||||||
by the board manufacturer.
|
by the board manufacturer.
|
||||||
|
|
||||||
By using this register it's possible to force a minimum operating frequency.
|
By using this register it's possible to force a minimum operating frequency.
|
||||||
|
|
||||||
## Reference clock
|
## Reference clock
|
||||||
While Sandybride supports 133 MHz reference clock (REFCK), Ivy Bridge also
|
While Sandybride supports 133 MHz reference clock (REFCK), Ivy Bridge also
|
||||||
supports 100 MHz reference clock. The reference clock is multiplied by the DRAM
|
supports 100 MHz reference clock. The reference clock is multiplied by the DRAM
|
||||||
multiplier to select the DRAM frequency (SCK) by the following formula:
|
multiplier to select the DRAM frequency (SCK) by the following formula:
|
||||||
|
|
||||||
REFCK * MULT = 1 / DCK
|
REFCK * MULT = 1 / DCK
|
||||||
|
@ -122,11 +122,11 @@ else:
|
||||||
for i in SPDs:
|
for i in SPDs:
|
||||||
freq_max := MIN(freq_max, ddr_spd_max_mhz[i])```
|
freq_max := MIN(freq_max, ddr_spd_max_mhz[i])```
|
||||||
|
|
||||||
As you can see, by using DIMMs with different maximum DRAM frequencies, the
|
As you can see, by using DIMMs with different maximum DRAM frequencies, the
|
||||||
slowest DIMMs' frequency will be selected, to prevent over-clocking it.
|
slowest DIMMs' frequency will be selected, to prevent over-clocking it.
|
||||||
|
|
||||||
The selected frequency gives the PLL multiplier to operate at. In case the PLL
|
The selected frequency gives the PLL multiplier to operate at. In case the PLL
|
||||||
locks (see Take me to [Hard fuses](#hard_fuses)) the frequency will be used for
|
locks (see Take me to [Hard fuses](#hard_fuses)) the frequency will be used for
|
||||||
all DIMMs. At this point it's not possible to change the multiplier again,
|
all DIMMs. At this point it's not possible to change the multiplier again,
|
||||||
until the system has been powered off. In case the PLL doesn't lock, the next
|
until the system has been powered off. In case the PLL doesn't lock, the next
|
||||||
smaller multiplier will be used until a working multiplier will be found.
|
smaller multiplier will be used until a working multiplier will be found.
|
||||||
|
|
|
@ -2,22 +2,22 @@
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
This chapter explains the read training sequence done on Sandy Bride and
|
This chapter explains the read training sequence done on Sandy Bride and
|
||||||
Ivy Bridge memory initialization.
|
Ivy Bridge memory initialization.
|
||||||
|
|
||||||
Read training is done to compensate the skew between DQS and SCK and to find
|
Read training is done to compensate the skew between DQS and SCK and to find
|
||||||
the smallest supported roundtrip delay.
|
the smallest supported roundtrip delay.
|
||||||
|
|
||||||
Every board does have a vendor depended routing topology, and can be equip
|
Every board does have a vendor depended routing topology, and can be equip
|
||||||
with any combination of DDR3 memory modules, that introduces different
|
with any combination of DDR3 memory modules, that introduces different
|
||||||
skew between the memory lanes. With DDR3 a "Fly-By" routing topology
|
skew between the memory lanes. With DDR3 a "Fly-By" routing topology
|
||||||
has been introduced, that makes the biggest part of DQS-SCK skew.
|
has been introduced, that makes the biggest part of DQS-SCK skew.
|
||||||
The memory code measures the actual skew and actives delay gates,
|
The memory code measures the actual skew and actives delay gates,
|
||||||
that will "compensate" the skew.
|
that will "compensate" the skew.
|
||||||
|
|
||||||
When in read training the DRAM and the controller are placed in a special mode.
|
When in read training the DRAM and the controller are placed in a special mode.
|
||||||
On every read instruction the DRAM outputs a predefined pattern and the memory
|
On every read instruction the DRAM outputs a predefined pattern and the memory
|
||||||
controller samples the DQS after a given delay. As the pattern is known, the
|
controller samples the DQS after a given delay. As the pattern is known, the
|
||||||
actual delay of every lane can be measured.
|
actual delay of every lane can be measured.
|
||||||
|
|
||||||
The values programmed in read training effect DRAM-to-MC transfers only !
|
The values programmed in read training effect DRAM-to-MC transfers only !
|
||||||
|
@ -36,34 +36,34 @@ The values programmed in read training effect DRAM-to-MC transfers only !
|
||||||
| DQS | Data Strobe signal used to sample all lane's DQ signals | - | - |
|
| DQS | Data Strobe signal used to sample all lane's DQ signals | - | - |
|
||||||
|
|
||||||
## Hardware
|
## Hardware
|
||||||
The hardware does have delay logic blocks that can delay the DQ / DQS of a
|
The hardware does have delay logic blocks that can delay the DQ / DQS of a
|
||||||
lane/rank by one or multiple clock cylces and it does have delay logic blocks
|
lane/rank by one or multiple clock cylces and it does have delay logic blocks
|
||||||
that can delay the signal by a multiple of 1/64th DCK per lane.
|
that can delay the signal by a multiple of 1/64th DCK per lane.
|
||||||
|
|
||||||
All delay values can be controlled via software by writing registers in the
|
All delay values can be controlled via software by writing registers in the
|
||||||
MCHBAR.
|
MCHBAR.
|
||||||
|
|
||||||
## IO phase
|
## IO phase
|
||||||
|
|
||||||
The IO phase can be adjusted in [0-512) * 1/64th DCK. Incrementing it by 64 is
|
The IO phase can be adjusted in [0-512) * 1/64th DCK. Incrementing it by 64 is
|
||||||
the same as Incrementing IO delay by 1.
|
the same as Incrementing IO delay by 1.
|
||||||
|
|
||||||
## IO delay
|
## IO delay
|
||||||
Delays the DQ / DQS signal by one or multiple clock cycles.
|
Delays the DQ / DQS signal by one or multiple clock cycles.
|
||||||
|
|
||||||
### Roundtrip time
|
### Roundtrip time
|
||||||
The roundtrip time is the time the memory controller waits for data arraving
|
The roundtrip time is the time the memory controller waits for data arraving
|
||||||
after a read has been issued. Due to clock-domain crossings, multiple
|
after a read has been issued. Due to clock-domain crossings, multiple
|
||||||
delay instances and phase interpolators, the signal runtime to DRAM and back
|
delay instances and phase interpolators, the signal runtime to DRAM and back
|
||||||
to memory controller defaults to 55 DCKs. The real roundtrip time has to be
|
to memory controller defaults to 55 DCKs. The real roundtrip time has to be
|
||||||
measured.
|
measured.
|
||||||
|
|
||||||
After a read command has been issued, a counter counts down until zero has been
|
After a read command has been issued, a counter counts down until zero has been
|
||||||
reached and activates the input buffers.
|
reached and activates the input buffers.
|
||||||
|
|
||||||
The following pictures shows the relationship between those three values.
|
The following pictures shows the relationship between those three values.
|
||||||
The picture was generated from 16 IO delay values times 64 timA values.
|
The picture was generated from 16 IO delay values times 64 timA values.
|
||||||
The highest IO delay was set on the right-hand side, while the last block
|
The highest IO delay was set on the right-hand side, while the last block
|
||||||
on the left-hand side has zero IO delay.
|
on the left-hand side has zero IO delay.
|
||||||
|
|
||||||
** roundtrip 55 DCKs **
|
** roundtrip 55 DCKs **
|
||||||
|
@ -82,39 +82,39 @@ on the left-hand side has zero IO delay.
|
||||||
|
|
||||||
[timA_lane0-3_rt53]: timA_lane0-3_rt53.png "timA for lane0 - lane3, roundtrip 53"
|
[timA_lane0-3_rt53]: timA_lane0-3_rt53.png "timA for lane0 - lane3, roundtrip 53"
|
||||||
|
|
||||||
As you can see the signal has some jitter as every sample was taken in a
|
As you can see the signal has some jitter as every sample was taken in a
|
||||||
different loop iteration. The result register only contains a single bit per
|
different loop iteration. The result register only contains a single bit per
|
||||||
lane.
|
lane.
|
||||||
|
|
||||||
## Algorithm
|
## Algorithm
|
||||||
### Steps
|
### Steps
|
||||||
The algorithm finds the roundtrip time, IO delay and IO phase. The IO phase
|
The algorithm finds the roundtrip time, IO delay and IO phase. The IO phase
|
||||||
will be adjusted to match the falling edge of the preamble of each lane.
|
will be adjusted to match the falling edge of the preamble of each lane.
|
||||||
The roundtrip time is adjusted to an minimal value, that still includes the
|
The roundtrip time is adjusted to an minimal value, that still includes the
|
||||||
preamble.
|
preamble.
|
||||||
|
|
||||||
### Synchronize to data phase
|
### Synchronize to data phase
|
||||||
|
|
||||||
The first measurement done in read-leveling samples all DQS values for one
|
The first measurement done in read-leveling samples all DQS values for one
|
||||||
phase [0-64) * 1/64th DCK. It then searches for the middle of the low data
|
phase [0-64) * 1/64th DCK. It then searches for the middle of the low data
|
||||||
symbol and adjusts timA to the found phase and thus the following measurements
|
symbol and adjusts timA to the found phase and thus the following measurements
|
||||||
will be aligned to the low data symbol.
|
will be aligned to the low data symbol.
|
||||||
The code assumes that the initial roundtrip time causes the measurement to be
|
The code assumes that the initial roundtrip time causes the measurement to be
|
||||||
in the alternating pattern data phase.
|
in the alternating pattern data phase.
|
||||||
|
|
||||||
### Finding the preamble
|
### Finding the preamble
|
||||||
After adjusting the IO phase to the middle of one data symbol the preamble will
|
After adjusting the IO phase to the middle of one data symbol the preamble will
|
||||||
be located. Unlike the data phase, which is an alternating pattern (010101...),
|
be located. Unlike the data phase, which is an alternating pattern (010101...),
|
||||||
the preamble consists of two high data cycles.
|
the preamble consists of two high data cycles.
|
||||||
|
|
||||||
The code decrements the IO delay/RTT and samples the DQS signal with timA
|
The code decrements the IO delay/RTT and samples the DQS signal with timA
|
||||||
untouched. As it has been positioned in the middle of the data symbol, it'll
|
untouched. As it has been positioned in the middle of the data symbol, it'll
|
||||||
read as either "low" or "high".
|
read as either "low" or "high".
|
||||||
|
|
||||||
If it's "low" we are still in the data phase.
|
If it's "low" we are still in the data phase.
|
||||||
If it's "high" we have found the preamble.
|
If it's "high" we have found the preamble.
|
||||||
|
|
||||||
The roundtrip time and IO delay will be adjusted until all lanes are aligned.
|
The roundtrip time and IO delay will be adjusted until all lanes are aligned.
|
||||||
The resulting IO delay is visible in the picture below.
|
The resulting IO delay is visible in the picture below.
|
||||||
|
|
||||||
** roundtrip time: 49 DCKs, IO delay (at blue point): 6 DCKs **
|
** roundtrip time: 49 DCKs, IO delay (at blue point): 6 DCKs **
|
||||||
|
@ -122,17 +122,17 @@ The resulting IO delay is visible in the picture below.
|
||||||
|
|
||||||
[timA_lane0-3_discover_420x]: timA_lane0-3_discover_420x.png "timA for lane0 - lane3, finding minimum roundtrip time"
|
[timA_lane0-3_discover_420x]: timA_lane0-3_discover_420x.png "timA for lane0 - lane3, finding minimum roundtrip time"
|
||||||
|
|
||||||
** Note: The sampled data has been shifted by timA. The preamble is now
|
** Note: The sampled data has been shifted by timA. The preamble is now
|
||||||
in phase. **
|
in phase. **
|
||||||
|
|
||||||
## Fine adjustment
|
## Fine adjustment
|
||||||
|
|
||||||
As timA still points the middle of the data symbol an offset of 32 is added.
|
As timA still points the middle of the data symbol an offset of 32 is added.
|
||||||
It now points the falling edge of the preamble.
|
It now points the falling edge of the preamble.
|
||||||
The fine adjustment is to reduce errors introduced by jitter. The phase is
|
The fine adjustment is to reduce errors introduced by jitter. The phase is
|
||||||
adjusted from `timA - 25` to `timA + 25` and the DQS signal is sampled 100
|
adjusted from `timA - 25` to `timA + 25` and the DQS signal is sampled 100
|
||||||
times. The fine adjustment finds the middle of each rising edge (it's actual
|
times. The fine adjustment finds the middle of each rising edge (it's actual
|
||||||
the falling edge of the preamble) to get the final IO phase. You can see the
|
the falling edge of the preamble) to get the final IO phase. You can see the
|
||||||
result in the picture below.
|
result in the picture below.
|
||||||
|
|
||||||
![alt text][timA_lane0-3_adjust_fine]
|
![alt text][timA_lane0-3_adjust_fine]
|
||||||
|
|
Loading…
Reference in New Issue