Houdini Pro version
Houdini Pro is intended for power users with high-end hardware.
The main differences with the Standard version are:
Large Memory Pages
Houdini Pro will use so-called large memory pages if they are provided by the operating system. Depending on the hash table size the speed gain may be between 5% and 15%.
To enable this feature in Windows, you need to modify the Group Policy for your account:
You'll also need to run your chess GUI with administrative rights ("Run as Administrator") or disable UAC in Windows.
Very often large pages will only be available shortly after booting Windows. After a while the Windows memory becomes too fragmented for large page allocation, and Houdini will fall back to standard memory page usage.
You can test the availability of Large Pages with the lp command. Run Houdini in a command window (simply by double-clicking on the executable) and type lp followed by Enter. Houdini will produce a summary with the number of allocated large pages as a function of the large page size. This command can take several minutes on a system with lots of ram (16 GB or more), so be patient.
Most CPU mother boards with multiple sockets employ the so-called "NUMA" architecture.
Houdini Pro detects the NUMA configuration at start-up and will adapt its memory management and thread interaction based on the different NUMA nodes that are available.
Speed gain can be 5% to 15% depending on the number of cores, the motherboard and CPU brand.
Running Multiple Houdini Pro instances
If you're simultaneously running multiple Houdini Pro instances they will by default compete for the resources on the same NUMA nodes. To avoid this, you should set the Numa Offset parameter to different values in the different Houdini instances.
For example, if you want to run two Houdini instances with 6 threads each on 12-core hardware, you should use Numa Offset 1 for the second instance so that it will allocate its 6 threads on the second NUMA node. See also the Numa Offset configuration.
Some Real Performance Data
The test system was a 16-core dual AMD Opteron-6128 box running at the stock 2.0 GHz speed.
The autotune command (see the topic on Split Depth) was used as benchmark to measure the impact of the Large Pages and the NUMA-awareness.
Hash memory was set at 2048 MB, 16 threads were used.
On this system Houdini Pro with NUMA and Large Pages was about 20% faster than the Standard version.