Houdini Pro is intended for power users with high-end hardware.

The main differences with the Standard version are:

Houdini Pro supports up to 32 threads.
Houdini Pro supports up to 256 GB of hash memory (262144 MB).
Houdini Pro supports Large Memory Pages.
Houdini Pro is NUMA-aware.

Large Memory Pages

Houdini Pro will use so-called large memory pages if they are provided by the operating system. Depending on the hash table size the speed gain may be between 5% and 15%.

 

To enable this feature in Windows, you need to modify the Group Policy for your account:

1.Run: gpedit.msc (or search for "Group Policy").
2.Under "Computer Configuration", "Windows Settings", "Security Settings", "Local Policies" click on "User Rights Assignment".
3.In the right pane double-click the option "Lock Pages in Memory".
4.Click on "Add User or Group" and add your account or "Everyone".
5.You may have to logoff or reboot for the change to take effect.

 

You'll also need to run your chess GUI with administrative rights ("Run as Administrator") or disable UAC in Windows.

Very often large pages will only be available shortly after booting Windows. After a while the Windows memory becomes too fragmented for large page allocation, and Houdini will fall back to standard memory page usage.

 

You can test the availability of Large Pages with the lp command. Run Houdini in a command window (simply by double-clicking on the executable) and type lp followed by Enter. Houdini will produce a summary with the number of allocated large pages as a function of the large page size. This command can take several minutes on a system with lots of ram (16 GB or more), so be patient.

NUMA-awareness

Most CPU mother boards with multiple sockets employ the so-called "NUMA" architecture.

Houdini Pro detects the NUMA configuration at start-up and will adapt its memory management and thread interaction based on the different NUMA nodes that are available.

Speed gain can be 5% to 15% depending on the number of cores, the motherboard and CPU brand.

Running Multiple Houdini Pro instances

If you're simultaneously running multiple Houdini Pro instances they will by default compete for the resources on the same NUMA nodes. To avoid this, you should set the Numa Offset parameter to different values in the different Houdini instances.

For example, if you want to run two Houdini instances with 6 threads each on 12-core hardware, you should use Numa Offset 1 for the second instance so that it will allocate its 6 threads on the second NUMA node. See also the Numa Offset configuration.

Some Real Performance Data

The test system was a 16-core dual AMD Opteron-6128 box running at the stock 2.0 GHz speed.

The autotune command (see the topic on Split Depth) was used as benchmark to measure the impact of the Large Pages and the NUMA-awareness.

Hash memory was set at 2048 MB, 16 threads were used.

 

Configuration

Best Split Depth

Average Node Speed

Speed Gain

Standard

14

13600 kN/s

 

With Large Pages

14

14900 kN/s

+10%

With NUMA and Large Pages

12

16200 kN/s

+20%

 

On this system Houdini Pro with NUMA and Large Pages was about 20% faster than the Standard version.