Details
Each Knights Landing (KNL) node has 64 cores, hyper-threaded 4 ways (256 virtual cores) running at 1.3 GHz. The on-package high bandwidth memory has a bandwidth above 450 GB/s, and the main memory has a bandwidth of about 90 GB/s (available concurrently).
Unlike the Knights Corner predecessor, the KNL node is self hosted, not an accelerator, and has more in common with a conventional Xeon processor than a GPU. Simply put, it has twice as many cores as a contemporary dual socket "Broadwell" Xeon, clocked at half the speed, with vector units twice as wide. This would make it only twice as fast per node. However, the on-package high bandwidth memory (8 stacks of 4 GB, 32 GB in total) has a bandwidth of roughly 4 times a conventional 2-socket system, so for bandwidth constrained applications, it is potentially 4 times as much performance for a lower price.
The Xeon Phi design can support 64 to 72 cores, and can execute 16 single precision (SP) or 8 double precision (DP) vector instructions per clock cycle per core (twice that for fused multiply-add). The core count and vector lengths are basic extensions of an x86 processor, and allow the same programming paradigms (serial, threaded and vector) used on other Xeon processors. Unlike the GPGPU accelerator model, the same program code can be used efficiently on the host and the coprocessor. Also, the same Intel compilers, tools, libraries, etc. that you use on Intel and AMD systems are available for the Phi processors.
These CPUs contain a large number of striped down cores running at lower frequency to deliver much higher peak performance per chip than is available using more traditional multi-core approaches. In the case of the Xeon Phi each chip has a peak performance of ~3 Gflops double precision.
Intel has compiled a large list of developer information at the Intel MIC Developer web site. Here you can find documentation and training videos for developing code for Intel Xeon Phi co-processors. One may also find useful information at the Intel Many Integrated Core Architecture Forum.
For programming examples and local configuration details see the local User's Guide for LQCD / HPC systems.