HoloLens 2 + MRTK3 眼触控交互开发笔记

5 minute read

Published: February 20, 2026

HoloLens 2 的眼动追踪能力

HoloLens 2 内置了一对近红外摄像头用于眼动追踪，无需额外硬件即可获取用户的注视方向。其主要参数如下：

采样率：约 30 Hz，对于交互设计足够，但不适合精细的眼动行为分析（如微扫视研究）
精度：约 1.5 度视角误差，在正常使用距离下对应大约 2-3 厘米的目标偏差
输出：提供注视射线（Gaze Ray）的起点和方向，而非屏幕坐标——这是 3D 空间交互与传统 2D 眼动追踪的根本区别

首次使用时系统会引导用户完成眼动校准（追踪屏幕上移动的宝石图标）。校准质量直接影响后续所有交互的可靠性，建议每次更换用户或重新佩戴设备后重新校准。

MRTK3 中的眼动交互配置

MRTK3（Mixed Reality Toolkit 3）是 HoloLens 2 开发的标准中间件。相比 MRTK 2.x，MRTK3 采用了更模块化的架构，眼动追踪相关的核心组件包括：

GazeInteractor：这是 MRTK3 中负责眼动射线投射的组件。它会持续发射一条从用户眼睛出发的射线，与场景中的可交互对象进行碰撞检测。需要挂载在 MRTK XR Rig 的相应节点上。

StatefulInteractable：场景中需要响应注视的对象需要添加此组件。它支持多种交互状态——包括 IsGazeHovered，当用户的注视射线命中该对象时触发。

EyeTrackingProvider：底层的数据提供者，负责从 HoloLens 2 的眼动硬件获取原始注视数据并转换为 Unity 坐标系中的射线。

配置时需要注意在 Unity 的 Project Settings > OpenXR 中启用 Eye Gaze Interaction Profile，并在应用的 manifest 中声明 GazeInput 能力。遗漏这一步是最常见的”眼动不工作”问题来源。

空间定位：锚定物理对象

在 AR 环境中，一个核心挑战是将虚拟内容精确锚定到物理世界中的特定位置。HoloLens 2 提供了多种空间定位方案：

World Anchors：HoloLens 2 的 SLAM 系统会自动建立空间地图，可以在特定位置放置世界锚点。优点是无需额外标记物，缺点是对动态环境适应性有限。

Marker Tracking：使用 Vuforia 或 ArToolkit 等视觉标记系统，在物理对象上贴附特定图案（如 ArUco 标记），通过 HoloLens 的 RGB 摄像头识别标记位置。这种方式精度更高（毫米级），特别适合需要与具体物理对象交互的场景。

QR Code Tracking：HoloLens 2 原生支持 QR 码检测与空间定位，实现简单但精度略低于专用 fiducial marker。

实际项目中，我推荐混合使用：用 marker tracking 做初始对齐，然后切换到 world anchor 保持稳定性——这样即使标记被遮挡，虚拟内容也不会漂移。

眼触控多模态交互设计

“眼触控”（Eye-Touch）的核心思想是将注视用于快速选择目标，用手势用于确认操作——解决了纯眼控交互中的 Midas Touch 问题（注视即触发的误操作问题）。

在 MRTK3 中实现这种多模态交互的基本思路是：

注视选择阶段：GazeInteractor 持续追踪用户注视的对象，被注视的对象进入”预选中”状态（可以用视觉高亮反馈）
手势确认阶段：当用户对预选中的对象执行 Air Tap（食指捏合）或其他手势时，确认选择
状态管理：需要处理注视目标切换时的状态清理、手势识别的时间窗口、以及多对象同时被注视时的优先级

这种设计的优势在于：注视提供了快速的隐式指向（用户自然地看向感兴趣的对象），手势提供了明确的显式确认（避免误触发），两个通道互补。

实用建议与常见坑

校准相关：

每次更换用户必须重新校准。不同用户的瞳距和眼球特征差异显著
佩戴眼镜的用户可能需要调整设备位置以获得最佳追踪质量
校准后让用户看向已知位置验证精度，再开始正式实验

开发调试：

Unity Editor 中无法获取真实眼动数据，需使用 MRTK3 的模拟输入（鼠标模拟注视）进行基本逻辑调试
部署到设备上测试是不可替代的——模拟器中的交互感受与真实佩戴差异很大
使用 Device Portal 的实时预览功能可以看到用户的第一人称视角，方便旁观调试

性能相关：

眼动数据处理本身开销很小，但注视射线的碰撞检测频率（每帧一次）在场景复杂时可能成为瓶颈
建议为可注视对象使用简化的碰撞体（Collider），而非 Mesh Collider
注视数据天然带有噪声，建议对注视点做轻度的时间平滑处理

常见问题排查：

眼动完全不工作：检查 OpenXR 配置和 manifest 权限
注视偏移严重：重新校准，检查设备是否佩戴到位
手势识别率低：确保手部在 HoloLens 的可视范围内（设备前方约 50cm 的区域）

HoloLens 2 Eye Tracking Capabilities

HoloLens 2 includes a pair of near-infrared cameras for built-in eye tracking, requiring no additional hardware to obtain the user’s gaze direction. Key specifications:

Sampling rate: ~30 Hz – sufficient for interaction design, but not for fine-grained oculomotor research (e.g., microsaccade studies)
Accuracy: ~1.5 degrees of visual angle, corresponding to roughly 2-3 cm of target offset at typical usage distances
Output: Provides a gaze ray (origin + direction) rather than screen coordinates – a fundamental difference between 3D spatial interaction and traditional 2D eye tracking

On first use, the system guides users through eye calibration (tracking a gem icon moving across the display). Calibration quality directly affects all subsequent interaction reliability. Recalibrate whenever switching users or re-donning the device.

Eye Tracking Configuration in MRTK3

MRTK3 (Mixed Reality Toolkit 3) is the standard middleware for HoloLens 2 development. Compared to MRTK 2.x, MRTK3 adopts a more modular architecture. Core eye tracking components include:

GazeInteractor: The component responsible for eye gaze raycasting in MRTK3. It continuously casts a ray from the user’s eyes, performing hit detection against interactable objects in the scene. It must be attached to the appropriate node in the MRTK XR Rig.

StatefulInteractable: Scene objects that need to respond to gaze must include this component. It supports multiple interaction states – including IsGazeHovered, triggered when the user’s gaze ray hits the object.

EyeTrackingProvider: The underlying data provider that retrieves raw gaze data from HoloLens 2 eye tracking hardware and converts it into rays in Unity’s coordinate system.

During setup, remember to enable the Eye Gaze Interaction Profile in Unity’s Project Settings > OpenXR, and declare the GazeInput capability in the application manifest. Missing this step is the most common source of “eye tracking not working” issues.

Spatial Anchoring: Registering Physical Objects

In AR environments, a core challenge is precisely anchoring virtual content to specific locations in the physical world. HoloLens 2 offers several spatial anchoring approaches:

World Anchors: HoloLens 2’s SLAM system automatically builds a spatial map, allowing world anchors at specific positions. The advantage is no additional markers needed; the downside is limited adaptability to dynamic environments.

Marker Tracking: Visual marker systems like Vuforia or ArToolkit use specific patterns (e.g., ArUco markers) attached to physical objects, recognized via HoloLens’s RGB camera. This approach offers higher precision (millimeter-level), particularly suitable for scenarios requiring interaction with specific physical objects.

QR Code Tracking: HoloLens 2 natively supports QR code detection and spatial positioning. Simple to implement but slightly less precise than dedicated fiducial markers.

In practice, I recommend a hybrid approach: use marker tracking for initial alignment, then switch to world anchors for stability – so virtual content remains stable even when markers are occluded.

Eye-Touch Multimodal Interaction Design

The core idea of “Eye-Touch” is using gaze for rapid target selection and hand gestures for action confirmation – solving the Midas Touch problem inherent in pure gaze interaction (unintended activation simply by looking).

The basic approach to implementing this multimodal interaction in MRTK3:

Gaze Selection Phase: GazeInteractor continuously tracks the object the user is looking at. The gazed object enters a “pre-selected” state (with visual highlight feedback).
Gesture Confirmation Phase: When the user performs an Air Tap (index finger pinch) or other gesture on the pre-selected object, the selection is confirmed.
State Management: Handle state cleanup when gaze targets switch, time windows for gesture recognition, and priority when multiple objects are simultaneously gazed.

The design advantage: gaze provides fast implicit pointing (users naturally look at objects of interest), gestures provide explicit confirmation (preventing false activation). The two channels are complementary.

Practical Tips and Common Pitfalls

Calibration:

Recalibrate for every new user. Interpupillary distance and eye characteristics vary significantly
Users wearing glasses may need device position adjustments for optimal tracking quality
After calibration, have users look at known positions to verify accuracy before starting experiments

Development and Debugging:

Unity Editor cannot access real eye tracking data – use MRTK3’s simulated input (mouse simulating gaze) for basic logic debugging
On-device testing is irreplaceable – the interaction experience in the simulator differs substantially from actual wear
Use Device Portal’s live preview to see the user’s first-person view, facilitating observer-side debugging

Performance:

Eye tracking data processing overhead is minimal, but gaze ray collision detection frequency (once per frame) can become a bottleneck in complex scenes
Use simplified colliders for gazeable objects rather than Mesh Colliders
Gaze data is inherently noisy – apply light temporal smoothing to gaze points

Common Troubleshooting:

Eye tracking completely non-functional: Check OpenXR configuration and manifest permissions
Severe gaze offset: Recalibrate, verify proper device fit
Low gesture recognition rate: Ensure hands are within HoloLens’s visible range (approximately 50 cm in front of the device)

王以俨