About the Vision Sensor - Designer - Foundation 23.2 - Foundation 23.2 - Ready - Hyland RPA - external

Hyland RPA Designer

Hyland RPA
Foundation 23.2


The Vision sensor allows UI Interaction with elements that can not be indicated or communicated via other automation libraries as UI Automation, Selenium, Java or SAP Scripting.

The vision sensor (as the name suggests) works with the visual representation of the user interface, i.e. with exactly what the user can see. Therefore, it is important that elements that are to be indicated (or interacted with) are really visible on the user interface. When indicating, an image of the highlighted element is taken, which is then found at run time with template matching. In addition, the other properties, i.e. left, top, index, count, excludetext, original scale, multi-line, context, contextlabel and also the border properties are determined, which in turn can all be used at run time to control the search for the element.

Using vision, the elements will be found by the following properties including OCR for content recognition.


  • Accuracy: Necessary accuracy for template matching (between 0 and 1).
  • Clickx: X coordinate of the click point. Can be either absolute (for example, 20) or relative (for example, 0.5 to click in the center of the element).
  • Clicky: The same as Clickx, but for the Y coordinate.
  • Content: Bitmap searched for with template matching (i.e. the template).
  • Excludetext: If "false", the template is searched for exactly as it is defined in "content". If "true", only the outer part of the template is searched for, i.e. ideally the margins, without the text that might be in it.
  • Borderbottom: The size of the bottom border (in pixels) when "excludetext" is set to "true. borderleft, borderright and bordertop are analogous to that.
  • Grayscale: Indicates that the search is done in gray-scale. But it has no function at the moment, because the search is always done in gray-scale.
  • Left: X coordinate of the element. Relevant if several identical looking elements are found - in this case the element with the most matching X coordinate is selected.
  • Top: The Y-coordinate, analogous to left. If both left and top are specified, the element with the lowest Euclidean distance to the coordinate is selected.
  • Index: If several elements are found, the element is selected which, if you count the elements from top to bottom and then from left to right, is at the corresponding position. If both index and left/top are set, an element is only selected if the element at the corresponding position is also the next element at the specified position.
  • Count: The number of elements that look the same. Must match if no position is given but index is set.
  • Originalscale: The scaling of the element when indicating. Currently has no function, but should be used at some point to find scaled elements again, for example, if you use different DPI settings than for at the production as on the development client.
  • OCRengine: The ID of the OCR engine to be used. At the moment only Tesseract is used, always 0 is used.
  • OCRlangmodel: The language model for OCR (if omitted, the language model is automatically selected - unlike the OCR activities, this is not just the first one in the list, but depends on the system language).
  • Multiline: Specifies that the element contains multi-line text. This sets the corresponding page segmentation mode for Tesseract in GetValue (for example, in GetTextActivity) and then possibly returns better results.
  • Context: A bitmap to search for in addition to the template. Is also relevant if there are several elements that look the same - the element that is closest to the context will be used.
  • Contextlabel: If no context was set or the context was not found, OCR searches for the text specified here to find the element. The functionality is exactly the same as with context. Regular expressions may also be used in contextlabel.


A general recommendation when using the vision sensor is to use Indicate and not Indicate with result, as this already reduces the number of meaningful features and keeps the effort for manual adjustments as low as possible.

Having a look inside the log file might be very helpful if the interaction with an element failed, because the reasons can be divers.

If an element is not found at run time, it may be because the search criteria are too strict, as each additional property in the selector further restricts the search. The user could try to adjust the search by removing properties one by one. In case that the position of the element has not changed (for example, by scrolling), it makes sense to start with context and contextlabel. If this does not help, we recommend to exclude count and index.