One of the most important parts of any search functionality is that the search index holds all of the content that can be searched for by users. For Guru, the search index is composed of information in Cards. Storing information in the index allows Guru to quickly find relevant content when a user submits a search query.

If you are interested in learning about the other parts of Guru’s search technology, see Guru's Search: Definition.

Building the search index

Every time a Card is created or updated, Guru runs a process to pull information from several components in the Card. We complete this process within a few seconds of a user publishing the Card.

These are the parts of a Card that Guru uses in our search index:

  • Card title

  • Card body

  • Tags

  • Any text that can be extracted from attachments or uploaded files (like PDFs)

  • Attachment file names


✍️ Note
Guru does not index iframed content. Content must be stored (“hosted”) in Guru for it to be searchable.


How text is extracted from attachments

When an author uploads an attachment to a Card, Guru runs a process to discover if there is text in the file. This text-identifying process is based on Optical Character Recognition (OCR) and a machine learning model that has been taught to recognize text visually. This OCR-based process works with handwritten and printed characters.

File types Guru indexes for search:

  • PDFs

  • Word (and open source equivalents)

  • PowerPoint (and open source equivalents)

  • Excel (and open source equivalents)

  • PNGs

  • Photoshop

  • Illustrator

  • Postscript

There are some limitations (per file, not per Card) to this OCR process to be aware of:

  • 500MB file size limit

  • 10MB file size limit for PNGs

  • Maximum number of pages is 3,000

  • Maximum height and width is 40 inches and 2880 points

  • PDFs cannot be password protected

  • PDFs cannot contain JPEG 2000 formatted images

  • Text must be horizontal, vertical text won’t be picked up

  • Text must be a minimum of 15 points - at 150 DPI this works out to about 8 point font


Since matches in attachments don't receive the same emphasis as matches found in Card titles, tags, and body content, we recommend adding some descriptive text to the body of Cards that contain attachments. Not only is a description helpful for improving search performance, but it will help anyone who comes across the Card better understand if the information in the attachment will be useful to them.

The text extraction process works with content in these languages:

  • English

  • French

  • German

  • Italian

  • Portuguese

  • Spanish

Related Cards

Did this answer your question?