TextOre Blog

Findability in the age of Big Data

As the volumes of data on the internet and inside corporations continue to grow exponentially, identifying the most important pieces of information becomes more and more critical. Understanding findability will be increasingly important for information workers.


The creation of the term findability has been credited to Heather Lutze in the early 2000s [http://www.findability.com/about-heather-lutze-search-engine-speaker], and referred to in public contexts by Peter Morville in 2005 [http://semanticstudios.com/about/]. Although findability is also relevant outside the web, the term is mostly used to describe how easily information on a website can be found, both through search engines and by website visitors.

Most blog posts discussing findability are concerned with organizing content for retrieval through search or website SEO (Search Engine Optimization). These issues are probably the most important for many people, but there are many other aspects of findability worth considering.

Content production vs content consumption

Generally speaking, there are two key roles affecting findability: content producers and content consumers. (Content distribution is also of interest, but we’ll leave that aside here.)

As a content producer, you have to ask yourself who is going to digest your content, for what purpose, and in which situations. Writing news stories that flow into the rapid stream of online media is different from creating blog posts on niche topics that are part of the “long tail” of internet content. For a mainstream news story – say, The Huffington Post- the “packaging” is important. A striking headline, well-written teaser and appealing images could make the story stand out from the news flow. On the other hand, a topical blog post – like this one - may only surface through deep searches, and must be equipped with relevant search terms in title, teaser, body text, and metadata. What’s relevant in each case, though, depends on who you plan to reach, for what purpose and where, and will therefore have to be prioritized according to your own goals and needs.

For a content consumer, the key to finding information is often to take on the mindset of the content producer. If you have a rough idea of the processes behind the type of information you’re looking for, you’ll have a better chance of using the right search terms, navigating to the correct location, or picking the appropriate tools to access the content. Let’s say you’re looking for a specific product in a web shop: it would be helpful to understand the logic behind the product categories to navigate them faster. Also, if you knew the capabilities of the site’s search engine (e.g. does it search in all product data, or just title and description?) you could tailor searches better and trust the results more. Another example is adjusting search terms on google depending on the type of information needed. If you’re searching for a technical document, like a manual or troubleshooting guide for a smart phone, you’ll use technical terms that are likely to be used by tech writers. Are you looking for news on coming smart phone models you might use features or product names in searches.

What affects Findability?

The most obvious factor in determining an object’s findability is the object itself, and its attributes. A web page has visible text and often images, organized in a page layout and structured with a headline, paragraphs, subtitles and so forth. There’s also “invisible” metadata in the source code that could give us more information about the page. A search engine will make use of both the visible text and the hidden metadata, while a human reader will normally just use the visible information.

On a corporate intranet, in document management systems, and in databases, it’s common to attach sets of metadata to documents and data. This could be in the form of “tags”, labels or categories; automatically updated created and modified dates; links to related objects; or plain keywords and additional descriptions. The challenge is often to convince the content producer – say, an employee posting a new document to the corporate Sharepoint intranet – to spend a couple of extra minutes filling these metadata fields with useful data.

In our experience, there are some additional factors at work in creating findability. The location of a piece of information can help or hurt its chances of being found. Physical locations – herein file servers – have some immediate advantages, like easy browsing. Virtual locations, say a database, is not as accessible, but once there are queries built to retrieve data from the database it’s more powerful than flat file servers. When considering location, there’s also the issue of permissions – who is actually allowed to find and view an object? Is it hosted inside a corporate firewall, or behind a news media paywall like the Wall Street Journal’s?

A content producer should also consider which tools the target consumers might use to find and retrieve information. Browsing a website or intranet, searching via a search engine, sharing links in social media, and downloading from RSS feeds are all popular ways to access content. A web page can be equipped with share buttons and feed links to enable this process.

Furthermore, there are several visual cues that can affect findability. Especially when browsing, a user may pick up on signals given by images and icons, font types, sizes and colors, or the page layout itself. If your content piece is a scientific report you probably want a plain layout, easily readable fonts, and a table of contents up front. However, a web page promoting your company’s services should have appealing images, an attractive layout, and the main selling point at the top of the text.

A final factor, which is a bit more difficult to get a good handle on, is context. There’s an objective context, as in “you’re now visiting our corporate website”. You can, as content producer, then safely assume that the visitor is there to learn about your company and your services. However, there might also be cases of a different, subjective context. For instance, if a student googled “information hierarchy” to find web pages about this topic, and clicked a link to a page “deep” in your website, he/she may not be interested in the company itself. If that web page is going to have a useful life in the “long tail”, it may have to be constructed differently than if it’s a pure sales point for your information hierarchy-related services.

Enhancing Findability

As we’ve seen above, there are a lot of things a content producer can do to make information objects more findable. Assuming that you want your content to be found, read and used, you can design the page, article or document for it, and choose location, context, visuals, and tools accordingly. Your first consideration should always be who the target user – content consumer – is. However, on the open internet, the actual user is not necessarily who you thought it would be, and the solutions you employ should be flexible enough to support unexpected usages. Taking advantage of (big) data, for instance from Google Analytics or the Twitter fire hose, would be very helpful in this process.

Content consumers can improve their results by “reverse engineering” the content production process to identify the most relevant search terms, likely locations and sources, appropriate retrieval tools, and natural contexts. Professional content users – information analysts – are likely to do this.





Further adding to th

by Angela 12 May 2014

Further adding to the copetmxily of the Internet is the ability of more than one computer to use the Internet through only one node, thus creating the possibility for a very deep and hierarchal sub-network that can theoretically be extended infinitely.

Great blog

by Tommy 04 Feb 2014

Great blog! I really like it!

Do you want to write a comment?