Content Authoring for Human and Machine Consumption

During the last 25 years, technical communicators have had to adapt to several major shifts in the content landscape. In a simplified timeline, we would illustrate content moving from paper-only to the web, to mobile, and eventually to conversational outputs, including personalized text variants and chatbot dialogues. Out of necessity, many internal publishing groups have moved away from waterfall workflows that resulted in one-time publish events of PDF versions or paper publications. In response to market demands, teams have moved towards authoring content in more agile and iterative workflows. Authoring and management workflows evolve as communications require more updates across more channels, and action in response to swift, incoming consumer feedback.

So, we have all been forced to make progress towards an omnichannel, real-time world that has arrived before anyone was really ready for it. But really, we are just getting started. Now, we face a huge mind shift. The ways we create and prepare content for publishing will fundamentally change because we no longer write only for human audiences. Our consumers now include machines; and, the machines need to understand and use our content just as effectively as humans.

Preparing content for machines takes real planning, work, and coordination. We know this because preparing content for humans requires the same kind of effort. As part of human content strategy, it has become a best practice to create customer journey maps that plot out which messages map to which human audiences. Similar journey maps will need to be mapped out for machine consumers, too.

Machine consumption ultimately leads to some type of content consumption by humans. But, the needs of machines are different, and we must master techniques to optimize delivery to this alternate audience.

Machines that actively work to consume our content include bots indexing for Google and other search engines, content aggregators, intelligent assistants, and social media syndicators, and many others listed later in this article. All machines ultimately aim to drive efficient human consumption.

Humans demand accessibility, comprehensibility, and creativity. Machines demand structure, relatability, and specialized metadata. Both humans and machines consume our content. And, the more coherent and complete our content, the more consumption we will facilitate for humans and machines.

Old Habits, New Models

Outside of techcomm (and even within many techcomm teams today), the majority of content is still authored with tools and processes that engender a “page presentation” worldview. Content authoring paradigms have followed some familiar patterns:

  • Authoring in unstructured Word or Google Docs: Although we may consciously think about mobile devices and other output while we write the first draft, or narrative flow, a white, rectangular screen does resemble a sheet of paper. The end-user experience consuming the content will often be quite different than reading from a page or a laptop screen in a stationary position. It will move, be driven by thumbs, be spoken, and need to adapt.

  • Authoring against wireframes: Although we know content has to live across many templates and in many forms, we end up “inserting content here” into wireframes or other experience planning layout tools that introduce containers for content. Since much of our original source content is still authored in a space resembling a page, when we copy/paste from that source into such layouts, the content is not optimized for the delivery destination.

  • Authoring in design and layout tools: Although traditional authoring tools were designed for static, print content, many of us still manually paste and manipulate content itself with InDesign, or similar template-layout environments. With authoring tools in which layout, formatting, and graphics take precedence over text, fixed-aspect visual context becomes the context that matters most. Unfortunately, that leaves behind a myriad of other potential forms of consumer context, including segment, location, timeline, and device.

Technical content sets are far more structured than non-technical sets. Within the technical content community, various approaches in use include:

  • Rapid semi-structured authoring: Using Markdown, or fixed style guides in Word, offers a basic structure that’s useful for authoring independent of a development environment or formal editor. Although this type of content has useful structure for portable machining, by itself, it is far from optimized for machine consumption, semantic enrichment, or for intelligent processing through a customer-demand workflow.

  • Structured authoring tools: Real structure starts taking shape in systems designed to capture content into a defined schema of some kind, or at least allow the creation of schemas on the fly. This kind of authoring happens in tools either separate from or integrated with a Component Content Management System (CCMS), and often incorporates DITA XML or other structural standards. Although topic-based structured authoring is a good fit for task-oriented documentation with reusable content, we are finding a need to define content and metadata, including Linked Data associations, at a more molecular level in order to optimize content for machine consumption.

  • Intelligent content authoring environments: Authors work with content at a granular level, interacting with semantic services applications to enrich content in ways that will drive automated behaviour downstream. These environments conceal, or reveal, the markup details as necessary and they provide authors with interactive preview and testing services so that the utility of the content components, and its semantic metadata, can be confirmed while the author is working.

In an “Information 4.0” context, these next-generation intelligent content systems build on the emergent systems seen within the most recent wave of technology innovations. An example would be AI services that seek to understand consumer context, even sentiment, by analyzing a combination of user behavior, content engagement, queries, and content contribution. Next-generation search engines, websites, and conversational interactions will increasingly leverage data about what content a user interacts with, and resulting user actions.

Accelerating Consumer Expectations

Starting with the iPhone in 2007, mobile devices accelerated portable consumption, and most of us have had to remap our customer journeys to accommodate this dynamic. Whether outputting to web or mobile, we mastered the concept of “every page is page one” (Mark Baker, 2012). Knowing that consumers now determine where their journey starts, the next step is clear: on-demand, contextual personalization. The movement towards personalization is an inevitable, natural evolution from the book, to the website, to the independent page, to the decoupled fragment, or molecule, of content. Each fragment can be discovered, through machines, separately from the page. With structure and semantics underlying content systems, every content fragment can ultimately live its own independent life. Answers can be discovered separately from the pages the authors wrote them to live upon. Google already provides such “rich snippets” millions of times per hour.

So, can our content keep up? We need structured, adaptive content sets to meet the needs of ever-accelerating customer journeys that now transit multiple devices and states.

Consumer expectations grow daily. We have come to expect instant access to highly personalized content. Conversational interfaces facilitated by Google and Amazon’s Alexa train consumers on a daily basis to become increasingly dependent on real-time delivery of relevant, useful information in smaller and smaller chunks. Google parses structured data into rich snippets within search results, extracting information from our websites and answering customer questions as soon as possible, with as little friction as possible. All of this requires intelligent content capable of interacting with machine consumption and automated distribution.

Addressing the “Non-Human” Environment

A “perfect storm” has resulted from the confluence of Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), and the dramatically improved platforms for Big Data, including Customer Data Platforms (CDP). These massive innovation drivers have created a seemingly insatiable demand for “machine-consumable” content to pair with the awe-inspiring zettabytes of human data. And, so far, only a small minority of publishers have changed their internal systems to serve contextually rich, structured content ready for machine processing. A massive opportunity lies within the gap between need and implementation for organizations attuned to this rapidly shifting landscape.

Fortunately, most of the same methods and workflows we use to access a growing matrix of human consumers and channels have a direct parallel to methods we require in order to optimize content for machine consumption.

Audience Segments and Personas for Machines

Our content strategies and content models must expand to accommodate machine consumption as well as various human consumers. We need to identify the types of machines that will consume our content. Various machine consumers include:

  • Search engine bots
  • Content aggregators
  • Screen readers and other tools for accessibility
  • Indexers and web crawlers
  • Chatbots and Intelligent Assistants
  • Social media syndicators
  • Personalization algorithms
  • Automated marketing systems
  • Sharing and amplification tools
  • Regulatory compliance checkers
  • Semantic Web applications
  • Artificial Intelligence (AI) services
  • Other content, knowledge management, and marketing technologies

Content team members need to segment and define what kind of content each machine consumer is looking for, asking questions such as:

  • Does it consume structured data? In what forms?
  • Does it need markup to generate rich search results?
  • Does it need to understand context or purpose through metadata?
  • Does the Content Management Systems (CMS) need to process it in a specific way?
  • Does it need to interpret images, or understand them through explicit text?

Benefits of Machine Delivery

Machine consumption also extends to some enabling tools or technologies that may not immediately come to mind. Screen readers, for example, require text that can be served as audio for accessibility. This issue will grow in importance along with an aging population.

Content optimized for machine consumption has far more value for human consumers:

  • It is more “findable”, therefore more useful to humans.

  • It can be targeted to specific human contexts and conditions, therefore reducing cognitive load on mentally overloaded humans.

  • It can be used with AI and ML to automate more tasks, making life easier for humans.

  • It can be used to reduce the burden on humans for repetitive top-of-funnel sales and support or help demands.

Our Roles Will Change and Expand

Publishing to machines involves collaboration within a much larger content operating model. This collaboration involves major innovations in the content supply chain:

  • Our content disciplines and practices must expand into machine, as well as human, consumption.

  • Content strategists must help define which machine segments need to be addressed.

  • Content engineers must figure out what the shape and structure of the content needs to be in order to conform to those machine consumption channels.

  • We need updated content technology and integration architectures.

  • We need shared orchestration models for structure and semantics.

Fundamentally, everything needs to change from the ground up. We can, however, make those changes in small parts. For now, the call-to-action for technical communicators is to continue increasing awareness around the issues described here, and to start learning relevant skills that will facilitate this new era emerging across authoring and publishing.

Contact Us for a Free Consultation.


Cruce Saunders is the Founder and principal at [A]. He hosts the Towards a Smarter World podcast and The Invisible World of Content YouTube series. He has been a keynote speaker around the world at conferences and an invited speaker at enterprises around the world. He regularly presents on AI, content intelligence, content operations, content engineering, personalization, governance, content structural and semantic standards, and enterprise transformation.

[A] Editor's Note: This article was modified from the original version published in the Intercom July/August 2019 issue by the Society for Technical Communication.