What is structured content?
When we think about content, we think first about the form it takes: a book, document, image, video or other representation. Each form of content has its own structure. Written forms, for example, come with structural signposts such as title, subtitle, sections and paragraphs, which help us to understand the anatomy of the material. But this isn’t what we mean when we refer to ‘structured content’. In the world of structured content, a document with all these signposts may be unstructured content. Not all content that has a structure is structured content.
Structured vs unstructured content
Structured content – also known as component content – is content that is organized in a predictable way. This means that there is a set of rules – called a schema, information architecture or content model (more on this below) – that the content must adhere to.
In unstructured content, such as a document written using standard word-processing software, no content model applies. Structured content, by contrast, is created with building blocks (or ‘components’) defined by the content model. This makes it machine-readable in an unambiguous way – documents become data that can be used much more accurately in any automated process.
The professionals who first started working with structured content are technical writers. They created structured content by using a markup language, which is an annotation system useful for defining the structure of content components and the relationships between them. XML – which stands for eXtensible Markup Language – is a markup language format that was specifically created for large-scale electronic publishing and later used for displaying content on the internet. It is similar to the more well-known HTML language, the difference being that XML allows writers to define their own new tags. Hence the name: extensible.
XML is software- and hardware-independent, meaning it is readable by any kind of computer or machine. Designed for storing and exchanging data, it’s the main format organizations are choosing worldwide to create human- and machine-readable digital content.
What is a content model?
Before you can write structured content using XML, you need to define the content model (also referred to as information architecture or a schema) that specifies what types of content components are allowed, what kind of content each can store, and how they will relate to one another within a document constructed of components. The content model sets the rules for what counts as a valid structure, which means that during the writing process, consistency and completeness can be ensured by validating the content against the content model. If a writer tries to use a component in a way that the model doesn’t allow, or forgets to write content for a component that is required, the authoring system that they’re using will alert them.
You can define your own content model or use an existing standard to do so. Two commonly adopted XML standards for this purpose are DITA and S1000D.
How do you create structured content?
Armed with an XML content model, how do you actually create valid structured content? The simple answer is: you use XML tags, as shown in the diagram. Tags are labels enclosed in ‘<’ and ‘>’ characters, used to indicate the start and end of a component. The tag marking the end of the component also uses a ‘/’ at the start of the tag.
Tags are nested to reveal their relationships – this is how XML assigns unambiguous structure and meaning to content. For example, the diagram shows how a person component consists of a name component and address component. The actual content or data sits between the relevant start and end tags (tag pair).
OK, but if you’re just finding out about structured content, you’re probably now wondering how authors go about tagging what they write. Do they need to learn XML and manually insert tags everywhere? Or is there a more user-friendly option?
The answer is that, while many technical writers do use editing tools that require them to know a lot about XML and work directly with tags, this is no longer the only option for writers. Today there are structured content authoring tools (such as Fonto) that offer an interface similar to that of Word or Google Docs, designed to hide the underlying complexity of XML.
Creating structured content deliverables
A key benefit of structured content is that it separates content from presentation (or formatting). Authors and reviewers can focus on the content itself, creating and validating components without worrying about how they will be formatted. The final, formatted deliverable is created through a separate publishing phase, in which components are assembled and a style sheet is automatically applied to organize and format the content into (for example) a document, book or web page.
For example, assume you have the following validated components or building blocks:
These would be assembled into the format you want:
Perhaps you’re now thinking that this seems quite a laborious method of creating a document. Why go through all this hassle just to separate formatting from authoring, if the end result is the same?
Well, to find the answer we need to zoom out and recognize that content has a life beyond any single document. Today’s businesses need content to serve multiple digital channels and formats, while also being able to publish and update paper-based documents where needed. And structured content is key to enabling content to serve these multiple requirements through efficient content reuse.
What do we mean by content reuse?
Without structured content, the only way to ‘reuse’ content is to duplicate it: to retype or copy-paste it into each different place where you want to use the content. With structured content, it’s possible to reuse any approved content component anywhere, as many times as needed, without duplication.
In the simplest of examples, you can insert the same paragraph on multiple pages of a document, without copying it onto every page. Instead, you write the paragraph just once as a content component and reuse it by including it in the structure of every page. This way, when you want to change this text on every page, you only need to edit the single content component that was referenced in each of the instances.
This example can easily be extended. You can use the same paragraph in multiple documents, for different products, across different brands and on any channel or device. Content reuse is the answer to coping with the multichannel, multidevice ecosystem that we live in.
With content reuse, the COPE principle came to be: create once, publish everywhere. The concept was presented for the first time around the year 2009, when US-based National Public Radio (NPR) needed an alternative to the then-current content management systems (CMSs) used to create websites and browser-based experiences. The world of digital content was changing fast, and the development of COPE was the answer to the limited resources and staff of NPR’s content team. The digital content era needed a paradigm shift and new tools that would allow them to create modular, portable content separate from the presentation layer.
Soon enough, entire industries adopted the strategy for improving content operations, output and performance.
The benefits of structured content
Structured content helps organizations reduce overall content operation costs and risks associated with publishing incorrect content. It represents a huge jump in the evolution of content and content automation, enabling content developers and information architects to:
• Reuse content, and render it into multiple output formats
• Enforce consistency through XML validation
• Easily update content in multiple places simultaneously
• Create and manage granular content variations
• Reduce translation costs when reusing components, by not sending them for translation if they have already been translated
• Considerably reduce publishing time for global multilingual omnichannel content, enabling faster go-to-market times
Who is structured content technology most useful for?
Many organizations will have a variety of use cases for structured content, but it makes the biggest difference for:
• Businesses that produce in-depth content for multiple brands, markets and product variations
• Regulation-heavy industries that need to comply with stringent laws and standards
• Organizations that are at the forefront of technology, enabling human-to-machine and machine-to-machine communication in the era of the internet of things (IoT) and artificial intelligence of things (AIoT). From technical writing use cases – for example, producing consistent multilingual automotive user manuals or technical documentation for medical devices – to contracts, policies and procedures, structured content saves time and money for organizations that need to stay ahead of their competition.
Want to know more about how structured content can help your business move faster?
Dive deeper into the world of structured content with Tridion Docs.