What Bill Kasdorf doesn’t know about technology and digital publishing could fit on the back of a postage stamp. As an expert in accessibility, XML/HTML/EPUB modelling, information infrastructure, editorial and production workflows, and standards alignment to future proof content and systems, Bill has written and spoken widely on publishing technology and workflows, with The Columbia Guide to Digital Publishing and BISG Guide to Accessible Publishing amongst his editorial credits. He holds the wonderful title of Publishing Evangelist for the Worldwide Web Consortium (W3C) where he serves on the Steering Committee and Publishing Working Group, developing the next generation of Web Publications and EPUB 3 standards.
Bill and I met in December 2021 and our conversation covered many of the areas which are close to his heart.
Why is it Bill, that some sectors of the publishing world are more open to XML and digital transformation within their workflow?
Some elements of the market are stubbornly print centric, focused on physical things with pages and chapters, and often ask “Do we really need XML to do that?” A common mistake is that these publishers see XML as an end product of their workflow. They ask a pre-press firm to typeset their book, then at the end of the process ask for the XML. I know presses that have been accumulating XML for years, and when I ask them what they do with the XML, the answer is nothing. And that’s why they stop seeing a need for it.
The answer is that you should be starting with XML – then you have structured content way upstream which can do huge things for you. It enables you to automate things, it enables your content to be more consistent, and, if you have a workflow engine, it’s reliant on that dependable and parsable structure for it to work.
But what’s turning the tide, and a key reason people come to me, is that they recognise they have a print centric mentality when they need a digital centric mentality. A digital mentality is not about the tools that they use, their processes or systems, or the files that they produce – the biggest barrier is the people working in the organisation who can’t see past the page. When they picture content, they picture it in the form of laid out printed pages.
Where are you seeing the greatest innovation, Bill?
The biggest change in this realm is happening in higher education. Because firstly many of their books are giants and very complex to produce, expensive to make, and expensive to sell (and therefore they are selling fewer and fewer). Consequently, they are all moving towards platforms. Access to the content is not in the form of chapters, it’s much more about delivery of granular components. That enables the institution to monitor how and what the students are consuming or skipping. They can monitor how well the student understands content by providing assessments. All of which provides data behind the consumption of the content – which you can’t get from print.
And data is the key, from knowing your customers to knowing how they consume your content.
Absolutely – all you can get from print is did they buy the book, whereas online now you can have real insight. So, in the education space, they now have the ability to say this student has mastered this thing, so they don’t need more content like this, let’s move them on to this content, whereas this student doesn’t get this, let’s feed them more of this.
What led XML to take such a strong hold in journals publishing in contrast to books?
Journals were way ahead of the game on XML in part because of the National Library of Medicine (NLM). The NLM wanted to aggregate content, medical literature from scholarly publishers, but all presses had different models so NLM hired a team to create a DTD which could be used to aggregate all of this content. That created a pressure for the publishers to gravitate towards this standardised XML model (which has become JATS XML and BITS XML). This model became so pervasive – and the delivery platforms were based on it too – that it became the lingua franca of scholarly publishing. Another factor is that journals content is so consistently structured.
What do you think might trigger more widespread adoption of XML workflow by book publishers?
What’s also helping to make XML much more mandatory is accessibility. It’s been a long time but people are finally thinking about it. The eBook revolution was pretty slow to take off, around 15 years ago people thought print would die out and people would be reading books on devices. That hasn’t happened to this day because the book format is a really good way to consume content, but it’s not accessible. So even though eBooks are 20% of the market, virtually all publishers produce them via an EPUB 3 format and EPUB 3 is designed to be able to be accessible.
And you’ve been directly involved in establishing a new standard for accessibility?
I haven’t been involved in creating accessibility standards – that’s primarily done in the W3C. But I’ve been part of the group that has been developing and advancing EPUB 3 and how to make it more accessible.
When I first started working on EPUB, many years ago, the Daisy Consortium International (an international accessibility advocacy organisation https://daisy.org/) had developed a suite of standards (referred to as Daisy format) one of which was an XML model called DTBook which assistive technology was programmed to understand. At that time few publishers used DTBook except for accessibility purposes so this always had to be produced as an offshoot of what the publisher was doing. Frankly, hardly any publishers did this; their books had to be remediated, usually by specialised service businesses, not the publishers.
When we developed EPUB 3, Daisy participated in that development, helping to ensure it could be accessible. So now, DTBook is mostly history and EPUB 3 is the format for interchanging accessible publications. Instead of needing a specialised format, just do the EPUB 3 right and assistive technology will be able to use it – and that’s where we are today. So, if it’s properly tagged and structured and has the proper features like alt text for images, EPUB 3 is the Daisy recommended format for interchange of publications.
And that’s become a global standard?
Well, the Daisy Consortium advocates the use of it, but the EPUB 3 standard was created at the IDPF (which became part of the W3C about 3 years ago), so EPUB 3 is now a W3C standard. It’s the right home for EPUB because W3C is responsible for XML, HTML, CSS – most of the major web standards, and EPUB is based on those W3C standards.
And who manages that standard Bill?
Currently, the EPUB 3 Working Group in the W3C. Importantly it has a very formal process by which its standards (called recommendations) are developed.
When a new standard specification is written or updated, every feature in that specification has to have two independent implementations or else that feature is taken out. For a specification to become a formal recommendation it has to go through horizontal review which reviews it for accessibility, for internationalisation, for privacy, security, and alignment with web architecture. That demonstrates that that feature can work and that it is useful. So now, when we have a recommendation that gets to formal recommendation status (and EPUB 3.3 will be that status by the mid to end of 2022), we know it works. We have also specified that 3.3 has to be backwards compatible which we know from past history will help adoption.
Where will publishers first see the fruits of this work?
This is absolutely golden to education. If there’s any part of this industry that has to be accessible, it’s education. The law doesn’t require the publishers to make their books accessible, but any public university is required to provide an accessible version to a student who cannot consume the print version. There are a few examples of where we are driving this process.
I’m involved in a Mellon funded project called FRAME (Federating Repositories of Accessible Materials for Higher Education) which is a group made up of Disability Service Offices (DSOs), which virtually all universities and colleges have now. This group is charged with getting some kind of file for a book and making it accessible for any student who requires it. It’s currently a very cumbersome and wasteful process, so within this FRAME project we are trying to create a common infrastructure so that these remediated files can be saved and shared.
Upstream, Benetech (https://benetech.org/) is an organisation devoted to accessibility (among other things). Benetech’s Bookshare is a huge repository of books provided in five accessible formats for print disabled people. (There are over a million books in that repository.) Bookshare also has a programme called Global Certified Accessible, which examines a given publisher’s books and workflow. If that publisher can demonstrate that they can consistently produce accessible books then they get the certification. This has prompted pre-press service providers to also follow the standard and achieve a ‘good practitioner’ accolade.
These standards are a great example of where and how XML can have an impact linking publishers more closely to their customers and supplying back data. I’m reminded of ONIX standards which allow metadata to be input at one end of a technological pipeline (by the publishers) and consumed at the other (in this case by retailers – in multiple languages).
By contrast, to produce academic books, publishers have to navigate a series of stepping stones, each one of which might have to pass through a different vendor, attracting a different cost, using different software with different standards, and different skill sets required. And each of these steps are effectively cul de sacs – if you want to make an edit you need to go back to the start again.
I keep imagining a future for publishing that is a little like the ONIX system whereby a publisher manages a conduit that allows content to flow from author to consumer and back again. Much like the HE sector this technology would allow publishers to interact with their customers, see what content has been read by whom, what’s most popular, what has the most citations.
What do you think it will take to break this current cycle and trigger a new age in digital publishing, Bill?
There are two modern requirements that might help break that monopoly of format:
1) accessibility – PDFs are not very accessible, sometimes not at all, and 2) mobile. Try reading a PDF on a mobile! The generation that is coming of age now and joining the publishing profession has grown up reading everything on their phone, so that will create real pressure for this content to become truly digital.
What’s driving change from within the industry?
Increasingly these standards organisations are collaborating with each other. For instance, the content documents in an EPUB are XML, but they are XHTML – and not the age old HTML4. Our standards specify that the markup must be in whatever the latest version of HTML is, but it’s expressed as XHTML as that gives it a degree of rigour that HMTL doesn’t have. Many publishers are moving in this direction, and despite them working in JATS and BITS upstream, when they make the EPUBs it’s not BITS or JATS in the EPUB, it’s XHTML. So, there is a tendency to move in that direction; why not start with XML in the first place?
Metadata is also an interesting example. When we were developing the accessibility metadata that goes in the EPUB, we collaborated with schema.org (https://schema.org/). So now the accessibility metadata in an EPUB is identical to schema.org metadata. We’ve worked with Graham at EDItEUR to create a crosswalk between EPUB accessibility metadata and ONIX accessibility metadata.
We are also about to start work on an even more challenging crosswalk to MARC 21 (https://www.loc.gov/marc/ ), a cataloguing format for libraries. We’re not going to be able to get libraries to stop using MARC 21 (it’s been tried).
What do you believe will be the burning platform for organisations who have yet to make the leap to digital transformation? Many organisations believe they have made great strides simply by digitising their product, which of course does nothing towards digitising their organisation (which is the greater goal).
Yes, and there is cost and delay associated with doing it that way. If you build XML upfront and work to the right standards in your workflow, the accessibility formats are free and the digital structured content is free. The long-term benefits cannot be overestimated.
Publishers are motivated by three fundamental things: cost, speed, and legislation. So, workflows which allow for the swift production of EPUBs as part of standard workflows are very welcome. This also allows them to publish content ahead of the print.
When it comes to accessibility, universities have the legal liability right now, but that puts pressure on publishers to produce accessible content in the first place because that gives them an advantage from an adoption point of view. Increasingly, there are universities whose procurement processes won’t consider a supplier who can’t provide accessible content. This will also put pressure on the supply chain side, even though there is no legal exposure there.
That’s US law you’re describing. Are there similar initiatives in other global regions?
Well, a huge step forward recently was the EU Accessibility Act. It is not too far into the future, around 2 years I think, when you will not be able to sell an eBook within Europe that is not accessible.
While developing EPUB 3.3 and the accessibility standard that sits alongside it, members of our working group actually worked with the EU so that EPUB 3.3 and its companion standard EPUB Accessibility 1.1 have become the specifications that enable publishers to meet the new EU requirements.