Ever wonder if humans are entering the age of AI as seen in Stanley Kubrick’s Space Odyessy 2001? Get to know Sean Martin’s pioneering work in the semantic web and the exciting developments in the field.
Hometown: Port Elizabeth, South Africa
Current Residence: Boston, MA
Occupation: Founder & CTO, Cambridge Semantics Inc.
Areas of Focus: The Founding & CTO’ing of a company dedicated to next generation enterprise software based on the W3C Semantic Open Data Standards.
For those who are not familiar with the semantic web, how would you explain it?
The Semantic Web is the second brilliant idea from the creator of the World Wide Web, Tim Berners-Lee. The original web, the one we use daily, is human-readable. It requires a human reading a screen to make sense of the information being presented. That human then has to make decisions on what they see and understand according to the context of that knowledge. The contribution of the machine presenting the information on the screen has thus far been limited to interpreting instructions in the retrieved data stream (web page!) and rendering that information on the screen so as to make it readable.
The emerging Semantic Web is a parallel and sometimes overlapping treatment of that same information in machine-readable format. This form is a collection of protocols and data-representation standards agreed on by members of the World Wide Web Consortium or W3C. It enables software programs to make sense of it and to automatically take actions based on their understanding of what it all means. We are thereby spared the effort of having to do something tedious through largely manual means. The computer intelligence at work here even assists us in making complex decisions and perhaps even carrying out tasks that might lie beyond the reach of humans but are taken care of in short order by a computer.
This software system forms the bedrock of a highly disruptive family of technologies that can give innovative businesses a considerable competitive advantage. They significantly alter two important variables relative to where businesses are today with their systems & data. They allow for a change in the skill level necessary to do more with an organizations data; pushing the bar far closer to the point where the end user can help himself to whatever it is that he or she needs to do. At the same time, it enables that same user to put together data-driven systems at a considerably greater speed.
How would you define a super computer?
Extraordinary amounts of raw computing power, and sometimes storage, assembled in such a way that that makes it possible to automate the solving of enormously complex problems. While very expensive in dollars, it may, nonetheless, be had very cheaply – depending on the value of the problem being solved. Generally you need to be the government, IBM, Microsoft or Google to buy or assemble one – or at least be on very friendly terms with a government! But then again my new Android based Galaxy S is an exceptionally pretty super computer too, and one that solves all kinds of personally complex problems for me all the time. When compared to the multi-room 8k mainframes my dad was programming in the ‘70’s, it is way beyond their imagining…a super computer in my pocket.
How would you define an ontology and its relationship with the formation of meaning?
I am told that the original ancient Greek word “ontology” is directly translated to mean “of being” or “to be,” and that it is related to the branch of philosophy that covers the study of reality and the meaning of things. It’s all fine if you want to plumb the depths of Plato and Aristotle’s thinking, but at Cambridge Semantics, ontologies mean OWL – the counter intuitively TLA name for the “Web Ontology Language” from the W3C. OWL is the open data standard we use to represent collections of concepts–interrelationships and the contributing properties that flesh them out. For us, they are the way that an expert in the domain describes the information they deal with all the time. This is not complicated, it’s just Joe User formalizing the meaning of the data in their spreadsheets or databases in their own terms.
What is new in the field of semantics based application systems and middleware (program libraries that make it easier to build complex software systems) is that we are now able to build software that preserves all the data as described by OWL ontologies. We no longer have to build multiple layers in our software systems that translate between Joe User’s view of the world (and his data) and a model that the underlying computing machine requires in order to store and manipulate that data. T his is quite an amazing difference. Traditional systems are extremely brittle because translating between how a user understands information and how a computing machine can most efficiently store and retrieve it requires teams of programmers, QA and probably a project manager or two. For example, a fully “de-normalized” relational database model usually bears very little resemblance to how an end user thinks about his or her information.
Every little change in the business users’ model of what they want to see in their data requires revisions of the underlying “black box” that actually implements the system – not to mention the requisite buckets of cash and the many months that are needed to deploy it. Believe it or not, the largest portion-by far-of the billions that business dedicates to IT budgets is spent on applications maintenance.
It goes, in other words, towards updating the “black box” in order to meet their business customers’ newest requirements. Semantics-based systems don’t have this barrier and often allow end users to “evolve” the software that serves as a home to their ever- changing data.
What exciting developments have you seen take place in the field?
What gets me the most excited are the converging elements in a whole slew of technical fields. So many things are coming together very suddenly, in hardware, software and in combinations that are changing the game utterly.
On the hardware side, you have multi-core CPU’s (we could be see hundreds of cores in a single CPU chip package within just a couple of years), commodity hardware clustering on a massive scale, ever increasing cheap RAM (both as traditional RAM memory and particularly in solid state drives/SSDs), the extraordinary improvements in well connected mobile computing, and fast general networking.
On the software side, we have the evolution of cloud computing architectures, both at the operating system and application layers – e.g. map reduce, virtualization. Then there’s the parallel data access & manipulation technologies pioneered by companies like AT&T and Google as well as the current no-SQL movement. Finally, there’s the work on semantics that teams like ours have spent nearly a decade on. It is all finally coming to fruition and it all adds up to a shockingly explosive blend.
Things that were unimaginably impractical on the semantic software side just two or three years ago suddenly got real thanks to the hardware guys. A whole new application model is coming thanks to cheap RAM – the current standard application server architectures from companies like Oracle (BEA) and IBM (Websphere) are a total anachronism now even if most of their customers (or developers?) don’t know that yet. The fact is that RAM is the new disk and disk is the new backup tape. The extraordinarily rapid deployment of these hardware improvements are what makes the semantic software we are building viable.
Can you provide case studies in the semantic web taking place today?
Yes, but far too many to detail here. Look at our company website for a few examples of what we are doing for our customers. Also the W3C has a large nu mber of use cases in all sorts of industries. The last year or so has seen huge interest in the semantic web, resulting in its basic concepts being proved and embraced in a host of industries. Although Life Sciences still lead the way for us, at least, as far as adoption and the achievement of early value are concerned, it’s is great to see this happen…given how much that industry has invested and supported semantic technologies from the cradle.
To highlight a case study of paper.li ( www.paper.li) which is a convergence of the semantic and social web, what are thoughts on this?
My own interest is somewhat limited. Cambridge Semantics is of course fully compliant with the W3C’s Semantic Web standards and naturally our customers will often (coincidentally or not ) benefit from the generally increasing adoption of the same standards and the growth of the Open Linked Data movement -organizations that make their data public for general consumption using these standards. There are many examples of the way of the Semantic Web increasingly catching on in the open internet. 2010 was better than ever when it comes to adoption – both the UK & US governments started publishing masses of data, the BBC, Best Buy, Yahoo, Google etc.. it is all good that these guys start to get it and adopt a shared means of exchanging data with all and sundry. The fact the new Facebook OpenGraph API is based on the Semantics standards is wonderful too. Clearly everyone gets the strategic value.
That said, our customers are very large enterprises and their interests do not necessarily coincide with those of the consumer web. While enterprises may use the open data standards to more easily interchange data with their business partners or sometimes interact with their customers, right now the early enterprise adopters of Semantic Technology are starting to see extremely significant benefits in using this technology to address many of their internal IT challenges, often behind their firewalls on their own intranets or even within specific applications and on factory floors.
What is the importance of real time content and its challenges regarding information overload?
Ah, a tough one! Real-time is clearly very important to us. We have invested an extraordinary amount of effort enabling our software to distribute semantically described data in real-time. We are implementing solutions for manufacturing customers where immediate data transfer and complex analysis is essential – for example in production quality control where we try to detect drift in highly dependent operating parameters on a hi-tech manufacturing line process to help improve product yield by reducing defects. We also have solutions where we enable users in different parts of the world to collaborate together immediately on information that they share.
But I don’t think either of these examples of real world solutions address the intent of your question which is probably more about searching Twitter and Facebook for information being created in real-time on mass and perhaps the explosion of information that is immediately available on any topic through services like Bing & Google. As I explained in my previous answer, consumer web problems are not our current focus, but I think there is one take-away I can provide. If real-time data ends up in a machine readable format, i.e. using the Semantic standards, then we will most likely be able to create more sophisticated software programs and fast computing machines to take care of any overload issues that arise. The short – as a human, I am not at all worried about it.