Greg Boutin has posted an excellent response to
Richard MacManus's post on
Where Are All The RDF-Based Semantic Web Apps?, which you can find at
RDF/Linked Data Standards Not Good Enough For Intelligent Agents? Or Is It The Opposite?.
In it, Greg asks the question:
"
does RDF serve intelligent agents properly? Because the point here is that,
if it doesn't, then we need an alternative method to get the job of representing information done.
Intelligent Agents alias Top-down technologies simply can't do without it.
HTML,
XML,
Relational Databases don't seem to cut it. So what does? Do
we need a completely fresh approach, or changes to the existing RDF
stack /
Linked Data model? Those are questions I ask genuinely. I don't
know the answer."
Let me try to answer based on my personal experience (still ongoing) while thinking at these issues from a practical perspective of working on an intelligent application that uses semantic data (notice the lower case). And by the way, let me explain where I'm coming from so that you understand the context of this post, and the fact that I am not in the business of technology politics (upper versus lower case semantic, and/or top-down versus bottom-up), but rather I've studied it extensively from the perspective of someone who has problems that need be solved without waiting for an entire technology to come to mainstream fully:
1. I am a "science" guy, that is: my background is in Data Mining/
Machine Learning, not
RDBMS, etc. (although I do get the depth of metadata having had to work with different over time in different situations), and to me, "intelligence" = algorithmic; I need my "intelligence" to do something to the data that was not there obvious before (read: Information Extraction, Context Extraction, recommendations, personalization, targeting, etc.);
2. I do like the idea of the
semantic web and have been fascinated by it since I discovered it: I think it's both a much closer way to structure data to the way we humans think, it's more flexible (albeit not scalable enough with current technologies) than flat, or relational ways, and I do believe the marriage of
ML and semantic structures is made in heaven.... theoretically at least (until someone comes up with a good app that needs to fulfill the following conditions: is consumer-facing, is solving a real problem and not built as showcase for a specific technology, has a business potential to it (read: business model), and is evangelized in a consumer-friendly way (not by a bunch of technical blurbs that I may understand but my wife certainly wouldn't).
Given these pieces of information about me, I wanted to say as response to Greg's post that :
1. the key to your question, Greg, is having a shared and clear definition of "intelligent". I see "intelligence" as described above, while most apps I see our there (RDF-based or not) use "intelligence" to showcase a slightly better version (usually in
SPARQL) of the typical "SELECT .... WHERE" from SQL. To me, that's not even intelligence. It's certainly useful, but it speaks more about data interoperability, not intelligence. I need more than that. By the way, the distinction I have made is not different from the one encountered when people use the word "analytics" or "
Business Intelligence": most refer to "reporting", but very few mean the concept denoted by activites such as "predictive modeling, (un)supervised clustering, Bayesian nets" etc etc. So let's just say I am using a stricter definition of the term "intelligence".
2. From this perspective, I think that the type of "intelligence" that I mentioned before has long been used in Machine Learning (see
Bayesian Networks, and others) in a very graph-based way (albeit not with RDF), that is: processing and generating intelligence by leveraging relationships/links between terms/concepts/attributes, etc. This is really no news for me. The only difference is that the aforementioned methodologies have not used (until recently) data physically structured in a graph-based form, but rather using XML-based "rules" (or external application logic) based on relational or flat structures. So to answer your point, this stricter definition of "intelligence" was quite ok without RDF.
3. More recently (and this is an area I've been working in, and doing research), there are VERY very few attempts to use (my stricter definition of) intelligence directly on physically structured graph-data, and I could name some:
SPARQL-ML,
Proximity, the work in CRF specifically by Andrew MacCallum (
Mallet) and other open source, academic projects. I know of some efforts of doing similar things by various startups in a very proprietary, but practical way too. The technologies cited above are all part of a larger area (that mixes modern Machine learning technologies with the good old Prolog-style
Logic Programming) of Statistical Relational Learning ("relational" here does not refer to RDBMS, but stands for "relationships" between concepts/attributes, etc).
To your question, I would say that "intelligent" applications are ok with whatever way data is structured in, but that if they use graph-based data/metadata (even with OODBs using Hibernate on top of MySql), they have particular advantages by doing so above and beyond doing it on top of flat or relational data. But they are quite intelligent enough for semantic data, I don't see the problem as one of technological misalignment, but rather both practical and political. Let me explain (using my experience as I promised before, and without necessarily giving concrete details quite yet :-):
A. I have tried to use RDF for my "intelligent" applications, but it would have taken way too long, would have had major scalability complexities, and I approached my problem (in trying to build an intelligent app on top of graph-based data) looking for a solution rather than viceversa; in the end, and at this time, I decided to go a specific custom route, but not taking advantage of the full RDF (I guess that would make me a "top-down" although I agree that
Alex Iskold's distinction seems the opposite of what I would think of being rather "bottom-up"). Everytime I tried to implement RDF at the bottom of my "intelligence" I would get either "you need to change your intelligence" (which would have substantially reduced my intelligence to "SELECT .... WHERE"), or "it's not quite scalable yet" (given that I need to process floating point operations at run-time on millions of rows but traversing the graph in a very depth-first search way) that I didn't think is possible quite yet. I would get these statements from avid proponents of RDF that did not quite understand my ML meaning of "intelligence" and would try to reduce it to "but do you really need that? Why can't you take Dan G. is_friends_of from one source and link it to Dan G. likes_beer from another?"). To me, the example here is what all RDF-based apps are able to do today (and for some time I guess), but not my specific application of "intelligence".
B. Political, because everytime I would genuinely try to solve my problem is really do due diligence on most optimal technologies that would allow me to do that, I would get "top-down" suggestions that I really should use RDF, etc. etc. Which to me, is technological totalitarianism. It looks from the solution's standpoint, in search of a problem, not otherwise. I mentioned before in earlier blog posts and articles that this is the major problem Semantic Web faces today.
In the end, I have a problem that needs solved (I solved it, and think I know how), and the key answer to your question (and another source of the problem) is that there are two "schools of thoughts" and behavior when people talk about "intelligence", etc.: one that is heavily warehousing-based (whether the warehouse is relational, or of any kind of flavor) (and I call this the logic-based guys), and the other is the algorithmic guys (I call these the "science" guys). There is very little common understanding of each other, sadly (because the two are really interdependent as you mention), there is even less shared "hanging out" in joint projects that leverage each other. Let me give you an example of what I mean: one time I've asked the "logic guys" (these are the typical RDF, but ex-RDBMS guys) to give me a randomized sample of some Internet data (that was extracted from various sites); I got it, and thinking it's truly random, I started my algo-type work, only to figure out it was nothing random in it. When i talked tothe logic guys, they told me "well, sure it's random, the database was partitioned at random daily". To which I asked "wait! how was the partitioning done?". The answer was: "well every day, we'd take the first 1/2 of the daily data and put it in one partition, and the rest in another/others". Clearly, we were using the same concept, but meant opposite things. This is a clear example of how logic and science guys don't talk "nice" to each other.
I think there is an asnwer to this, but I think it has more to do with practical and political reasons than to technological problems. What say you?
Hi Dan,
a great article that I certainly hope a lot of Semantic Web guys will read! Coming from the same background as you, I have _exactly_ the same view on the topic! :)
bye
Andraz Tori, Zemanta