Log in

No account? Create an account
04 August 2012 @ 07:57 pm
Transforming Wikipedia into a Large Scale Multilingual Concept Network  
I seem to have a long list of things I want to blog about, hopefully I'll actually manage to get down to it properly this week!!

Anyway to start another of my (obviously not remotely weekly) 100 papers in AI.

100 Current Papers in Artificial Intelligence, Automated Reasoning and Agent Programming. Number 6

Vivi Nastase and Michael Strube, Transforming Wikipedia into a Large Scale Multilingual Concept Network, Artificial Intelligence (2012) (In Press)

DOI: 10.1016/j.artint.2012.06.008
Open Access?: Not that I can find.

Knowledge acquisition isn't really my field but this paper caught my eye largely, I confess, because it had "Wikipedia" in the title.

It's widely recognised that a fundamental component of any intelligent system is going to be some general knowledge. Researchers have been looking into the problems of acquiring, representing and then using such a knowledge base pretty much since Artificial Intelligence was dreamed up in the 1960s.

This paper clearly isn't the first to suggest that Wikipedia could be used as part of this process, though I'm not knowledgeable enough to really know how original its proposals are.

The paper suggests though that Wikipedia's info boxes and categories can be used to structure the data that is extracted from it - for instance to deduce information such as "Brian May is a member of Queen(Band)" and "Annie Hall was directed by Woody Allen".

It presents algorithms for mining Wikipedia's categories and info boxes in order to create such facts and organise them as a concept network (i.e. turning relationships, like is a member of, into lines in a graph and the objects like Brian May, into nodes where the lines meet up). It is then possible to do further processing on these concept networks, and to run comparisons between networks from different language versions of Wikipedia to produce a multi-lingual concept network.

The resulting resource, WikiNet is available for download as is a visualisation and application building tool. WikiNet was compared against a number of similar knowledge bases the most famous of which is WordNet a large lexical database of English. Obviously WikiNet is multi-lingual which WordNet isn't and it can be built and updated rapidly, however it lacks the coverage of WordNet.

This entry was originally posted at http://purplecat.dreamwidth.org/72989.html.
wellinghallwellinghall on August 4th, 2012 08:07 pm (UTC)
That is interesting - thank you.
daniel_saunders: Worcester Collegedaniel_saunders on August 5th, 2012 03:18 pm (UTC)
Would Wikipedia be the main source of this Artificial Intelligence's knowledge base? Because, regardless of the technical advantages, I can see serious practical flaws there.
louisedennis: ailouisedennis on August 5th, 2012 05:38 pm (UTC)
Well, on one level, they are only extracting the structure rather than the text - i.e. they don't analyse any of the free text, just the categories and info boxes. Which at least should remove some of the wilder nonsense.

That said, it doesn't seem, per se, to be any less error-prone than most other ways of doing it - at least in the absence of reliable natural language processing (and for that you probably need a good knowledge base to start out with - a bit chicken and egg really).