-
Notifications
You must be signed in to change notification settings - Fork 356
FAQ
Here are answers to some frequently-asked questions, updated for ConceptNet 5.5.
ConceptNet is a knowledge graph of things people know and computers should know, expressed in various natural languages. See the main page for more details.
ConceptNet is a resource. You can use it as part of making an AI that understands the meanings of words people use.
ConceptNet is not itself a chatbot. Some chatbot systems have used ConceptNet as a resource, but this is not a primary use case that ConceptNet is designed for.
You can browse the knowledge graph at http://www.conceptnet.io/.
We recommend starting with the Web API. If you need a greater flow of information than the Web API provides, then consider downloading the data.
This is an interesting comparison to make, as the projects have similar goals, and by now they both make use of multilingual linked data.
ConceptNet contains more kinds of relationships than WordNet. ConceptNet's vocabulary is larger and interconnected in many more ways. In exchange, it's somewhat messier than WordNet.
ConceptNet does only the bare minimum to distinguish word senses so far -- in the built graph of ConceptNet 5.5, word senses are only distinguished by their part of speech (similar to sense2vec). WordNet has a large number of senses for every word, though some of them are difficult to distinguish in practice.
WordNet is too sparse for some applications. You can't build word vectors from WordNet alone. You can't compare nouns to verbs in WordNet, because they are mostly unconnected vocabularies.
ConceptNet does not assume that words fall into "synsets", sets of synonyms that are completely interchangeable. Synonymy in ConceptNet is a relation like any other. If you've worked with WordNet, you may have been frustrated by the implications of the synset assumption.
In ConceptNet, we incorporate as much of WordNet as we can while undoing the synset assumption, and we give it a high weight, because the information in WordNet is valuable and usually quite accurate.
ConceptNet is open. The Knowledge Graph isn't.
The Knowledge Graph seems to focus largely on things you can buy and things you can look up on Wikipedia. In ConceptNet we try to focus on words with general meanings, and much less on named entities. We want to understand all nouns, verbs, adjectives, and adverbs, not just proper nouns.
The Microsoft Concept Graph is a taxonomy of English nouns, connected with the "IsA" relation, with some automatic word sense disambiguation. Its data comes from machine reading of a Web search index.
It's introduced with a lot of the same language as ConceptNet, about how common-sense understanding is important, and the name certainly evokes a lot of similarity. It looks a lot more like an automatically-generated version of OpenCyc than of ConceptNet, though.
DBPedia is very much focused on named entities. It's considerably messier than ConceptNet. Its edges are denser but its nodes are sparser: only terms that are titles of Wikipedia articles are represented in ConceptNet.
ConceptNet imports a small amount of DBPedia, and also contains links to DBPedia and Wikidata.
DBnary is a counterpart to DBPedia that's actually quite compatible with ConceptNet. Like ConceptNet, it focuses on word definitions rather than named entities, and it gets them from parsing Wiktionary.
Right now we use our own Wiktionary parser, which covers fewer Wiktionary sites than DBnary does but extracts more detail from each entry. We would gladly use DBnary instead, if DBnary starts extracting information such as links from definitions.
Cyc is an ontology built on a predicate logic representation called CycL. CycL can enable very precise reasoning in a way that machine learning over ConceptNet doesn't. However, Cyc is intolerant of errors, and adding information to Cyc is a difficult task.
OpenCyc provides a hierarchy of types of things, with English names, some of which are automatically generated. It seems to be intended as a preview of the full Cyc system, which is not open.
Approximately 28 million.
No. Its representation is words and phrases of natural language, and relations between them. Natural language can be vague, illogical, and incredibly useful.
The data that ConceptNet is built from spans a lot of different languages, with a long tail of marginally-represented languages. 10 languages have core support, and 303 languages are supported in total. See Languages for a complete list.
This will always be true. We use machine-learning techniques, including word embeddings, to learn generalizable things from ConceptNet despite the incompleteness of the knowledge it contains.
There will probably always be isolated mistakes or falsehoods in ConceptNet. Our data sources and our processes are not perfect. Machine learning can be relatively robust against errors, as long as the errors are not systematic.
If you've identified a systematic source of errors in ConceptNet, that is more important. It would probably improve ConceptNet to get rid of it. In that case, please go to the 'Issues' tab and describe it in an issue report.
See the table on the Relations page of this wiki.
Made-up numbers that are programmed into the reader modules that import various sources of knowledge. These weights represent a rough heuristic of which statements you should trust more than other statements.
During the golden age of crowdsourcing (the decade of the 2000s), ConceptNet accepted direct contributions of knowledge. This was a great start, but now the opportunities for improving ConceptNet have changed, and we are content to leave crowdsourcing to the organizations that are really good at it, like the Wikimedia Foundation.
If you contribute to Wiktionary and follow their guidelines, the information you contribute will eventually be represented in ConceptNet.
What I mean is, can I make my own version of ConceptNet that includes information that I need in my domain?
Well, you can reproduce ConceptNet's build process using Docker and change the code to import a new source of data. This may or may not accomplish what you want.
What ConceptNet is designed for is representing general knowledge. Making a useful domain-specific semantic model is a rather different process, in our experience. The software we built on top of ConceptNet to make this possible eventually became our company, Luminoso. Luminoso provides software as a service that creates domain-specific semantic models, which make use of ConceptNet so they can start out knowing what words mean and just have to learn what's different in your domain.
We've tried a lot of them. Currently PostgreSQL.
Probably one of the following reasons:
- It isn't as efficient as PostgreSQL
- It doesn't actually work as advertised
- It is no longer maintained
- It doesn't provide a documented workflow for importing a medium-sized graph such as ConceptNet
- It takes more than a day to import a medium-sized graph such as ConceptNet
- It inflates the size of the data it stores by a factor of more than 10
- It assumes every user has access to and wants to use a distributed computing cluster
- It doesn't run well inside a container
- It's not free software
- It has an unacceptable restriction on it that would prevent people from reusing ConceptNet, such as the GPL or "academic use only"
If you think you know of a database that doesn't fail one of these criteria, I'd still be interested to hear about it.
It fits on a hard disk, so no. It's enough data for many purposes. But text is small.
If you have textual knowledge that actually requires distributed computation, you work at a company that does Web search.
Heck no. SPARQL is computationally infeasible. A SPARQL endpoint is a denial-of-service attack. Similar projects that use SPARQL have unacceptable latency and go down whenever anyone starts using them in earnest.
The way to query ConceptNet is using a rather straightforward REST API, described on the API page. If you need to make a form of query that this API doesn't support, open an issue and we'll look into supporting it.
Blame science reporting for doing what it usually does. There's a nugget of truth in there surrounded by a big wad of meaningless AI hype. It's true that ConceptNet 4 could compete with 4-year-olds on a particular question-answering task -- and ConceptNet 5 performs much better on a similar task. This is cool. It doesn't mean that anyone's about to make robot children.
Here's the background: A much older version of ConceptNet, ConceptNet 4, was evaluated on some intelligence tests involving question-answering and sentence comprehension. The researchers who performed these tests compared ConceptNet's performance to a 4-year-old child.
We found the comparison odd but flattering. 4-year-old children are incredible beings. They have desires, goals, and imagination, and they can communicate them in their spoken language with a level of competence that second-language learners have to put tremendous effort into achieving. No real AI system can come close to emulating the range of things a child can do.
When it comes to the narrower task of answering questions, though, it's believable that ConceptNet 4 compared to a 4-year-old. We're always interested in measurably improving the general intelligence contained in ConceptNet. Excitingly, we now have a question-answering task in which ConceptNet 5 compares to a 17-year-old: that of answering SAT-style analogy questions.
But there is much more to be done. The Story Cloze Test is a test of story understanding that any human can score close to 100% on in their native language. Natural language AI systems, including ConceptNet, have not yet surpassed 60% on this test.
Starting points
Reproducibility
Details