How Google Understands Text | A Brief Introduction to NLP

A wise man once said, “Knowledge of Languages is the Doorway to Wisdom”.

It’s actually a 13th-century quote. However, that does not mean that it is any less true as of now. In fact, it is perhaps one of the most important subjects of study in the 21st century since the applications of this concept have grown tremendously. Today we know it as Natural Language Processing, and we know for sure Google takes it very seriously.

In this article, I will be explaining the basic concepts of NLP namely, POS Tagging and Dependency Parsing.

Parts of Speech Tagging

Since our school days when we learned English for the first time, we have come across different forms of words, namely nouns, adjectives, prepositions, verbs, determiners and other kinds of subjects and modifiers.

Such identifiers were created in order to understand the relation of each word in a sentence with another. The concept is so basic, that defining the purpose of each word in a sentence became the foundation for Natural Language Processing.

That is where Part of Speech Tagging comes from. POS Tags are labels assigned to different words in a sentence to understand the context of the “parts of speech”.

There are mainly two types of Tags:

Universal POS Tags

These tags include NOUN(Common Noun), ADJ(Adjective), ADV(Adverb).

You can learn more about them in this document.

Detailed POS Tags

These include secondary tags NNS for common plural nouns and NN for the singular common noun compared to NOUN for common nouns in English.

Now that we know about POS Tags, we can run a program that returns such relations between each word in a sentence.

Let us take an example: “The quick brown fox jumping over the lazy dog”

Run the Following Code in Python:

Terminal Commands:

(base) C:\Users\USER>pip install spacy

(base) C:\Users\USER>conda install -c conda-forge spacy-model-en_core_web_sm

(base) C:\Users\USER>python -m spacy download en_core_web_sm

(base) C:\Users\USER>pip install networkx

Python Code:

>>> import spacy

>>> nlp=spacy.load(‘en_core_web_sm’)

>>> import en_core_web_sm

>>> nlp = en_core_web_sm.load()

>>> from spacy import displacy

>>> displacy.render(nlp(text),jupyter=True)

<IPython.core.display.HTML object>

>>> sentence = “The quick brown fox jumping over the lazy dog”

>>> doc = nlp(sentence)

>>> print(f”{‘Node (from)–>’:<15} {‘Relation’:^10} {‘–>Node (to)’:>15}\n”)

>>> for token in doc:

… print(“{:<15} {:^10} {:>15}”.format(str(token.head.text), str(token.dep_),

str(token.text)))

…

Output

Node (from)–> Relation –>Node (to)

fox det The

fox amod quick

fox amod brown

fox ROOT fox

fox acl jumping

jumping prep over

dog det the

dog amod lazy

over pobj dog

Dependency Parsing

It involves making sense of the grammatical structure of the sentence based on the

dependencies between the words of a sentence.

For example in the above sentence “The quick brown fox jumping over the lazy dog”,

“brown” acts as an adjective that modifies the noun “fox”. Hence there is a dependency on the word “fox” to the word “brown”. This dependency is defined by the “amod” tag known as the adjective modifier.

A thing to note is that dependency always occurs between two words in a sentence. Let’s now write a program that can return such dependencies between different words in the following sentence.

“It took me more than two hours to translate a few pages of English.”

Terminal Commands:

(base) C:\Users\USER>pip install spacy

(base) C:\Users\USER>conda install -c conda-forge spacy-model-en_core_web_sm

(base) C:\Users\USER>python -m spacy download en_core_web_sm

(base) C:\Users\USER>pip install networkx

Python Code:

>>> import spacy

>>> nlp=spacy.load(‘en_core_web_sm’)

>>> import en_core_web_sm

>>> nlp = en_core_web_sm.load()

>>> text=’It took me more than two hours to translate a few pages of English.’

>>> for token in nlp(text):

print(token.text,’=>’,token.dep_,’=>’,token.head.text)

…

//First column is the text

//Second column is the Tag

//Third column is the head term

Output

It => nsubj => took

took => ROOT => took

me => dobj => took

more => amod => two

than => quantmod => two

two => nummod => hours

hours => dobj => took

to => aux => translate

translate => xcomp => took

a => quantmod => few

few => amod => pages

pages => dobj => translate

of => prep => pages

English => pobj => of

. => punct => took

Finding Shortest Dependency Path With Spacy

Semantic dependency parsing has been often used as a way to obtain information between words(entities) that are related but are far in sentence distance.

The Shortest Dependency Path or SDP contains all the information that is just enough to define the relationship between two words in a sentence.

Now let’s find the SDP with the help of a Python Program.

Terminal Commands:

(base) C:\Users\USER>pip install spacy

(base) C:\Users\USER>conda install -c conda-forge spacy-model-en_core_web_sm

(base) C:\Users\USER>python -m spacy download en_core_web_sm

(base) C:\Users\USER>pip install networkx

Python Code:

>>> import spacy

>>> import networkx as nx

>>> nlp = spacy.load(“en_core_web_sm”)

>>> doc = nlp(u’Convulsions that occur after DTaP are caused by a fever.’)

>>> print(‘sentence:’.format(doc))

sentence:

>>> edges = []

>>> for token in doc:

… for child in token.children:

… edges.append((‘{0}’.format(token.lower_),

… ‘{0}’.format(child.lower_)))

…

>>> graph = nx.Graph(edges)

>>> entity1 = ‘Convulsions’.lower()

>>> entity2 = ‘fever’

>>> print(nx.shortest_path_length(graph, source=entity1, target=entity2))

>>> print(nx.shortest_path(graph, source=entity1, target=entity2))

[‘convulsions’, ’caused’, ‘by’, ‘fever’]

As you can see, the shortest path length function returns 3 as the shortest number of jumps of hops made to establish the relationship.

The shortest path function returns the exact words which are part of the SDP.

Conclusion

These are some elements that help gives an elementary idea of how an AI makes relationships between different words and hence is an introduction to the vast subject of NLP. Obviously, Google AI is much more complex than two or three Python scripts, however, our purpose was to share a brief idea of how a machine can create a semantic relationship between words in a text.