Today Jeff Byer (@globaljeff) talks about the abundance of posts and information regarding Natural Language Processing and the BERT update.
About Natural Language Processing & the BERT Update
Bill Slawski @bill_slawski retweeted this article from Ibrahim Sharaf ElDen as an introduction to Natural Language Processing https://t.co/Xct8wP3BIX?amp=1
I just reread this article by @ajkohn from 1 year ago and it’s quite possibly the most relevant article about BERT and #SEO on the internet right now.https://t.co/PCbCAOdQ1D— Jon Henshaw (@henshaw) October 27, 2019
Justin Briggs@justinrbriggs article about how to write for natural language processing On-page SEO for NLP post
Kevin Indig quoting a Fortune article. I am embedding the tweet to show the comments and expose that this is not “new” as the article title might be misleading.
👀— Kevin_Indig (@Kevin_Indig) October 25, 2019
“Language understanding is key to everything we’re doing on search,” said Pandu Nayak, Google fellow and vice president of search. “This is the single, biggest, most positive change we’ve had in last five years.”https://t.co/3LVXGjFu0e pic.twitter.com/jRzeTb7OC3
Jeff Byer 00:08 Welcome to Digital Rage, the podcast about all things internet and the people that make it great. My name is Jeff Byer. Today’s episode, I do not have a guest and I wasn’t planning on recording an episode. This was going to be the first episode that I skipped since, , end of January, 2019. But, , I’m gonna call this the accidental episode because I came across so much, , tweets and posts about natural language processing and the so-called Bert update that, , I had to, , throw in my 2 cents and provide you the audience with everything that I’ve, I’ve been following and watching and learning and, and how all of this affects search moving forward. So let’s get started with how this, you know, so when it comes to natural language processing and, , language itself, I look to bill Slawsky and his study on how Google has, , published their patents on language processing and Bill’s breakdown of those patents and how they relate from patent to real life search results.
Jeff Byer 01:26 So he, so bill retweeted an article from Ibraham Sharif El den as an introduction to natural language processing. And so I started digging in there and started realizing that different variations of natural language processing are taking a a sentence basically and breaking it down into its primary parts with the nouns, the adjectives and verbs and taking a, a word, a word that that is meant to be. I don’t want to use the word keyword here because it’s completely in different contexts. The contents here is, it’s basically the root word of the sentence and how many jps from the root word backwards. Can you put the word into context and choose its actual meaning based on that. So in natural language processing and previous iterations of that, those, those jps from that, that main word in the sentence, we’re, we’re toward the beginning of the sentence and not many hops from the, from that word to the end of the sentence.
Jeff Byer 02:45 And now what I understand this Bert update or Bert is supposed to do is it is supposed to take the, the words that are not on it’s bi-directional means not only the w taking the context words before the main word, but after the main word and going further after the main word. So that all the, everything, all the possible data can be taken into account. So that the understanding of the context of the root word and all of the contextual words found in short jps from that root word will provide an actual context. And based on that context, you will, a search engine would then provide the appropriate response in context. So what that means for SEOs and content writers is, , a system that, , that you w a system of writing or a style of writing that if you’re writing for SEO, you want those words to be relatively close to each other and provide as much context as possible for what the sentence that you are building is trying to convey.
Jeff Byer 04:09 And if there is any, , ambiguous is there, if it’s an ambiguous term, , the term can mean many different things. There’s so many different ways that it can be interpreted. So your answer needs to be as clear and sustained as possible and put in all of the relevant information possible to give Google or any, , natural language processing algorithm with algorithm. The best opportunity to understand in full the context of what you’re writing and what you’re saying. And you know, in turn what you’re basically doing is providing the, the searcher with the most informative and complete answer to a query that you could possibly offer. So, , what I’ve been finding in a lot of these writing guides for, , natural language processing and what the style of, of writing would be is that, , you’d need to, you know, it’s, it’s not as creative writing and it’s not super technical.
Jeff Byer 05:27 Writing is writing for understanding. , and so natural language processing is trying to take the natural progression of speech into account. But for people like me who find it difficult to explain technical concepts, simply, I just find it difficult to speak in general, which is why I do a podcast to become better at it. But when I’ve start explaining something technical and I get into details, my, my sentences tend to get longer and longer and longer, and it becomes harder for the listener to connect those dots. As an example. That sentence that I just said, the, the explanation took way too many words. And so if I was to rewrite that sentence over, I would get more to the point and say, this is, this is the problem, this is the solution. And that’s it. So, , as I’ve been researching and digging more into it, , a tweet by Jon Henshaw, he re posted an article from AGA cone from a year ago.
Jeff Byer 06:40 And what he tweeted was, I just re-read this article by J cone from one year ago, and it’s quite possibly the most relevant article about Bert and SCO on the internet right now. And so what I did is, , I went to that original tweet and, , I’ll have it in the show notes so that you can reference it and see the context of how my learning on this subject progressed. So the Bert update, , is basically, it’s been used for a long time and there, and I’ve, we get into this later in the, in in my learnings, but it just seems like this wasn’t much of an update because it’s been happening for a while. And so this article came out a year ago right after the medic update and we’ve talked about the medic update, , in several of our past episodes. So, , if you need to catch up on that, I can put links to where we talked about it and specifically how it affected, , one of my client’s sites and, and how it corrected and all linked also to a recent tweet from Kevin indig, which I’ll get into in a second.
Jeff Byer 07:56 But basically the, , here is the original tweet. , here’s the quote reading article reading the article by aJ cone one year ago about natural language processing. Burt its relation to the medic update, what went wrong with the update and how it was corrected. So this, after reading through the article in detail and it said that the medic update was actually not medic at all, but it was a, a attempt, an initial attempt at a improving the natural language processing, , for, for search and , mainly for content so that, , for search results. So this started the whole discussion, you know, as in the quality greater guidelines about, , experience authoritativeness and which is eat and SEOs now have to consider all of this because Google is now considering all of this information. And in this article he explains that the, the , medic update initially got it wrong and was, was not understanding the context of a lot of, of medical, , information and how it would relate to searchers. So a lot of medical information sites, which is how this update got the, the term medic.
Jeff Byer 09:33 , it was basically nay natural language processing error in the update itself. And it only affected, you know, the medical, your money, your life categories. This is based on this article. So I know that the update had a lot more going on, but specifically to natural language processing, this is how the medic update was, was seen, applied and reversed. So this, the, , example in the article uses it uses a SERP analysis. I’m just looking at back up again so I can, I can tell you exactly. , so the OG in the article, it States the audit August 1st error, , that, that the new content ranking didn’t match the intent of the queries. And so by October of 2018, rankings came back because there was a reversal in how the syntax was calculated in, , on pay for a natural language posts, how it was being processed and how it was being rated as you know, better or worse for a query.
Jeff Byer 11:00 So taking out any ambiguity in your answers to Google is has always been something that people have been talking about for a long time, especially since we’re starting to get into voice searches and searches that that we can’t and we can’t rely on the searchers to put intent or put context into their searches because most searches are going to be as fast as possible, as little skis, keystrokes as possible or shortest questions as possible. And that’s just the natural part of it. So the query itself will not have context. Your answer has to, and Google is going to use everything they can to ant to, to anticipate what the searcher’s request means. Put that in context based on any historical data that they have on the, on the searcher, on their previous searches or you know, anything related to what they have searched recently and are now searching, trying to get in depth.
Jeff Byer 12:06 So all you can do as a content provider, as an SEO is make sure that your answer, using those root words has as much context as possible and putting those contextual definitions as close to the root word as possible. And using proper sentence structure. So big long run-ons is going to, , not process very well because you’re gonna end up getting so far away from the root word that the context and the semantics are not going to match up or they’re going to get diluted. And the way that natural language processing works is that if there’s too many jps, then it’s, it’s diluting the context and the meaning. So that’s it, at least my understanding of all of this. Now. , so let’s get back to my rundown here.
Jeff Byer 13:10 And so, , in the article in this, , Bert update article from a year ago, , in the article, , just in Briggs article about how to write for natural language processing was linked. And so in reading this article, I got a lot more information about what, what a proper sentence structure is according to a natural language processing algorithm and how to write for it. And what, , , lemmatization refers to, , w word dependencies. So the example that they use in the word to set dependencies is a, is a search term that says safe temperature for chicken. So that can mean a lot of things. Well, it can mean a very few things, but if, you know, there’s probably one thing that comes to your mind, but the, there’s the whole point of this th this specific, , term is to match the searcher’s intent. And so if your answer is process is structured properly, it will have, you know, is looking for a temperature so that temperature could be in Fahrenheit or Celsius.
Jeff Byer 14:31 So you’ve got in your, in your answer, you should define which temperature you are referring to. , you’ve gotta, you’ve gotta asse the, the search processing, the language processing. It’s going to asse that this has something to do with cooking because a walking around chicken, nobody really cares what their internal temperature is when they’re alive. It’s not that, that’s not something that we should be, you know, concerned about on a global level. But as far as a search intent, they’re probably talking about cooking. So the, the, the proper, , answer structure that they have used is the safe internal temperature for cooked chicken is 165 degrees Fahrenheit. That is as complete of an answer as you can give to that, to that, , query con, it’s a concise answer and all the word dependencies are very few hops away from each other. So, , internal temperature, , is exactly what you’re looking for when you’re cooking.
Jeff Byer 15:40 External temperature doesn’t matter as much as internal temperature, , cooked chicken, you know, chicken and then one hot backward is cooked. And then for 165, there’s a degree symbol that they use, which also has context and natural language processing. So you can use the use of the word or the symbol. And then Fahrenheit is the definition of the scale that they’re using to get the temperature. So it is very, , it’s a very simple and to the point answer. And so the only way that this could be more thorough is if you offered the, , Celsius alternative to Fahrenheit, but that would be more links away from, you know, or it would re answer the question a little further down. And natural language processing would put that as secondary. It would take your first temperature reading first in that context. So, , it’s very interesting.
Jeff Byer 16:42 And dissecting your current content and looking for those root words and how many hops away the context defining words are from it is going to tell you a lot about how, how , natural language processors are reading your content and understanding its context. I just had this, , a client created a page and wrote all the content for this one page that were struggling to get ranked because when you look at the, the results that it’s given there, they’re trying to rank on a, , a fairly competitive term, but they serve a specific niche in that term. So we’re not going to ever rank highly for organic for the general term because it, we, for the majority of searches, we’re not the right solution for that search. We’d have to go deeper into understanding more context of the search. So it’s, it’s type of a type of measurement that we’re doing, but it’s specific to a industry or a specific to a, another type of measurement.
Jeff Byer 17:59 So, , the, the main term is not our target. Adding words to that main term is getting us closer to that target. But then they also are using a, a product name that is a, a standard name that is also being misunderstood. So we’re getting results all over the place about the T. so it’s becoming clear that Google doesn’t know if this product is a car or it’s a measuring device or it’s a part or it’s , has, , an animal. , it’s having really a really difficult time understanding the context. So once I brought that to the customer’s attention, they had said they, you know, light bulb went off and said, Oh, okay, now we need to put this in context. So now we need a separate page describing taking the, the, the root term as a starting point, but narrowing it down and covering all different facets of how the, you know, two, we’re not going to rank for the actual term itself, but any of these added words on top of that, we’re going to give it context.
Jeff Byer 19:23 So we’re answering a lot of questions on our new content page. We’re talking about a lot of industries where this product is used and we’re relating all of it back to that product. So in our context under our domain and the links that are coming into that product from other domains such as distributors and things like that, that we’re giving as much context to this product as possible so that the product name alone does not end up hurting us. So, , it’s, it’s really interesting all of the natural language processing and, and I’m starting to reevaluate all of our content and how we, how we’re using, , you know, names and root words and the different, how, how many hops it takes from a root word to understand its context and the meaning that we’re trying to give it. , it’s very fun. I’m going to link to all of this information in the show notes.
Jeff Byer 20:25 , but it’s, it’s a great read and great to understand natural language processing and how Google is, is using all this information and trying to come up with the best possible solution based on reading your content and understanding the intent of the searcher. So there’s a ton to, to dissect there. And, , if you have any questions, you can always email me or tweet me. I’m on email@example.com. If you need any other information or you want me to link to any other different sources. So that’s it. That’s the accidental episode. I’ve already reached out to a couple of people for interviews this week, so next Monday will be a fault episode, so I apologize, but at least I got this one out there. It’s a little late, but it is out. So, , thank you very much for listening. As always, if you, , enjoy the episode, please rate and review on your pod catcher of choice. And if you have any ideas for future episodes, just let us know. Thank you very much for listening. Talk to you next week for show notes and information. Go to digital rage.fm. Follow us on Twitter and Instagram at digital rage at bam. And please give us a rating review is sincerely appreciate it.