An Insight into Googlebot Crawls & Indexes First 15 MB HTML Content

Author: Myk Baxter

SEO is a task that takes time, research & great execution, but a recent insight into Google documentation has unveiled that the web crawler only uses the first 15MB of a page’s HTML to determine where you rank.

But was does this mean I hear you ask? 

In short, it simply means that anything after this cutoff will be disregarded in ranking calculations. 

See the specifics within the help document for further insight:

“Any resources referenced in the HTML such as images, videos, CSS and JavaScript are fetched separately.

After the first 15 MB of the file, Googlebot stops crawling and only considers the first 15 MB of the file for indexing.

The file size limit is applied on the uncompressed data.”

With the above in mind, some of the SEO community were left wondering if this meant that the search engine would completely push aside text that fell below imagery within the cutoff of the HTML files.

Well do not fret, as John Mueller, a Google Search Advocate has confirmed that this is “specific to the HTML file itself, & that embedded resources/content pulled in with IMG tags is not a part of the HTML file”.

How will this affect my SEO?

To avoid important content being outweighed by the Googlebot, when actioning the practice you now want to ensure this is included near the top of your web pages. 

Your coding should also be structured in a way that the SEO-relevant information sits within the first 15 MB in a supported text-based file or HTML. Images and videos should be compressed & not encoded directly in the HTML, where possible too. 

Whilst the above may come as a surprise to some, the best SEO practices do recommend keeping HTML pages to 100 KB or less, so most websites will be unaffected by these developments. Your page size can be checked through a variety of tools, but I myself would recommend Google Page Speed Insights. 

You may also be worried, that potentially you may have important content on a page that doesn’t get used within the index. However, 15MB is a significant amount of HTML. 

As stated within the document, assets such as imagery & videos are fetched separately, so I understand that based on this that the 15MB cutoff is only applied to HTML. 

With this in mind, it would be hard to go over the limit unless single pages are filled with books & books’ worth of content, which is something that I as a digital marketing consultant advise against. 

If you’re in this scenario where your pages exceed 15MB of HTML, it’s highly likely that you already have underlying issues within your website that need to be addressed  & rectified. 

With over twenty years of experience working in the SEO & marketing industry, I’m your guy when it comes to ensuring your website & its content are suited to the best practices. 

Reach out to me today to book your free consultation.

Contact