How Google indexes multilingual websites (and what can go wrong)
7 min read
Googlebot and language detection — how it works
When Googlebot crawls a page, it determines the language of the content using multiple signals. The primary signal is the actual text content — Googlebot reads and understands the language of the page body. Secondary signals include the HTML lang attribute, the Content-Language HTTP header, and the URL structure (subdomain, subdirectory, or ccTLD).
None of these signals on their own are sufficient for international SEO. A page in German with a German URL but no hreflang tags may still be shown to Italian users if Google cannot find a clear signal about which market it targets. Hreflang provides that unambiguous signal.
The hreflang signal: what Google does with it
Google treats hreflang as a strong recommendation, not a command. When processing hreflang, Google checks several things: are all the declared alternates real pages that return a 200 status? Does each alternate reciprocate the reference? Is the x-default present? Are the language codes valid ISO 639-1 codes?
If all checks pass, Google stores the relationship and uses it to decide which language version to show in which search results. A search on Google.de for a term that matches a page with a hreflang="de" alternate will surface the German version, not the Italian original.
If any checks fail — one alternate is a 404, one page doesn't reciprocate, a language code is invalid — Google may silently ignore the entire hreflang set for that page cluster. This is why Google Search Console's International Targeting report exists.
Why duplicate content warnings happen with multilingual sites
Google considers pages with highly similar content to be duplicates — and may choose to index only one of them. For multilingual sites, this creates a risk: if your German, French, and English pages have the same structure and enough similar content (think: product specs, addresses, shared boilerplate), Google may determine they are duplicates and exclude some from indexing.
The solution is a combination of proper canonical tags (pointing each language version to itself, not to the original) and meaningfully localized content — not just translated. Pages that differ sufficiently in language, keywords, and content structure are treated as distinct pages by Google.
The canonical + hreflang relationship
These two signals must be consistent. The canonical tag on a page should point to that page itself (a self-referential canonical). If the German version at de.site.it/page has a canonical pointing to site.it/page (the Italian original), Google interprets this as “the Italian page is the canonical, the German one is a duplicate” — and may not index the German version.
Correct setup: every language version has a self-referential canonical plus a complete hreflang matrix. The canonical says “this is the primary version of this page”; the hreflang says “and here are the equivalent pages in other languages.”
How long it takes Google to index new language versions
For most websites, Google will discover and index a new subdomain within 2–4 weeks of it going live — faster if the site already has authority and the new subdomain is linked from the main site or sitemap. However, ranking in search results can take 3–6 months as Google evaluates the quality and relevance of the new content.
The hreflang tags help Google discover the new language versions faster, because Googlebot follows the alternate links when crawling the main site. This is why Locra injects hreflang from day one — even before translation is complete — so Google discovers all language subdomains as early as possible.
Search Console setup for international sites
Each subdomain is a separate property in Google Search Console. If you have site.it, de.site.it, and en.site.it, you need three verified properties. The International Targeting report (under Legacy tools and reports) shows any hreflang errors Google has detected.
Submit a language-specific sitemap for each property. The sitemap for de.site.it should list only the German pages — not all languages. This helps Google understand the scope and structure of each language version.