How to Write the Perfect Robots.txt for Next.js & Modern Frameworks | DevUtils Lab Blog

Every search engine crawler (like Googlebot or Bingbot) begins its journey on your website by reading one specific file: robots.txt. This simple text file acts as the traffic controller, letting crawlers know which paths they can search and which ones they should stay away from.

In modern web frameworks like Next.js, managing robots.txt is slightly different than in standard static HTML sites. Let's look at best practices for setting up your crawler directives, handling sitemaps, and avoiding major indexing mistakes.

1. The Basics: What is a robots.txt?

The robots.txt file is placed in the root directory of your site. It uses a simple syntax:

User-agent: The crawler the rule applies to (e.g. * for all, or Googlebot).
Disallow: The path you want to block (e.g. /admin/).
Allow: The path you want to explicitly open (usually used to override a parent disallow rule).
Sitemap: The absolute URL to your XML sitemap.

2. Generating robots.txt in Next.js

In Next.js (App Router), instead of placing a static robots.txt in your public folder, you can generate it dynamically using a special file: app/robots.ts. This allows you to dynamically populate sitemap URLs or customize paths programmatically based on environment variables.

import { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: {
      userAgent: "*",
      allow: "/",
      disallow: ["/api/", "/private/"],
    },
    sitemap: "https://www.devutilslab.dev/sitemap.xml",
  };
}

3. Common Robots.txt Mistakes

Many developers misunderstand how robots.txt works. Here are the most common pitfalls:

It is NOT a Security Measure

Disallowing a path in robots.txt does not secure it. Anyone can read your robots.txt file and see your blocked paths (like /admin/dashboard/). Secure sensitive pages using authentication, not robot rules.

Blocking Doesn't Stop Indexing

If another website links to one of your disallowed URLs, Google can still index that URL based on the external link's context. To prevent indexing completely, use a noindex meta tag.

Generate and Test Robots.txt Easily

Need to generate a compliant robots.txt or check if a specific URL is being blocked? Use our interactive visual builder and tester.

Open Robots.txt Generator & Tester