Modifying robots.txt for individual sites of a multisite install

WordPress creates a robots.txt dynamically. To overwrite it in a normal non-multisite installation, you can just upload a static robots.txt to the server. On a multisite install, this would overwrite the robots.txt for all sites, which is not always the wanted behavior. This post explains how you can modify robots.txt for individual sites of a multisite.

WordPress comes with the filter robots_txt which allows modifying the dynamically created robots.txt’s output. The function get_current_blog_id() returns the ID of the current multisite site, which we can use to check for a particular site to add rules to the robots.txt. This is how it looks currently for my site:

/**
 * Modify robots.txt for main site and english site
 *
 * @param $output
 * @param $public
 *
 * @return string
 */
function fbn_custom_robots( $output, $public ) {
	$site_id = get_current_blog_id();
	if ( $site_id == 1 ) {
		$output .= "Disallow: /agb-und-widerruf/\n";
		$output .= "Disallow: /mein-konto/\n";
		$output .= "Disallow: /warenkorb/\n";
		$output .= "Disallow: /impressum-und-datenschutz/\n";
	} elseif ( $site_id == 11 ) {
		$output .= "Disallow: /account/\n";
		$output .= "Disallow: /cart/\n";
		$output .= "Disallow: /imprint/\n";
		$output .= "Disallow: /terms/\n";
	}

	return $output;
}

add_filter( 'robots_txt', 'fbn_custom_robots', 20, 2 );Code language: PHP (php)

For the site with the ID 1 (this is florianbrinkmann.com) are added four Disallow rules, likewise for the site with the ID 11 (my English site florianbrinkmann.com/en).

To get the site’s ID, I just added the following line after $site_id = get_current_blog_id();:

$output .= $site_id;Code language: PHP (php)

This way, the ID of the current site is displayed, when you visit its robotx.txt.

11 reactions on »Modifying robots.txt for individual sites of a multisite install«

  1. Hi Florian,

    Thank you for this solution. Can you please tell me what this line does? I'm not an expert in PHP.

    add_filter( 'robots_txt', 'fbn_custom_robots', 20, 2 );

    Thank you!
    Matt

      1. Thanks Florian,

        Where by chance would you add this code in a multisite setup? Is there a centralized file that would handle this for each site?

        1. Hi Matt,

          you’re welcome.

          I use it in a custom plugin that is enabled network-wide. I pasted the code into a (very basic) plugin file that you can use as a starting point: https://gist.github.com/florianbrinkmann/9236134e29c07bb10ae1932d93100984

          You could use it as a Must Use plugin (https://codex.wordpress.org/Must_Use_Plugins). For that, just upload it to the wp-content/mu-plugins folder via FTP (you may need to create the mu-plugins directory).

          Hope that helps,
          Florian

  2. Hi!

    $output .= "Disallow: /account/n";

    This is not '/n' working work for me when I want a new line. This works '\n';

    $output .= "Disallow: /account\n";

  3. This helped so just wanted to give you a solution to generate this dynamically for all your websites inside a multisite install 🙂

    function my_robots_txt( $output, $public ) {

    # Check if Woocommerce is active on the website for this request
    if ( ! class_exists( 'woocommerce' ) )
    return $output ;

    # These are the endpoint names used by Woocommerce, not the real slug of the page
    $endpoints = [ 'cart', 'checkout', 'myaccount' ];

    foreach ( $endpoints as $endpoint ) {

    $woo_page_id = wc_get_page_id( $endpoint );
    # You need to get the slug => the slug field is "post_name"
    $slug = get_post_field( 'post_name', $woo_page_id );
    # Add the rule for this page
    $output .= "Disallow: /{$slug}/\n";
    }

    return $output;
    }
    add_filter( 'robots_txt', 'my_robots_txt', 10, 2 );

Mentions

  • aeo
  • Florian Brinkmann
  • Dave Loebig
  • Florian Brinkmann
  • Dmitry
  • Florian Brinkmann
  • Matt R
  • Florian Brinkmann
  • Matt R
  • Florian Brinkmann
  • Adam

Leave a Reply

Your email address will not be published. Required fields are marked *

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)