WordPress creates a robots.txt
dynamically. To overwrite it in a normal non-multisite installation, you can just upload a static robots.txt
to the server. On a multisite install, this would overwrite the robots.txt
for all sites, which is not always the wanted behavior. This post explains how you can modify robots.txt
for individual sites of a multisite.
WordPress comes with the filter robots_txt
which allows modifying the dynamically created robots.txt
’s output. The function get_current_blog_id()
returns the ID of the current multisite site, which we can use to check for a particular site to add rules to the robots.txt
. This is how it looks currently for my site:
/**
* Modify robots.txt for main site and english site
*
* @param $output
* @param $public
*
* @return string
*/
function fbn_custom_robots( $output, $public ) {
$site_id = get_current_blog_id();
if ( $site_id == 1 ) {
$output .= "Disallow: /agb-und-widerruf/\n";
$output .= "Disallow: /mein-konto/\n";
$output .= "Disallow: /warenkorb/\n";
$output .= "Disallow: /impressum-und-datenschutz/\n";
} elseif ( $site_id == 11 ) {
$output .= "Disallow: /account/\n";
$output .= "Disallow: /cart/\n";
$output .= "Disallow: /imprint/\n";
$output .= "Disallow: /terms/\n";
}
return $output;
}
add_filter( 'robots_txt', 'fbn_custom_robots', 20, 2 );
Code language: PHP (php)
For the site with the ID 1
(this is florianbrinkmann.com) are added four Disallow rules, likewise for the site with the ID 11
(my English site florianbrinkmann.com/en).
To get the site’s ID, I just added the following line after $site_id = get_current_blog_id();
:
Code language: PHP (php)$output .= $site_id;
This way, the ID of the current site is displayed, when you visit its robotx.txt
.
Is it possible to use this function to modify the default WP robots.txt file?
Hi Adam,
sorry for the late reply, your comment was marked as spam…
To your question: yes, you can modify the default’s robots.txt file with that filter.
Greetings,
Florian
Hi Florian,
Thank you for this solution. Can you please tell me what this line does? I'm not an expert in PHP.
add_filter( 'robots_txt', 'fbn_custom_robots', 20, 2 );
Thank you!
Matt
Hi Matt,
that line adds the function as a filter to the
robots_txt
hook, that enables us to modify the robots.txt output. Without it, nothing would happen.Best,
Florian
Thanks Florian,
Where by chance would you add this code in a multisite setup? Is there a centralized file that would handle this for each site?
Hi Matt,
you’re welcome.
I use it in a custom plugin that is enabled network-wide. I pasted the code into a (very basic) plugin file that you can use as a starting point: https://gist.github.com/florianbrinkmann/9236134e29c07bb10ae1932d93100984
You could use it as a Must Use plugin (https://codex.wordpress.org/Must_Use_Plugins). For that, just upload it to the
wp-content/mu-plugins
folder via FTP (you may need to create themu-plugins
directory).Hope that helps,
Florian
Hi!
$output .= "Disallow: /account/n";
This is not '/n' working work for me when I want a new line. This works '\n';
$output .= "Disallow: /account\n";
Hi Dmitry,
yes, you are right, thanks!
Best,
Florian
Thanks for this bit of code. Works nicely. I modified it a little, of course. Your info here got me at the right starting point.
Thanks.
Hi Dave,
great to hear that, you’re welcome!
Best,
Florian
This helped so just wanted to give you a solution to generate this dynamically for all your websites inside a multisite install 🙂
function my_robots_txt( $output, $public ) {
# Check if Woocommerce is active on the website for this request
if ( ! class_exists( 'woocommerce' ) )
return $output ;
# These are the endpoint names used by Woocommerce, not the real slug of the page
$endpoints = [ 'cart', 'checkout', 'myaccount' ];
foreach ( $endpoints as $endpoint ) {
$woo_page_id = wc_get_page_id( $endpoint );
# You need to get the slug => the slug field is "post_name"
$slug = get_post_field( 'post_name', $woo_page_id );
# Add the rule for this page
$output .= "Disallow: /{$slug}/\n";
}
return $output;
}
add_filter( 'robots_txt', 'my_robots_txt', 10, 2 );