Copying directory with many files in multiple steps via PHP

I am currently working on a side project, which needs to create a copy of a WordPress installation. Because I do not know the server environment where it is used by the users, I can neither expect that I can run Linux commands, nor that there is a long PHP max_execution_time.

So copying needs to be done via PHP and also work when, for example, the maximum execution time for scripts is 30 seconds. 30 seconds are not much for a script that should copy many files. Creating a list of the files and looping it without copying files can already take longer.

Because of that, we need to do the copying in a way where we can save the current state when we are near the time limit, plan the next run, and exit. In the WordPress context, that is possible with WordPress cron events.

I had issues finding a solution how to exclude already copied files from the next process so that they are not looped again.

Symfony Finder component to the rescue

Relatively fast I came across the Finder component of Symfony. The component allows finding directories and files by various rules and exclude other directories.

So in a few steps, my solution looks like the following:

  1. All files in the WordPress folder are searched, except the uploads directory.
  2. Copying starts.
  3. After finishing copying one folder, it is added to an array to be able to ignore it in the next run.
  4. When almost reaching the timeout, the current state is saved to a file, a WP cron event is created and the program is finished.
  5. When the cron event happens, the files from the not-yet processed directories are searched and it starts with step 2 again.

My current code for that (not yet finished, but it works and should be enough for inspiration. Besides the Finder component I also use the Filesystem component — the code also exists as a Gist).

<?php /** * Main plugin code. * * @package FlorianBrinkmann\Copier */ namespace FlorianBrinkmann\Copier; use Symfony\Component\Filesystem\Exception\IOExceptionInterface; use Symfony\Component\Filesystem\Filesystem; use Symfony\Component\Finder\Finder; /** * Class Plugin * * @package FlorianBrinkmann\Copier */ class Plugin { /** * Absolute path to the WordPress install. * * @var string */ private $abspath = ''; /** * Absolute path where we want to copy the files to. * * @var string */ private $dest = ''; /** * Name of destination directory. * * @var string */ private $dest_dir_name = ''; /** * Array of default directories to exclude. * * @var array */ private $default_exclude = []; /** * Array of additional to exclude. * * @var array */ private $additional_exclude = []; /** * Path to status file. * * @var string */ private $status_file; /** * Current file index. * * @var string */ private $current_file_index = ''; /** * Current path. * * @var string */ private $current_path = ''; /** * Tables of WordPress install. * * @var array */ private $tables = []; /** * The current step of the process. * * @var string */ private $step = ''; /** * Time limit in seconds. Default 30. * * @var int */ private $time_limit = 30; /** * Unix timestamp of init event. * * @var int */ private $timer; /** * Symfony filesystem object. * * @var Filesystem */ private $filesystem; /** * List of directories and files. * * @var array */ private $files_list; public function init() { // Set timer. $this->timer = time(); // Set filesystem property. $this->filesystem = new Filesystem(); $this->step = 'file-list-creation'; // Set dest folder for copying. $dest_dir_name = uniqid( 'uaas-copy-' ); // Check if folder exists. If so, change name and check again. // @todo: Only check for a limited time. while ( $this->filesystem->exists( "$this->abspath/$dest_dir_name" ) ) { $dest_dir_name = uniqid( 'uaas-copy-' ); } $this->dest_dir_name = $dest_dir_name; $this->dest = trailingslashit( "$this->abspath{$dest_dir_name}" ); // Create file list for cloning. $this->default_exclude = [ 'wp-content/uploads', $this->dest_dir_name ]; $this->create_file_list(); // Create dest folder. $this->filesystem->mkdir( $this->dest, 0755 ); // Create status file. $this->status_file = "{$this->dest}uaas-status.txt"; $this->filesystem->dumpFile( $this->status_file, serialize( $this ) ); // We have the files in the files_list property now, so we can clone! $this->step = 'copying-files'; $this->copy_files(); } /** * Continue copying. */ public function continue_copying() { // Check for step. if ( $this->step !== 'copying-files' ) { return false; } // Set timer. $this->timer = time(); // Update file list. $this->create_file_list(); $this->copy_files(); } /** * Create a list of the files that we want to copy. */ private function create_file_list() { $finder = new Finder(); // @todo: check for modified uploads destination. // @todo: ignore other directories like cache, backups, … $exclude = array_merge( $this->default_exclude, $this->additional_exclude ); $finder->files()->in( $this->abspath )->exclude( $exclude )->notPath( '/.*\/node_modules\/.*/' ); if ( ! $finder->hasResults() ) { // @todo: Add error, nothing found. return; } $this->files_list = $finder; } /** * Copy the files. */ private function copy_files() { // Loop the files list. $already_processed = true; foreach ( $this->files_list as $index => $file ) { // Check if we already had that file. if ( $this->current_file_index === '' ) { $already_processed = false; } else if ( $this->current_file_index === $index ) { $already_processed = false; } if ( $already_processed ) { continue; } $tmp = str_replace( '\\', '/', $file->getRelativePath() ); // Check if current path is not empty. if ( $this->current_path !== '' ) { // Now check if the path of prev file // does not exist in path of current file. if ( strpos( $tmp, $this->current_path ) !== 0 ) { $pushed_to_array = false; // Add prev path to exclude in conditional cases. // Add it if is wp-admin folder or a direct subfolder. if ( strpos( $this->current_path, 'wp-admin' ) === 0 && substr_count( $this->current_path, '/' ) <= 1 ) { array_push( $this->additional_exclude, untrailingslashit( $this->current_path ) ); $pushed_to_array = true; } // Add it if is wp-content folder or up to two levels deeper. if ( strpos( $this->current_path, 'wp-content' ) === 0 ) { // Get first three directories of paths. // https://stackoverflow.com/a/1935929/7774451 $path_parts = explode( '/', $this->current_path, 4 ); if ( isset ( $path_parts[3] ) ) { unset( $path_parts[3] ); } $tmp_path_parts = explode( '/', $tmp, 4 ); if ( isset ( $tmp_path_parts[3] ) ) { unset( $tmp_path_parts[3] ); } // If both arrays would be the same, that means we are deeper than three subdirs. // CHECK WHY THAT DOES NOT WORK FOR ANTISPAM-BEE AND ANTISPAM-BEE-3-0 if ( $path_parts !== $tmp_path_parts ) { // Push the path from $path_parts. array_push( $this->additional_exclude, implode( '/', $path_parts ) ); $pushed_to_array = true; } } // Remove entries from exclude that are covered by more general rules. if ( $pushed_to_array ) { $filtered = array_filter( $this->additional_exclude, function( $var ) { // Check if $var is equal with current_path. if ( $var === untrailingslashit( $this->current_path ) ) { return true; } // If $var contains $this->current_path, remove it. if ( strpos( $var, untrailingslashit( $this->current_path ) ) === 0 ) { return false; } return true; } ); $this->additional_exclude = $filtered; } } } $this->current_path = str_replace( '\\', '/', $file->getRelativePath() ); $absolute_file_path = str_replace( '\\', '/', $file->getRealPath() ); // Create dest path for file. $dest_file_path = str_replace( $this->abspath, $this->dest, $absolute_file_path ); // Copy the file. try { $this->filesystem->copy( $absolute_file_path, $dest_file_path ); } catch ( IOExceptionInterface $exception ) { // @todo: make something with the error. } // Check if we are near the self-defined script timeout limit. if ( time() - $this->timer >= $this->time_limit - 1 ) { $this->current_file_index = $index; // Store the current object in a file. $this->filesystem->dumpFile( $this->status_file, serialize( $this ) ); // Add a new cron event in the near future to continue the copying. wp_schedule_single_event( time(), 'flobn_uaas_continue_copying', [ $this->status_file ] ); // Exit. exit(); } } error_log( 'Finished file copying' ); } public function set_abspath( string $abspath ) { // Replace backslash with slash. $abspath = str_replace( '\\', '/', $abspath ); $this->abspath = trailingslashit( $abspath ); } }
Code language: PHP (php)

To start the process, we need to run the following code (the class file needs to be included, too, for example, via the Composer autoloader):

namespace FlorianBrinkmann\Copier; // Load Composer autoloader. From https://github.com/brightnucleus/jasper-client/blob/master/tests/bootstrap.php#L55-L59 $autoloader = dirname( __FILE__ ) . '/vendor/autoload.php'; if ( is_readable( $autoloader ) ) { require_once $autoloader; } // Create Plugin object. $uaas = new Plugin(); // Set abspath property. $uaas->set_abspath( ABSPATH ); $uaas->init();
Code language: PHP (php)

Now the explanation of the most important parts:

  • Line 120: timer property is set to the current timestamp to later be able to check the script runtime.
  • Line 128: A directory name for the target folder is generated and it is checked if it already exists.
  • Lines 141 and 142: The directories to exclude by default are the uploads folder and the target folder. After that, the file list is generated.
  • Lines 177-192: Running the Symfony Finder in line 183. In addition to our ignore rules and the ones that come from the Finder component by default (version control directories, for example), we also exclude node_modules folders. In line 191 the Finder object is stored in the files_list property.
  • Lines 148 and 149: We create a status file in the target directory and save the current object ($this) to it.
  • Line 153: We start copying.
  • Line 197 following: The method for copying the files. We loop the files list and check if a file was already copied by waiting until the index of the last processed file is there (the files are always in the same order, and the index of the current file is set in line 288). All files before the index can be ignored because they are already processed.
  • Starting line 212, we begin with the check for fully processed directories. We cannot use all folders, because that would crash the Finder. We limit it to direct subdirectories in the wp-admin folder and two levels of subdirectories in the wp-content folder, so, for example, wp-content/plugins/antispam-bee.
  • In lines 252 to 268, we check if $this->additional_exclude contains folders that are matched by other entries and remove them. That would be the case for plugin folders after the whole wp-content/plugins directory is processed.
  • In line 281 the file is copied.
  • The lines 287 to 289, we check if we are near the timelimit by one second. If that is the case, the currently processed file is saved in current_file_index and the current object in our state file. After that, the cron event is planned that executes the flobn_uaas_continue_copying hook and gets the path to the status file as a parameter.
  • Lines 159-172: The function that is hooked to flobn_uaas_continue_copying calls the continue_copying() method. In it we check it step is copying-files and if that is true, we set the timer, create our file list (in create_file_list() also the processed directories are ignored) and continue with copying.

The action that is called by the cron event looks like this:

add_action( 'flobn_uaas_continue_copying', function( $status_file ) { // Get status file. $status_file_contents = file_get_contents( $status_file ); // unserialize object. $uaas = unserialize( $status_file_contents ); // Continue with copying. $uaas->continue_copying(); } );
Code language: PHP (php)

We get the content of the status file, undo serialization of the object and run the continue_copying() method.

And that is it, we have a process that even with a short execution time can copy large folders 🎉

If copying one file needs a long time, that could of course lead to a timeout error, because the check is done after copying and not in between. But it should be a good starting point.

5 reactions on »Copying directory with many files in multiple steps via PHP«

Reposts

Leave a Reply

Your email address will not be published.