background image
Eric's picture

In this code snippet, I'll show how you can parse a (large) CSV file using Drupal's Batch API. The purpose of batching an operation is to avoid PHP memory limits and time outs. Before you begin, I recommend reviewing the following two articles. Be sure to review the additional batch parameters outlined in the documentation, you might need to use them.

Batch API
Batch Operations

We can start by defining the arguments that will be passed into the batch_set() function. For this example, I added this code in an arbitrary page callback function.

<?php
function MYMODULE_callback_csv_import() {

 
// define path to CSV file
 
$csv_file_path = file_directory_path() . '/import_path/myfile.csv';

 
// define a redirect path upon batch completion
 
$redirect_path = 'admin/import-csv';

 
// define batch array structure
  // NOTE: minimal parameters defined to simplify code
 
$batch = array(
   
'title' => t('Reading File'),
   
'operations' => array(
      array(
       
'_MYMODULE_batch_read', array($csv_file_path),
      ),
    ),
  );

 
// set batch
 
batch_set($batch);

 
// process batch
 
batch_process($redirect_path);

}
?>

Next, we'll define the batch callback function. This function will be called repeatedly until the $context['finished'] variable is set to "1".

<?php
function _MYMODULE_batch_read($csv_file_path, &$context) {
 
 
// define batch limit
 
$batch_limit = 100;

 
// assume the batch process has not completed
 
$context['finished'] = 0;

 
// open the file for reading
 
$file_handle = fopen($csv_file_path, 'r');

 
// check if file pointer position exists in the sandbox, and jump to location in file
 
if ($context['sandbox']['file_pointer_position']) {
   
fseek($file_handle, $context['sandbox']['file_pointer_position']);
  }
 
 
// loop through the file and stop at batch limit
 
for ($i = 0; $i < $batch_limit; $i++) {

   
// get file line as csv
   
$csv_line = fgetcsv($file_handle);

   
// NOTE: at this point, do what ever you'd like with the CSV array data!
   
if (is_array($csv_line)) {
     
// db_query(), etc   
   
}

   
// retain current file pointer position
   
$context['sandbox']['file_pointer_position'] = ftell($file_handle);

   
// check for EOF
   
if (feof($file_handle)) {
     
// complete the batch process
     
$context['finished'] = 1;

     
// end loop
     
break;
    }

  }

}
?>

The batch operation will be called until the end of the CSV file is reached. The $context variable is passed by reference into the batch callback so you can maintain data through each iteration; in this case, the position of the file pointer. When the batch operation is complete, the user will be redirected to the batch_process() path argument.

It's important to read the full Batch API documentation so you can take advantage of its additional features: finished callback, init_message, progress_message, error_message, etc.

Thanks for the code, it was

Thanks for the code, it was extremely useful and allowed me to meet an unexpected deadline. I made an addition to this script that would allow a csv file to be uploaded from a form instead of a hard coded file path.

Create the drupal form.

<?php
function MYMODULE_form() {
  
$form = array();
  
  
$form['#attributes'] = array('enctype' => "multipart/form-data");
  
  
$form['csv'] = array(
    
'#type' => 'file',
    
'#title' => t('Select a CSV File'),
    
'#size' => 40,
   );
  
  
$form['import']['submit'] = array(
    
'#type' => 'submit',
    
'#value' => t('Submit'),
   );
  
   return
$form;
}
?>

Validate the form and make sure the file field is not blank.

<?php
function MYMODULE_form_validate($form, &$form_state) {
  
$file = file_save_upload('csv');
    if (!
$file)
     
form_set_error('upload', 'You must select a valid file to upload.');
    else {
     
// Manually add the uploaded file to the $form_state
     
$form_state['values']['csv']['title'] = $file->filename;
     
$form_state['values']['csv']['file'] = $file;
    }
}
?>

The submit handler will initialize and start the batch script

<?php
function MYMODULE_form_submit($form, &$form_state) {
  
$csv_file_path = $form_state['values']['csv']['file']->filepath;
  
  
// define batch array structure
   // NOTE: minimal parameters defined to simplify code
  
$batch = array(
   
'title' => t('Reading File'),
   
'operations' => array(
      array(
       
'_MYMODULE_batch_read', array($csv_file_path),
      ),
    ),
   );
 
 
// set batch
 
batch_set($batch);
}
?>

next use the _MYMODULE_batch_read from the main post.