This topic is locked

Guide 30 - Load very fast and online data

12/29/2021 7:09:43 PM
PHPRunner Tips and Tricks
fhumanes author

img alt
There are many phprunner projects that are functionalities added to large CRM systems, etc. And sometimes it requires that we copy an important part of information to the system developed in Phprunner.

Also, as is the case of the example that you bring you, OPEN DATA data from public administrations are used to complement information from our systems.

The example that I have made is the dynamic capture of COVID information of all the countries of the world and the regions of the countries of the European Union.

The example makes:

  • The first user who connects the day, starts the refreshment of the information published by the EU with the system that explains below.
  • The other users of the day, use the information that has previously been loaded.

The refreshment process does (on my pc takes less than 20 seconds):

  • Block the tables involved so that no one will access until completing the data refreshment process.
  • Clean the data storage tables.
  • Download the EU website of the 2 files (about 20 MB of information and some 75,000 records). They are JSON files
  • Convert those JSON files into CSV files.
  • Load the new data with the command "Load Data Infilt".
  • Unlock the taxes involved.
  • Refresh the page to see the new data.

Objetive

Explain an online data refresh method where a considerable volume of information is involved in a very small time.

DEMO: https://fhumanes.com/covid/

You can access all the information and the example code, accessing my portal.

fhumanes author 12/29/2021

This solution is oriented to MySQL, since many points of the solution depend on this database manager.

Things we have to take into account are:

  • We have to use tables with MYISAM storage. This is because they are the fastest in creating, erased and do not affect the rest of the information of the database.
  • The information load command must be LOAD DATA INFILE. You have to verify that your manager has enabled this function or that you can enable it. In the hosting where I publish is not enabled and I would have to pay a lot to enable it.
  • It is tremendously quick to load with this utility. To load, it requires that the data file from the CSV type.
  • I use the "Convert JSON TO CSV" library to pass the data from the original format to the format to load.
  • I use the UNIREST library to recover the JSON files from the EU website.

The screen that appears when refreshing the data is:

img alt
The Snippet Code of the load "Banner" is:

// Control of whether it has already been reviewed today
$rs = DB::Query("SELECT * FROM covid_control");
$last_update = date("Y-m-d", strtotime($rs->value("update_date")));
$id_covid_control = $rs->value("id");

$now = date('Y-m-d');

if ( $last_update <> $now ) { // We must produce the update of the data tables
$html = <<<EOT
<script src="MyCode/bootstrap-waitingfor/bootstrap-waitingfor.js"></script>
<script>

$.ajax({
type: "POST", //we are using POST method to submit the data to the server side
url: 'MyCode/ajax_refresh.php', // get the route value
beforeSend: function () {//We add this before send to disable the button once we submit it so that we prevent the multiple click
waitingDialog.show('Analyzing if it is necessary to refresh the data and charge them...',{
// if the option is set to boolean false, it will hide the header and "message" will be set in a paragraph above the progress bar.
// When headerText is a not-empty string, "message" becomes a content above the progress bar and headerText string will be set as a text inside the H3;
headerText: 'Refreshing European Union Covid data',
// this will generate a heading corresponding to the size number
headerSize: 3,
// extra class(es) for the header tag
headerClass: '',
// bootstrap postfix for dialog size, e.g. "sm", "m"
dialogSize: 'm',
// bootstrap postfix for progress bar type, e.g. "success", "warning";
progressType: '',
// determines the tag of the content element
contentElement: 'p',
// extra class(es) for the content tag
contentClass: 'content'
});
},
success: function (response) {//once the request successfully process to the server side it will return result here
// waitingDialog.hide();
},
complete: function() {
waitingDialog.hide();
location.reload(); // Reload page LIST
},
error: function (XMLHttpRequest, textStatus, errorThrown) {
// You can put something here if there is an error from submitted request
}
});
</script>

EOT;
echo $html;
}

The information recharge code "ajax_refresh.php" is:

<?php

$debugCode = false;

function error($number,$text){ // Function to produce the error file
global $debugCode;
if ($debugCode == true ) {
$ddf = fopen(__DIR__ .'/../error.log','a');
fwrite($ddf,"[".date("r")."] Error $number: $text\r\n");
fclose($ddf);
}
}

require_once("../include/dbcommon.php"); // DataBase PHPRunner

require_once __DIR__ . '/json-csv_0.4.0/autoload.php';
require_once __DIR__ . '/unirest_3.0.4/autoload.php';
use OzdemirBurak\JsonCsv\File\Json;

// Control of whether it has already been reviewed today
$rs = DB::Query("SELECT * FROM covid_control");
$last_update = date("Y-m-d", strtotime($rs->value("update_date")));
$id_covid_control = $rs->value("id");

$now = date('Y-m-d');

if ( $last_update <> $now ) { // We must produce the update of the data tables

error(1,'Hay que recoger los datos'); // To debug the code on the server
// Exclusive blockade of updated resources
DB::Exec("LOCK TABLE covid_country WRITE, covid_subcountry_cases WRITE, covid_control WRITE");
// Update the Control Registry
$data = array();
$keyvalues = array();
$data["update_date"] = $now;
$keyvalues["id"] = $id_covid_control;
DB::Update("covid_control", $data, $keyvalues );
// Clean all data from tables
DB::Exec("TRUNCATE table covid_country");
DB::Exec("TRUNCATE table covid_subcountry_cases");

error(2,'Se ha borrado todos los datos anteriores'); // To debug the code on the server

// Collect new data from the website of the European Union
$file_1_json = tempnam(sys_get_temp_dir(), 'json'); // Create temporal file
$response = Unirest\Request::get("https://opendata.ecdc.europa.eu/covid19/nationalcasedeath/json/",
array(
"X-RapidAPI-Host" => "opendata.ecdc.europa.eu",
)
);
$a = $response->code; // HTTP Status code
$b = $response->headers; // Headers
// $c = $response->body; // Parsed body
// $d = $response->raw_body; // Unparsed body
file_put_contents($file_1_json, $response->raw_body);

error(3,'Se ha descargado el primer fichero "country" '.$file_1_json); // To debug the code on the server

$file_2_json = tempnam(sys_get_temp_dir(), 'json'); // Create temporal file
$response = Unirest\Request::get("https://opendata.ecdc.europa.eu/covid19/subnationalcaseweekly/json/",
array(
"X-RapidAPI-Host" => "opendata.ecdc.europa.eu",
)
);
$a = $response->code; // HTTP Status code
$b = $response->headers; // Headers
// $c = $response->body; // Parsed body
// $d = $response->raw_body; // Unparsed body
file_put_contents($file_2_json, $response->raw_body);

error(4,'Se ha descargado el segundo fichero "subcountry "'.$file_2_json); // To debug the code on the server

// JSON to CSV
$file_1_csv = tempnam(sys_get_temp_dir(), 'csv');
$file_2_csv = tempnam(sys_get_temp_dir(), 'csv');

$json = new Json($file_1_json);
// To convert JSON to CSV string
$csvString = $json->convert();
// To set a conversion option then convert JSON to CSV and save
$json->setConversionKey('utf8_encoding', true);
$json->convertAndSave($file_1_csv);

error(5,'Se ha obtenido el 1º fichero CSV'.$file_1_csv); // To debug the code on the server

$json = new Json($file_2_json);
// To convert JSON to CSV string
$csvString = $json->convert();
// To set a conversion option then convert JSON to CSV and save
$json->setConversionKey('utf8_encoding', true);
$json->convertAndSave($file_2_csv);

error(6,'Se ha obtenido el 2º fichero CSV'.$file_2_csv); // To debug the code on the server

//Read the first line that contains the names of the fields
$handle = fopen($file_1_csv, "r");
$line_1 = fgets($handle);
fclose($handle);
$handle = fopen($file_2_csv, "r");
$line_2 = fgets($handle);
fclose($handle);

//Replace fields that do not want to be loaded
$line_1 = str_replace("source", "@dummy", $line_1);
$line_1 = str_replace(";", ",", $line_1);
$line_1 = trim($line_1);
$line_1 = substr($line_1, 3); //The first character is eliminated that I do not know how to identify

$line_2 = str_replace("source", "@dummy", $line_2);
$line_2 = str_replace(";", ",", $line_2);
$line_2 = trim($line_2);
$line_2 = substr($line_2, 3); //The first character is eliminated that I do not know how to identify

error(7,'Se ha obtenido las columnas: '.$line_1.' - '.$line_2); // To debug the code on the server

$file_1_csv = str_replace("\\", "/", $file_1_csv);
$load_data_1 = <<<EOT
LOAD DATA LOCAL INFILE '$file_1_csv'
INTO TABLE covid_country
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\\n'
IGNORE 1 ROWS
($line_1)
EOT;
mysqli_options($conn, MYSQLI_OPT_LOCAL_INFILE, true); // Necessary to produce the load from "Inline" file
DB::Exec($load_data_1);

error(8,'Se ha hecho el 1º LOAD DATA'.$load_data_1); // To debug the code on the server

$file_2_csv = str_replace("\\", "/", $file_2_csv);
$load_data_2 = <<<EOT
LOAD DATA LOCAL INFILE '$file_2_csv'
INTO TABLE covid_subcountry_cases
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\\n'
IGNORE 1 ROWS
($line_2)
EOT;
DB::Exec($load_data_2);

error(9,'Se ha hecho el 2º LOAD DATA'.$load_data_2); // To debug the code on the server

// Delete file auxiliary
unlink($file_1_json);
unlink($file_2_json);
unlink($file_1_csv);
unlink($file_2_csv);
// Exclusive blockade of updated resources
DB::Exec("UNLOCK TABLES");
}

If you set, at the beginning I define an "error" function that I use to generate debug traces when I do not have full access to the server (when I am in hosting). You can use it to debug your codes.

For any questions or information consult me on my email fernandohumanes@gmail.com

I leave the project and the database so you can install it in your Windows and review or change what you want.

fhumanes author 1/19/2022

img alt
Yesterday I saw that this system was failing and I have reviewed it to identify what the problem was.

The problem is that in the JSON file of the world's Covid data they had added an additional field "note" and said field did not exist in the database. Since I didn't want to load it, I renamed it to “DUMMY” and the problem has been solved.

Taking advantage of the change, I have added a new functionality and it is the calculation of the increase or decrease in the contagion rate of the last 14 days.

img alt
This data is calculated "flight" recovering the coefficient of the previous week. The system used is important (because it is efficient) and I recommend it if you have a similar situation in your application.

The article describing the solution has been updated. Click this link to access it .