Technical decisions
The choice to use PHP on traditional LAMP-stack shared hosting was primarily a cost-driven one; nevertheless, it proved to be more than adequate seeing as Formulary is a straightforward, read-heavy app with a read-write ratio in excess of 2000:1.
Initially, the delta-sync endpoint was implemented in code, approximately as follows;
$DELETED = -1;
$HASH_CHANGED = 1;
$NEW = 2;
/*[{
"drug_id": 1,
"hash": "4Aid+/cssFK2Dv0lPI3Qv8jH/uKYnM4zNbglWUvqh2o="
},{
"drug_id": 2,
"hash": "ew6IP+Zxy25GUCFR3vuGU168w2RI7gpAFGJxwLL9LeI="
}]*/
$output = [];
$hashes = json_decode($request->getContent(), true);
foreach ($hashes as $hash_entry) {
// Check in DB whether the particular id - hash combo exists; get by ID first
$drug = DrugCurrentVersion::whereDrugId($hash_entry['drug_id'])->take(1)->get();
if ($drug->isEmpty()) {
array_push($output, array('drug_id' => $hash_entry['drug_id'], 'status' => $DELETED));
} else {
$drug = $drug[0];
$drug_output = [];
if (base64_encode(hex2bin($drug->hash)) !== $hash_entry['hash']) {
foreach (self::DRUG_FIELDS as $drug_field) {
$drug_output[$drug_field] = $drug[$drug_field];
}
$drug_output['hash'] = base64_encode(hex2bin($drug['hash']));
array_push($output, array('drug_id' => $hash_entry['drug_id'], 'status' => $HASH_CHANGED, 'drug' => $drug_output));
}
}
}
// get all the new drugs
$new_drugs = DrugCurrentVersion::query()->whereNotIn('drug_id', array_column($hashes, 'drug_id'))->get();
foreach ($new_drugs as $new_drug) {
$drug_output = [];
foreach (self::DRUG_FIELDS as $drug_field) {
$drug_output[$drug_field] = $new_drug[$drug_field];
}
$drug_output['hash'] = base64_encode(hex2bin($new_drug['hash']));
array_push($output, array('drug_id' => $new_drug['drug_id'], 'status' => $NEW, 'drug' => $drug_output));
}
return response()->json($output);
The main bottleneck was doing
foreach ($hashes as $hash_entry) {
$drug = DrugCurrentVersion::whereDrugId($hash_entry['drug_id'])->take(1)->get();
...
followed by the cloning of the returned entities.
One straightforward optimization could have been to create a HASH
index on the primary id
key, or perhaps load the entire drugs table into memory at once. The ultimate solution was to offload this "diffing" process to MySQL for the following reasons:
- The MySQL server was higher-specced and would likely handle concurrency better than
php-fpm
. - Our problem is easily expressed in terms of
JOIN
s; the code is perfectly concise despite manually written SQL queries. - The entire drugs table wouldn't need to be loaded into PHP memory all at once.
public function drugSyncChangeState(Request $request)
{
$DELETED = -1;
$HASH_CHANGED = 1;
$NEW = 2;
$hashes = $request->input("hashes");
$lastSync = $request->input("lastSync");
$forceSyncAt = SysKV::query()->whereKey("force_sync_at")->first();
$forceSync = false;
if (!empty($lastSync) && !empty($forceSyncAt)) {
$lastSync = Carbon::parse($lastSync);
$forceSyncAt = Carbon::parse($forceSyncAt->value);
$forceSync = $lastSync <= $forceSyncAt;
}
$changedState = DB::transaction(function () use ($hashes, $forceSync, $DELETED, $HASH_CHANGED, $NEW) {
/**
* Prepare temporary state table.
*/
$tempTableName = str_replace('.', '', uniqid("drugsync_temp_", true));
DB::statement("
CREATE TEMPORARY TABLE $tempTableName (
drug_id INT UNSIGNED,
hash BINARY(32),
PRIMARY KEY (drug_id)
);
");
$hashInsert = [];
foreach ($hashes as $id => $hash) {
$hashInsert[] = ['drug_id' => $id, 'hash' => $this->hex2bin($hash)]; //If hash is not valid hex, will sync it.
}
DB::table($tempTableName)->insert($hashInsert);
/**
* SQL for finding deleted drugs.
* @noinspection SqlResolve
*/
$deletedSQL = "
SELECT $DELETED AS change_status, COALESCE(t.drug_id, d.drug_id) as drug_id, d.hash
FROM $tempTableName t
LEFT JOIN (select * from drugs_current d where deleted_at IS NULL) d ON t.drug_id = d.drug_id
WHERE d.drug_id IS NULL
";
/**
* SQL for finding new drugs.
* @noinspection SqlResolve
*/
$newSQL = "
SELECT $NEW as change_status, COALESCE(t.drug_id, d.drug_id) as drug_id, d.hash
FROM $tempTableName t
RIGHT JOIN (select * from drugs_current d where deleted_at IS NULL) d
ON t.drug_id = d.drug_id
WHERE t.drug_id IS NULL
";
/**
* SQL for finding changed drugs.
*/
$changedHashSQL = "
SELECT $HASH_CHANGED AS change_status, d.drug_id, d.hash
FROM (select * from drugs_current d where deleted_at IS NULL) d
WHERE drug_id IN (SELECT DISTINCT drug_id FROM $tempTableName)
AND NOT EXISTS(SELECT 1
FROM $tempTableName
WHERE $tempTableName.drug_id = d.drug_id
AND $tempTableName.hash = d.hash);
";
/**
* SQL that just forces sync for existing drugs.
*/
$skipHashCheckSQL = "
SELECT $HASH_CHANGED AS change_status, d.drug_id, d.hash
FROM (select * from drugs_current d where deleted_at IS NULL) d
WHERE drug_id IN (SELECT DISTINCT drug_id FROM $tempTableName);
";
$step3 = $forceSync ? $skipHashCheckSQL : $changedHashSQL;
$results = DB::select("
$deletedSQL
UNION
$newSQL
UNION
$step3
");
DB::statement("DROP TABLE $tempTableName;"); // Drop the temporary state table.
return $results; //Return and end transaction.
});
$results = [];
foreach ($changedState as $row) {
$results[] = [
'drug_id' => (int)$row->drug_id,
'status' => (int)$row->change_status
];
}
return response()->json($results);
}
Metrics
I conducted a load test using Gatling with the following parameters; 20 concurrent users per second, over 30 seconds, with a 1 second pause between requests: ScenarioBuilder scn = scenario("RecordedSimulation")
.exec(
http("request_0")
.post("/api/app/v1/drugs/syncState")
.headers(headers_0)
.body(RawFileBody("recordedsimulation/0000_request.json"))
).pause(1);
setUp(scn.injectOpen(constantUsersPerSec(20).during(30))).protocols(httpProtocol);
A mean response time of 562 ms was achieved, with each request body containing 1372 drug entity ID-hash pairs.
================================================================================
---- Global Information --------------------------------------------------------
> request count 600 (OK=600 KO=0 )
> min response time 374 (OK=374 KO=- )
> max response time 1465 (OK=1465 KO=- )
> mean response time 562 (OK=562 KO=- )
> std deviation 255 (OK=255 KO=- )
> response time 50th percentile 432 (OK=432 KO=- )
> response time 75th percentile 522 (OK=522 KO=- )
> response time 95th percentile 1107 (OK=1107 KO=- )
> response time 99th percentile 1227 (OK=1227 KO=- )
> mean requests/sec 18.75 (OK=18.75 KO=- )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms 473 ( 79%)
> 800 ms <= t < 1200 ms 114 ( 19%)
> t >= 1200 ms 13 ( 2%)
> failed 0 ( 0%)
================================================================================
The figure of 5.5K MAU is arrived at by:
- ~2.5K users are from the Statistics section of the Google Play Console, Monthly Active Users.
- ~1K of users, comprising 33% of the entire iOS userbase, opted in to share their usage information, which was taken from the Metrics section in Apple's App Store Connect, ACTIVE LAST 30 DAYS.
It was known beforehand that iOS, in particular iPad users, outnumber Android users among the target demographic, hence extrapolating to 3K is a reasonable assumption.