PHP / GC story-4) Memory Leak and variable data that cannot be released

Sep 8, 2020 PHP memory management GC Garbage Collection

Preface

――All articles include a summary of your study objectives and subjectivity. Please use it at the reference level only. If you have any incorrect information, I would be very grateful if you could give us your opinion. ――In the content, it may be difficult to understand because it is omitted or ambiguous. If you contact us, we will add supplements, so please do not hesitate to contact us. ―― “GC” in the text means “Garbage Collection, Garbage Collector” and is used as an abbreviation. ――This article is designed for serialization.

** * Serialization list **

-PHP / GC story-1 story) Why Garbage Collection? Be aware of memory and GC -PHP / GC story-2 story) Variable management information, zval container and reference count -PHP / GC story-3 story) Disappearance of variable data from memory –PHP / GC story-4) Memory Leak and variable data that cannot be released (← current article) –PHP / GC story-5 episodes) GC appeared. GC occurrence condition and root buffer ⇨ Preparing –PHP / GC story-6 story) Patrol / deletion of managed objects. Garbage Collection Cycle ⇨ Preparing –PHP / GC story-7) GC related function introduction (GC Statistics, Weak Reference Type) (END) ⇨ Preparing

** * Sample code used in the series **

Sample Code Link on Github

● ExampleGc.php: Sample code used in episodes 2 to 6. ● ExampleWeakReference: This is the sample code used in the contents of WeakReference in episode 7.

This series of articles is basically based on this sample code. The sample code does not have to be seen or tried. For each story, we will break down the code and explain the operating principle and results, so try to keep the basic article content sufficient. For those who want to see the whole code, turn it at hand, modify it, and turn it.

This story

This time, I will talk about the following.

―― 1. What is Memory Leak?

From this time, I will look at it while quoting the sample code in earnest. I think you should look at it while referring to the code in the link below.

1. What is Memory Leak?

Simply put

Phenomenon that variable data that can no longer be used continues to occupy memory

is not it. This is also referred to as “garbage / Garbage”.

To explain in a little more detail, I quote the definition of Memory Leak from the wiki.

https://en.wikipedia.org/wiki/Memory_leak#cite_note-1

① In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in such a way that memory which is no longer needed is not released. ② A memory leak may also happen when an object is stored in memory but cannot be accessed by the running code.

If you translate a little

(1) In computer engineering, Memory Leak is a phenomenon of finite resource waste caused by incorrect memory space allocation that cannot be released even though it is no longer used in a program. ② As an example, a memory leak can be caused by an object that is stored in memory but is no longer successful in the executable code.

is not it.

Other than that, I think the most common case is forgetting to cancel resource types such as network connections and graphics. * 1

One example of Memory Leak for (2) is the “circular reference” that we will see from now on. We will continue to take a closer look at the “circular reference” in case (2).

2. Example of variable data that cannot be released

“Cannot be released” here means

The programmer’s intention is that the original data of a variable that has already been released and should not be used anymore remains in memory.

Means.

A simple analogy is a “circular reference” between objects. If you introduce it with a simple code, you can reproduce it with the following code.

$ a = new \ stdClass;
$ b = new \ stdClass;

// Circular reference
$ a-> node = $ b;
$ b-> node = $ a;

// In this case, the data will not disappear from memory as it should not be used anymore
unset ($ a);
unset ($ b);

I will explain why this is a problem with sample code and execution results.

1) Sample code example

● Class definition quote Sample Code Link on Github

abstract class Base
{
    private $ dummyData;
    private $ tag = null;
    private $ nodes = array ();

    public function __construct ($ tag)
    {
        $ this-> tag = $ tag;
        $ this-> dummyData = str_repeat ('a', 20 * 1024 * 1024); // 20M Byte size approximately
    }

    / **
     * add Reference as ChildNode
     * /
    public function addNode (object $ obj)
    {
        $ this-> nodes [] = $ obj;
        return $ this;
    }
}

class AliveInScope extends Base {}

class CircularReference extends Base {}

● Code quote Sample Code Link on Github

    private function doExampleGcBasic ()
    {
//... Omitted
        Log :: debug (null, ['event' =>'new','msg' =>'V']);
        $ alive = new AliveInScope ('V');
//... Omitted
        Log :: debug (null, ['event' =>'new','msg' =>'A, B']);
        $ circleA = new CircularReference ('A');
        $ circleB = new CircularReference ('B');

        Log :: debug (null, ['event' =>'set','msg' =>'$ alive`s reference to A']);
        $ circleA-> addNode ($ alive);

        Log :: debug (null, ['event' =>'set','msg' =>'circluar reference on A B']);
        $ circleA-> addNode ($ circleB);
        $ circleB-> addNode ($ circleA);

        xdebug_debug_zval ('alive');
        xdebug_debug_zval ('circleA');
        xdebug_debug_zval ('circleB');
        $ this-> logMemUsage ();

        Log :: debug (null, ['event' =>'unset','msg' =>'A, B']);
        unset ($ circleA);
        unset ($ circleB);

        xdebug_debug_zval ('alive');
        xdebug_debug_zval ('circleA');
        xdebug_debug_zval ('circleB');
        $ this-> logMemUsage ();
//... Omitted

● Quote of execution result

root @ bc290870f5e9: / var / www / html / subdomain / laravel # ./artisan example: gc | cut -d "$" -f 1

[2020-09-07 20:30:37] local.DEBUG: {"event": "new", "msg": "V"}
alive: (refcount = 1, is_ref = 0) = class App \ Console \ Commands \ AliveInScope {private
[2020-09-07 20:30:37] local.DEBUG: {"Memory Usage (Bytes)": "37,060,552"}
[2020-09-07 20:30:37] local.DEBUG: {"event": "new", "msg": "A, B"}
[2020-09-07 20:30:37] local.DEBUG: {"event": "set", "msg": "
[2020-09-07 20:30:37] local.DEBUG: {"event": "set", "msg": "circluar reference on A B"}alive: (refcount=2, is_ref=0)=class App\Console\Commands\AliveInScope { private 
circleA: (refcount=2, is_ref=0)=class App\Console\Commands\CircularReference { private 
circleB: (refcount=2, is_ref=0)=class App\Console\Commands\CircularReference { private 
[2020-09-07 20:30:37] local.DEBUG:  {"Memory Usage(Bytes)":"79,015,032"} 
[2020-09-07 20:30:37] local.DEBUG:  {"event":"unset","msg":"A, B"}
alive: (refcount=2, is_ref=0)=class App\Console\Commands\AliveInScope { private 
circleA: no such symbol
circleB: no such symbol
[2020-09-07 20:30:37] local.DEBUG:  {"Memory Usage(Bytes)":"79,015,672"} 
```

### 2) コードと実行結果の解説
```php
        Log::debug(null, ['event' => 'set', 'msg' => 'circluar reference on A B']);
        $circleA->addNode($circleB);
        $circleB->addNode($circleA);
```
```
[2020-09-07 20:30:37] local.DEBUG:  {"event":"set","msg":"circluar reference on A B"}
...中略
[2020-09-07 20:30:37] local.DEBUG:  {"Memory Usage(Bytes)":"79,015,032"} 
```

ソスコード上の上記のポイントで、AとBは、お互いの内部でお互いを参照することになります。
その後のメモリの使用量は「79,015,032」Bytesになっています。

```php
        Log::debug(null, ['event' => 'unset', 'msg' => 'A, B']);
        unset($circleA);
        unset($circleB);
```
```
[2020-09-07 20:30:37] local.DEBUG:  {"event":"unset","msg":"A, B"}
alive: (refcount=2, is_ref=0)=class App\Console\Commands\AliveInScope { private 
circleA: no such symbol
circleB: no such symbol
[2020-09-07 20:30:37] local.DEBUG:  {"Memory Usage(Bytes)":"79,015,672"} 
```

unsetをして、A,Bは、これ以上使うこともできないのに、なぜかメモリの使用量は減っていません。
つまり、使うこのもないのに、有限であるメモリ空間をずっと専有していることになります。

なぜ、メモリの使用量は減らずに、実のデータがずっと残り続けるのでしょう。

### 3) コードの実行時に起きる変数と参照カウントの変化解説(GIF)
#### ①循環参照変数の生成の段階の変化

![](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/436887/80d44515-7336-a4c2-9307-017b8060202d.gif)


上記のイメージの4番目のように、AとBは、`$circleA`, `$circleB`の参照以外に、各自の内部で、お互いを参照するようになります。
だとしたら、`$circleA`と`$circleB`をunsetし、変数を無効にしたらどうなるのでしょう。

#### ②`$circleA`と`$circleB`をunsetした後の段階の変化 (絵が間違っている)

![](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/436887/c351eb7f-4a5a-afe4-15b7-439cbba36e57.png)

![](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/436887/a6aaa1e8-6795-650b-4cb0-ce94d3b3aa21.png)

前回の「変数が消滅しない条件」は
> 「実際のデータの参照が一つ無効になる」時、参照カウント(refcount)が「1以上」であれば、データは消滅せずに残り続ける。

でした。

絵の2番めのように、`$circleA`と`$circleB`の変数を解除したことで、2つとも実のデータにアクセスできなくなりました。
しかし、元のデータはお互いを参照しているので、参照カウントは2→1になりますが、その時の参照カウントが1以上であるため、「どこかで使われている」とシステムでは認識し、メモリから解除することはできなくなります。

なので、プログラマー的には、消滅してほしいデータであるにも関わらず、ずっとメモリに残り続けることになります。


まさに、Memory Leakとして話した、`もう使えない変数データが、メモリを専有し続ける現象`であり、`ゴミ・Garbage`ですね。

だとしたら、このゴミ問題を解決するために、PHPではどういう機能を提供しているのでしょうか。

その一つが、次回に登場するGC・Garbage Collectionになります。
そこは、次回に詳しく説明することになります。

## 3. Summary

今回で、最低限に覚えて頂くと良い内容は以下になります。
-  Memory Leakとは、「もう使えない変数データが、メモリを専有し続ける現象」であり、「ゴミ」が残る現象
-  循環参照は、Memory Leakのわかりやすい例であり、GCが収集する対象としてのわかりやすい一例
-  このゴミ問題を解決するために、PHPではどういう機能の一つが、GC・Garbage Collection

## 後書き
1話から今回まではGCの背景になる、変数の仕様・動作・消滅メカニズムなどを話してきました。

次からは、本格的にGCのメカニズムに対して話していきます。

GCの発生条件や明示的に発生させる方法、GCが起きたら行われるメカニズムを解説していきます。この時に、zvalと参照カウントの理解と、変数の消滅基準、消滅せず残り続けるデータの理解が必要なので、それを意識した上でご覧頂けると嬉しいです。

説明とは不足なところか、分かりづらいところはあるかもですが、フードバック頂けると補足とか訂正いたしますので、宜しくお願いします。

## ※注釈
> ※1
> ▶ resourceタイプの解除を忘れること
> 
> 実のところ最近は、resourceタイプに対しても使われなくなったら自動解除してくれたりします。しかし例外な場合も無いわけでは無いのでresourceタイプ(または違う言語での類似タイプ)の解除は意識しておくと良いです。!