Positioning and analysis of ANR problem

2020/03/13 11:09
Number of readings 245

click
"Sogou test"
Follow us!

preface


        ANR problem is believed to be a problem that all children will encounter in daily application testing. This chapter summarizes the types, causes and scenarios of ANR, as well as ANR positioning and analysis ideas!

[I] ANR definition and classification


ANR definition

The full name of ANR is Application Not Responding, which means that the program does not respond. If the application is blocked in the UI thread for too long, ANR will occur. When ANR appears, a prompt box will pop up.

ANR type

(1). Broadcast ANR

(2).     ServiceANR

(3).     ContentProviderANR

(4).     InputANR

(5). System WatchDog

ANR timeout threshold

The timeout thresholds of different components are different. The timeout thresholds of Service, Broadcast, ContentProvider, and Input are as follows

(1)BroadcastTimeout

Foreground Broadcast: An ANR occurs when onReceiver fails to complete processing within 10S.

Background Broadcast: ANR occurs when onReceiver fails to complete processing within 60s.

(2)ServiceTimeout
The foreground service: onCreate, onStart, onBind and other life cycles do not complete processing within 20s, and the ANR occurs.
Background services: onCreate, onStart, onBind and other life cycles have not been processed in 200s, and ANR has occurred
        (3)ContentProviderTimeout
ANR occurs when the ContentProvider does not complete processing within 10S.
(4)KeyDispatchTimeout

The input event is not processed (such as pressing a key or touching) within 5S, and an ANR occurs.

[II] Causes and scenarios of ANR


one Causes of ANR

a.        Time consuming operation

b.        Self service blocking

c.         System blocking

d.       Memory tight

e.         CPU resource preemption

2. Typical Scenarios

a.         The main thread frequently performs time-consuming IO operations, such as database read and write

b. The main thread is blocked due to the deadlock of multi-threaded operation;

c. The main thread is blocked by the Binder peer;

d.   ANR appears on the WatchDog in SystemServer;

e.         servicebinder The connection of has reached the upper limit and cannot be combined System Server signal communication

f.         System resource exhausted (pipeline, CPU, IO)

[III] ANR positioning and analysis


1.  ANR analysis idea - traces

Usually, when an ANR occurs, first look up the corresponding Trace (trace information of each thread call stack of an important process) log to see whether the main thread is processing the broadcast or blocked.

Trace path:/data/anr/traces.txt

Trace export: adb pull/data/anr/traces.txt

The latest ANR information is at the beginning. We can find the specific number of rows in question from the stacktrace.

Use ctrl+F to find the package name in the file to quickly locate related codes. Special attention: Generate a new ANR, and the original traces.txt file will be overwritten.

2.  Other ANR analysis ideas and related logs

If the stack is found to be completely idle, it needs to be analyzed in combination with log logs, including logcat, kernel logs, cpuinfo, and meminfo. The reference order is from front to back.

Analyze the idea of logcat

First, search the log for keywords such as "anrin", "low_memory", and "slow_operation". Through these keywords, you can mainly view the system CPU load. If you find that the application process CPU is obviously too high, it is likely that the process preempts too many CPUs, and the system is not scheduled in time, mistakenly thinking that the application has timed out.

Analyze kernel ideas

Search lowmemorykiller directly in such logs. If it exists, check whether the occurrence time and the ANR time roughly correspond. If there is little difference, you can see the current memory situation on the operating system level from this log. Free Memory refers to free physical memory, File Free refers to file cache, that is, the application or system reads files from the hard disk. After use, The kernel does not release this kind of memory and cache it for the purpose of speeding up the next read/write process. Of course, if the overall value of Free and Other is low, the kernel will exchange memory to some extent, causing the whole system to jam. At the same time, this kind of phenomenon will also be reflected in the log "slow_operation", that is, the scheduling of system processes will also be affected.

Analyze the idea of cpuinfo

This type of log can clearly see which process has a high CPU. If there is a process with a high CPU, the ANR has something to do with this process preempting the CPU. Of course, if Kswapd and emmc processes are found in top, it means that the system memory pressure or file IO overhead is encountered.

Analyze meminfo ideas

The analysis of this type of log is mainly to see which applications or systems occupy high memory. If the application memory occupation is relatively normal and the system does not have excessive memory usage, then it means that a large number of processes are cached in the system and the overall memory of the system is low because they are not released in time.

3. Important fields of traces.txt

Main: main identifies the main thread. If it is a thread, it is named in the form of "Thread-X". x represents the thread id and increases gradually.

Prio: thread priority, default is 5

Tid: tid is not the thread ID, but the thread unique ID

Group: is the thread group name

SCount: The number of times the thread is suspended

DsCount: is the number of times a thread has been suspended by the debugger

Obj: object address

Self: the address of the thread's native

SysTid: is the thread number (the thread number of the main thread is the same as the process number)

Nice: is the scheduling priority of the thread

Sched: indicates the scheduling policy and priority of threads

Cgrp: Scheduling Home Group

Handle: The address of the thread processing function.

State: is the scheduling status

Schedstat: read from/proc/[pid]/task/[tid]/schedstat. The three values represent the time the thread executes on the CPU, the thread's waiting time, and the length of the thread's time slice. The three values that do not support this information are all 0;

Utm: the time value used in thread user mode (unit: jiffies)

Stm: is the scheduling time value in kernel mode

Core: the serial number of the cpu core that last executed the thread.

Find the java stack information, locate the code location, and locate the problem.

[IV] ANR analysis case


Case 1: Input ANR

Case 2: The lock on the system method is not released

WindowManagerGlobal.dumpGfxInfo


Blocked must have held objects. This sometimes happens in the binder, so you need to analyze the log related to the binder

Analysis case 3: memory problem

Case IV GC problem

Observe the Trace main thread stack. It is found that the main thread is blocked in the process of applying for memory, waiting for the GC to end,

Look at other thread statuses and find out that the following tasks are executing GC

Tip=8 threads execute GC, which causes the main thread's memory application to be blocked, Improper use of application process memory causes GC Hour Interprocess ANR。


[Reference

https://juejin.im/post/5be698d4e51d452acb74ea4c

https://www.jianshu.com/p/862ce91c1abf

https://droidyue.com/blog/2015/07/18/anr-in-android/

https://www.jianshu.com/p/388166988cef


Welcome to add our Sogou test micro signal , talk about testing with us


This article is shared from the WeChat official account Sogou QA.
In case of infringement, please contact support@oschina.cn Delete.
Participation in this article“ OSC Source Innovation Plan ”, welcome you to join us and share with us.

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
zero Collection
zero fabulous
 Back to top
Top