Analyzing Road Accidents in India

Yogesh Chandrasekharuni
5 min readMay 16, 2020

--

India ranks 1st in the number of road accident related deaths across 199 countries reported in World Road Statistics.

It’s no secret that traffic in India is bad. But, why? In this article, we will do some data analysis to figure that out.

Contents:

  1. General Analysis
  2. Checking for predictors
  3. Checking for correlation
  4. Conclusion
  5. Scope for further research

Note: All the code and datasets are available on my GitHub repository

General Analysis

Let us look at our first data-set which documents road accidents in Indian states from 1970–2017.

Years

Let us see how the number of accidents change over time.

  • We immediately notice a gradual increase in number of accidents through the years. However, it is nice to see that from 2015 to 2017 we see a rather sharp drop.

Number of vehicles

Now, let’s see if the increase in the number of motor vehicles effects the number of accidents.

  • We notice a very sharp increase in the number of accidents till 40,000 motor vehicles. From 40,000 to 1.25 lakh vehicles, we notice a flatter but very significant rise. From 1.25 lakh vehicles to 2.25 lakh vehicles, we see a steady number of accidents.

Fatality

It is also important to note the fatality rate of accidents.

  • We notice that 19.2% of people who met with an accident lost their lives. 80.8% had been injured. Let us see if the use of safety equipment like a helmet, or a car’s seat belt prevents the loss of life.
  • It should be fairly obvious that more number of people who were not using safety equipment lose their lives than people who did. This is an important feature in predicting if a given person would make it out alive if met with an accident.

Time

Let us now check if the time of day contributes to the total number of accidents.

  • We see that most accidents take place between 9 AM and 9 PM, meaning that most occur during the day.

Different states have different conditions, laws, and other factors. Let us see how the states differ in the number of accidents.

  • It is noticed that Tamil Nadu has the highest number of road accidents with a subtle rise over the years. However, during the year 2005, we notice a very high spike in the number of cases.

Checking for predictors

Road condition

The condition of the road is perhaps the first thing that comes to mind when we think about predicting the risk of an accident.

  • And surely enough we see that most accidents take place in open roads as compared to residential, institutional or others. The condition of the road seems to be a promising predictor.

Weather condition

Let’s see how the condition of the weather effects accidents.

  • It is seen that most accidents take place when it is Sunny and clear. However, it is important to keep in mind that there is lesser traffic during bad weather. More analysis is needed to determine the ratio of traffic in a weather condition to the number of accidents.

Driver’s License

A driver’s license implies higher experience and therefore, more safety on road. Let us see if this is the case.

  • It is to be noted that although most accidents are by Valid permanent license holders, they are a vast majority of drivers. More analysis is needed to check if drivers with a Learner’s license or without a license are in fact more likely to meet with an accident.

Checking for correlation

It is essential to look for any correlation between deaths due to accidents and other factors, such as the literacy of a nation or its GDP.

For this analysis, we will be using Pearson’s correlation coefficient using pandas.corr()

Coefficient matrix

GDP

We quickly see that there is a strong correlation of -0.63 between the death rate and GDP of a country. Let’s plot it to investigate it a little more.

We quickly see an inverse relationship between GDP and death rate just like what we gathered from the coefficient matrix. This suggests that richer countries see fewer deaths per 100,000 population.

Literacy

From the coefficient matrix above, we see a correlation coefficient of -0.59 between death rate and literacy. Let’s see what the graph looks like.

The graph suggests that there are far fewer deaths in countries with a higher literacy rate than in countries without.

Area and population

The area and the population of a country do not contribute much to the number of accidents. Hence, we can ignore them.

Conclusion

We have seen few of the most important predictors to evaluate the risk of an accident. We conclude that the number of accidents have been increasing over the years, and also increase with the increase in the number of vehicles. Accidents also tend to take place during a certain time interval. We have also looked at some other potential predictors like conditions of the road and weather.

We also conclude that literacy and GDP of a country are strongly correlated with the number of accidents in that country and are very important predictors.

Scope for further research

Gathering addition data like data regarding the ratio of traffic density to the number of accidents can be important to evaluating the chance of an accident in certain conditions. Furthermore, data regarding the mindset of the driver and his sobriety can also prove to be useful.

Note: The data has been obtained from many sources and is available on my GitHub. However, the data is not ascertained to be completely accurate. Repository link: https://github.com/yogeshchandrasekharuni/road_accidents

For any corrections, queries or suggestions please email me at yogeshchandrasekharuni@gmail.com

--

--