
The drug review dataset contains patient reviews on specific drugs, along with related conditions and ratings reflecting overall patient satisfaction. In this project, we will explore the dataset to answer various research questions and gain insights into the drugs and conditions represented in the data.

Research Questions

  1. What is the most popular drug?
  2. What are the groups/classifications of drugs used?
  3. Which drug has the best review?
  4. How many drugs do we have?
  5. The number of drugs per condition?
  6. Number of patients that searched on a particular drug?
  7. How genuine the review is?
  8. How many reviews are positive, negative or neutral?
  9. Correlation between rating and review and users who found the review useful?
  10. Distribution of rating?
  11. Amount of review made per year and per month?


This dataset contains patient reviews on specific drugs along with related conditions and a 10-star rating reflecting overall patient satisfaction. The dataset includes features such as the drug name, condition name, patient review, rating from a range of 1 to 10 stars, date of review entry, and number of users who found the review useful. The dataset was downloaded from the following link: https://www.kaggle.com/jessicali9530/kuc-hackathon-winter-2018


  1. What is the most popular drug?
    #Loading the dataset
    import pandas as pd
    drug=pd.read_csv(r"C:\Users\HENRY\Downloads\drugsCom_raw\drugsComTrain_raw.tsv",sep ='\t')
    #Preview the data
    #Top 15 most popular drugs

    Output: Levonorgestrel = 3657

  3. What are the groups/classifications of drugs used?
    		drug_suffix = {
    					"alol":"Beta blocker or alpha blocker",
    				  ,"bactam":"Beta-lactamase inhibitors"
    				,"caine":"Local anesthetics"
    				,"flurane":"Inhalation Anesthetic"
    				,"lukast":"Leukotriene receptor antagonists"
    				,"olol":"Beta blockers"
    				,"pril":"ACE Inhibitors"
    				,"sartan":"Angiotensin II receptor blockers"
    				,"thiazide":"Thiazide Diuretics"
    				,'vastatin':"HMG-CoA inhibitors"
    				#This function classifies drugs based on the name you have inputed.
    			def classify_D(drugname):
      				for i in drug_suffix.keys():
        			if drugname.endswith(i):
          			print("The drug class is: ",drug_suffix[i])
    				#Classifying the drug class example

    The drug class is: Beta blockers

  5. Which drug has the best review?
    Unnamed: 0 drugName condition review rating date usefulCount drug_class sentiment value sentiment label sentiment_label
    1 95260 Guanfacine ADHD "My son is halfway through his fourth week of ... 8.0 April 27, 2010 192 None 0.168333 positive positive
    3 138000 Ortho Evra Birth Control "This is my first time using any form of birth... 8.0 November 3, 2015 10 None 0.179545 positive positive
    4 35696 Buprenorphine / naloxone Opiate Dependence "Suboxone has completely turned my life around... 9.0 November 27, 2016 37 None 0.194444 positive positive
    7 102654 Aripiprazole Bipolar Disorde "Abilify changed my life. There is hope. I was... 10.0 March 14, 2015 32 None 0.074107 positive positive
    9 48928 Ethinyl estradiol / levonorgestrel Birth Control "I had been on the pill for many years. When m... 8.0 December 8, 2016 1 None 0.079167 positive positive

    The drug with the best review:   Aripiprazole

  7. How many drugs do we have per class? Top 15
    plt.title("Distribution of Drugs by class")
    1. Antifungals: This drug class is used to treat fungal infections, and it is not surprising that it is among the largest drug classes. Fungal infections can be difficult to treat and can affect various parts of the body, including the skin, nails, and lungs.
    2. Antianxiety: Anxiety disorders are among the most common mental health disorders, affecting millions of people worldwide. Antianxiety drugs are used to manage the symptoms of anxiety disorders, and they are also among the largest drug classes.
    3. Antivirals: Viral infections can be difficult to treat, and antiviral drugs are essential in managing these infections. With the emergence of new viruses and viral strains, the demand for antiviral drugs is increasing.
  9. The number of drugs per condition?

    Output:   There are 885 drugs per condition

  11. What is the distribution of drugs per condition?
    #Drugs per condition Top 15
    plt.title("Distribution of Drugs per Condition")

    Drugs per Condition

    1. Pain: Pain is the condition with the highest number of drugs in the dataset, with a total of 200 drugs. This is not surprising as pain is a common condition that affects people of all ages and backgrounds. Pain medications come in different forms such as pills, patches, gels, and injections, and are used to manage acute or chronic pain.
    2. Birth Control: Birth control is the second condition with the highest number of drugs, with a total of 173 drugs. Birth control medications come in different forms such as pills, patches, rings, and injections, and are used to prevent pregnancy. They work by inhibiting ovulation, thickening cervical mucus, or preventing fertilization.
    3. High Blood Pressure: High blood pressure is the third condition with the highest number of drugs, with a total of 147 drugs. High blood pressure, also known as hypertension, is a common condition that affects millions of people worldwide. It is a major risk factor for heart disease, stroke, and kidney failure. Medications for high blood pressure come in different forms such as pills, patches, and injections, and work by relaxing blood vessels or reducing the amount of fluid in the body.
    4. Acne: Acne is the fourth condition with the highest number of drugs, with a total of 120 drugs. Acne is a common skin condition that affects people of all ages, but is most common in teenagers. Acne medications come in different forms such as creams, gels, and pills, and work by reducing inflammation, killing bacteria, or regulating oil production.
  13. How genuine the review is?

    This function allows for it to be possible to determine the emotion behind text.

    from textblob import TextBlob pip install -U textblob #this function gets the numerical value of the blob def get_sentiment(text): blob = TextBlob(text) return blob.polarity #this function automaticlly determines numerical value(positive or negative) def get_sentiment_label(text): blob = TextBlob(text) if blob.polarity > 0: result = 'positive' elif blob.polarity <0 result = 'negative' else: result = 'neutral' return result #Creating a column containing sentiment drug['sentiment value'] = drug['review'].apply(get_sentiment) drug['sentiment_label'] = drug['review'].apply(get_sentiment_label) drug[['review','sentiment value','sentiment_label']]
    review sentiment value sentiment_label
    0 "It has no side effect, I take it in combinati... 0.000000 neutral
    1 "My son is halfway through his fourth week of ... 0.168333 positive
    2 "I used to take another oral contraceptive, wh... 0.067210 positive
    3 "This is my first time using any form of birth... 0.179545 positive
    4 "Suboxone has completely turned my life around... 0.194444 positive
    ... ... ... ...
  15. How many reviews are positive, negative or neutral?
  16. #The number of negative, positive and neutral reviews
    drug['sentiment_label'].value_counts().plot(kind ='pie')
  17. Correlation between rating and review and users who found the review useful?
    #Correlation Between our sentiment and rating
    sns.lineplot(data = drug, x= 'rating',y='sentiment value')


    As shown in the image above, there appears to be a positive correlation between the sentimental value of reviews and the associated rating. This suggests that reviews with a more positive sentiment tend to be associated with higher ratings.

  19. What is the distribution of rating?
    			#Correlation between rating and sentiment
    sns.lineplot(data = drug, x= 'rating',y='sentiment value',hue ='sentiment_label')


    There is a positive correlation between the usefulCount and rating of drugs, as shown in the image above

    The image illustrates that as the number of usefulCount increases, the rating also tends to increase. This suggests that drugs with a higher number of usefulCount tend to have higher ratings. However, it is important to note that correlation does not necessarily imply causation, and there may be other factors that influence the rating of drugs.

  21. Amount of reviews made per year and per month?
    #Average Rating Per Day of Every Year
    plt.title("Average Rating Per Day of Every Year")