Python-coding problem Hackerrank.com

URL: https://www.hackerrank.com/challenges/insert-a-node-at-the-head-of-a-linked-list/problem?isFullScreen=true&utm_campaign=challenge-recommendation&utm_medium=email&utm_source=7-day-campaign

SinglyLinkedListNode -LinkedList

Screen recording:https://www.youtube.com/watch?v=OxY7pf86J9Y

Before going to solve the problem just brief the theory

Comparing array versus list vs. LinkedList:

ListarrayLinkedList
Insert at head (first)O(n)Inserting or deleting an element at an arbitrary position (assuming you know the position): O(n)O(1)
Insert last at last element(Tail)Appending a piece to the end: Amortized O(1)Appending an element to the end: O(1) (assuming the array has available space)Inserting or deleting an element at an arbitrary position (assuming you know the position): O(n) – if no tail is tracked
O(1)-If the tail is tracked
Search

Accessing an element by index: O(1)

Searching for an element: O(n) – linear search, unless the list is sorted and you use a binary search algorithm, which would be O(log n).
Accessing an element by index: O(1)

Accessing an element by index: O(n) – you have to traverse the list from the beginning to the desired position.

Searching for an element: O(n) – linear search, unless the linked list is sorted and you use a binary search tree, which can reduce the complexity to O(log n).
Delete arbitrary position
Inserting or deleting an element at an arbitrary position (assuming you know the position): O(n) – because elements may need to be shifted.
O(n), an arbitrary element Deleting an element at an arbitrary position O(n)
Deleting headO(n)O(n)Inserting or deleting an element at the beginning (head): O(1) – These operations involve updating pointers at the head, so they are constant time
Deleting tailO(1)O(1)Inserting or deleting an element at the end (tail): O(1) – If you maintain a reference to the tail node, these operations are constant time.

Note:

If we need to perform frequent insertions at the head of a collection with better performance, consider using a data structure like a deque (double-ended queue) from the collections module. Deques are well-optimized for fast insertions and removals at both ends and can be more efficient for this purpose.

List:

Insert at head:

list.insert(0, value)

insert at tail:

list.append(value)

Search by in:
my_list = [10, 20, 3, 4, 5]
element_to_search = 3
if element_to_search in my_list:
print (f"{elemnt_to_search} is avaible")

else:
print(f"{element_to_search} is not available")

Search by index():
try:
found_index = list.index(elemnt_to_search):
print(f"{element_to_search} is available at index {found_index}")
Except ValueError:
print("Not avaible')

Delete:

If we know the value: remove()

my_list.remove(100)

if we know the index of a value:pop(index_of_the_value)
my_list(index_of_value)

Last element deletion :my_list.pop()

Array:

# Create an array of integers my_array = array(‘i’, [100, 200, 3, 4, 5])

Insert:

typical array
new_elemnt =10
index_to_insert = 0
my_array.insert(index_to_insert,new_element)

Using Numpy array

import numpy as np

index_to_new_element = 5

new_element_to_insert =10

inital_array = np.array([34,6,7,8,9,99])

new_array = np.insert(inital_array, index_to_new_element , new_element_to_insert, )

LinkedList:


class LinkedList:

    def __init__(self):
        self.head = Node('NULL')

    def display_elements(self):
        if self.head is None:
            print("List is empty")
            return

        current = self.head
        list_elements = ""

        while current:
            list_elements += str(current.data)

            if current.next:
                list_elements += "->"
            current = current.next

        print(list_elements)

    def insert_at_head(self, data):
        new_node = Node(data)
        current_head = self.head
        self.head = new_node
        new_node.next = current_head
        return True

    def insert_at_tail(self, data):
        new_node = Node(data)

        if self.head is None:
            self.head = new_node
            return

        current = self.head
        while current.next:
            current = current.next

        current.next = new_node





link_list = LinkedList()

link_list.insert_at_head(10)
link_list.insert_at_head(20)
link_list.display_elements()

Handling missing data with “Titanic dataset” for beginners

Handling missing values is not an easy task. As we think that the typical cases are either NAN in Dataframe or Null in the database. What do you think about some of the following characters in a dataset?

Encoded with ‘Nill’, ‘-‘, ‘Empty’, and ‘ null’ in a column.

Note: In Data science, the above-mentioned values are not only missing data but it is needed to think about some rows (or records) that are missing. It depends on the dataset and the use case that we work on. But it is recommended to have some visualizations daily or weekly or monthly that may help to identify the gaps easily.

Then the question is how to come for a conclusion on the pattern of missing values?

Get the length of each value in every text column and then identify the pattern and most repeated lengths.

The following scopes are focused to discuss :

Impact of the missing values.

How to analyze/visualize the missing values.

How to fix the missing values.

Example of the code using Panda and Pyspark.

Please download the data for this example from -https://www.kaggle.com/competitions/titanic/data?select=train.csv

How it impacts in analyzing:

Handling missing data by analyzing it in the right way is important to visualize the insight correctly to have better decision-making.

How to analyze NAN , missing value typically.

Firstly, a sample of missing data in the Age and the Cabin columns.

df. info() will provide the count of missing

visualizing the missing values

Step one: Please download Titanic data for this example from -https://www.kaggle.com/competitions/titanic/data?select=train.csv

Step Two: Let us work with Jupyter Notebook

Step two: import necessary libraries and read the CSV using Panda

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
%matplotlib inline

explore_titanic_train =pd.read_csv(‘titanic_train.csv’)

Step three: visualizing using “distplot” and heatmap

plt. figure(figsize=(10,6))
sn. displot(
data=explore_titanic_train.isna().melt(value_name=”missing”),
y=”variable”,
hue=”missing”,
multiple=”fill”,
aspect=1.25
)

Using heat map visualization

sn.heatmap(explore_titanic_train.isnull(), yticklabels=True ,cbar=False, cmap=’viridis’)

So it is crystal clear that Age and Cabin have a notable number of missing values.

How to fix the missing values and some examples of the code using Panda and Pyspark will be updated in another post soon.

PySpark setup on Windows Docker configuration

As I am used to working on Ubuntu these are interesting steps to share.

1. First, install the docker Desktop, https://docs.docker.com/desktop/install/windows-install/

2. make sure to install WSL (I did Ubunut’s latest LTS)

3. Open the docker desktop and pull jupyter/pyspark-notebook

4. Run the PowerShell with admin privileges and then docker run -p 8888:8888 jupyter/pyspark-notebook

Playing with Docker and WSL using Notebook:

Open your WSL command prompt and then try following.

  1. How to mount to Windows default file :

azeem@DESKTOP-VGSDP7F:~$ cp /mnt/c/Users/User/Downloads/Py_DS_ML_Bootcamp-master/Refactored_Py_DS_ML_Bootcamp-master/04-Pandas-Exercises/Salaries.csv .

2. How to login into the container shell

docker exec -it <intelligent_benz> bash #make sure ur image name

3. How to copy a local Windows file into a docker image container

docker cp /mnt/c/Users/User/Downloads/Py_DS_ML_Bootcamp-master/Refactored_Py_DS_ML_Bootcamp-master/04-Pandas-Exercises/Salaries.csv intelligent_benz:tmp/
Successfully copied 16.1MB to intelligent_benz:tmp/

4. Docker file to local host

sudo docker cp container-id:/path/filename.txt ~/Desktop/xyz.txt