Sign to Speech Conversion for Dumb People Pro ject Proposal Submitted by
Sign to Speech Conversion for Dumb
People Pro ject Proposal Submitted by:
aAbdul Ahad Razzaq 2015-EE-014
aAhmad Malik 2015-EE-031
aZirsha Riaz 2015-EE-152
aAmna Fayyaz 2015-EE-153
Supervised by: Dr. kashif Javed
Department of Electrical Engineering
University of Engineering and Technology Lahore
We declare that the work contained in this thesis/report is our own, except where
explicitly stated otherwise. In addition this work has not been submitted to obtain
another degree or professional qualication.
Pro ject Supervisor: Signed:
Final Year Pro ject Coordinator: Signed:
List of Figuresiii
1 Problem Statement1
2 Literature Survey2
3 Proposed Methodology5 3.1 Micro-controller:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
3.2 Flex Sensor:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
3.3 Pro ject Idea:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
3.4 MPU-6050:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
3.5 Noise Removal:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3.6 Moving average lter:. . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3.7 Features:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3.8 Preprocessing:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3.9 Feature Extraction:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
3.10 Sign Classier:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
3.11 Feature Extraction using Python:. . . . . . . . . . . . . . . . . . . . . . .9
3.12 Multilayer Perceptron:. . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
3.13 Support Vector Machines:. . . . . . . . . . . . . . . . . . . . . . . . . . .10
3.14 K-Nearest Neighbor:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
3.15 Dynamic Time Warping:. . . . . . . . . . . . . . . . . . . . . . . . . . . .11
4 Block Diagram12
5 Flow Chart13
6 Time Table14
List of Figures
3.1 Raspberry Pi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
3.2 Flex Sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
4.1 Block Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
5.1 Flow Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
6.1 Time Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
PSL PakistanSign Language
SVM S upportVector Machine
ANN A rticalNeural Network
Progress in social welfare of a society can be gauged from how well physically and
mentally disabled individuals are being put to productive relations. According to the
1998 census , there were an estimated 3.28 million people with disabilities in Pakistan.
The most recent data as of 2014, including 1.37 million women. This would make them
2.49% of the population in that year 12. Disability also includes dumb people who
nd it dicult to communicate with others. All over the world there are many deaf and
dumb people. They are all facing the problem of communication. Sign language is a
good solution for communication but it only restricts among disable people i.e. hearing
impaired and dumb people. A person without any disability does not know the sign
language properly. Our pro ject tackles this particular problem. It provides the solution
in form of a wearable device that can recognize sign language (PSL) gestures made by
these dumb amputees to produce speech signal related to it. In this way they can convey
their problems and advices to others without any diculty.
Sign language is a vision based language of hearing impaired people, which involves
the use of hands, the face and body. Sign language recognition system works on ve
essential parameters; hand shapes, hands movement, hand and head orientation, hand
and head location and facial expressions. Out of these ve parameters the foremost
fundamental requirement is hand shapes. The second most important parameters are
hand movements, hand and head orientation and their location. The wearable device
which is actually a speaker attached glove will be able to sense sign language based ges-
tures made by dumb amputee. Gestures will be recognized by the ML based algorithms
implied through Raspberry Pi and through text to speech conversion, corresponding
speech signal will be played on speaker.
Number of articles and research papers are studied to get familiarized with existing
study on sign to speech conversion using data gloves. Each individual utilize language
to communicate with others. Sign language is basically used by hearing-impaired peo-
ple to communicate with each other, developed by deaf communities. Communication
through signing is a very organized nonverbal language using both non-manual and
In 2015 at IEEE International Symposium on Robotics and Intelligent Sensors 7, A
New Data Glove Approach for Malaysian Sign Language Detection was presented by
some Malaysian students.In this research, the ob jectives are to develop a sign language
translation system in order to assist the hearing or speech impaired people to commu-
nicate with normal people, and also to test the accuracy of the system in interpreting
the sign language. As a rst step, the best method in gesture recognition was chosen
after reviewing previous researches. The conguration of the data glove includes 10
tilt sensors to capture the nger
exion, an accelerometer for recognizing the motion of
the hand, a microcontroller and Bluetooth module to send the interpreted information
to a mobile phone. Firstly the performance of the tilt sensor was tested. Then after
assembling all connections, the accuracy of the data glove in translating some selected
alphabets, numbers and words from Malaysian Sign Language is performed. For the
accuracy of 4 individuals who tested this device, total average accuracy for translating
alphabets is 95% , numbers is 93.33% and gestures is 78.33%. The average accuracy of
data glove for translating all type of gestures is 89% .
In another approach in June 2014 students of Bhivrabai Sawant Institute of Technology
and Research (W) University of Pune, Pune, Maharashtra, India published a research
paper on Innovative Approach for Gesture to Voice Conversion 15. This paper presents
hand gesture based interface for facilitating communication among speech- and hearing-
impaired disabilities. In this system a data glove is used as input device which is normal
cloth driving gloves tted with ve
ex sensors along the length of each nger and the
thumb. In this pro ject data glove is implemented to capture the hand gestures of a user.
Literature Survey 3The data glove is tted with
ex sensors along the length of each nger and the thumb.
ex sensors output a stream of data that varies with degree of bend.
Another approach used a smart sign language interpretation system using a wearable
hand device is proposed to meet this purpose 11. This wearable system utilizes ve
ex-sensors, two pressure sensors, and a three-axis inertial motion sensor to distinguish
the characters in the American sign language alphabet. In this system a smart sign
language interpretation system using a wearable hand device is proposed to meet this
purpose. This wearable system utilizes ve
ex-sensors, two pressure sensors, and a
three-axis inertial motion sensor to distinguish the characters in the American sign
language alphabet. The gestures are recognized using a support vector machine (SVM)
model implemented in the wearable device.
Another approach used a system relies on a home-made sensory glove, used to measure
the hand gestures, and on Wavelet Analysis (WA) and a Support Vector Machine (SVM)
to classify the hands movements 8. The proposed system is light, not intrusive or
obtrusive, to be easily utilized by deaf people in everyday life, and it has demonstrated
valid results in terms of signs/words conversion.
Another approach used an Articial neural networks are used to recognize the sensor
values coming from the sensor glove 14. These values are then categorized in 24 al-
phabets of English language and two punctuation symbols introduced by the author.
So, mute people can write complete sentences using this application. They have used
7-sensor glove of 5DT company. It has 7 sensors on it. 5 sensors are for each nger and
Another approach used a MATLAB bases system where the the data received from the
ADC 9. The data is then arranged in the form of a packet with a marker character
and sent to the bluetooth module for transmitting. A text to speech converter is used
to convert it to voice.
Another approach used a glove which embedded with
ex sensors and an Inertial Mea-
surement Unit (IMU) to recognize the gesture 6. A novel method of State Estimation
has been developed to track the motion of hand in three dimensional spaces. The pro-
totype was tested for its feasibility in converting Indian Sign Language to voice output.
Raspberry Pi is used as the processing unit.
Another approach used to develop a sign to Arabic language translator based on smart
glove interfaced wirelessly with microcontroller and text/voice presenting devices 10.
Arduino Software (IDE) was used to program the system.
Another approach developed by students of Electrical and Electronic Engineering, East
West University, Bangladesh, Dhaka 5. In this pro ject they have used dierent types
of hardware and software. Arduino Mega 2560, Flex sensor,Gy-521 sensor,HC-06 was
the main hardware. Arduino IDE, MATLAB, Simulink, PLX-DAQ and Android Studio
were mainly use in this pro ject as softwares.
Literature Survey 4Another approach is used by to make gesture-based input for smartphones and smart-
watches accurate and feasible to use 16. With a custom Android application to record
accelerometer data for 5 gestures, we developed a highly accurate SVM classier using
only 1 training example per class
The Microcontroller which we are using in our pro ject is the Raspberry Pi Model 3 B.
The Raspberry Pi is a series of small single-board computers. It has a 64 bit quad cores
processor, and has on-board WIFI, Bluetooth and USB boot capabilities. Although, one
shortcoming in Raspberry Pi is that it doesnt contain inbuilt ADC. Figure 3.1:
ex sensor is used to detect the
exion of nger. Flex sensor which is also known
as bend sensor is a sensor that changes its resistance according to amount of bend
on the sensor. It is a passive resistance device fabricated by laying strips of carbon
resistive elements within a thin
exible substrate. The
ex sensor can be read as a
potentiometer with a resistance output or across a voltage divider to get voltage output
that is proportional to the bend applied. Its resistance varies between 25K ohm to 120K
Proposed Methodology 6Figure 3.2:
The output voltage of potential divider will be, V out= V in*R2/(R1+R2), where R1
is a xed resistance of about 10K ohm and R2 is a variable resistance depends upon
ex sensor. The outputs from the
ex sensors are inputted into non-
inverting style op-amps to amplify their voltage. The greater the degree of bending the
lower the output voltage. To process the incoming signals, each analog signal from the
ex sensors must be converted to a digital signal. This means to process all ve of the
ex sensors ve channels of ADC are required. The addition of ADCs can be used to add
more analog pins to the overall processing system. One option, the MCP3008 allows for
up to 8-channels of conversion with 10 bits of resolution and a sampling rate of 200ksps.
Finger gestures are exploited through the
ex sensors placed on the top of the
ex sensors used in this study are either 4.5 inches or 2.2 inches in size. The
ex sensor is suitable for the pinky nger, whereas the longer length
sensor is used for the other four ngers.
3.3Pro ject Idea:
The idea proposed is a smart glove which can convert sign language to speech output.
The glove is embedded with ve
ux sensors and Inertial Measurement Unit (IMU)
which included 3-axes accelerometer and gyroscope to track the motion of hand in 3-D
Gestures are normally classied into linear and angular motions. Such a device which
can measured these parameters is namely the MPU-6050. MPU 6050 is integrated 6-axis
motion tracking device. This combined with 3-axis gyroscope, 3-axis accelerometer,3-
axis compass and with a Digital Motion processor(DMP).
Proposed Methodology 73.5Noise Removal:
To track the motion of hand in 3-D space an Inertial Measurement Unit is used. At-
taching an IMU on the human limb allows us to track its movement in any random
direction. The position can be found by double integrating the acceleration values ob-
tained from the accelerometer. However, they add noise and induce drift errors. Due
to the drift, positional values diverge within the small intervals of time. Hence, it must
be compensated and accuracy of positional values must be improved. To minimize the
error induced due to integration and to complement a suitable sensor fusion algorithm
can be used. The fusion of sensor and gyroscope data can be achieved by using lters.
There are dierent lters that can be such as Kalman lter and Complementary lter.
However, the implantation of the Kalman is complex and hence complementary lters
can be used as it would give agreeing values and is simple to implement. To eliminate
the eect of external noise in accelerometer readings, the values are passed through a
low pass lter. To eliminate the drift error in integration of gyroscope readings, the
integrated values are passed through a high pass lter. A stable output can be obtained
by combining the output of both low pass and high pass lters.
3.6Moving average lter:
This lter takes out the average of samples. It used in addition to complementary lter
13. It helps to strengthen the low pass ltered data by smoothing out the data and
remove abrupt variations.
We would use the Scikit-Learn software which is a machine learning library in the python
language. This software library has a module for feature selection using mutual infor-
mation. Basic features are:
3. Standard deviation
exion values varied among the dierent sub jects owing to the dierent hand sizes.
A smaller hand size has a lower discrepancy in terms of the
ex sensor values with
respect to signs. In other words, the resistance of the
ex sensors for a smaller hand has
less variation as compared to the resistance with a larger hand. Thus, the dierences
among the signs are not observable from the raw
ex sensor values. To solve this issue,
Proposed Methodology 8the sensor values were normalized based on the computed mean and standard deviation
ex sensor for each sub ject into a range of 0, 1 for easier analysis. where fsi, fs, and fs are the i-th sensor reading, mean and SD of
ex sensor value
respectively. The normalized values are computed individually for each sensor and for
each test sub ject. In other words, due to the dierent hand size, the grasping of ngers
into st shape generate dierent maximum
ex sensors values. Thus, the mean and SDs
ex sensors are computed separately for each sub ject based on the sub jects sensors
readings. This method eliminated the necessity for the new user to perform all the sign
gestures before the new user started to use the proposed system.
Meanwhile, each hand movement was derived from the IMU in three orientations: pitch,
roll, and yaw. The calculation of each orientation is similar depending on the axis used.
The angle agli of each axis is computed using a complementary lter method at the i-th
agli = 0.98 (agli-1 + gyri/gyr H Z) + aglci 0.02,
aglci = arctan (Au, Av) 180/pi,
where gyri, gyr H Z, and aglci are the raw gyroscope sensor reading, gyroscope sensor
sampling rate, and the angular acceleration speed at the i-th time, respectively. In
addition, Au, and Av are the coordinates of the exclusive raw linear accelerometer
readings that do not correspond to the angles being computed, e.g., to compute the pitch
angle, Au and Av are denoted as the y-axis and z-axis of the raw linear accelerometer
To simplify and optimize the coding implementation, a vector of the
exion degree in
tabular format was considered. The
exion degree is split into three regions in each
vector. The rst region is denoted as no bend or slight bend, which is associated with a
exion value within the range of 0.0,0.3). The second region is considered
as a partial bend with the associated normalized
exion value within the range of 0.3,
0.7), and the last region is a complete bend with associated normalized
within the range of 0.7, 1.0.
To determine the hand motion, standard deviation (SD) of the angular reading from the
IMU sensors is computed for each axis. The SD is a measure that is used to quantify
the amount of variation or dispersion of the motion readings.
Proposed Methodology 93.10Sign Classier:
The signs are classied into 28 classes using a support vector machine (SVM). SVM is
a binary supervised learning classier, that is, the class labels can only take the values
of +1 and -1. The training procedure used a quadratic optimization algorithm to derive
structural axes to separate the training dataset into n numbers of a hyperplane. Assume
the i-th training sample using
(xi, yi) , yi belongs to -1, +1, i = 1, 2, 3, . . ., n
where xi is the feature vector and yi is the training label in accordance to the feature
vectors of the i-th training datasets. The decision boundary is dened as:
f (x) = w * x – b,
where the i-th feature is classied as positive (+1) if f (x) > 0, and negative (-1) if f (x)
< 0. The separating hyperplane line is structured at f (x) = 0. The points positioned
around the separating hyperplane line are known as support vectors (SVs) and their
distance to the hyperplane line is known as the margin. Optimization of the SVM is
calculated by nding the smallest distance among all SVs. The values of +1 and -1 are
expressed as correctly classied alphabet and incorrectly classied alphabet respectively.
However, there are more than two classes that are being classied. To obtain an M-class
of the SVM classiers, a set of binary classiers need to be constructed, where each
binary classier is trained to distinguish one class from the rest. The results are then
integrated to form a multiclass classication according to the maximal output of each
binary classier x j, which is also known as the condence value and j is referred to
each alphabet binary classier. Thus, x belongs to the class with the largest condence
value. In fact, there will be gestures that belong to none of the sign classes or the
aforementioned neutral class in a real-world application, and thus any gesture with a
condence value of less than 0.5 (50%) is considered as an invalid class. In this study,
a feature vector x is built using a total of 8 features, which are the normalized
sensor data for ve ngers, and three computed SDs of angular readings from the IMU
sensor. Sliding windows of 3 s are adopted to construct the feature vector for every
10s sensor data to accommodate the changes in hand movement within a 3s time. The
values of the ten features in 3s time are averaged for the feature vector construction.
The SVM is trained and tested using the leave one sub ject out (LOO) method. Here,
the SVM is trained using n 1 sub ject datasets, where n is the total number of sub jects.
Subsequently, the trained model is tested using the leave-out sub ject dataset that did
not participate in the training process. This process was repeated n times, where each
sub ject was treated as a leave-out sub ject once when the testing dataset.
3.11Feature Extraction using Python:
Following package of Python programming language will be used.
Numpy : Numpy is a python library used for problems of scientic computing.
Proposed Methodology 10Numpy oers following useful tools:
2. Array indexing
3. Array math
We will make use of its N-dimensional array ob jects to represent our data. Along with
array ob jects it also provides some very useful mathematical tools. These tools will be
used to extract features from our data.
Gesture Training and Real-time Recognition:
For classication purpose, classication models will be trained. We will made use of
following algorithms of machine learning:
A multilayer perceptron (MLP) is a class of feedforward articial neural network. An
MLP consists of at least three layers of nodes. Except for the input nodes, each node
is a neuron that uses a nonlinear activation function 3. MLP utilizes a supervised
learning technique called backpropagation for training. Its multiple layers and non-
linear activation distinguish MLP from a linear perceptron. It can distinguish data that
is not linearly separable. If a multilayer perceptron has a linear activation function in
all neurons, that is, a linear function that maps the weighted inputs to the output of
each neuron, then linear algebra shows that any number of layers can be reduced to a
two-layer input-output model.
3.13Support Vector Machines:
In machine learning, support vector machines (SVMs, also support vector networks)
are supervised learning models with associated learning algorithms that analyze data
used for classication and regression analysis 4. Given a set of training examples, each
marked as belonging to one or the other of two categories, an SVM training algorithm
builds a model that assigns new examples to one category or the other, making it a non-
probabilistic binary linear classier (although methods such as Platt scaling exist to use
SVM in a probabilistic classication setting). An SVM model is a representation of the
examples as points in space, mapped so that the examples of the separate categories are
divided by a clear gap that is as wide as possible. New examples are then mapped into
that same space and predicted to belong to a category based on which side of the gap
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric
method used for classication and regression 2. In both cases, the input consists of the
Proposed Methodology 11k closest training examples in the feature space. The output depends on whether k-NN
is used for classication or regression:
In k-NN classication, the output is a class membership. An ob ject is classied by a
ma jority vote of its neighbors, with the ob ject being assigned to the class most common
among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then
the ob ject is simply assigned to the class of that single nearest neighbor.
In k-NN regression, the output is the property value for the ob ject. This value is the
average of the values of its k nearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classication. The k-NN
algorithm is among the simplest of all machine learning algorithms.
3.15Dynamic Time Warping:
In time series analysis, dynamic time warping (DTW) is one of the algorithms for mea-
suring similarity between two temporal sequences, which may vary in speed 1. For
instance, similarities in walking could be detected using DTW, even if one person was
walking faster than the other, or if there were accelerations and decelerations during the
course of an observation. DTW has been applied to temporal sequences of video, audio,
and graphics dataindeed, any data that can be turned into a linear sequence can be
analyzed with DTW. A well known application has been automatic speech recognition,
to cope with dierent speaking speeds. Other applications include speaker recognition
and online signature recognition. Also it is seen that it can be used in partial shape
Block Diagram Figure 4.1:
Flow Chart Figure 5.1:
Raspberry Pi code
sent to database
Mapped text sent to LCD and
Speaker synchronized to
In this phase of proposal we want to show you our brief management plan for how we
will tackle the challenges and complete the proposed system pro ject by keeping in mind
the race of clock. So, we set work plan according to time to complete each and every
task eciently. We prepare it upon monthly basis but we also have mind to make short
term plan in each month to evaluate our performance and by that we can it. We are
going to start it from July. Figure 6.1:
1Dynamic time warping. http://en.wikipedia.org/wiki/Dynamic_time_
warping , Last accessed on June 28, 2018.
2K nearest neighbors. http://en.wikipedia.org/wiki/K-nearest_neighbors_
algorithm , Last accessed on June 28, 2018.
3Multilayer perceptron. http://en.wikipedia.org/wiki/Multilayer_
perceptron , Last accessed on June 28, 2018.
4Support vector machine. http://en.wikipedia.org/wiki/Support_vector_
machine , Last accessed on June 28, 2018.
5MD SARWAR JAHAN KHAN POLASH ABDULLAH AL MAMUN and FAKIR MASHUQUE. Flex sensor based hand glove for deaf and mute people. International
Journal of Computer Networks and Communications Security , February 2017.
6Deepak Ram K Krishnan Ananthanarayanan H R Nandi Vardhan Abhijith Bhaskaran K, Anoop G Nair. Smart gloves for hand gesture recognition. 2016
International Conference on Robotics and Automation for Humanitarian Applica-
tions (RAHA) , 2016.
7Muhammad Herman Jamaluddin Fariz bin Ahmad Zaki Shukor, Muhammad Fahmi Miskon. A new data glove approach for malaysian sign language detec-
tion. 2015 IEEE International Symposium on Robotics and Intel ligent Sensors
(IRIS 2015) , 2015.
8Pietro Cavallo and Giovanni Saggio. Conversion of sign language to spoken sen- tences by means of a sensory glove. JOURNAL OF SOFTWARE, VOL. 9, NO. 8,
AUGUST 2014 , 2014.
9Sujan Kumar Anish Tamse Dr. Nagendra Krishnapura Celestine Preetham, Girish Ramakrishnan. Hand talk- implementation of a gesture recognizing glove.
2013 Texas Instruments India Educators' Conference , 2013.
10Shahrazad Abdulla DalalAbdulla and Rameesa Manaf. Design and implementation of a sign-tospeech/text system for deaf and dumb people. Electrical and Computer
Engineering Department University of Sharjah, Sharjah 27272, UAE .
1711Boon Giin and Su Mun Lee. Smart wearable hand device for sign language in-
terpretation system with sensors fusion. IEEE SENSORS JOURNAL 2018, Feb,
12T Hammad and N Singal. Disability, gender and education: Exploring the impact of education on the lives of women with disabilities in pakistan. South Asia and
disability studies: Redening boundaries and extending horizons. Oxford: Peter
Lang , 2014.
13Maria Khawar, Ayesha Baba Sehar Fatima, and Asma Maqsood. Accelerometer
Based Hand Gesture Recognition System . University of Engineering and Technology,
14Syed Atif Mehdi and Punjab 54570 Pakistan YasirNiaz Khan, FAST-National Uni- versity of Computerand Emerging Sciences B Block Faisal Town Lahore. Sign
language recognition using sensor gloves. Proceedingsof the 9th InternationalCon-
ferenceon Neural InformationProcessing(ICONIP02), Vol. 5 .
15Dr. D. M. Yadav Priyanka R. Potdar. Innovative approach for gesture to voice conversion. INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH and
DEVELOPMENT , 2014.
16Michael Xie and David Pan. Accelerometer gesture recognition. 2014.