A Data-Driven Approach for Addressing Sexual and Reproductive Health Needs Among Youth Migrants

  • Pragati JaiswalEmail author
  • Amber Nigam
  • Teertha Arora
  • Uma Girkar
  • Leo Anthony Celi
  • Kenneth E. Paik
Open Access


Background: Every year millions of people migrate across international borders from country to country capturing the attention of governments. While this movement has led to the development of many new international policies and programs that help assist these migrants, still a lot needs to be done to plug the unmet sexual and reproductive health needs of adolescent migrants who are mostly dependent on their families financially and socially and often fall through the cracks of the system. Objective: In order to create new policies and programs, legislators and other government workers must collect extensive data about these migrants to find out more information regarding the reasons for their migration and their immediate health needs in the process of migration for key decision-making. This study explores ways of getting relevant data from the migrants and apply machine learning to derive insights from the data for the stakeholders. Methods: To solve this problem, we have created a web application that will facilitate crucial data collection. Additionally, we have mocked up a data driven recommendation system about predicting most vulnerable migrants. This could help different stakeholders involved in the sexual and reproductive health of youth migrants in clinical decision-making. Results: The study involved building a web-app and curation of a questionnaire for the migrants to build a pipeline of data that could be later used for deriving insights about the patterns in migration and its potential sexual and health risks. It has also explored different ways of disseminating information about sexual and reproductive health needs to the youth migrants. Finally, machine learning was used for predicting vulnerability of migrants based on their backgrounds. Conclusion: Data is of great essence in mitigating the risks associated with various sexual and reproductive health related issues among migrants. First, it can be used to make youth migrants aware of their sexual and reproductive health needs and rights. Second, it can be used by Machine Learning to generate useful recommendations for reducing the risks of migration.


Data-driven Sexual and reproductive health needs Youth migrants Migration Machine-Learning Random forest Support vector machine (SVM) XGBoost Multilayer perceptron (MLP) Deep learning 

Learning Objectives

  1. 1.

    Understand the grave issues which are prevalent among youth migrants with respect to sexual and reproductive health and the current gaps in provision of care

  2. 2.

    Explore ways of getting relevant data from the migrants and apply machine learning to derive insights from the data for all stakeholders

  3. 3.

    Develop a prototype and propose next steps to illustrate possibilities


25.1 Introduction

In the past few decades, people have increasingly moved across borders due to compelling political, social and economic circumstances or in search of a better future for themselves as well as their families. During such transit, many witness exploitation or trauma which may affect their physical and mental well-being. Further, access to healthcare in a new environment for migrants may be restricted when compared to the local residents. This can be due to either inadequate coverage of healthcare services for migrant population or lack of awareness regarding clinic locations. In the scenario of migration, the ‘adolescent’ population is particularly vulnerable and fragile. Not having complete autonomy over their decisions due to financial and social dependence, they are not empowered enough to take decisions for their Sexual and Reproductive Health (SRH). As a result, their SRH gets compromised. The barriers to health services get intensified due to inadequate sources of information, lack of financial resources and paucity of youth-friendly health services.

A study by Bocquier et al. (2011) that examined the impact of mother and child migration on the survival of more than 10,000 children in two of Nairobi’s informal settlements between 2003 and 2007 found that children born to women who were pregnant at the time of migration have the highest risk of dying. Another study by Greif et al. (2011) explored the vulnerability of the migrant population to engage in risky sexual encounters and it was found that migrant populations are more prone to engage in risky sexual behavior.

These concerns prompted UNFPA-MIT team to explore the needs of migrant youth as well as identifying the barriers to existing health services in the migrated country. To understand the key element of our user base—adolescent migrants—we mapped their geographical journey (Fig. 25.1) to identify potential areas where SRH services will be needed. For instance, if an adolescent migrant is exposed to harassment then access to SRH services might help them recover swiftly both mentally and physically. On the other side, in case of unintended pregnancy, there are increased chances of unsafe birth or abortion if requisite reproductive health services are inaccessible.
Fig. 25.1

Journey of an adolescent migrant mapping potential areas for SRH needs

Additionally, given the pervasiveness and increasing relevance of data science in data intensive domain, it could help in staying ahead of the curve while tackling the problem at hand. We experimented with artificially created data to come up with recommendations on vulnerability of migrants for maximizing attack surface against the problems faced by migrants. This could not only help in prioritizing preventive actions, but it could also help in formulating a data-driven policy and also leading the way for dealing with such issues at scale.

25.2 Methods

UNFPA collected data for 200+ questions on topics relating to profile screening, migration background, sexual and reproductive health knowledge levels as well as medical history to ascertain the needs of the migrant youth population.

This data was shortlisted to 27 questions to design a user-friendly questionnaire that can be administered via a web app. This would prompt the adolescent migrants to fill out the easy-to-answer, multiple choice questionnaire so that they can be assisted in their needs in an efficient and effective manner. Besides, we also conducted machine learning analysis, explained in the Data Science Experiment section, to predict vulnerable migrants. The data collected from migrants over time through web app would be used to make predictions about their vulnerability using machine learning and deep learning algorithms. The information generated via this analysis would be highly useful in channeling limited resources and efforts to the population group that needs it the most.
  1. (A)

    Question Curation


We divided the questions into two broad categories—(1) Need assessment and (2) Access issues. Initially, through the first set of questions, we aim to understand basic profile metrics like age, relationship status, number of children if any, association with support group in the city, education levels etc. Since migration is often an additive factor for trauma due to sexual harassment in transit, we have included questions related to the same. Further, for effective communication we asked questions to gauge their access to smartphones and social media platforms so that when required in later stages, our team can design information dissemination for migrants using their desired and most engaging platforms.

In addition to this, our team also put forward questions pertaining to Sexual and Reproductive health in order to collect information on number of sexual relations, number of partners, use of different contraceptive methods, use of condoms and current status on being sexually active. These questions help assess the level of need for sexual and reproductive health services. Lastly, we included questions to understand the barriers to access that these adolescent migrants face which restricts them from going to clinics when in need. These questions include their knowledge of any health centers nearby, their comfort with the health professionals, any experience of misbehavior, the kind of services they were seeking when they were mistreated, among other questions (Table 25.1).
Table 25.1

List of 27 questions the team shortlisted




Need assessment



How old are you?


What is your sex?



Are you in a relationship or married, divorced, widowed?



How many children do you have?



Do you have someone in this city you can rely on if you have a problem?


What is your highest level of educational attainment?



Which of the following languages do you speak well? (select all applicable)



Did you experience any physical abuse or harassment of a person (of a non-sexual nature) during your journey?



Did you have a phone with you during your migration journey so far?



Which social media do you use during your journey? (select all that apply)


Have you ever had sexual relations?



How old were you the first time you had sexual relations?



How many sexual partners have you had?



What contraception method are you using at present?



Are you sexually active?



How often do you use condoms?

Access issues



Do you know where to access sexual or reproductive health services in this town?



Have you ever visited a health facility or health care professional of any kind in this city?



Do you think that the waiting room was comfortable?



At this facility, did you notice any signboard in a language you understand that mentions the operating hours?



Did you feel comfortable enough to ask questions during the consultations?



Why did you not feel comfortable enough to ask questions?



Has any staff working in a health facility in this country ever treated you or your friends in a manner that made you feel upset?



If you felt that the staff was not friendly and did not treat you with respect, please tell us why do you think the staff acted that way?



Have you ever been denied access at any health facility in this city?



Do you know why you were denied access?



What service where you seeking when you were denied access?

Note We have not included the multiple-choice options for these questions in this report

  1. (B)

    Developing Web Application

The purpose of the web app is two-fold: providing information about sexual and reproductive health needs to youth migrants through the section “Safety Tips” in the app, and collating user-data that could be later used by machine learning to derive patterns on migration and corresponding sexual and reproductive health risks. The web app also has a section “Interactive Tools, Facts and Figures” for providing information to assist decision-making.
  1. (C)

    Data Science Experiment


Our experiment about predicting migrants’ vulnerability using data science has been divided into the following steps:

  1. 1.

    Migrant vulnerability data curation using a rules-based system

  2. 2.

    Predict vulnerability by identifying patterns using Machine Learning and Deep Learning

  1. 1.

    Migrant vulnerability data curation using rules

Given the lack of a well-curated dataset for our problem statement, we artificially created 1000 migrant profiles and predicted their vulnerabilities using rules (see Appendix).
  1. 2.

    Predict vulnerability by identifying patterns using Machine Learning and Deep Learning


We then used machine learning algorithms like Random Forest (Breiman 1996; Breiman 2001; Liaw and Wiener 2002), Support Vector Machine (SVM) (Cortes and Vapnik 1995) and XGBoost (Chen and Guestrin 2016) and deep learning algorithms like Multilayer Perceptron (MLP) and sequential Neural Network (Hagan 1996) to predict whether one suffered any physical abuse based on one’s features listed later. We used keras and sklearn libraries for our machine learning and deep learning implementation. The ratio between training and testing data is 80:20. The intent of this experiment is to show how machine learning can help extract underlying rules/patterns and help in determining vulnerabilities. This could help in prioritizing actions to save as many people as possible.

Following is the feature set used for this experiment:

  1. i.

    Age—integer value

  2. ii.

    Sex—categorical value

  3. iii.

    City of birth—categorical value

  4. iv.

    Current City—categorical value

  5. v.

    Duration of stay in current City (in months)—integer value

  6. vi.

    Married, divorced, widowed—categorical value


25.3 Results

In this paper, we have discussed different ways in which we could provide data driven recommendations to different stakeholders that play a role in Sexual and Reproductive Health Among Youth Migrants. For instance, we propose to provide summarized reports on crimes, diseases, other health indicators like age-weight statistic, fertility rate, and life expectancy to the relevant people who can take a decision based on the information. The webapp we have developed contains information and statistics that would help in almost-live tracking of various health issues and could also serve as a lynchpin for decision-making for policymakers. For the migrants, we have various sections like “Safety Tips” dedicated to spreading awareness among youth migrants. The content in these sections would be updated based on the analysis of migrants’ responses to questionnaire that we have shortlisted. We have shortlisted a relevant set of questions to be asked from the candidates that could capture the health statistics of the migrants in a clear and precise way without bogging them down with too many questions.

We have also conducted a machine learning and deep learning-based analysis for predicting migrants’ vulnerability using features like age, gender, city of birth, current city, duration of stay in current city (in months), and status (Married, divorced, widowed). In the experiment, we have evaluated F1 score and accuracy (see Table 25.2) and confusion matrix (see Table 25.3) for each of the algorithms used in our experiment.
Table 25.2

F1 score and Accuracy for algorithms


F1 Score





Random forest









Neural network



Table 25.3

Confusion matrices for algorithms


Confusion Matrix



Support vector machine (SVM)





Random forest










Multilayer perceptron (MLP)





Neural network





We believe that the most important aspect of solution for the problem is to not miss predicting vulnerable migrants even at a cost of a few false-positives. Therefore, we have optimized our algorithms to minimize false-negatives (although we could have settled for a higher F1 score for a different problem). As we can observe in Table 25.3, XGBoost gives the least false-negatives.

Our results show that we have correctly identified most of the vulnerable migrants and also reduced the number of migrants to be checked on priority at a cost of a few false positives. For instance, we are able to correctly identify 14 out of 17 vulnerable migrants and reduce the number of people to monitor on priority from 200 (total migrant count) to 28 (positively predicted cases i.e. true positive + false positive) at a cost of 14 false positive instances using SVM algorithm (see Table 25.3). All the statistics reported here have been averaged over 100 evaluations to account for variability due to initial random weight assignment by algorithms.

25.4 Discussion

We chose to make the web application comprehensive to target a large user base. By targeting migrants, researchers, and healthcare workers, much integration of the personal qualities between these user groups could be made in future iterations of the application that would increase the amount and types of data collected as well as the overall utility of the application. We chose to keep the website as simple as possible to not overwhelm the users, especially migrants, with a large amount of information. Each of the features was placed in a standalone tab and a navigation menu was placed on the top of the website to allow the migrants to easily navigate through. We also used data science to make our app more dynamic by taking real-time data into account. We have a recommender system for predicting most vulnerable migrants who need immediate attention.

25.4.1 User Stories

Let’s consider examples of three different users from the three different user groups who will benefit from what we have developed:

Masalia is a 15 years old girl who grew up in rural Kenya. Several men visited her village one day and encouraged her to come with them to Nairobi in promise of more work opportunities and a better lifestyle. After coming to Nairobi, she was physically and emotionally abused. Even though she managed to escape from her perpetrators she found herself living alone in a large city without a home and the help she needed. In this case, Masalia could use our website to immediately locate the nearest healthcare center. Additionally, she could look over the safety tips to protect herself from her surroundings and may be more inclined to fill out the survey after realizing how much the web app benefited her.

Abulakan is a 30 years old health care worker in Nairobi. Working at one of the largest hospitals in the city, Abulakan has been tasked with addressing the SRH needs of migrants. He could use the existing survey data from the web application to strategize on specific initiatives he should undertake to help migrants. Additionally, he could try to ensure that all migrants who visit the hospital be required to complete the survey that would be useful for him and other hospitals in the area in addressing SRH needs of the vulnerable population.

Brandon is a 28 years old graduate student at the Harvard School of Public Health interested in the sexual and reproductive health needs of youth migrants. A quick web search leads him to find some articles on this topic but no opportunity to access any data. By using our website, Brandon could not only obtain existing survey data on the migrants but also create his own survey and get input from the migrants should he decide to conduct his own research product related to the SRH needs of migrants. Brandon can also take advantage of the interactive tools page and take a look at facts and figures to potentially help him determine a direction for his research (Fig. 25.2).
Fig. 25.2

Prototype snapshot with navigation menu

25.4.2 Prototype

The figure above, displays a prototype of our web application. The snapshot illustrates the different pages that users have the option to navigate: Home, About, Background, Relevant Research, Research Questionnaire, Health Centers, Facts and Figures, Safety Tips, and Contact. The Home tab gives a brief description on the goal of the web app and some statistics on the importance of the work. The About tab gives the user some background on the creators of the website, specifically bringing to light that we are a group of MIT and Harvard School of Public Health students and our partnership with the UNFPA. The Background and Relevant Research tabs give the user insight into the importance of research in this area and what has previously and is currently being done. The Contact tab allows anyone using the app to get in touch with us to address any questions, suggestions, or concerns. The remaining tabs are discussed in extensive detail below.

25.4.3 Visualizing Youth Migration Patterns & Vulnerabilities Using Geospatial Analysis

Given that our website is meant to be informative as well as instructional, utilizing the user’s current geographical location to direct them to the nearest health center is of high interest. Thus, we derived a map of health centers in Nairobi using Google Maps ©. Currently, our website provides a single layer map with all Nairobi health care facilities (including hospitals, healthcare centers, and clinics). In the future, it would be of great use to include specifics regarding particular services offered at each facility. Some examples of particular qualifications of the facilities may include what the facility is best known for based upon reviews by other migrants through our application. Other things to consider are the specialist physician services offered including gynecologist, primary care, and infectious disease.

All of these potential additions would each represent another visual layer added to the map. While the map is currently being projected using Google free services, it could easily become more intricate if a more-advanced open source GIS software were used (for example, QGIS or gVSIG). This would allow users to filter their search results to only include health facilities that are accessible and applicable to them; in this way, the map would be customized to the specific migrant user. If desired, UNFPA could further tailor the map design to illustrate accessible UNFPA intervention programs and health services.

Given that our website will continually collect data, additional insights may be derived from the questionnaire that can be applied to this service. After a certain threshold of data collection has been crossed, UNFPA may be able to categorize health services based on geographic location by identifying migrant subpopulations. Therefore, the site has the potential to serve as a longitudinal research platform. These specifications will enable more advanced levels of geospatial analysis in order to formulate a more accurate representation of the migrant population in Nairobi, as well as identify potential hot spots of high sexual and reproductive health vulnerability.

To prevent potential barriers of use, we must consider the confidentiality of user information. The political climate surrounding sexual violence, female abuse, and violation of human rights, especially on the topic of sexual and reproductive health, has engendered some level of societal fear. Citizens, mostly women, will not seek health services out of fear of being stigmatized and shamed by their peers. Thus, in order for continuous youth migrant data collection to be feasible, our site must be entirely secure and confidential. Encrypted geographic location data will be used to determine aforementioned hot spots of migrants and to make more informed decisions regarding allocation of sexual and reproductive health resources in Nairobi. User identity will remain anonymous. In doing so, we will generate trust from our users and ensure useful data collection.

25.4.4 Interactive Tools, Facts and Figures

The data captured from various resources has been presented to provide information through a web application that has a user-friendly interface. The frameworks used to develop the charts and graphs displayed on the web app were HTML, CSS and Javascript scripts. Some of the comparative analysis available on the web app includes crime rate, migrant age-weight statistics, correlation between life-expectancy, fertility rate, and population of countries around the world. Some samples of the figures are shown below. The code used to generate these figures (figs. 25.3, 25.4, 25.5) has been placed in the appendix.
Fig. 25.3

Migration percent

Fig. 25.4

Migrant age-weight statistic

Fig. 25.5

Annual crime count

We also have dedicated a page in our web application that includes a comprehensive overview of the safety tips. These tips would play a key role in plugging the information gap by increasing the awareness of the migrant population about previous incidents and accidents so that they can be averted in future. It also informs the migrants some simple strategies they can incorporate into their daily life to avoid placing themselves in dangerous situations. A screenshot of the safety tips page is displayed in Fig 25.6.
Fig. 25.6

Safety tip snapshot

25.5 Conclusion

For addressing problems such as the current one, where machine learning can only be used when there is data, we have shown that such a cold-start can be addressed by artificially curating the data using rules. Later, when the data is collated, it can be used to train algorithms for providing a more comprehensive and nuanced solution.

25.6 Next Steps

In the future, we hope to better understand the specific sexual and reproductive health services required by the migrants. Sexual and reproductive health is a geographically and demographically diverse issue. So, we hypothesize that an analysis to explore niche population needs and gaps would generate more accurate results.

We will conduct an exhaustive geospatial analysis to locate the migrant population. We primarily looked at urban areas for the scope of this project as we expected a greater concentration of migrants in those regions; however subsequent research could focus on data collection for at-risk migrants living in rural communities or dispersed in small numbers across various regions. The start and end locations of the migrants can also be recorded to look for migration patterns among specific groups of people and to investigate the extent to which starting locations influence final destinations.

Further development will include location tracking of migrant adolescents through the web application to allow automatic direction to the nearest health facility. Currently, the user has to click on the tab to find the nearest health facilities and then is redirected to a Google Maps page that shows the directions from the user’s current location to the nearest healthcare center. If these steps could all be combined into one whereupon at-risk users are automatically shown directions to the nearest healthcare center and possibly even identified as at-risk based on survey responses, the app would help migrants get fast medical care in emergency cases.

An integral next step will be to publicize the utility of the app among healthcare providers. If both migrants and healthcare providers are actively using the app, a telemedicine-based approach could be used to diagnose and treat patients remotely without them having to even come into the healthcare centers except for urgent issues. This would save both the migrants and the healthcare providers much time and money and could be especially beneficial for migrants living in remote locations or far away from healthcare centers. In the long term, we seek to have a strong patient-provider network based on this app for migrants. This integration would sensitize the healthcare providers towards the migrants’ needs and possibly create treatment approaches more specific to them.

Currently, the web application provides only one interface for the migrants, researchers interested in youth mixed-migration, and health care workers. This application could be further modified so that each of the user groups have their own unique interface. The features specific to each user group would be immediately apparent to that user group and he or she would not need to sort through unnecessary features. This way, for instance, health care workers and researchers could focus on the data collected using the questionnaire instead of having to view the questionnaire while migrants could be directed straight to filling out the questionnaire and not have to look at graphs and statistical analyses.

Finally, we have shown that machine learning and deep learning algorithms are able to identify most of the vulnerable migrants at a cost of a few false positives. We acknowledge that rules used to build the dataset for this experiment are curated manually and the patterns in real world scenario would be much more convoluted. But the intent of this exercise is to demonstrate, through a simple rule-based dataset, how machine learning could identify the patterns that could exist in manually-curated or real-world dataset. The next logical step would be to run these algorithms over the actual data. It would also be interesting to predict the severity and probability of abuse through the algorithms.

Another area of future development is to create a mobile application analogous to the web application. Based on current design, the migrants would need to navigate to the website on their phone in order to view the information and use the web application. Through mobile apps information could be presented in a better format, and they are usually 1.5 times faster than mobile websites, which means the actions performed on the app are faster than the actions performed on the website.


Author Contributions

Pragati Jaiswal led the team of authors, participated in conceptualization, data curation, formal analysis, supervision, validation, writing the original draft, and reviewing and editing.

Amber Nigam spearheaded the design and execution of machine learning experiments, and participated in conceptualization, data curation, formal analysis, supervision, validation, writing the original draft, and reviewing and editing.

Teertha Arora led the initiative of finalizing the questionnaire, participated in conceptualization, data curation, formal analysis, supervision, validation, writing the original draft, and reviewing and editing.

Uma Girkar participated in conceptualization, data curation, formal analysis, supervision, validation, writing the original draft, and reviewing and editing.

Leo Anthony Celi participated in project administration, resources, supervision, and reviewing and editing.

Kenneth E. Paik participated in project administration, resources, supervision, and reviewing and editing.


  1. Bocquier, P., Beguy, D., Zulu, E. M., Muindi, K., Konseiga, A., & Yé, Y. (2011). Do migrant children face greater health hazards in slum settlements? Evidence from Nairobi Kenya. Journal of Urban Health, 88(2), 266–281.CrossRefGoogle Scholar
  2. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.Google Scholar
  3. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefGoogle Scholar
  4. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining (pp. 785–794), ACM.Google Scholar
  5. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.Google Scholar
  6. Greif, M. J., & Dodoo, F. N. A. (2011). Internal migration to Nairobi’s slums: Linking migrant streams to sexual risk behavior. Health & place, 17(1), 86–93.CrossRefGoogle Scholar
  7. Hagan, M. T., Demuth, H. B., Beale, M. H., & De Jesús, O. (1996). Neural network design (Vol. 20). Boston: Pws Pub.Google Scholar
  8. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18–22.Google Scholar

Copyright information

© The Author(s) 2020

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Pragati Jaiswal
    • 1
    Email author
  • Amber Nigam
    • 2
  • Teertha Arora
    • 1
  • Uma Girkar
    • 3
  • Leo Anthony Celi
    • 4
  • Kenneth E. Paik
    • 4
  1. 1.Department of Global HealthHarvard T.H. Chan School of Public HealthBostonUSA
  2. 2.Kydots.Ai, Incoming health data science student at HSPHNew DelhiIndia
  3. 3.Department of Electrical Engineering and Computer ScienceMITBostonUSA
  4. 4.Institute for Medical Engineering and Science, Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations