What Should A Data Scientist Know?

That’s the question that a group of data scientists from industry and academia discussed over dinner on 14 December 2018 at the NUSS Suntec City Guild House.  We were fortunate to have Marianne Winslett, a professor emerita of computer science at the University of Illinois at Urbana–Champaign, to open the event with a keynote address. We see the role of SIGKDD as contributing to identifying and building the skills required for data science professionals.  Though simple and direct answers to such a complex question are elusive, the discussions were fruitful and pointed us in the right direction. It was also a chance for us to reflect and regroup as a community.


Marianne Winslett (UIUC) gave her take on the question, including the need for standards to ensure that professionals could build something that’d work, the need for a realistic mindset aware of the limitations of statistics, as well as the attention to ethical issues


In between dinner courses, we had breakout discussions on the topic, moderated by Giuseppe Manai (Chapter Secretary)


Though it was not by deliberate design, one table had primarily industry professionals, which brought forth issues on how to hire and select the right data scientists


The other table just happened to have many academicians, with discussions touching on the skills and competencies of data scientists


Hady Lauw (Chapter Chair) closed the event, summarizing the noted points for future follow-ups.


A fruitful discussion over a satisfying dinner with a growing community.  From left: Cheng Long (NTU), Aixin Sun (NTU), Yuchen Li (SMU), Serene Ow (Grab), Graham Williams (Microsoft), Joao Gama (DataRobot), Giuseppe Manai (Chapter Secretary, ING FutureLabs Ventures), Xiaoli Li (I2R), Aloysius Lim (Chapter Membership Chair, Eureka AI), Huayu Wu (DBS), Bing Tian Dai (SMU), Jing Jiang (SMU), Marianne Winslett (UIUC), Hady Lauw (Chapter Chair, SMU)



Networking over Networks

On 18 Sep, we had a chance to learn about the power of networks from Dr. Gábor Benedek (Lynx Analytics), while networking with data science enthusiasts over pizza and beer. We are also grateful to WeWork for hosting our event.  Here are a few pictures to remember by.


Giuseppe Manai (Chapter Secretary) introduced the speaker Gábor Benedek, PhD, who would be speaking about Big Graph Data Intelligence


In addition to graph theory and techniques, Gábor covered several case studies relating to social and health sciences as well


Networks gave us something to “chew on”, while pizza gave us something to chew while networking


WeWork provided a conducive space for learning together


KDD.SG Tutorial organized by SIGKDD Singapore & WeWork

Big Graph Data Intelligence – Analyzing Large Connected Data in Social and Health Science

September 18, 2018 | 6pm to 8.30pm | WeWork 71 Robinson


The miracle behind Social Data is that we have information on the detailed structure of how people are connected to each other, who are the family members, who are the friends, and who are the influencers, who are the followers. We call this structure as graph topology. But beside the Social Data Topology (from diverse sources) we can also observe characteristics of and behavior mechanism among the individuals. Many researches have proved that inside micro-topologies (cliques or communities) people tend to think, decide, purchase or do similar things, have similar profiles in many cases. Thus, if we want to understand or change customers’ decision, we must use the micro-topology information, not just individual connections.

Today, multinational companies (banks, airlines, telecoms, insurance companies and many other domains) are closer than ever to analyze, understand and utilize Social Data. They usually have at least three different sources of Social Data, which is sufficient to build their own transactional Social Network. First, A-to-B transactions (calls, instant messages, money transfers, bookings). Second, (co-)locations and the (co-)movements (same address, sharing bills, traveling together). Third, digital behavior (browsing history, app usage) of customers, potentially complemented with external information. Just like Facebook helps not only to derive the Network, but add much diverse information on attributes and interests. This in turn enables further deep dive on the homogeneity of these communities, verify those cases when the network is similar in demography, when the network is giving an insight on commercial decisions, when the network enables the spread of word of mouth.

However, creating the transactional Network using these attributes is not easy, especially when there are dozens of millions of customers, and billions of possible and measurable interactions of and between them. The different sources of information can contain contradictions and confusions, when we observe that overlaid topologies are not matching. (E.g.: online friends and offline friends.) Lynx Analytics has experiences and solutions to build and use these networks efficiently.



Gábor Benedek is an innovation partner at Lynx Analytics providing predictive analytics for communication companies, financial services and healthcare sectors in South-East-Asia. He has been applying SNA methodologies for Celcom, Indosat, Singtel, Telkomsel, Globe, DBS Bank in the region. Gábor received his PhD in 2003, in 2012 his T-Mobile SNA churn study was awarded as the Best Application paper by the European Decision Science Institute, and he is the author of one book and author/coauthor of over 20 articles. He was an Associate Professor at Corvinus University of Budapest, researching and lecturing in the fields of economic and business simulations, social network analysis, econometrics, data mining and predictive analytics. Gábor was among the founders of Data Explorer, the first consulting company in predictive analytics in Hungary. In 1999 Data Explorer built the first social network analysis software for churn and community detection applicable for mobile customers in Europe, based on Gábor’s theoretical foundations and proposals. In 2010 Gabor was contributing to the largest public physicians’ social network in the world, based on real patient-flow data between general practitioners and specialists.

Date/Time Tuesday 18 September 2018, beginning with a light reception at 6pm

Venue WeWork 71 Robinson Road Singapore 068895

RSVP on our Meetup.

See you there!

1st Singapore ACM SIGKDD Symposium

On Friday afternoon 27 July 2018, a number of esteemed data science thought leaders gathered at Hotel Jen Orchardgateway in the first-ever, invitation-only Singapore ACM SIGKDD Symposium.

It was an insightful session, and the attendees got to know one another, shared their current focus, exchanged ideas, and began a discussion towards a common vision.  We were very fortunate to have a diverse representation from academia, government, and industry.  There is also a consensus to pursue further such activities in the future that forge a stronger connection among the data science community in Singapore.

Look out for coming events in this space!


Aloysius Lim (in green, SIGKDD Chapter Membership Chairperson & Director of AI Products, Eureka AI) welcomed arriving attendees


Hady Lauw (SIGKDD Chapter Chair & Associate Professor, Singapore Management University) opened the session by introducing the Singapore ACM SIGKDD Chapter


Ee-Peng Lim (Professor, Singapore Management University) on Smart Systems for Citizens


Koo Sengmeng (Deputy Director of Strategic Alliances, AI Singapore) introduced AI Singapore


Robby Tan (Assistant Professor, National University of Singapore) on Computer Vision: Bad Weather, Motion and Human Image Analysis


Gyorgy Lajtai (Chief Executive Officer, Lynx Analytics) on How Neural Networks Help the Science of Loyalty


David Hardoon (Chief Data Officer, Monetary Authority of Singapore) on AI for Finance


Koo Ping Shung (Co-Founder, DataScience SG) introduced DataScience SG


João Gomes (Data Scientist and Director of Customer Success, DataRobot) on Automated Machine Learning: Enabling the AI-Driven Enterprise


Ying Li (SIGKDD Chapter Treasurer & Chief Scientist, Eureka AI) led an open discussion on data science issues


Xiaoli Li (Head of Data Analytics Department, Institute for Infocomm Research) on PU Learning and Imbalanced Learning


Jing Jiang (Associate Professor, Singapore Management University) on Recent Trends in Natural Language Processing


Arijit Khan (Assistant Professor, Nanyang Technological University) on Expressibility of Vertex-Centric, Distributed Graph Processing Paradigm


Bing Tian Dai (in blue, Assistant Professor, Singapore Management University) discussed his work on AI for Pedagogy


Ng See Kiong (Director of Translational Research, Institute for Data Science) on KDD Challenges & Opportunities


Zhu Feida (Associate Professor, Singapore Management University) on Data of the People, By the People, For the People




KDD.SG Tutorial organized by SIGKDD Singapore, DataScience SG & SMU SIS

Update on 9 June 2018: See the coverage of the event here.

Download the flyer for this tutorial and share it with your friends!

Image Classification Using Convolutional Neural Networks
with Applications to Facial Expression Recognition and Visual Sentiment Analysis

June 9, 2018 | 9.30am to 12pm | SOE/SOSS Seminar Rm B1-1 (Rm B127), SMU


With the prevalence of smartphones, everyone is now a casual photographer, leading to the abundance of images on the social media.  One useful machine learning task is to categorize an image into one of several classes. In this tutorial, we will cover how to build such a classifier using a deep learning technique called Convolutional Neural Networks or CNN.  While the task is generally applicable, this tutorial will focus on two applications relevant to user sentiments and preferences. One application is facial emotion recognition, classifying whether a photo indicates emotion such as happiness and sadness.  Another application is visual sentiment analysis, classifying whether an image found in a review signifies high or low rating.


Hady W. Lauw is an Assistant Professor of Information Systems at Singapore Management University (SMU), as well as NRF Fellow of the Singapore National Research Foundation.  Formerly, he served as postdoctoral researcher at Microsoft Research in Silicon Valley, as well as scientist at A*STAR’s Institute for Infocomm Research. He received his PhD from Nanyang Technological University on A*STAR Graduate Scholarship.  At SMU, he leads the Preferred.AI research project, whose research activities span data mining and machine learning, focusing on preference analytics and recommender systems. More information may be found at http://www.hadylauw.com.

Quoc-Tuan Truong is a PhD candidate in School of Information Systems, Singapore Management University (SMU). His research interests include machine learning, text mining and social network data analytics, with a focus on mining user preferences from review text and images. Quoc-Tuan received his Bachelor degree from University of Engineering and Technology, Vietnam National University, Hanoi. He was one of the ten Vietnamese recipients of the prestigious Honda Y-E-S (Young Engineer and Scientist’s) Award in 2016.  More information may be found at http://www.qttruong.info.

Date/Time Saturday 9 June 2018, beginning with coffee and tea at 9.30am

Venue SOE/SOSS Seminar Rm B1-1 (Rm B127), Basement 1, School of Economics/School of Social Sciences, Singapore Management University (Campus Map)

Prerequisite  You should be proficient in programming; familiarity with Python is a plus.  Prior knowledge of neural networks would be immensely helpful.

Preparation  Bring your own laptop.  There is guest wifi connectivity with limited speeds on site.  You are also welcome to bring your own internet connectivity. We will inform confirmed registrants of the data and packages to be downloaded a week beforehand.

RSVP on our Meetup.

See you there!