Kim Doyle is a Research Data Specialist with MDAP and is completing a PhD in Digital Media (@Doyle1Kim, ORCID: 0000-0002-9429-7188). Paul Gruba is an Academic Convenor on the Petascale Campus Initiative at the University (@pgruba, ORCID: 0000-0002-6616-9568). Aleks Michalewicz is a Research Data Steward with MDAP and holds a PhD in Archaeology (ORCID: 0000-0002-7328-2470). Simon Mutch is a Senior Research Data Specialist with MDAP and a Postdoctoral Research Fellow in Astrophysics working as part of the ARC Centre of Excellence for All-Sky Astrophysics in 3D (ASTRO 3D) (@T_Tauri, ORCID: 0000-0002-3166-4614). Andrew Siebel is the Platform Manager at MDAP and has a PhD in Reproductive Endocrinology (@Blewey30, ORCID: 0000-0002-9230-9042). You can find the MDAP team on Twitter at @MDAP_Unimelb.
What does a law professor need from a data visualisation specialist? What would an expert in the prevention of domestic violence see in billions of Tweets? How can a researcher use Big Data to enhance climate projections in Australia and overseas? What can universities do to promote Indigenous data sovereignty and governance? Can scientists transform analytical pipelines for more effective genomics research? These questions and more are what our specialist team at the Melbourne Data Analytics Platform (MDAP) are working to answer. We are an academic specialist team that collaborates with colleagues across the University in data-intensive research. Our mission is to explore the possibilities of technology, infrastructure and digital research.
We love data and recognise that its true value lies in the application of data analytics to help answer research questions. Our aim is to work with scholars across all research fields to maximise the value of their data and its interpretation, and to contribute to its impact within and beyond the academy.
As academic specialists we spend the majority of our time contributing to research, with outputs including co-authorship on publications, supervision of graduate students through an intern program, and pursuit of research funding. We work hard to enhance Open Scholarship, including Open Data, Open Code and Open Access.
We have been looking here and overseas to see whether there are similar data-intensive research units driven by academic staff at other universities, but so far have not found any. To the best of our knowledge, we are unique in Australia and the region because of our academic status, our focus on data-intensive and computational methods and our mix of fields, which cover most disciplines in the University. This makes benchmarking a real challenge, but it is also exciting to be leaders in this space. If you are involved with a similar group, we would love to talk to you.
We engage with researchers through projects (under 3 months), collaborations (3 to 6 months), and through consultancy, expert advice, and advocacy on an ad hoc basis. Our motto is “If we can’t help you, we will find someone who can!”. We use a formal application process to manage demand for longer collaborations. In 2020, we received 60 expressions of interest, of which 40 went to full application, with 17 supported across 9 of the 10 faculties at the University.
We are well placed to support new research directions in all disciplines. In our first year of operation, MDAP has enabled data science methods, computation, and stewardship of data-intensive research with all faculties at Melbourne University. A crucial part of our work is to break down disciplinary ‘silos’ and actively foster interdisciplinary collaboration. This is only possible by partnering with our professional colleagues, other academic specialists working in data rich research, and the infrastructure platforms of the University, such as Research Computing Services, Scholarly Services, and the Melbourne Centre for Data Science.
At MDAP, we are charged with connecting a wider community of dedicated research and professional staff at the University, fostering productivity and leading to greater recognition. To this end, we work to see where data can be gathered, used, reused, and preserved in ways that spark the imagination, identify and break down the edges between disciplines, and illuminate new and emerging fields of research.
We are exploring the idea of the ‘Third Space’, located at the intersection of traditional academic scholarship, professional support, and command of research and data infrastructure. We are engaged with Australian and international universities who together view the third space as a driver of data-intensive research, interdisciplinarity, and capability building. We focus on our research collaborations to build this dynamic community and are involved in (inter)national discussions and workshops, consultations and advice. Each member of our team is charged with building this emerging and dynamic community.
Big Data is often defined by Variety, Veracity, Volume and Velocity. The first two principles apply to all academic research. The last two feature strongly in the work that we do, with the important caveat that the collection and analysis of traditional ‘smaller’ data can likewise have a big impact. Moreover, what defines ‘data-intensive’ varies considerably between disciplines.
One way in which we address Variety is in our team’s composition and diverse backgrounds. Drawn from across the University, the MDAP team reflects – and is strengthened by – a diversity of research specialisations. We were intentionally hired for our range of disciplinary backgrounds, and these include: Actuarial Science, Agricultural Science, Archaeology, Astrophysics, Biochemistry, Bioinformatics, Biology, Classics, Computer Science, Cultural Conservation, Digital Media and Communications, Earth Sciences, Ecology, Economics, Genomics, History, Hydraulics and Hydrology, Languages, Law, Mathematics, Music, Neuroscience, Physical Geography, Physics, Psychology, and Public Health.
This variety of disciplines helps us to work with many kinds of data, which in turn requires a collaborative team of diverse expertise. Further, many of our research partnerships tackle cross-domain data, and our collaborators across the University are themselves often part of interdisciplinary teams.
The concept of Veracity is essential to all research. Our contributions here incorporate research design, data collection methodology, as well as data management and preservation. Without ensuring quality of data, there can be no meaningful analysis. We recognise that data is rarely ‘objective’ but rather the result of human data collection, with its inherent biases, outliers and noise. We focus on data quality and quality assurance.
One example of this is the collaboration between MDAP and Melbourne Pollen, which applies time series forecasting via machine learning to rich datasets that span 30 years. One of the challenges here has been identifying noise and understanding reasons for deviating data points. This model is an Australian first, and with Australia having one of the highest rates of allergies in the world, the outcomes of this research will help millions of people.
We apply the FAIR data principles (Findable, Accessible, Interoperable, Reusable), and to support Indigenous data governance, the CARE principles (Collective Benefit, Authority to Control, Responsibility, Ethics). However, these are not always mutually applicable. For example, not all data should be accessible, or preserved in perpetuity, and universities are not always the most appropriate custodians of data collections. By focusing on best practice international standards, on a case by case basis, we ensure that data sets are assessed, accessed, processed and safeguarded in the most appropriate manner.
Recently MDAP has turned its attention to Twitter. With around 140 million daily users and 500 million tweets per day, Twitter forms a significant cultural archive, but working with terabytes of data is beyond the skill set of most researchers. We created a database and an index, much like an index in a book, so that researchers can quickly query billions of tweets on any topic.
During a collaboration with the Australian-German Climate Energy College, we helped researchers to crunch petabytes (over 1 million gigabytes) of complex climate data by getting them set up and running on the University’s Research Cloud. This allowed them to cut their processing time down from months to weeks.
This global pandemic has shown how data can be harnessed in large scale cooperative efforts to understand and combat the virus and its social repercussions; but we have likewise seen how individuals can misunderstand and disengage from data and its evidence-based analyses. Understanding data, and harnessing its insights, is the challenge of our time and the light for our collective future. At MDAP we are excited about collaborating on data-intensive research, as well as educating the broader research community.