2017 used to be an enormous yr for Kaggle. Aside from becoming a member of Google, it additionally marks the yr that our group expanded from being essentially taken with gadget studying competitions to a broader knowledge science and gadget studying platform. This yr our public Datasets platform and Kaggle Kernels each grew ~3x, that means we now actually have a thriving knowledge repository and code sharing surroundings. Each of the ones merchandise are heading in the right direction to cross competitions on maximum job metrics in early 2018.
To give the group extra visibility into how Kaggle has modified, now we have determined to percentage our primary job metrics and the statement round the ones metrics. And, we’re additionally giving some visibility into our 2018 plans.
Active customers (distinctive annual, logged in customers) grew to 895Ok this yr up from 471Ok in 2016 (chart 1). This represents 90% expansion for 2017 up from 71% expansion in 2016.
While we're nonetheless most renowned for gadget studying competitions, each our public Datasets platform and Kaggle Kernels are heading in the right direction to be higher drivers of job on Kaggle in early 2018.
Chart 1: Active customers
We introduced 41 gadget studying competitions this yr, up from 33 final yr. This incorporated 3 competitions with greater than $1MM in prize cash:
We have additionally invested in turning into nearer to the analysis group, launching some essential analysis competitions for NIPS and CVPR workshops. Highlights come with a chain of opposed studying demanding situations and the YouTube 8M problem. Kaggle could also be now webhosting ImageNet.
Kaggle inClass, which permits professors to host competitions at no cost for his or her scholars, turned into a fully self-service platform and noticed actually great expansion. 1217 gadget studying and statistics categories hosted Kaggle InMagnificence competitions in 2017, up from 661 in 2016 (84% expansion).
On the group facet, 375Ok customers downloaded pageant datasets, up 62% YoY. And, 122Ok customers submitted entries to our gadget studying competitions, up 54% YoY.
Public Datasets Platform
Our public Datasets platform lets in our group to percentage and collaborate on public datasets. 7044 datasets had been uploaded onto the platform in 2017, up from 495 datasets in 2016. The hottest datasets uploaded in 2017 had been:
Downloaders of datasets on our public Datasets platform larger greater than 3x this yr, achieving 339Ok in 2017 up from 107Ok in 2016. This expansion approach the general public Datasets platform is riding nearly as many knowledge downloads as our gadget studying competitions (see chart 2). For context, we introduced our public Datasets platform in 2016 and our pageant platform in 2010.
Chart 2: downloaders of public Datasets vs competitions
Kaggle Kernels is recently used to percentage code and fashions on our competitions and public Datasets platform. In 2017, we had 113Ok customers of Kaggle Kernels, up nearly 3x from 39Ok in 2016. Kernel authoring is instantly turning into simply as well-liked as making a contest submission (see chart three).
Chart three: kernel authors vs pageant submitters
The hottest publicly shared kernels from this yr had been:
We introduced the biggest ever survey of information scientists and gadget inexperienced persons. It had 16,716 respondents and led to 235 public kernels exploring the dataset.The easiest protection of the survey used to be within the FT and The Verge.
Overall, we had been within the press so much this yr with subjects together with protection of the purchase (Techcrunch), profiles of a number of elite group participants (in Wired and Mashable), NIPS opposed studying problem (MIT Tech Review), TSA pageant (NYTimes) and the Zillow pageant (NYTimes).
It's additionally value highlighting the actions by way of our group that lend a hand improve Kaggle. We are acutely aware of over 50 Kaggle meetup teams arranged by way of Kaggle group participants in towns starting from Princeton to Paris. These meetups speak about our competitions and datasets. This yr, some elite Kaggle participants introduced a Coursera route on win Kaggle competitions. And a bunch of group participants setup a Kaggle slack channel to talk about Kaggle competitions and datasets; it has over 3300 participants.
We began with gadget studying competitions. We’ve now expanded so as to add a public Datasets platform and Kaggle Kernels. We ultimately wish to make Kaggle where the place Kagglers can do all in their knowledge science and gadget studying. In 2018, we're taken with bettering all of our primary merchandise (competitions, the general public Datasets platform and Kaggle Kernels) and including new instructional sources to our platform.
Competitions are recently in a powerful place. However, it is crucial that we aren't complacent and that we proceed to innovate. In 2018, we plan to begin supporting new pageant varieties to verify we will strengthen issues which might be on the chopping fringe of gadget studying and AI. To do that, we purpose to higher strengthen code-only competitions (the place Kagglers add code reasonably than resolution information). This will let us host new pageant varieties, together with reinforcement studying competitions and competitions with compute restrictions.
Public Datasets platform
In 2018 hope to grow to be as widely recognized for our public Datasets platform as we're for our gadget studying competitions. To do that, we want to keep growing the choice of top quality datasets on Kaggle. We are aiming to try this with a variety of robust new options. We are making plans to combine with and upload services and products that permit our group to paintings with higher datasets via integrations with knowledge warehouses like BigQuery. And to construct capability that permits Kagglers to circulation in are living datasets reasonably than simply importing static datasets.
Kaggle Kernels is recently most beneficial for sharing fashions and research on our competitions and public Datasets platform datasets. In 2018, we wish to make Kaggle Kernels a powerful standalone product. This comprises enabling Kagglers to make use of Kaggle Kernels with their very own personal datasets, get entry to GPUs and strengthen extra complicated pipelines.
Many customers come to Kaggle to begin their Data Science profession and spice up their studying. To higher strengthen this section of our group, we’ve introduced a platform of hands-on gadget studying classes at https://www.kaggle.com/learn. We hope for it to be the quickest trail for customers to begin developing extremely correct gadget studying fashions and to have the abilities they want to land their first knowledge science activity.
Want to become involved?
We are hiring knowledge scientists as we develop our pageant workforce. You can be told extra and observe at: https://www.kaggle.com/careers/datascientist.