In this episode of the Data Show, I spoke with Soumith Chintala, AI research engineer at Facebook. Among his many research projects, Chintala was part of the team behind DCGAN (Deep Convolutional Generative Adversarial Networks), a widely cited paper that introduced a set of neural network architectures for unsupervised learning. Our conversation centered around PyTorch, the successor to the popular Torch scientific computing framework. PyTorch is a relatively new deep learning framework that is fast becoming popular among researchers. Like Chainer, PyTorch supports dynamic computation graphs, a feature that makes it attractive to researchers and engineers who work with text and time-series.

United States


00:00:04welcome to the %HESITATION rightly data show I'm your host bend Oregon before we jump into today's episode I want to remind our listeners that we do have to invent series that they can go and attend and learn more about the topics covered in this podcast the first one
00:00:19is called the strategy to conference which you can find at strat UConn's dot com the second one is the artificial intelligence conference which you can find at D. A. icons dot com in this episode of the data shall I speak with an agent Allah is an A. I.
00:00:37a research engineer at Facebook among his many research projects so meet was part of the team behind the scene again deep convolutional generative adversarial networks %HESITATION widely cited paper that introduced a set of neural network architectures for unsupervised learning our conversation however centers around by torch the successor
00:01:01to the popular torch scientific computing framework while it's relatively new by torch has been embraced by the deep learning research community I hope you enjoy the app so seven eight chan Talat welcome to the data show well of my pleasure to be here so before we jump into
00:01:23our discussion I just wanted to ask you about your job title chair artificial intelligence research engineer so I've seen the title A. I. engineer and I've seen the title research engineer so you are somewhat of a hybrid between our %HESITATION are also there's a title a I researcher
00:01:42so read so it's the title doesn't actually mean much it's just to make sure people get context on what I work on just by looking at the title my actual tireless research engineer and I work in the A. I. division of their %HESITATION I see okay great so
00:02:02we'll come back to Facebook a I research %HESITATION later but let's start by thing about the new deep learning framework I buy new I mean that twenty seventeen five Dortch %HESITATION which sure %HESITATION heavily involved with so first of all describe how my torch came about and what
00:02:23convinced you folks to build it chair so do understand how I'd heard him allowed or very came from %HESITATION I'll give you a little bit of historical context so around so let's say like three twenty four day in there were three men favor exile one was torch the
00:02:44other was Tiana and the other one was cafe and there were a lot of flake other friends favorites but mainly these for the frameworks people used for doing research and %HESITATION computer vision deep learning and so on and all three frameworks had the distinction that they came out
00:03:06of university labs that is they were built by grad students maybe had like one or two support engineer said the case of piano but largely there where frameworks that where Bill Clinton no formal %HESITATION software engineer click or suffer architect experiences just like okay we have a need
00:03:29for this we're going to build this and they all came out and around like twenty ten to twenty eleven and then people they all had their niche right they are they all had that initial like the adult was really good and some other compiler torch was a framework
00:03:47that would try to be out of your way if you're a C. programmer so you could write your %HESITATION see programs and then just interface the did to lose interpret their language cafe it was very very suited to computer vision models so you wanted a con that then
00:04:07you wanted to trade it on like a large reason dataset cafe was your paperwork they all have their niche however one of the biggest problems that are facing these frameworks was that they were not professionally that looked or package store talked about you had quality control issues you
00:04:24had two user experience issues etcetera so in late twenty fifteen tenser Floquet about Dan cancer flow was one of the first professionally build frameworks from the ground up to be open source so the buyer or what to do planning bay where is supposed to be in terms of
00:04:46quality controller packaging went much much higher and that also open up to the users that okay tens of Chlo shows us how it's easy to package staying and so on but it on the on the other side of things all three of these frameworks had aging designs these
00:05:06frameworks were about like six or seven years old and it was evident that the field was moving their research in a certain direction that these frameworks and they're abstractions weren't keeping up you know what's interesting about what you're describing is that it really shows you that there was
00:05:23a need to write because as you described that the they were are harder to use and yet people stuck with them for so long and that also it's also funny because there's that this job in computer science departments right so when the grad student the crap finishes the
00:05:40software dot right but in these cases so you can just see the need there right yes so what happened well until then was our users would come play in like here and there but then like they would largely be happy with low %HESITATION and they would be okay
00:05:59with like the lady packet storage and shipped it and also like our design itself like you know the the way you build your own networks and python carros became an alternative that looked very very similar to torch so people don't have to deal with like the love programming
00:06:14language so we had a lot of feedback coming in and at the same time be had been thinking a lot about what to build as a next generation framework so wait though there's a missing part of your story because you mentioned that the answer flow came out it
00:06:29was professionally built in yet there was something that you needed that is correct Sir cancel flow address one part of the problem which is quality control and packaging and then it offered a piano style programming model so it was a very low level of deep learning framework on
00:06:52top of that you would see that or where the your or you're and to have the chance of that has been around there are a multitude of front ends that are trying to cope up and the fact that tens of those a very low level framework so there's
00:07:07TF slanders caras there's like that I think there's like ten or fifteen like and like just from Google there's probably like four or five of those so under toward side the philosophy has always been slightly different at the end like I see tons of flow as a much
00:07:23better piano style framework and on the card said we had to face a philosophy that we want to be imperative which means that you ran your competition immediately debugging should be but arsenal the user should never have trouble debugging their programs better they use say like like a
00:07:46python debugger or like a GED the or so on so we focused on being out of the way for the user between the user as a power user and we give them the tools that are very easy to debug and abstract less and V. that it had a
00:08:08lot of fans for so for our listeners who weren't familiar with torch I mean a lot of researchers were using courts I mean even deep mindless using portrait yes decline was using towards Facebook Twitter several university labs like there is like towards the year of twenty fifteen it
00:08:28was towards the year of twenty four day in less cafe layer of twenty sixteen %HESITATION stands are so %HESITATION in terms of like getting the large set of audiences so yeah I like over time we were thinking about what to build next and how to build it because
00:08:44he wanted to build a modern design that retains the philosophy of torch and so we started building it in July twenty sixteen we didn't initially call it by towards recall the hive down towards the weren't sure what to call it the at and %HESITATION it started off as
00:09:02an intern project that was that and Adam Pascal was my intern at last summer and he and I decided that we would build this thing and what we didn't do as do a design by committee that is like okay we're building this next big thing how do we
00:09:20design it we instead collected a core group of four people and the just designed the whole framework and they'll double initial set of grammar ourselves and then we started slowly giving access to this Famer again it's also formed to a multitude of people in the community just people
00:09:41who've been there were huge users of toward shore like former uses of toward you left towards to go go to Aidan because python was my kind of rain and so we collected a bunch of these people and by December we had collected about seventy or eighty I'll protesters
00:10:00completely closed no public presence finally had what what percentage were from Facebook there were about ten thirty eight percent or less from Facebook we really tried to cover a diverse crowd from like universities from other companies are they out the testers had people from like almost every major
00:10:22company and every major %HESITATION AI research labs universities etcetera so did the word torches and a name so what's the formal relationship between height torch and porch bow tie torches a successor targets built by the same set of people who used to maintain torch and and the one
00:10:45this the transition for a torch you Sir to quite so adored user would see a lot of other things and I adored sh but they would also feel that their philosophy of towards has been captured of al and high towards but also a graded to modern designs so
00:11:04there's another framework that you may or may not have been inspired by and it's just a framework all chain %HESITATION yes chain there was a huge inspiration inspiration so hypergiants fired by three primary a framework there was one and in toward shall we had certain researchers from trader
00:11:28who built an auxiliary package undergrad and this was actually based on a package called undergrad and the python community and both like chain %HESITATION are to grad and torture to grad all use a certain technique called tape based automatic differentiation that is you have a tape recorder that
00:11:52records what operations you have done and then a tree place that backwards to a computer gradients and and this is a technique that is not used by any of the other major favorites except I torch and chain %HESITATION all the other favor X. use what we call a
00:12:12static graph that is you they use their bills are grass and then the they give that Grafton execution engine that is provided by the framework and a friend were executed said eight can analyze it ahead of time and so on and so these are great to different techniques
00:12:28the tape based different shapes and gives you easier to debug ability and and give you a certain things that are more powerful it gives a dynamic neural networks and so on and the and the static graph based which gives you %HESITATION easier deployment to the mobile is their
00:12:47deployment to %HESITATION more exotic architecture is due compiler at techniques at a time and so on so you're roughly just about a year in and so how would you assess your rollout so far have you been happy with the reception absolutely first up the only public they really
00:13:11is on the eighteenth of chat SO very roughly like a few months and I would say our output testers helped a lot and and trading to design but I torch was so rough on the edges during out the testing that I wouldn't really consider that like you know
00:13:26unofficial really is so there really is on the eighteenth and then say an amazing amazing adoption so what would be the pockets of the doctors are closer to meet one of the things I've noticed and so there's two there seems to be two buckets to jump out of
00:13:45the one is researchers yes and then the other one is people interested in an LP and taxed EDS side I think that's also like tied to researchers like in my opinion there's three buckets one is the researchers and the second is data scientists including people who do cabelo
00:14:06competitions and staff and the third is product people who actually just take some technique can and like they're building this out into their product so does that mean that the so when you say data scientists these are people who aren't deep learning gore rules so that means that
00:14:24you have enough example architecture so chip that they can play with them exactly what they would generally do is either take a technique implemented into their database like into their data plans or they would find too and their existing models on top like using %HESITATION a neural network
00:14:44of a certain kind that was just published or so on so these are I would say not surely these people are not really involved in publishing papers to I CM outward naps but they're more to words caring towards company needs in terms of data science or like participating
00:15:02in Kagel competitions and so on so what is the relationship between by torch and in a terrace and then by torch and the larger kind of python not tool set that the data scientists like check out so to finish up the previous conversation hi George has got in
00:15:25there its biggest that option from researchers and eight Scott did an about a moderate dress bonds from data scientists and as we expected we did not get any adoption from like product builders because I'd urge models are not easy to ship into mobile for example so that's the
00:15:46that's the current adoption and the have people we did not expect to come on board like folks are open AI and several universities Berkeley was a surprise to those because they were very strong cafe stronghold and the expected them to either remain at cafe %HESITATION but it tends
00:16:06to flow but they love pride or chin %HESITATION date then %HESITATION using it a lot several universities in Europe has there been have you heard of the of people offering by torching a class yet yes we actually have three courses already being taught by George like by March
00:16:26actually one is this fast day I course bites are yeah yeah yep yep yep and the second is the course at Oxford for machine learning and then we have the and why you did assigned scores so that has all of their practical sent by torches well apart from
00:16:46this stand for offers recommends its students to deal works either and I towards or tenser flow and there's several universities in Europe that are using by George as their part of their homework curriculum as well so yes there is a lot of students using Pieters part of that
00:17:05homework send courses being toward using pie tarts all yeah yeah yeah and then I have a click here I forgot to mention to you earlier that we had a a deep learning deep learning for adult people that was so using tenser flow and the and the author switched
00:17:23to quite touch that's fantastic the S. so there was their choice rights and then of course I supported it right away right so that's awesome %HESITATION so anyway so going back to so what's the relationship between python and carrots and python and the rest of the python data
00:17:43science tools to the data scientist community love chart so cast is a fantastic front and for tens of slow and piano and see and decay so what I mean by front and as you can build your own networks quickly and run them on your on that data set
00:18:04that you give in a particular package format and carrots abstracts of a **** underlying framework is being used to train these things or I run these no networks it makes the user experience very very simple %HESITATION you don't need to worry about it okay which C. code is
00:18:23actually running when I when I execute this commander I guess it's a very powerful tool for data scientists who want to remain in python and never want to go into C. or C. plus plus listed to do advanced optimization yeah so it's kind of becoming a standard tool
00:18:43absolutely I think they recently announced that caris will become a speck that is instead of being a software package they're going to have spec that defines what Karis says and then you can have implementations of cars by different people so it's a it's a very thank user friendly
00:19:03fantastic front end now very high courage comes is that it's both a front end and the back in so you can think of pride or just something that gives you the ease of use of caras or probably more in terms of debugging and you can do it power
00:19:22users can go all the way down to the sea level %HESITATION and like to hand could optimization and stuff so it it takes the whole stack of like I have a front and that front and then calls back and to create a neural network and then back and
00:19:39in turn calls some underlying GPO coder CPT code and we make that whole stack to be very flat and without many abstractions so that you you have a superior user experience so do you what about the %HESITATION the tools to fight on users use for data science usually
00:20:00you're there they just download on a Honda right and then they'll run everything awful awful down are you guys starting to integrate with this ecosystem yes so there's a few tools that people in the pike on it identical system very often they use anaconda they use non pi
00:20:16they use psychic learn and side by right and all that so I like I like say cafe or amex nature tends to flow hi George actually puts python first high towards is basically like just integrated into python it doesn't have a separate execution and %HESITATION I think so
00:20:35trade lives up to its name so one of the biggest benefits of that is that we don't actually have to separately integrated with says side by or but then a condo or anything like all these integrations are just there like we can take an umpire ETS and kind
00:20:55word them deprived or sensors with absolutely no performance penalty and why so we're so and we can call site by functions inside of bite or Jan my source says so it's a very similar sense mode experience in integrating that the other python tools and like we don't actually
00:21:12have to do anything any additional integration %HESITATION work because we put I'd done first so what about scale and there is there a distributed version in the works yes it's not only in the works it's actually available on the master bad so we hear about true these are
00:21:32second major worse not by towards zero point two which introduces distributed hi torch and it introduces higher order gradients but generally when you're training on networks you would want to take come get some neural network function and then you would want to take the first order directive up
00:21:55that you're on that very but we've been seeing more and more research go towards needing to come here the second order a toad order differential sense so I and our point duration one of the major features were introduced saying is higher order gradients and the second feature we
00:22:11are introduced saying is distributed by George and the the distributor pie torch is powered by the same library that powers cafe to the production fame argues that Facebook so it's extremely performant an extremely fast and we're going to bend over these the version that when we release point
00:22:34two they will have oppose detailing how distributed by torch where X. how %HESITATION its ease of use is and how it does in terms of performance so we talked about the courses are we not about a book you mentioned product builders and %HESITATION you you just mentioned cafe
00:22:56Houston Facebook so are you starting to seep by torch being use inside Facebook not but not by the researchers but by the product people yes so at internally at Facebook we have a unified strategy we face a private purchase use for all of research and cafe to is
00:23:13used for all of production and that just makes it easier for us to separate out which team does what and which tools do what and we are what you're seeing is spiders models are being like you first create your partner's model do research make sure it's a promising
00:23:31and good model and then when you want to ship it to burn and a production you're just converted into a cafe to model and then ship it into a product and to either and all the cafe cafe theme of Facebook is on Facebook to yes any I can
00:23:46go to showed showed me recently some really amazing demos of cafe running are full exactly cabinet to is biggest trends is that it's probably the fastest mobile trademark and it does really well on production workloads are you running things million times a day the same thing data really
00:24:08about that this so who would you say is is the core of pipe torch at this point this is still mostly Facebook closer I'm starting to hear people at other companies also contributing may be absolutely I companies dot com companies like sales force maybe right so yes so
00:24:26pie towards itself from the beginning he built it as a distributed that weapon model so I toward should take the surcharge was similar torches actually double it by three company is at the same time you would see patches being merged into towards from Facebook deep mine and Twitter
00:24:50all at the same time probably on the same day event so this was a distributed committee based on the model apart from these big companies there were several smaller companies universities contributing to towards the Ty toward should take the exact same model hi torch has the same licensing
00:25:10that torch has torch itself was controlled by a company called S. P. I. ang it's a company that old the corporate it's a corporation that runs out open office sorry labor office has skill and many other very large open source projects like FM tag so toward Jack Lee
00:25:36is run by SPI in good terms of like you know wears incorporated and stuff anti torch is exactly the same we have a decentralized company that you know can take contributions to bite or via S. B. I. ang and the community model itself is that people from sales
00:25:58force James bribery I'm sells for is and people from Facebook and Twitter and folks from several universities they all contribute engaged participate and commit patches to buy torches are are you folks getting contributions from overseas like China %HESITATION we don't actually track by country we track it by
00:26:25the you say universities or companies we have see in quite a few requests from people from both China and Korea in terms of which university they go to but the major major contributors are largely from Europe and America so want to rattle off a few things on your
00:26:54road map for the rest of twenty seventeen an early twenty eighteen sure dress the twenty seven day and what they're doing is they're building a compiler for pie towards so I I'm sure you heard of the new nvidia GPUs double tides of all yeah yeah yeah yeah yeah
00:27:12so we have several new A. I. hardware coming like nvidia GPUs are coming out and AMD is coming up with their own line of GPS called Vega and then into the house late caress nirvana exactly Intel has let Chris nirvana coming up in there several start ups coming
00:27:31up with their own hardware and this hardware is running much faster than we can see and competition and do it it's going very very fast what that means is that V. as favorite writers so you have to build certain compiler confidence into your frameworks to even keep up
00:27:56that these hardware and run at full potential and in this line tends to flow has their excel a project right where you can it's it's a compiler that can take a tens of the graph and compile it and target different backends similarly with high towards and cafe to
00:28:17we are building a carbon compiler and they're also collaborating with the folks at Amazon gave the amex that folks and %HESITATION building this common compiler that can take either pie torch dynamic grass or capital models and it can run these things much faster and that hardware like we'll
00:28:42die or they can send a GPS for example and that certain ambitious project that's going at full speed and the are hoping that by some part of twenty six seventy and like before the end of twenty seventy and you'll see that integrated in some form into high torch
00:29:02and users will see their competition being accelerated but again not to use any flexibility or debug ability there were %HESITATION that's definitely one of the trends I'm saying which is the were entering a world of heterogeneous processors maybe even some very specialized in and domain specific processors absolutely
00:29:26%HESITATION so you built up by torch with a bunch of people but obviously also have a job as a research engineer so what how has quite torch super charged your work as a research engine so like just with the modern design up by a church the I've been
00:29:52personally in my own research but I've also been hearing this a lot from my colleagues is that we've been exploring new things that are not constrained by a book by what we were able to do so good DiGeorge we have these things called dynamic there on that corrects
00:30:08their enabled by pay towards that so right now I'm a computation grants exactly so this is something that has constrained as in the past that is like we would engage in on the certain kinds of research because doing say dynamic models would take as much much longer to
00:30:31build and say torch so we've been saying good research being unshackled from those bonds and we've seen a lot of dynamic there on that for a cliff C. in because it it by George and you as well that that that that side by been seeing a lot of
00:30:50like where like this %HESITATION linear programming quadratic programming based optimizers in deep learning they've been slick seeing a lot of directions being super charged now that by George is here in the U. your cell phone rid of been part of an of coal written a bunch of really
00:31:11%HESITATION seminal papers and %HESITATION dance right general generalizes yes generative adversarial networks so despite torch help in their %HESITATION so good generative adversarial networks I would say like partners helps a lot in terms of how fast you can purchase type but that those particular kinds of models you
00:31:37can use any framework you can aspire towards you guys tons of Logan is T. I know and you should be good with running them high torque specifically does not boost what you can do it can for example right right right right so that's interesting because I think I
00:31:57saw you tweet I don't know when something along the lines of Hey I just haven't looked at the archive for the last thirty days and I just checked I'm overwhelmed that is the absolute truth so so now you know you just built a toll will which which will
00:32:13generate even more yes I'm both happy and sad about this so one of the things that they've been saying and and research especially and Jenner to better so that breaks a search as there are papers calming left and right day in day out you have a paper with
00:32:34the flashy title that's coming in every single day and some days you have five papers appear on the same day and and generative order so that breaks demand so I personally have completely stopped reading papers on Ganz because there's just way too many and I can keep up
00:32:56and the way I keep out these days is like sip some of my colleagues mention a particular paper I just hold them right there and I say explain this paper to me in five minutes and then that's basically out of keeping out so this is not a good
00:33:13state of affairs of a year or so researcher have to resort to this hacked so yeah as a researcher you have to do several things that you have to do literature review make sure you're not on the whole field pretty well I do also have to think about
00:33:29with problems and how to tackle them do some creative research and especially if you're tackle longer term research you don't see that the field moving that fast like if you're targeting research that is like say two or five years of age and then use you keep up with
00:33:50current research and I'll occasionally but you don't need to know like what paper came out yesterday so I've been focusing a little bit on slightly longer term research doing multi modal %HESITATION staff so are you allowed to kind of give us a teaser of what the problem the
00:34:10kinds of problems you're working on at a high level I can add at a very high level I could I've I've started thinking a lot about not focusing on one single demand lake saves computer vision are at and they'll TV or at speeds but I I've I've started
00:34:30thinking a lot about how to build something that said the intersection of all of these and probably has abstract logic as well and and the tree and and I've been exploring a lot there's a lot of existing literature that sort of catch up on this but there's a
00:34:50very very new field of say grounding one one demand and another demand and I don't have anything concrete to share but there is there's a lot of exciting longer term tangs Simon exploring so what would be what would be an example application let's say you're able to make
00:35:10some mild breakthroughs in this area or describe I would say you would get to unifying your research and different so right now you have the state of the art and image classification or object detection you have super human performance there right right and you have like super human
00:35:29from and send some speech tasks and you have like amazing performances and some text tasks but each of these systems are separate expert systems they you can take a speech model and say okay I have this amazing experts based model how do I integrate this with all of
00:35:51it but my expert image model and then try to have a a a model that takes in questions and answers staying then I got like this and do and general system it's just not there I guess I guess %HESITATION going back to what you said earlier the whole
00:36:12notion of multi modal right so if your inputs are multimodal yeah then that the system will be smart enough to use bits and pieces of the different and but that is correct call well this is been great thank you so much for your time and %HESITATION we look
00:36:28forward to the progress of not only by torch but also of a I and the types of things you're working on thanks for hosting a band you wanna follow some beach and dollar on Twitter at Sydney gentile thank you for joining us if you like the show you
00:36:49can rate us answers cry through iTunes or stitcher or tune in dot com or soundcloud and never miss an episode

Transcribed by algorithms. Report Errata
Disclaimer: The podcast and artwork embedded on this page are from O'Reilly Media, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.


Thank you for helping to keep the podcast database up to date.