Data Crunch | Big Data | Data Analytics | Data Science

By Vault Analytics

About this podcast   English    United States

Whether you like it or not, your world is shaped by data. We explore how it impacts people, society, and llamas perched high on Peruvian mountain peaks—through interviews, inquest, and inference. Buckle up.
March 27, 2018
Transcript Before the airplane was invented, some people were concerned that everything that could be invented had been invented. Obviously, that was not the case then, and it’s certainly not the case now. So as you create novel inventions, how do you protect them? What’s the process? And what tools can help you and your team navigate the world of patents? Janal Kalis: It was like a black hole. Almost nothing got out of there alive. So it became slightly more possible to try and steer your application away by using magic words . . . it didn’t always work but sometimes it did. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Here at Data Crunch, we research how data, artificial intelligence, and machine learning are changing things. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. So to help keep you, our listeners, informed, we’ve started collecting and categorizing all of the artificial applications we see in our daily research. It’s on a website we just launched. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send a weekly newsletter highlighting the top three to four applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And, now let’s get back to today’s episode. Curtis: Today we dive into a world filled with strategy, intrigue, and artful negotiation, a world located in the wild west of innovation. Ginette: In this world, you fight for your right to own something you can’t touch: your ideas. You and your team ride out into this wild west to mark your territory, drawing a border with words. Sometimes during this land grab, people get a lot of what they want, but generally they don’t, so you have to negotiate with the people in charge, called examiners, to decide what you can own, but what if you’re assigned someone who isn’t fair? Or what if you want to avoid someone who isn’t fair? Is there anything you can do? Maybe, but first you need to understand how the system works. Let’s dive into the world of patents and hear from Trent Ostler, a patent practitioner at Illumina. Trent: The kind of back and forth that goes on oftentimes is trying to get broad coverage for a particular invention, and chances are, the examiner, at least initially, will reject those claims. Curtis: Claims define the boundaries of the invention you’re seeking to protect. It’s like buying a plot of land. There are boundaries that come with the property. These claims define how far your ownership of the invention extends. Claims can be used to tell the examiner why he or she should allow, or approve, your exclusive rights to your idea, giving you ownership over that idea, or in other words, grant you a patent. Trent: The examiner will say that they are broad. The claims don’t deserve patent protection. And he could say that they would have been obvious. He could say that it’s been done before—it’s not novel, and so what this means for anyone trying to get a patent is that it’s very complex. There are thousands of pages of rules and cases that come out that further refine what it is that’s too broad or what it is that makes something obvious, and oftentimes there is a balancing act of coming close to the line to get the protection that you deserve but not going overboard. Ginette: So there’s a back-and-forth volley between the inventor’s lawyers and the examiner. The examiner says, “hey, you don’t deserve these claims,” and he or she gives you a sound reason or argument for it, and then you and your team try to persuade him or her otherwise, and hopefully overcome those rejections by arguing for why your claims are reasonable, and why the examiner should allow, or approve, your claims. Trent: There is always going to be a back-and-forth with the examiner, and when an examiner does have credibility and is applying reasonable rejections and backing that up with sound arguments, then that’s obviously not going to be as easy to overcome those rejections through argument alone, and so you have to resort to other strategies, such as narrowing the claims. Kind of what that is is not getting as broad of coverage for your invention but still getting some coverage. Curtis: When an examiner rejects your application for a patent, you have options, and if everything goes well as you carefully navigate, you just might be able to be allowed a patent. Trent: When the examiner rejects your application, you are entitled to appeal that decision to judges or to some sort of second eyes, and they assess whether that examiner was correct or not. Ginette: When you appeal your patent case, there’s a chance that the judge could overturn the examiner’s decision and allow, or approve, you to lay claim to your intellectual property. Curtis: In this world guided by laws and the outcomes of previous patent cases, there’s often ambiguity and lack of visibility, and Trent has developed a product, called Anticipat, that works to look deeper into the inner workings of the patent appeals process using data. Trent: Anticipat seeks to make sense of all of the appeals decisions to shed some light into how to get a patent, to increase the efficiency that the current process has, and so what each decision does is that we extract the particular ground of rejection. There can be 10 or so different grounds that the examiner can reject the application on, and I already touched on a couple of those: if it’s too broad of a claim, if it would have been obvious—novelty, and we catalog all these rejections so that you can look at different trends and different patterns. Maybe there are certain examiners who are getting overturned by the judges and that means the higher that that rate happens that means the more that examiner is being unreasonable with their rejections. What the Patent and Trademark Office, the PTO, does is they post all of these decisions online for free. This will post thousands, literally thousands of these decisions every year of appeals that the judges either overturn the examiner and say that the examiner was wrong in that rejection, or they affirm the examiner and they side with the examiner by saying that his rejection was was good. We’re extracting all the information that the PTO has, trying to make sense of that, and tying in information that can be relevant to someone that’s trying to get a patent. Ginette: In addition to tracking how often examiner’s rejections are overturned, which is an important data point for understanding what may or may not be a reasonable rejection, Trent’s tool takes a look at overturned rejections by the group an examiner belongs to, called an art unit or group—which is basically called that because they review similar types of what they term art. Above an art group is what’s called a tech center, which houses several art groups, and there are nine of these tech centers under the USPTO umbrella, and Trent’s tool looks at the analytics at this level as well. Trent: We like to look at reversal rates because from the patent-office perspective, there are two ways for bad rejections to get weeded out. Before it goes on appeal, there’s a pre-appeal conference, and sometimes if the rejection is not good, then this panel of examiners will kick it back to the examiner, and they can either have him do another rejection or allow the case. And then that same conference happens again. And from the patent practitioner’s perspective, it cost a lot of money and time to go through an appeal, and only 1 to 2 percent of all applications go to appeal, so they’re going to be invested in their position to go through with the whole process. So we think that the appeal is actually a good data point for measuring whether a rejection that an examiner applies is a reasonable rejection or not because otherwise it would have been weeded out by the two other steps, and so we can keep track of the reversal rates for this particular examiner, for the group, and for the tech center, and it’s kind of like a different level of hierarchy, and then you can see whether there’s an anomaly in any of those, so if this examiner is reversed higher than his group that he works in of 20 or so examiners, then maybe that suggests that it may be not worth the time to work with this examiner and go back and forth, and maybe just be easier to appeal and have a judge decide in your favor, so there is that advantage. The other advantage is that using the analytics page can provide you with the specific rationales that were relied on to overturn the examiner, so for some of the grounds of rejection, such as obviousness, it’s a very nuanced rejection, and we actually have a newly introduced tagging system that tags every different type of rationale that can be used for obviousness, and so we have 27 ways that you can use either for or against obviousness, and this is very helpful because otherwise you have to really rely on your own experience and your own memory and sometimes go and, and do a lot of legal research, but with with this, it provides a very organized structure for all the possible arguments that you could use. We also make note of the legal cases that the board relies on. Ginette: The board is the Patent Trial and Appeal Board, or the PTAB. Trent: Just like a petitioner in responding to an examiner, the board can rely on cases that come out, on the the PTO rules, which are called the NPEP, and just various guidelines that the PTO provides on a regular basis. It’s tough to keep track of all these things, these legal support documents, but thankfully the board does that, and so it’s an easy way to look at the specific rationales that the board applied, and you can have the legal support that they cited. There are a number of different panels of judges that come out with these decisions, and they all use different language. Some of the language is consistent, such as the statutory grounds of rejection, and that’s helpful, but when it comes to the rationales, that has been a challenge for us, we’ve had to rely on manual intervention to make sure that our algorithm is correctly picking out the right rationales. That’s a work in progress; we’re working on just continuing to feed the algorithm training sets that are what we called correct rationales. We’re still working on that. Curtis: The ultimate goal of this process is to offer protections for inventions that deserve it and not to over-extend or under-extend protections. The aim is to allow quality patents. Trent: I think that the Patent Office is concerned about, or at least they would say that they’re concerned about patent quality, and a big part of patent quality is good examination and having the examiners do a quality examination of each application, and so I think the hope of many people in this field is that examiners who do frequently get overturned, that there should be some sort of concern about, about that. Ginette: It’s known in the patent community that there are some examiners who have never allowed a case and brag about the fact that they’ve never allowed a case. These examiner analytics show that some examiners are statistical outliers. While some allow no cases and brag about it, others allow more cases than the average examiner. One would think that the USPTO get rid of the outliers, the examiners that don’t allow many if any cases and the examiners who allow what seems like way too many compared to their peers. But as of yet, the USPTO has not recognized the value of these analytics, although they do have analytics of their own, which are somewhat skewed. Trent: There is not a formal accountability mechanism. There is some sort of metric. The patent office does keep track of of appeals outcomes, but they keep track of the outcomes in a different way that does skew the data towards affirmances. Oftentimes you’re not only appealing one ground of rejection. You can be appealing four or five grounds of rejection, and if one of those rejections sticks, then the patent office treats the entire appeal as affirmed even if the other grounds of rejection are reversed, so because of that, it does skew the data towards showing that the examiners are being affirmed more than maybe one might suspect. Curtis: Trent’s tool helps divide up each ground of rejection so users know how the court handled each individual rejection. Trent: What this tool [Anticipat] does that has not been done before is it granularizes what it is that’s being decided in each rejection, and it keeps track of each individual ground separately rather than lumping them up into one outcome and just saying this decision is affirmed or this decision is reverse, or affirmed in part. It goes into each of the rejections and it pulls it out. And with that granularity, you can find some very interesting things.  that some of the grounds of rejection are overturned a lot more frequently than others, such as novelty and Section 102, and Section 112 is related to the the breadth of the claim and having proper support for the claim in the disclosure. These two grounds are over 50 percent reversed, which is high. For me, it was unexpectedly high. Other grounds of rejection, such as obviousness and abstract idea, they are much lower. These are areas of the law that . . . one would argue that obviousness has more discretion for the board to side with the examiner because . . . it’s a lot more of a nuanced argument. With novelty, it’s either the examiner found a reference that teaches your invention or not, and obviousness is a little bit more of a technical exercise. I think that one of the big interest that we are seeing is in section 101, which is a very hot—if there is such thing in patent law—it’s a very hot item. A couple of years ago the Supreme Court came out with a case that ruled in favor of this patent being an (ineligible) abstract idea, and many thought that that would spell the end for software patents because it was the opinion used language that seemed to indicate that it would be very challenging to get a patent that is related to software, Curtis: This is a famous case known as the Alice decision which ruled in favor of a specific patent being an ineligible abstract idea. Trent: and we’ve found that that’s not the case—that software is still patentable, but it has made it more challenging, and both examiners and practitioners are trying to find out exactly how a software application gets to be eligible for a patent. It’s not black and white at all. I think that using the board decisions—because there’s so many board decisions that are deciding these types of questions—it’s a way for for patterns and for trends to be discovered and especially it’s a way to look for persuasive arguments that the board found that were persuasive so that the practitioner can use that in his or her own practice. Ginette: In some cases, it doesn’t matter how many statistics you and your lawyers have, it may not help you avoid a tough examiner. But it may help you with strategy of how to handle that examiner. Other times, the statistics may help you steer your case away from a tough art group. We spoke with another patent practitioner, Janal Kalis of Schwegman, Lundberg & Woessner, about her experience with the analytics and how they’ve helped. Janal: I’m working on a case right now where the examiner, his allowance rate is way below that of his art group. In fact it’s discouragingly low, we are leaning towards filing an appeal a lot earlier than we normally would in order to move this case along, and without the analytics, we wouldn’t be able to do that. It’s very expensive too, so if we can see that an examiner they have a very low allowance rate and long prosecutions, than we know that filing RCEs is not likely to be very productive, and it’s going to be very expensive. Ginette: An RCE is a request for continued examination. Janal: Appeals are not cheap either, but in the medium we’re on, they may be better value than just going the RCE route. It’s very difficult to get any kind of control over the examiner that will be examining a particular case. In the case that I’ve mentioned, that general art group has an allowance rate of 73 percent, which is pretty average for the patent office, but this particular examiner has an allowance rate of 42 percent. There’s no way we can steer a case away from this examiner. The best that we can do is to have frequent interviews and call the supervisor in for every interview in the hopes that the supervisor will knock some sense into the examiner, which may or may not happen. Now a situation where we may have had some modicum of control was related to and still is related to 101 rejections in Art Group 3600. After the Alice decision, Art Group 3600 was a killing field for software patents. It was like a black hole. Almost nothing got out of there alive. In fact usually nothing. So there it became slightly more possible to try and steer your application away from 3600 by using magic words in the field of the invention. It didn’t always work again, but sometimes it did. Ginette: While these stats can’t always steer an application away from an unwanted examiner, these statistics can help, at least patent lawyers, in other surprising ways. Janal: We recently found from an Anticipat report that our firm ranks at the very top as far as getting 101 appeals overturned. That’s not information that used to be easy to get. Without Anticipat, it’s actually still is very difficult to get, and expensive, but they’ve definitely streamline the process, and with that information, we were able to pull all of the briefs where we had been successful and identify the attorneys who had been successful, and we are using that information on future appeal briefs so that we can perhaps even improve our record. Surprisingly, we have found this examiner information to be very useful in marketing, especially for existing clients, because it becomes information that we can provide to the client and give them predictions as to when we might get an allowance. If the examiner’s performance is typical, we see that the turnaround is within three years, then we can say it’s probably going to be a fairly typical prosecution. Some might be faster, some might be slower, but we have the data now to present that to clients, and they really liked it. Trent: What I see this product doing is helping to further develop the technology coming in and helping the efficiencies of the process. I kind of envisioned like a like a Turbo Patent type of a type of a product, and I hope that this, that this facilitates a better quality prosecution experience and also more efficient one. Conclusion: A huge thanks to both Trent Ostler and Janal Kalis for speaking with us, and as always, check out datacrunchpodcast.com for show notes and attribution. And if you’d like to easily learn about the latest artificial intelligence and machine learning applications every week, go sign up for our weekly newsletter at datacrunchpodcast.com/ai. Thanks for listening. Music Source https://freesound.org/people/levelclearer/sounds/332625/
Feb. 21, 2018
  Transparency International started when a rebellious World Bank employee quit to dedicated himself to exposing corruption. Now the organization claims the media's attention for about one week a year when it publishes its annual Corruption Perceptions Index, an index that ranks countries in order of perceived corruption. Find out how the organization sources the data, what an important bias is in that data, and how that data ultimately impacts the world. Alejandro Salas: I studied political science and I got very interested in all the topics related to good governance, to ethics in the public sector, etc., and I started working in the Mexican public sector, and—oh, the things I could see there. I was a very junior person working in the civil service, and I got all sorts of offers of presents and things in order to gain access to certain information, access to my boss—so very early on in my professional career, I started to see corruption from very close to me, and I think that's something that marked my interest in this topic. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics Here at Data Crunch, we research how data, artificial intelligence, and machine learning are changing things, and we’re noticing an explosion of real-world applications of artificial intelligence and machine learning that are changing how people work and live today. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. At the same time, we think it’s really important that people understand the impact machine learning is having on our world, because it’s changing and is going to change nearly every industry. So to help keep our listeners informed, we’ve started collecting and categorizing all of the artificial applications we see in our daily research and adding them on generally a daily basis to a collection available on a website we just launched. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send out a weekly newsletter highlighting the top 3–4 applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And now let’s get back to today’s podcast. Curtis: We’ve spent a lot of time on our episodes talking to interesting people about what creative things they’ve done with data, like detecting eye cancer in children, identifying how to save the honey bees, and catching pirates on the high seas, but today we’re going to talk about a simple measurement. A creative and clever way to measure something that is incredibly hard to measure. And powerful results come from a measurement that puts some numbers behind a murky issue so people can start to have important conversations about it. And we’re going to look at an example that’s all over the news right now. Ginette: This dataset that’s all over the news right now has an interesting history. While it draws criticism from some sources, it draws high praise from others. But before we get too ahead of ourselves, let’s officially meet Alejandro, the man at the beginning of this episode. Alejandro: My name is Alejandro Salas. I am the regional director for the Americas at Transparency International. I come from Mexico. I started 14 years ago, and I was hired to work mainly in the Central America region, which is also a region where there's a lot of corruption that affects mainly public security, access to health services, access to education. In general the basic public services are broadly affected by corruption....
Jan. 19, 2018
Episode Summary Few things are more controversial in these perilous times as Donald Trump's twitter account, often laced with derogatory language, hateful invective, and fifth-grade name-calling. But not all of Trump's tweets sound like they came straight out of a dystopian dictator's mouth. Some of them are actually nice. Probably because he didn't write them. Join us on a discerning journey as two data scientists tackle Donald Trump's twitter account and, through quantitative methods, reveal to us which hands are behind the tweets. For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Dave Robinson: So the original Trump analysis is certainly the most popular blog post I’ve ever written. It got more than half a million hits in the first week and it still gets visits . . . and the post still gets a number of visits each week. I was able to write it up for the Washington Post and was interviewed by NPR. Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: Here at Data Crunch, as we research how data and machine learning are changing things, we’re noticing an explosion of real-world applications of artificial intelligence that are changing how people work and live today. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. At the same time, we think it’s really important that people understand the impact machine learning is having on our world, because it’s changing and is going to change nearly every industry. So to help keep our listeners informed, we’ve started collecting and categorizing all of the artificial intelligence applications we see in our daily research. These are all available on a website we just launched, which Data Elixir recently recognized as a recommended website for their readers to check out. The website includes, for example, a drone taxi that will one day autonomously fly you to work, a prosthetic arm that uses AI to aid a disabled pianist to play again, and a pocket-sized ultrasound that uses AI to detect cancer. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send out a weekly newsletter highlighting the top 3-4 applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And now let’s get back to today’s podcast. Ginette: Today, we’re chatting with someone who made waves over a year ago with a study he conducted and he recently did a follow up study that we’ll hear about. Here’s Dave Robinson. Dave: I'm a data scientist at Stack Overflow, we’re a programming question-and-answer website, and I help analyze data and build machine learning features to help get developers answers to their questions and help them move their career forward, and I came from originally an academic background where I was doing research in computational biology, and after my PhD I was really interested in what other kinds of data I could apply a combination of statistics and data anal...
Dec. 19, 2017
The ubiquity of and demand for data has increased the need for better data tools, and as the tools get better and better, they ease the entry into data work. In turn, as more people enjoy the ease of use, data literacy becomes the norm.   Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” “We have a gift for you this holiday season. We’re giving you, our listeners, a website . . . it’s a website of all the AI applications we come across or hear about in our daily research. We post bite-size snippets about the interesting applications we are finding that we can’t feature on the podcast so that you can stay informed and see how AI is changing the world right now. There are so many interesting ways that AI is being used to change the way people are doing things. For example, did you know that there is an AI application for translating chicken chatter? Or using drones to detect and prevent shark attacks on coastal waters? To experience your holiday gift, go to datacrunchpodcast.com/ai.” Curtis: “If you’ve listened to our History of Data Science series, you know about the amazing advances in technology behind the leaps we’ve seen in data science over the past several years, and how AI and machine learning are changing the way people work and live. “But there is another trend that’s also been happening that isn’t talked about as much, and it’s playing an increasingly important role in the story of how data science is changing the world. “To introduce the topic, we talked with someone who is part of this trend, Nick Goodhartz.” Nick Goodhartz: “So I went to school at Baylor University, and I studied finance and entrepreneurship and a minor in music. I ended up taking a job with a start-up as a data analyst essentially. So it was an ad technology company that was a broker between websites and advertisers, and so I analyzed all the transactions between those and tried to find out what we are missing. “We were building out these reports in Excel, but there was a breaking point when we had this report that we all worked off of, but it got too big to even email to each other. It was this massive monolith of an Excel report, and we figured there’s got to be a better way, and someone else on our team had heard of Tableau, and so we got a trial of it. In 14 days we—actually less than 14 days—we were able to get our data into Tableau, take a look at some things we were curious about, and pinpointed a possible customer who had popped their head out and then disappeared. We approached them and signed a half million dollar deal, and that paid for Tableau a hundred times over, so it was one of those moments where you really realize, ‘man, there’s something to this.’ “That’s what got me into Tableau and what ...
Nov. 17, 2017
Episode Summary The growth of the Internet of Things, or IoT, is often compared with the industrial revolution. A completely new phase of existence. But what does it take to be part of this revolution by building an IoT product? It’s complex, and Daniel Elizalde gives us a peek into what the successful process looks like. For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Donate 15 Seconds If you liked this episode, please consider giving us a review on iTunes! It helps other people find the show and lets us know how we’re doing. Partial Transcript (for the full episode, select play above or go here) Ginette: “So, today, we’re defining an IoT product, or an Internet of Things product, as “a product that has a combination of hardware and software. It acquires signals from the real world, sends that information to the cloud through the Internet, and it provides some value to your customers. ”Okay, so before we introduce you to our guest, consider this: The IoT Market is infernally hot. In 2016, we had 6.4 billion connected ‘things’ in use worldwide, and Gartner research firm projects that number will nearly double to 11.2 billion in 2018, and then nearly doubling again to 20.4 billion IoT products in 2020. For context, this last number is about 2 and a half times the number of people on earth. “Let’s look at an example of IoT at work. Let’s say you’re an oyster farmer, and you need to keep your oysters under a certain temperature because harmful bacteria might grow if you don’t—which would result in people getting very sick after eating your product. If that happened, the FDA could shut your operation down. “This is where IoT products can help you. You can track water temperature with sensors. Those sensors can send that data to the cloud, where you can access it. The system will even send you an alert if the temperature ranges outside your chosen temperature criteria. You can use cameras that show when the oysters are harvested and how long the oysters are out of cold water before they’re put on ice. By using these sensors and cameras to record harvest date, time, location, and temperature at all stages of harvest, you have recorded evidence that you’ve properly handled the harvest. “So, for the purposes of today’s episode, let’s now switch to the other perspective—to the perspective of someone who wants to make and sell an IoT product. Imagine you and two of your friends recently launched an IoT startup—you’re able to secure funding to build your IoT product, and you’ve hired some team members to help you get your beta version off the ground. But you’re new to building products like this, and the rest of your team is also pretty new to it as well. So you decide to talk with someone who is an expert in the IoT space who can give you and your team pointers—and you’re lucky enough to find this man.” Daniel: “My name is Daniel Elizalde. I am the founder of Tech Product Management. My company focuses on providing training for companies building IoT products, specifically I focus on training product managers. I’ve been doing IoT really for over 18 years, before it was called IoT, and I worked in small companies and large companies,
Oct. 18, 2017
  Episode Summary We’ve seen photos of disasters depicting fearful and fleeing victims, ravaged properties, and despondent survivors. In this episode, we explore two ways data can help survivors heal and how data also tells their stories. For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.   Donate 15 Seconds If you liked this episode, please consider giving us a review on iTunes! It helps other people find the show and lets us know how we’re doing!   Partial Transcript (for the full episode, select play above or go here) Aaron Titus: “I almost disbelieved my own numbers, even though I chose the most conservative ones. It’s just outrageous. I’m like, ‘Really? A 233x ROI?’ That’s insane.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” “Today’s episode is brought to you by Lightpost Analytics. Data skills are in intense demand and are key for organizations to remain competitive; in fact, Forbes listed the industry’s leading data visualization software, Tableau, as the number three skill with the most explosive growth in demand, so investing in yourself to stay relevant in today’s hyper-competitive, data-rich, but insights-hungry world is extremely important. Lightpost Analytics is a trusted training partner to help you develop the Tableau skills you need to stay relevant. Check them out at lightpostanalytics.com and let them know that Data Crunch sent you.”  “Today, we look at what it takes to understand a larger story—when many disparate voices come together to tell you something much more powerful, and specifically how it can help people deal with the large scale devastation of natural disasters. Let’s jump into how one man did something about his pet peeve, and it produced $300,000,000.00 dollars in savings. And then we’ll pop over to New Zealand to explore how a disaster situation affected Christchurch and what people did about it.” Aaron: “I was a disaster relief volunteer in New Jersey during hurricanes Irma (Ginette: Here Aaron actually means Irene) and Sandy, and my area got very hard hit by Irma, and I started off as a relief volunteer and ended up directing a lot of those relief efforts for my church, and while I was there, I remember standing in very long lines, and a thousand of us would gather together at a field command center and spend an hour and a half waiting to get checked in, which is lightning speed for 1,000 people, but it’s still an hour and a half. “And while everybody was waiting, they’d pull out their phones and would start playing Angry Birds, and the technologist in me would just scream inside, “I could have you all checked in with your work orders in 30 seconds, not an hour and a half!” “And I abhor inefficiency—to a fault—like it’s almost a little bit of a sickness. I really ought to be better, but I really abhor inefficiency, and I hate it when people waste my time, and I hate wasting people’s times, especially volunteers.
Sept. 19, 2017
Hilary Mason is a huge name in the data science space, and she has an extensive understanding of what’s happening in this space. Today, she answers these questions for us: * What are the backgrounds of your typical data scientists? * What are key differences between software engineering and data science that most companies get wrong? * How should you measure the effectiveness of your work or your team’s work as a data scientist for the best results? * What is a good approach for creating a successful data product? * How can we peak behind the curtain of black-box deep learning algorithms? Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Curtis: Today we hear from one of the biggest thinkers in the data science space, someone who DJ Patil endorses on LinkedIn for data science skills. She worked at bit.ly, the url shortener, and is a data scientist in residence at venture capital firm Accel Partners, a firm that helped fund some companies you may know, like Facebook, Slack, Etsy, Venmo, Vox Media, Lynda.com, Cloudera, Trifacta—and you get the picture. Ginette: The partner of this VC firm said that Accel wouldn’t have brought on just any data scientist. This position was specifically created because this particular data scientist might be able to join their team. Curtis: But beyond her position as data in residence with Accel, she founded a company that’s doing very interesting research, and today, she shares with us some of her experiences and perspective on where AI is headed. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Hilary: I’m Hilary Mason, and I’m the founder and CEO of Fast Forward Labs (Please note that Hilary is now the VP of Research at Cloudera). In addition to that, I’m a data science in residence for Accel Partners. And I’ve been working in what we now call data science, or even now call AI, for about twenty years at this point. Started my career in academic machine learning and decided startups were more fun and have been doing that for about 10,   12 years depending on how you count now, and it’s a lot of fun! Ginette: Something I’d like to note here is there’s been a very recent change: Hilary’s company, Fast Forward Labs, and Cloudera recently joined forces, and Hilary’s new position is Vice President of Research at Cloudera. Now, one thing that Hilary talks to is where the data scientists she works with come from, which is a great example of the different paths people take to get into this field. Hilary I am a computer scientist, and I have studied computer science. It’s funny because now at Fast Forward, our team only has only two computer scientists on it, and one of them is our general counsel, and one is me, and I’m running the business, so most of the people doing data science here come from very different backgrounds. We have a bunch of physicists, mathematicians, a   neuroscientist, a person who does brilliant machine learning design who was an English major, and so data science is one of those fields where one of the things I really love about it is that people come to it from so many different b...
Aug. 9, 2017
Tesla isn’t the only car brand in the world producing or aiming to produce self-driving cars. Every single car brand is working on developing self-driving cars. But what does this mean for our future? We talk about this and other interesting deep learning projects and history with Ran Levi, science and technology observer and podcaster, who explains in thought-provoking ways what we have to look forward to. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Ran Levi: “I actually had the pleasure of being invited to Google’s Mountain View headquarters, and they took me for a drive in one of their autonomous vehicles, and it was, to tell you about that drive because it was boring—boring in a good way. Nothing happened! We were just driving around. The car was driving itself all around Mountain View. And it worked. “The first time I entered such a car, I didn’t know what to expect. I mean, I didn’t know how reliable are those kinds of cars. So I had the idea that maybe I should sit somewhere where I can maybe jump and grab the wheel if necessary. You know, I was a bit dumb. They don’t need me, really. And probably if I touch the steering wheel, I would probably make some mistake and ruin the car. It drives better without me.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “We have a great live show planned that we hope to give at SXSW 2018. It’s a really awesome show about the power of niche artificial intelligence, and we’re going to share details from our research into what amazing things AI is doing right now on the fringe and in mainstream AI projects. We’re really excited to share it, so if you’re going to SXSW, or you just want to be good hearted and help us out, please vote on our dual panel by going to panelpicker.sxsw.com, signing in, and liking our topic, which you can find by searching for ‘The Power of Niche AI: From Cucumbers to Cancer.’ “Today we get to talk to Ran Levi, who’s been researching and reporting on science and technology for the past 10 years. He’s a hugely successful science and tech podcaster in Israel, producing a Hebrew-language show called Making History, and he’s also producing two English podcasts right now for an international audience, so since he’s steeped in the subject, he has a lot of very interesting insights for us.” Ran: “I’m actually an electronics engineer by trade. I was an engineer for 15 years. I was both a hardware and software developer for several companies in Israel. And during my day job as an engineer, I wrote some books about the history of science and technology, which was always a big hobby of mine. And actually, I started a podcast about this very subject about 10 years ago, and it became quite a hit in Israel I’m happy to say. So about four years ago, I quit my day job, and I actually started my own podcasting company, and now we are podcasting both in Israel and in the U.S. for international audience and actually launched my brand new podcast last week. It’s called Malicious Life about the history of malware and cybersecurity, which is a fun topic. Actually, the day I launched the podcast,
July 16, 2017
When Julia Silge’s personal interests meet her professional proficiencies, she discovers new meaning in Jane Austen’s literature, and she gauges the cultural influence of locations in pop songs. Even more impressive than these finds, though, is that she and her collaborator, Dave Robinson, have developed some new, efficient ways to mine text data. Check out the book they’ve written called Tidy Text Mining with R. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Transcript Julia Silge: “One that I worked on that was really fun was about song lyrics. The last 50 years or so of pop songs, we have all these lyrics, so all this text data, and I wanted to ask the question, what places are mentioned more or less often in these pop songs.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “Brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Whether you’re already a frequent dataset contributor or totally new to data.world, there are several resources you can use to stay in the loop on the latest features, learn new skills, and get support. Check out docs.data.world for up-to-date API documentation, tutorials on SQL, and other query techniques, and much more!” Ginette: “We hope you’re enjoying some vacation time this summer. We just did, and now Data Crunch is back! To hear the latest from us, add us on Twitter, @datacrunchpod. Today we hear from an exciting guest—someone who is on the cutting edge of data science tool creation, someone exploring and developing new ways to slice and dice difficult data.” Julia: “My name is Julia Silge, and I’m a data scientist at Stack Overflow. My academic background is in physics and astronomy, but I’ve worked in academia, teaching and doing research, I worked at an ed tech start up, and I’ve made a transition now into data science.” Ginette: “Stack Overflow, where Julia works, is the largest online community for programmers to learn, share knowledge, and build their careers. It’s a great resource when you need to solve a coding problem or develop new skills.” Curtis: “Now there are basically two main camps in data science: people who program with R, a statistical programming language, and people who program with Python, a high-level, general purpose language. Both languages have devoted followers, and both do excellent work. Today, we’re looking at R, and Julia is a big name in this space, as is her collaborator Dave Robinson.” Julia: “Text is increasingly a really important part of our work as people who are involved in data. Text is being generated all the time, at ever faster rates. This unstructured data is becoming a really important part of things that we do. I also am somebody that—my academic background is not in text or literature or natural language processing or anything like that, but I am somebody who’s always been a reader and always been interested in language, and these sort of collection of circumstances kind of all came together to converge that me and Dave decided to ...
June 11, 2017
According to the CDC, people have been writing descriptions of malaria—or a disease strikingly similar to it—for over 4,000 years. How is data helping Zambian officials eradicate these parasites? Tableau Foundation’s Neal Myrick opens the story to us. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Neal: “When somebody walks from their village to their clinic because they’re sick, health officials can see that person now as the canary in a coal mine.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “This episode is brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Looking for a lightweight way to deliver a collection of tables in a machine-readable format? Now you can easily convert any tabular dataset into a Tabular Data Package on data.world. Just upload the file to your dataset, select ‘Tabular Data Package’ from the ‘Download’ drop-down, and now your data can be effortlessly loaded into analytics environments. Get full details at meta.data.world.” Ginette: “Today we’re talking about something that can hijack different cells in your body for what we’ve deemed nefarious purposes. It enters your bloodstream when a mosquito transfers it from someone else who has it, to you. Once it’s in your body, it makes a B-line for your liver, and when safely inside your liver, it starts creating more of itself. “Sometimes, this parasite stays dormant for a long time, but usually it only takes a few days for it to get to work. It starts replicating, and there are suddenly thousands of new babies that burst into your bloodstream from your liver. When this happens, you might get a fever because of this parasite surge. As these new baby parasites invade your bloodstream, they hunt down and hijack red blood cells. They use these blood cells to make more of themselves, and once they’ve used the red blood cells, they leave them for dead and spread out to find more. Every time a wave of new parasites leaves the cells, it spikes the number of parasites in your blood, which may cause you to have waves of fever since it happens every few days. “This parasite can causes very dangerous side effects, even death. It can cause liver, spleen, or kidney failure, and it can also cause brain damage and a coma. To avoid detection, the parasites cause a sticky surface to develop on the red blood cell so the cell gets stuck in one spot so that it doesn’t head to the spleen where it’d probably get cleaned out. When the cells stick like this, they can clog small blood vessels, which are important passageways in your body. You may have guessed it, we’re describing malaria. “It plagues little children, pregnant women, and other vulnerable people. Children in particular are incredibly vulnerable, something that’s reflected in the statistics: one child dies every two minutes from malaria. “But often outbreaks are treatable, trackable, and preventable when the data is properly captured and analyzed. The United States eradicated malaria in the 1950s.

Podcasts like "Data Crunch | Big Data | Data Analytics | Data Science"   ·   View all

By Kirill Eremenko: Data Science Coach, Lifestyle Entrepreneur
By Francesco Gadaleta
Disclaimer: The podcast and artwork embedded on this page are from Vault Analytics, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.