哈佛大学肯尼迪学院：零僵尸网络（互联网机器人）：一种观察、追踪、对抗方法（英文）

2021年07月01日
50 金币

THE CYBER PROJECTZero BotnetsAn Observe-Pursue-Counter ApproachJeremy Kepner Jonathan Bernays Stephen Buckley Kenjiro Cho Cary Conrad Leslie Daigle Keeley Erhardt Vijay Gadepally Barry GreeneMichael Jones Robert Knake Bruce Maggs Peter Michaleas Chad Meiners Andrew Morris Alex Pentland Sandeep Pisharody Sarah PowazekAndrew Prout Philip Reiner Koichi Suzuki Kenji Takahashi Tony Tauber Leah Walker Douglas StetsonREPORT MAY 2021The Cyber Project Belfer Center for Science and International Affairs Harvard Kennedy School 79 JFK Street Cambridge, MA 02138www.belfercenter.org/CyberStatements and views expressed in this report are solely those of the authors and do not imply endorsement by Harvard University, Harvard Kennedy School, the Belfer Center for Science and International Affairs, or any of the sponsors or sponsoring institutions.Design and layout by Andrew FaciniCopyright 2021, President and Fellows of Harvard College Printed in the United States of AmericaTHE CYBER PROJECTZero BotnetsAn Observe-Pursue-Counter ApproachJeremy Kepner Jonathan Bernays Stephen Buckley Kenjiro Cho Cary Conrad Leslie Daigle Keeley Erhardt Vijay Gadepally Barry GreeneMichael Jones Robert Knake Bruce Maggs Peter Michaleas Chad Meiners Andrew Morris Alex Pentland Sandeep Pisharody Sarah PowazekAndrew Prout Philip Reiner Koichi Suzuki Kenji Takahashi Tony Tauber Leah Walker Douglas StetsonREPORT MAY 2021Author AffiliationsJeremy Kepner1, Jonathan Bernays1, Stephen Buckley1, Kenjiro Cho2, Cary Conrad3, Leslie Daigle4, Keeley Erhardt1, Vijay Gadepally1, Barry Greene5, Michael Jones1, Robert Knake6, Bruce Maggs7, Peter Michaleas1, Chad Meiners1, Andrew Morris8, Alex Pentland1, Sandeep Pisharody1, Sarah Powazek9, Andrew Prout1, Philip Reiner9, Koichi Suzuki10, Kenji Takhashi11, Tony Tauber12, Leah Walker9, Douglas Stetson11Massachusetts Institute of Technology 2Internet Initiative Japan 3SilverSky 4Global Cyber Alliance 5Akamai 6Harvard University 7Duke University 8GreyNoise 9The Institute for Security and Technology 10EDB 11NTT 12ComcastiiZero Botnets: An Observe-Pursue-Counter ApproachAcknowledgementsThe authors wish to acknowledge the following individuals for their contributions and support:Bob Bond, Alan Edelman, Jeff Gottschalk, Chris Hill, Charles Leiserson, Mimi McClure, Damian Menscher, Steve Rejto, Daniela Rus, Allan Vanterpool, Marc Zissman, and the MIT SuperCloud team: Bill Arcand, Bill Bergeron, David Bestor, Chansup Byun, Michael Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Julie Mullen, Antonio Rosa, Albert Reuther, Charles Yee.DisclaimerAny opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our sponsors or institutions.Belfer Center for Science and International Affairs | Harvard Kennedy SchooliiiTable of ContentsSummary............................................................................................................1 Introduction...................................................................................................... 2 Policy Considerations.......................................................................................4 Botnet Takedown.............................................................................................. 5 Technological Considerations..........................................................................9 Architectural Vision........................................................................................ 16 Observatories and Outposts ...........................................................................21 Conclusions.....................................................................................................24 Next Steps.......................................................................................................25 References...................................................................................................... 28 Appendix A: Better Data for a Better InternetA Call to Facilitate and Encourage Data Sharing................................................33 Appendix B: Communicating Data Release Requests.............................................36Belfer Center for Science and International Affairs | Harvard Kennedy SchoolvviZero Botnets: An Observe-Pursue-Counter ApproachSummaryAdversarial Internet robots (botnets) represent a growing threat tothe safe use and stability of the Internet. Botnets can play a role inlaunching adversary reconnaissance (scanning and phishing), influ-ence operations (upvoting), and financing operations (ransomware,market manipulation, denial of service, spamming, and ad click fraud)while obfuscating tailored tactical operations. Reducing the pres-ence of botnets on the Internet, with the aspirational target of zero,is a powerful vision for galvanizing policy action. Setting a globalgoal, encouraging international cooperation, creating incentives forimproving networks, and supporting entities for botnet takedownsare among several policies that could advance this goal. These pol-icies raise significant questions regarding proper authorities/accessSTRATEGIC APPROACH: LAYERED CYBEthRaDtEcTaEnRnREoNt CbEe answered in the abstract. Systems analysis has beenwidely used in other domains to achieve sufficient detail to enablethese questions to be dealt with in concrete terms. Defeating botnetsLayered cyber deterrence is the blueprint that theusing an obsTeortvraen-splauterlasyueree-dcdoetuernretnecre ianrtocahcititoencrteuqurieresistharene alyzed, the tech-government and American public need to build bnriidcgaesl feasibillitnyes iosf eafofritromrgeandiz,eadnindtotshixepiallaursthanodrmitoireest/hancc75ess questions areamcorostssimgopvoerrtnamntelnytthageepnrciiveast,einsetcetronratiinonoardl eprartotnseerscsiu,graenndificantlysonufptahpreorrUtoin.wSg. ergeodcvo.emrRnmmeecennodtamttoiomsnhsaetphneaadtdeevnedhrasannrcyeebxtehtheasavtbieoilrpi,tys include: supportingAmerican networks in cyberspace. It is the best wtahy eforinternatidoenyablebneottns, eantdtiamkpeodseocwostns. Dcoefmendmfourwnairtdys,paenxspanding networktahpeprgoopvreirantempernotptoortimionpalelmacetniotnnethwataubtuhioldristineastaionondabtlasrkeeesirl-vatorieatshl,lretehantrsehecolainnnseiscstoienfnetgwotirththteeoxuiidstneinndgtiefayru, tlihysooilrnaittegie,sananneddtcwloeugonarltekr science at scale, con-ience as well as disrupts, defeats, and deters activedcuybcetring detafirlaemdewsoyrksst.ems analysis, and developing appropriate policycampaigns, including those targeting and political institutions like electioncsryisttiecmal se.c15o7nformamic eworks.Layered Cyber DeterrenceCURRENT STATE APPROACHPILLARSDESIRED END STATESAdversaries areShapeconducting cyber Behaviorcampaigns that tar-Foundation: Reform the U.S.SNtorermnAegnsnftoiahnrtcneeedrnsnanNtoioromnnasl-ocof mremsAbpuolnednit,siyigpbthliertaoast mtloabetoesnetbvreevihesroascvnaioonmrnd. teinnut ethdaitninsPsfousiutvlaulbaasfleiuttciopi-opanpraonnivlrdtaaaotwenfsagptdraoaevrn-teenrsensrms, hceionpmtssbbiniansedededfaeocnntsioeanos,fhatanhrdeedmilitary Toolseconomic growth, protects pperirvsaotensaecltporr.ivacy,get U.S. networks in cyberspace and threaten American safety and security, economic interests, political institutions,Deny BeneﬁtsGovernment Structure and Organization for Cyberspaceensures national security, and does so byPRreosmilieontecTidneehNateedarparftuoyitoa-utconrte-iavdcealayombcposb•aemuirgpvinAielnsdtgin,ti,iaonpinnngudtrt:osreeudriinninsfgroau,rtpcaietnoadfnancvdaooludranecbtfoeeleramintingomtneogrfunoaanindtiigvoteynmrasatlahlnircyoaiortomupossebaroadfstvibeoeenrrhssvaaeavrnyisodcr.yimbeproscianmgpoafigconsst,sReshape the CtoBywlbuaeer(rdFEriGecnordelsyay) stCetyebremrspaceand enforces Gnroarym(Nseuotrfarl)eCsypboernspsaibcele state behaviorRed (Adversary) Cyberspaceand ability to proj-Security• Critical elementsPhoyfsincaaltWioonrldal power andect military power.OperationalizeiannfdrassturpupcStouorrtleeadrthiubamyt aarRedeespfeeocnurstrieb[,l7er0eds]iilgieitnatl,The U.S. government has the authorities but lacks the optimal structure and relationships with the private sector and other partnersImpose CostsCybersecurityecosystemCollaboration with the Private Sector• Public-private partnerships based on a shared situational awareness, combined action, and full support of the U.S. govern-PreservemenBteilnfedr CeefenntesrefoorfStchieenpcerivaantdeInsteecrntoartional Affairs | Harvard Kennedy School1and Employ the Military• An agile, proactive U.S. governmentInstrument oforganized to rapidly and concurrentlyIntroductionBotnets are a growing scourge on the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations [2][35]. The Internet, though revolutionary, remains insecure, and increasing the number of devices on the Internet increases the potential attack surface. This discrepancy has allowed adversarial activity to flourish. Passive devices represent a third of all infected mobile devices in 2020, a 100% increase from 2019 [59], contributing to a situation where adversarial botnets account for a quarter of Internet traffic at some websites (Figure 1)[38].Bad BotnetsWeb Traffic SampleGood BotnetsBad BotnetsFigure 1:Bad Botnets. Advanced Persistent Botnets (APBs) continue to plague some websites. APBs cycle through random IP addresses, enter through anonymous proxies, change their identities, and mimic human behavior (adapted from [38])In an industry where the existence of botnets is often thought of as irreversible, the Council on Foreign Relations (CFR) published “Zero Botnets: Building a Global Effort to Clean Up the Internet” advocating an aspirational zero-tolerance policy for botnets [35]. Describing cyberspace as “the old American Wild West, with no real sheriff and with botnets as the outlaws with guns” the report focused on changing the distribution of cybersecurity responsibility. Most critical to this redistribution of responsibility was the stated need to establish “the principle that states are responsible for the harm that botnets based within their borders cause2Zero Botnets: An Observe-Pursue-Counter Approachto others”. On the service provider side, the authors urged that “Internet service providers should hold each other accountable for the bad traffic leaving their networks”.Admirable work is ongoing in the non-profit and private sector, with various actors keeping the lights on in the Internet through collecting data containing threat information, taking down botnets, and operating botnet sinkholes [13][14][31][33][51][68]. However, non-governmental actors cannot clear the Internet of botnets on their own. For that, international government coordination is needed. Additionally, there are challenges when relying on private companies to police freedom of speech and freedom of access on the Internet. Simply calling for governments to play a broader role in botnet takedowns is not sufficient. At the moment, there is a lack of prioritization for governments to take down or disrupt botnets that are not engaged in significant fraud or wiretapping.Governments have treated cyberspace as a warfighting domain for years, but are just starting to address it in the whole-of-domain manner found in land, sea, undersea, air, and space. Governments do not leave sea defense to private fisheries, nor should they leave cyberspace defense to private technology companies. There is an abundance of resources that governments can bring to the fight should they have the right tools and the proper permissions to do so. This paper seeks to present one possible approach to improve the public-private discussion for better network defense. Too often policy discussions are framed by abstract ideas of technical abilities. By providing a specific, feasible, notional architecture, this paper provides the policy debate with the tools to significantly narrow policy questions. Botnets, because of certain distinctive features, provide an opportunity to consider how threats are observed and tracked in the Internet in the context of a specific threat.The standard approach to protecting existing domains relies on three supporting elements: security, defense, and deterrence (see Figure 2). This paper proposes technical means to address the lack of cyber defense by both private and public actors in regard to botnets. The solution derived, after applying the well tested method of systems analysis (a method used in deriving defense systems for the other domains as well), is a network-based defense,Belfer Center for Science and International Affairs | Harvard Kennedy School3with the precise objective of taking down botnets through an observe-pursue-counter approach. Such a proposal is not novel, having been used in other domains, but applying this approach to botnets requires addressing the particular problem of how to observe cyberspace. Observation in cyberspace is technologically challenging and resource intensive but is critical to understanding where to block and where to shut adversaries down. This approach is one that can be expanded to government and non-government groups, bringing a legitimate technical framework to the policy and technical challenges of ridding the internet of botnets.Policy ConsiderationsThe observe-pursue-counter approach calls for the collection and aggregation of sufficient network traffic meta data to identify malicious activity. Such an approach could identify botnets in their early stages of formation when disruption could potentially be easier. It can also be used to inform campaigns to takedown larger, more dangerous botnets. The observepursue-counter approach, however, could raise concerns over privacy, warrantless surveillance by government, and other civil liberty concerns. Our analysis is technical and does not address these concerns directly. Our analysis informs the feasibility, cost-benefit trade-offs, and narrows the policy scope of building such an architecture for botnet detection at scale. Our analysis does not assume that governments play a central role in its operation. Business models could be developed that would incentivize an entirely private sector approach. Alternatively, a non-profit organization could be charged with managing the effort. The approach also is does not assume fully monitoring all botnet traffic. A sufficient view could likely be obtained by a series of observatories acting cooperatively and with the consent of their users.4Zero Botnets: An Observe-Pursue-Counter ApproachBotnet TakedownBotnets come in many forms. A botnet (short for “robot network”) is a network of computers infected by malware that are under the control of an entity, known as the “bot-herder.” A distinguishing feature of botnets is the scale, which may comprise millions of nodes. Botnets, because of their scale, can amplify other malicious attacks. The centralized control of millions of infected nodes allows botnets to be used in distinctive ways, including what is called a Distributed Denial of Service attack, where all the machines in a botnet flood a host or a region of the network with the goal of disrupting service.The machines that are part of a botnet are sometimes called “bots”. The term “bot” can be ambiguous and used to describe any program that operates without direct supervision of a human being. “Chatbots” try to have conversations with humans, using AI technology. Some bots of this sort may do manipulative things, such as the “social bots” that join applications such as Facebook and Twitter, with the goal of influencing, upvoting, and the like. Our focus is botnets not bots.At their essence botnets are distributed computing infrastructure. Adversarial botnets are manifest by their behavior that may include sending undesired communications and exploiting unconsenting computing systems. The architecture of adversarial botnets components often includes:Botherder (botnet shepherder): the entity controlling the botnet.BotCC (botnet command & control): systems that receive direction from the botherder and coordinate the larger botnet.Botnet Clients: usually unconsenting computing systems that have been compromised with botnet malware so as to receive instructions from the botCC to achieve the objective of the botherder and/or spread to other systems.Belfer Center for Science and International Affairs | Harvard Kennedy School5Victims: computing systems receiving undesired communications from the botnet clients.Botnets create distinctive traffic patterns that can be detected in the Internet. Botnets require complex command and control mechanisms. As the machines that make up the botnet are subverted and turned into clients, those machines must report in and then must stand by for instructions. Many attacks (independent of what the form of the attack) have distinctive traffic patterns.Large botnets are capable of significant damage. Botnet client level mitigations are manyfold and include continuous software updates, use of two-factor authentication, and changing factory-set device passwords. Taking down the botnet is often the ultimate objective of cyber defenders and would play a significant role on the way to achieving the aspirational goal of zero botnets. The “botnet takedown” community consists of many entities working together to disable botnets. There are many botnet takedown scenarios, but this months-to-years long process often includes the following steps [16][50]Victim Identification: analysis of a variety network artifacts reveals the systems the botnet is victimizing.Client Identification: analysis of a variety network artifacts reveals botnet clients and leads to the botnet malware running on the clients.Malware Analysis: exploration of the operation of the malware in combination with network traffic artifacts reveals how the botnet spreads and communicates with the botCC.Client Patching: fixing client software and adding appropriate signatures to network security systems remediates identified botnet clients.BotCC Sinkholing: seizure of botCC domain names prevents the botherder from controlling botnet clients.6Zero Botnets: An Observe-Pursue-Counter ApproachBotnet takedowns play a key role in defending the Internet and involve many parallel components to the takedown stage, including the use of the courts. These may be slow, but it seems to have been effective in many cases. Increasing the pace and reducing the time of botnet takedowns is an important step toward achieving the aspirational zero botnet goal. A high-level assessment of botnets and botnet takedowns suggests a number of acceleration opportunities. Most notably, existing network observatories and outposts demonstrate that the communications among botherders, botCCs, botnet clients, and victims are readily observable from the appropriate vantage points. Early detection of botnet communications allows mitigations to be pursued when botnets are smaller and before they have inflicted significant damage. Accordingly, the zero botnets aspirational goal naturally lends itself to an observe-pursue-counter approach that is a hallmark of effective defense used in a wide range of mature domains (land, sea, undersea, air, and space)[22].Adversary CyberspacesourcesDeterrence - Existence of a credible threat of unacceptable counteractionDefense - Actions taken to defeat threats that are threatening to breach cyberspace securitySecurity - Actions taken within protected cyberspace to prevent unauthorized access, exploitation, or damageMITRE ATT&CK Matrix Initial Access … C&C Exfil ImpactCybersecurity ResponseCybersecurity ResponseProtected CyberspaceCyber Incident activating a Cyber Unit who follow Cyber ProceduresDamage AssessmentDamage ControlSecurityRepairProtected CyberspaceAdversary CyberspaceDeterrence DefensedestinationsFigure 2:Security, Defense, Deterrence. Source-destination traffic matrix view of cyberspace security, defense, and deterrence using standard domain terminology [27]. Cybersecurity is well characterized by the ATT&CK (Adversarial Tactics, Techniques, & Common Knowledge) paradigm [53].A key step in the system analysis process is the appropriate representation of the domain in such a way that is mutually useful to both decision makers and practitioners. The traffic matrix view, where rows represent sources of network traffic and columns represent network traffic destinations is one such generally accepted approach [39]. Furthermore, matrixBelfer Center for Science and International Affairs | Harvard Kennedy School7mathematics (linear algebra) is unaffected by row and column reorderings that come about from anonymization, which means that there are algorithms on traffic matrices that can work on anonymized data [42].Leveraging the lessons learned from other domains requires contextualizing cyber in broad terms of defense systems analysis. Specifically, the generally accepted definitions of security, defense, and deterrence (see Figure 2). Security covers actions taken within protected cyberspace to prevent unauthorized access, exploitation, or damage. Defense refers to actions taken to defeat threats that are threatening to breach cyberspace security. Deterrence is the existence of a credible threat of unacceptable counteraction. Within the cyber domain, security is the most mature, and is effectively described by the MITRE ATT&CK (adversarial tactics, techniques, & common knowledge) matrix and corresponding actions therein [53]. Deterrence has also received significant resources and is rapidly evolving. Cyber defense of the type that has been most effective in other domains has received disproportionately underinvestment so that modest investments in cyber defense are likely to yield disproportionately positive returns.8Zero Botnets: An Observe-Pursue-Counter ApproachTechnological ConsiderationsNetwork based defense has long been recognized as offering many benefits [1][37]. Network-based systems observe the traffic generated between multiple hosts. They are placed strategically at ingress/egress points to capture the most relevant or risky traffic. Suspicious behaviors that may be flagged include failed connection attempts, failed domain name server (DNS) requests, web connections to blacklisted sites, and the use of randomized domain names. Network-wide monitoring provides an overview of all activity in the observable space to aid in detecting behavioral patterns, fluctuations, and group actions. A primary disadvantage of network observations is the sheer volume of data generated daily, requiring greater resources to collect and process. Historically, the volumes of data required have been perceived as so insurmountable that network-based defense has been discounted, particularly in the context of providing the view of the Internet necessary for an observe-pursue-counter approach to botnet takedowns. Fortunately, the advent of more performant AI (artificial intelligence) algorithms, software, hardware, and cloud computing capable of operating on anonymized data has made network-based defense systems with sufficient capability routine in the private sector [63]. While the methods used in the private sector are generally proprietary [69], more recently, open approaches demonstrating many of the required capabilities on anonymized data have been published by the academic research community. Some of these innovations are described as follows.Belfer Center for Science and International Affairs | Harvard Kennedy School9Packets per MegabyteUpdates per Second10,000x faster1071 bit/packet 106105104103102Raw PacketsPacketSource & GraphBLASHeaders Destination IPs Traffic MatrixFormat1011 1010 109 10810x American Internet traffic*HierarchicalGraphBLAS1 HierarchicalD4M2Accumulo4107 SciDB D4M5106Accumulo D4M3Oracle (TPC-C) CrateDB105 110100Number of Servers1000Figure 3:Feeds and Speeds for Anonymized Traffic Matrices. The GraphBLAS. org open standard provides a sparse traffic matrix library demonstrating compression (left) and performance (right) on anonymized traffic matrices consistent with existing proprietary capabilities needed to observe larger-scale networks. *Non-video traffic [18] [19], 1[46], 2[44], 3[43], 4[67], 5[65].The observe-pursue-counter approach is the foundation of many domain defense systems. The observe component is often the most technologically challenging and resource intensive. Thus, systems analysis usually begins with an assessment of the technological fundamentals necessary for effective observation of the domain. These “feeds and speeds” are typically the volumes of data to be stored and the rates at which they can be processed. The published open state-of-the-art for anonymized network traffic matrices using the GraphBLAS.org standard is shown in Figure 3 and affirms the basic feasibility and claims made by the private sector. A thousand server system with a commodity interconnect can process 10x the non-video traffic of the North American Internet. While such a system may seem large, it is 1% of a typical hyperscale datacenter, of which there are hundreds worldwide.A common aspect of many recent AI innovations is their reliance on signatures of the phenomena they are tasked with identifying (referred to as supervised machine learning). AI algorithms often require copious amounts of clean training data with clearly marked examples. Cyber security has generally adopted a signature-based approach to detection that has become overwhelmed by the exponential diversity that is readily achievable in modern malware. Even once the signature of new malware is identified,10Zero Botnets: An Observe-Pursue-Counter Approachan update must be deployed to pre-existing AI systems to include the new signature - a process which can be slow on many systems and fail to keep pace with the malware’s evolution. Other domains face similar challenges. An air defense system that relied on detailed signatures of every aircraft in the world would be impractical. As a result, defense systems in other domains instead use AI to model the background for which there is copious amounts of training data. Using highly accurate background models, anomalies can be readily detected and enriched with additional sensor modalities to allow precise classification. Similar approaches are used for cyber defense in the private sector. For reviews of the significant broader literature in this space see [4][6][12][24][56]. Selected examples from the open literature drawn from the authors’ work on anonymized traffic data are presented in Figures 3, 4, 5, 6, and 7.Figure 4:AI Background Modelling on Anonymized Traffic. Treating anonymized network traffic matrices as a stream of sparse images allows standard AI methods to extract accurate features for inferring precise background model parameters (α,δ) [45]. These models can then be used for anomaly detection.For many decades signal processing has been the basis of the detection theory that underpins the observe component of most effective domain defense systems [32][41][61]. These signature-less approaches compute accurate models of the background signals in the data. Comparing observations with these background models is an effective way to detect subtle anomalies. Figure 4 is an example of the modern AI equivalent of this approach applied to anonymized network traffic matrices [45]. Treating the network traffic matrices as a sequence of sparse images enables standard convolutional neural network methods to be used to train accurate models of background network traffic [76][77].Belfer Center for Science and International Affairs | Harvard Kennedy School1110010-1= 2.01= -0.83310-2Differential cumulative probability10-310-410-5measured 10-6model10-7100101102103104105Bin degree (di = 2i )Figure 5:Example Observed Background from Anonymized Traffic. Many network quantities: packets, sources, destinations, links, … exhibit a power-law behavior. With sufficient data and AI processing, accurate network traffic measurements can be made that allow precise background model fitting, in this case p(d) oc 1/(d + δ)α, that can be used for signature-less anomaly detection on anonymized traffic data [45]. Similar background models are widely used in proprietary network defense solutions.The Gaussian or normal distribution specified by a mean and variance is a standard background model used in many domains. One of the most significant early discoveries in the field of Network Science is that the probability distribution of many network quantities: packets, sources, destinations, links, … exhibit a power-law [10][21]. With sufficient traffic, these distributions can be measured accurately enough to train high precision models of the background (see Figure 5). These power-law distributions and their parameters can be computed entirely from anonymized traffic matrices [45]. Similar background models are widely used in proprietary network defense solutions.12Zero Botnets: An Observe-Pursue-Counter Approach 300