0 Kommentare
0 Geteilt
15 Ansichten
Verzeichnis
Elevate your Sngine platform to new levels with plugins from YubNub Digital Media!
-
Bitte loggen Sie sich ein, um liken, teilen und zu kommentieren!
-
BLOG.JETBRAINS.COMHidden Truths About Developer Experience: Three Key Insights From Our ResearchDeveloper experience (DevEx) and developer productivity (DP) are hot topics. Many companies are already working actively to measure and improve them, while others, if not already doing this, are at least aware of them. However, what were interested in is whats really happening inside companies when it comes to DevEx and DP.To get a clearer picture, we incorporated a range of questions on DevEx and DP into our annual Developer Ecosystem Survey (about 20,000 developers all over the world have submitted responses to this survey). The results are rather revealing. While theres growing awareness of these topics, especially within larger companies, our research uncovered some gaps and blind spots. We believe that understanding what these are and how to overcome them will prove useful for our users and customers.Here are three key insights from our research:How developers really feel about being measured? Most developers are generally okay with their productivity assessments as long as they are done transparently and fairly.Developer satisfaction with tools: desired but not measured. 55% of developers either dont have their satisfaction with tools measured, or dont know if its being measured.Team leads are carrying the burden of DevEx and DP, but should they? 51% of developers and 67% of tech leads agree that team leads are the primary drivers of DevEx and DP measurement.How developers really feel about being measuredIts easy to assume that developers might feel uncomfortable about the fact that their productivity and developer experience are being measured. After all, who enjoys being evaluated? But heres the good news: most developers are generally okay with these assessments as long as they are done transparently and fairly.Our research shows that 42% of developers feel comfortable with productivity assessments, while 40% report neutral feelings on the subject. That means only a small percentage actively dislike being assessed. And even then, the real problem isnt measurement itself, its how its done.The most common frustration? A lack of transparency.Developers want to know how their work is being evaluated, why and what decision will be made based on it. Yet our data suggests that nearly half of developers (46%) dont fully understand how their productivity data is used in decision-making. Without this clarity, these efforts risk being perceived as arbitrary or unfair.Among other issues are a lack of constructive feedback following assessments and the use of methods and metrics that are sub-optimal from developers perspective.Developers want actionable insights based on the results of evaluations so they can grow and improve. And many companies still solely rely on activity-based measurements like counting commits or code changes, that provide a limited overview of productivity and offer little value for meaningful feedback.Developer satisfaction with tools: desired but not measuredWhats one of the biggest contributors to a great developer experience? Good tools. When tools are smooth, functionable, and reliable, developer frictions are minimized. When tools are slow, hard to use and unstable, developer frictions (long feedback loops, high cognitive load, inability to get into the flow state) flourish.Yet, our research shows that 55% of developers either dont have their satisfaction with tools measured, or dont know if its being measured. This is upsetting. If companies arent paying attention to satisfaction with tools, how can they improve developer experience?Why is this a problem?Developers spend hours every day working with their tools. Small inefficiencies add up quickly. And frustration with tools potentially leads to lower productivity and even attrition. If a developer struggles with inefficient workflows for too long, feeling unproductive and not cared about, they may eventually just leave for a company that prioritizes developer experience and cares about developers.Team leads are carrying the burden of DevEx and DP, but should they?When it comes to developer productivity and developer experience, who is actually responsible? In most companies, regardless of size, team leads take (or are expected to take) ownership of these efforts. Our data shows that 51% of developers and 67% of tech leads agree that team leads are the primary drivers of DevEx and DP measurement.This looks as a logical choice, team leads work closely with developers and understand their challenges. But theres a question: are team leads actually ready to take on this responsibility? How well-equipped and trained are they for this task, do they have real authority to influence company-wide decisions regarding tools, DP, and DevEx? Or has this responsibility been pushed onto them without real sufficient support?In large companies, dedicated specialists (30%) and platform engineering teams (28%) are becoming important players in measuring and improving DevEx and DP, according to tech leads. In smaller companies, these roles are less common, with only 16% and 17% of tech leads respectively naming them as responsible.Final thoughtsOur research shows that while developer experience and productivity are in the focus for many companies, some of them still face challenges in measuring, understanding, and improving them.Transparency and clarity in DevEx and DP assessment mattersDevelopers arent against being measured, but they need to understand the how and why behind the process and get constructive, useful feedback based on it.The right tools make all the difference, but how can you be sure a tool is right for your developers if you dont ask them?Poor and unsuitable tooling creates friction, yet many companies fail to track developer satisfaction with their tools.Developer experience isnt just the responsibility of team leadsWhile team leads currently play an essential role in measuring DevEx and DP, over-relying on them without support from dedicated teams and a broader, structured approach can lead to inconsistent efforts across teams within the company and team lead burnout.Our series exploring how market and user research is done at JetBrains continues. Want to learn more about research insights and take part in future JetBrains studies?Join ourJetBrains Tech Insights Lab!0 Kommentare 0 Geteilt 15 Ansichten
-
BLOG.JETBRAINS.COMdotInsights | June 2025Did you know? The Original Name of .NET Was Next Generation Windows Services (NGWS). Before Microsoft officially named it .NET, the platform was internally referred to as NGWS: Next Generation Windows Services. The name .NET was adopted in the late 1990s to emphasize the platforms focus on web-based development and interoperability, as opposed to being tightly coupled to Windows-specific services.Welcome to dotInsights by JetBrains! This newsletter is the home for recent .NET and software development related information. LinksHeres the latest from the developer community.Double Dispatch in DDD: When Injecting Dependencies Makes Sense Derek ComartinHow to Become a Technical Coach And Carry On Coding in your Developer Career Emily BacheCheck Out the DrawingView in .NET MAUI Leomaris ReyesAvoiding reflection in C# in way unsafer ways! Steven Giesel4 Ways to Culture-Proof Your C# xUnit Tests | Never Break Your Pipeline Again! Gui FerreiraDuende IdentityServer and OTel Metrics, Traces, and Logs in the .NET Aspire Dashboard Khalid AbuhakmehUsing the new AI template to create a chatbot about a website Andrew LockEvolve your C# Code with AI: A 5 Week Genetic Algorithms Bootcamp for Developers Chris WoodruffMaster NoSQL: Scalable Databases for Modern Applications Frank LaVigneUse C# 14 extensions to simplify enum Parsing Grald BarrCreate a Beautiful Photo Gallery Using .NET MAUI Tab View and ListView Naveenkumar SanjeevirayanThe Model Context Protocol: Getting beneath the hype Karrtik IyerAdding a Blazor Pager to Your Data Display Hctor PrezASP.NET Core Pitfalls Action Constraint Order Ricardo PeresAsynchronous and Parallel Programming in C# David RamelZLinq, a Zero-Allocation LINQ Library for .NET Yoshifumi KawaiHow to Import and Read Form Fields from DOCX Documents in .NET on Linux Bjoern MeyerHow to Migrate Users to Auth0: A Technical Guide David BoltonTaming Manifest Sprawl with Aspire David FowlerSong recommendations as an F# Impureim Sandwich Mark SeemannNullable bool and if statement Ji inuraVibe coding: Your roadmap to becoming an AI developer Gwen Davis From our .NET GuideEach month we feature tutorials or tips from our .NET Guide. Refactor expressions to use pattern matching Use pattern matching on properties in Boolean logic for more readable and efficient code. See more C# Experimental Attribute Mark a block of code as experimental so other developers are aware of its status. See More Coffee BreakTake a break to catch some fun social posts. JetBrains NewsWhats going on at JetBrains? Check it out here:ReSharper Comes to Microsoft Visual Studio Code: Public Preview Now OpenJetBrains AI Assistant Now in Visual Studio CodeReSharper 2025.2 EAP 2: First Public Build with Out-of-Process Mode SupportRider 2025.2 Early Access Program Is Live!ReSharper and Rider 2025.1.2 Bug Fixes Have Landed! Comments? Questions? Send us an email.Subscribe to dotInsights0 Kommentare 0 Geteilt 15 Ansichten
-
BLOG.JETBRAINS.COMCase Study: How Junie Uses TeamCity to Evaluate Coding AgentsIntroductionJunie is an intelligent coding agent developed by JetBrains. It automates the full development loop: reading project files, editing code, running tests, and applying fixes, going far beyond simple code generation.Similar to how developers use tools like ChatGPT to solve coding problems, Junie takes it a step further by automating the entire process.As the agents architecture evolved, the team needed a secure, robust way to measure progress. They wanted to build a scalable, reproducible evaluation pipeline that would be able to track changes across hundreds of tasks.Thats where TeamCity came in. Junies development team uses TeamCity to orchestrate large-scale evaluations, coordinate Dockerized environments, and track important metrics that guide Junies improvements.The challengeValidating agent improvements at scaleAs Junies agents became more capable, with new commands and smarter decision-making, every change needed to be tested for real impact. Evaluation had to be systematic, repeatable, and grounded in data. Did it get better or not? is a very poor way to evaluate. If I just try three examples from memory and see if it got better, that leads nowhere. Thats not how you achieve stable, consistent improvements. You need a benchmark with a large and diverse enough set of tasks to actually measure anything. Danila Savenkov, Team Lead, JetBrains Junie The team identified five core requirements for this process:Scale: Evaluations had to cover at least 100 tasks per run to minimize statistical noise. Running fewer tasks made it hard to draw meaningful conclusions.Parallel execution: Tasks needed to be evaluated in parallel, as running them sequentially would take over 24 hours and delay feedback loops.Reproducibility: It had to be possible to trace every evaluation back to the exact version of the agent, datasets, and environment used. Local experiments or inconsistent setups were not acceptable.Cost control: Each evaluation involved significant LLM API usage, typically costing USD 100+ per run. Tracking and managing these costs was essential.Data preservation: Results, logs, and artifacts needed to be stored reliably for analysis, debugging, and long-term tracking.Benchmarking with SWE-benchFor a reliable signal, Junie adopted SWE-bench, a benchmark built from real GitHub issues and PRs. They also used SWE-bench Verified, a curated 500-task subset validated by OpenAI for clarity and feasibility.In parallel, Junie created in-house benchmarks for their internal monorepo (Java/Kotlin), Web stack, and Go codebases, continuously covering more languages and technologies by the benchmarks.The operational challengeRunning these large-scale evaluations posed operational challenges:Spinning up consistent, isolated environments for each task.Managing dependencies and project setups.Applying patches generated by agents and running validations automatically.Collecting structured logs and metrics for deep analysis.Manual workflows wouldnt scale. Junie needed automation that was fast, repeatable, and deeply integrated into their engineering stack.TeamCity enabled that orchestration. With it, the Junie team built an evaluation pipeline that is scalable, traceable, and deeply integrated into their development loop.The solutionTo support reliable, large-scale evaluation of its coding agents, Junie implemented an evaluation pipeline powered by TeamCity, a CI/CD solution developed by JetBrains.TeamCity orchestrates the execution of hundreds of tasks in parallel, manages isolated environments for each benchmark case, and coordinates patch validation and result collection. If we tried running this locally, it just wouldnt be realistic. A single evaluation would take a full day. Thats why we use TeamCity: to do everything in parallel, isolated environments, and to ensure the results are reproducible. Danila Savenkov, Team Lead, JetBrains Junie The setup enables the team to trace outcomes to specific agent versions, gather detailed logs for analysis, and run evaluations efficiently, while keeping infrastructure complexity and LLM usage costs under control.Execution pipeline designAt the heart of the system is a composite build configuration defined using Kotlin DSL, which gives Junie full control over task orchestration. Each top-level evaluation run includes multiple build steps.Example of a build chain in TeamCityEnvironment setupEach coding task is paired with a dedicated environment, typically a pre-built Docker container with the necessary dependencies already installed. This guarantees consistency across runs and eliminates local setup variability.Agent executionJunies agent is launched against the task. It receives a full prompt, including the issue description, code structure, system commands, and guidelines. It then autonomously works through the problem, issuing actions such as file edits, replacements, and test runs.The final output is a code patch meant to resolve the issue.Patch evaluationThe generated patch is passed to the next build step, where TeamCity applies it to the project and runs the validation suite. This mimics the GitHub pull request flow if the original tests were failing and now pass, the task is marked as successfully completed.Metric loggingExecution metadata, including logs, command traces, and success/failure flags, is exported to an open-source distributed storage and processing system. Junie uses it to store evaluation artifacts and perform large-scale analysis.With the solutions support for SQL-like querying and scalable data processing, the team can efficiently aggregate insights across hundreds of tasks and track agent performance over time.Developers rely on this data to:Track the percentage of solved tasks (their North Star metric).Analyze the average cost per task for LLM API usage.Break down agent behavior ( like the most frequent commands or typical failure points).Compare performance between agent versions.Scalability through automationBy using Kotlin DSL and TeamCitys composable build model, Junie scales evaluations to hundreds of tasks per session far beyond what could be managed manually. For larger datasets (typically 300-2000 tasks), each execution is spun up in parallel, minimizing runtime and allowing the team to test changes frequently. We use Kotlin DSL to configure everything. When you have 13 builds, you can still manage them manually, but when its 399, or 500, or 280, it starts getting tricky. Danila Savenkov, Team Lead, JetBrains Junie Results: reproducible, scalable, insight-driven agent developmentTeamCity has enabled Junie to measure agent performance efficiently and at scale, making their development process faster, more reliable, and data-driven.Key outcomesChallengeResult with TeamCityValidate agent changes at scale100+ tasks per run, reducing statistical noiseLong evaluation cycles (24+ hrs)Tasks run in parallel now completed in a manageable windowInconsistent local testingEvery run is reproducible and traceable to the exact agent and datasetExpensive LLM usagePer-task usage is tracked, helping optimize development and costsFragile logging and data lossLogs and outcomes are automatically stored for later debugging and reviewNeed to scale your AI workflows?TeamCity gives you the infrastructure to evaluate and iterate with confidence. Start your free trial or request a demo.0 Kommentare 0 Geteilt 14 Ansichten
-
BLOG.JETBRAINS.COMWhats Next for RubyMineHello everyone!The RubyMine 2025.2 Early Access Program is already available! In this blog post, well share the upcoming features and updates planned for this release cycle.Whats coming in RubyMine 2025.2?Debugger improvementsWere introducing a number of changes aimed at enhancing the debugger installation experience. The entire process will now take less time, and the associated notifications will be less distracting and more informative. Finally, the RubyMine debugger will be updated to support newly released Ruby versions sooner than it previously did.Better multi-module supportA priority of the upcoming RubyMine release is the provision of support for multi-module projects. This will include bundler improvements, faster startup for multi-module projects, smoother switching between interpreters, and more.Automatic management of RBS CollectionWe made this feature a default setting, which requires RBS 3.2. Ruby 3.4 comes with a compatible RBS version bundled. This is beneficial for all features related to code insight.Better remote development experienceWe are continuing to enhance RubyMine remote development as an alternative to using just remote Ruby interpreters. In 2025.2, you will enjoy an even better overall performance and several improvements to split mode.AI Assistant improvementsIn the new release, you can expect AI Assistant to generate more code suggestions across your projects. The quality of multi-line suggestions will also improve now that the formatting of outputs has been fixed.Whats more, in line with our efforts to expand AI Assistants functionality, we have improved code completion for ERB in RubyMine 2025.2.Join the Early Access ProgramYou can download the latest EAP build from our website or via the Toolbox App. The full list of tickets addressed by this EAP build is available in the release notes.Stay connected through our official RubyMine X channel. We encourage you to share your thoughts in the comments below and to create and vote for new feature requests in our issue tracker.Happy developing!The RubyMine team0 Kommentare 0 Geteilt 15 Ansichten
-
BLOG.JETBRAINS.COMJunie and RubyMine: Your Winning ComboJunie, a powerful AI coding agent from JetBrains, is available in RubyMine! Install the plugin and try it out now!Why Junie is a game-changerUnlike other AI coding agents, Junie leverages the robust power of JetBrains IDEs and reliable large language models (LLMs) to deliver exceptional results with high precision.According to SWE-bench Verified, a curated benchmark of 500 real-world developer tasks, Junie successfully solves 60.8% of tasks on a single run. This impressive success rate demonstrates Junies ability to tackle coding challenges that would normally require hours to complete. This is more than AI its the latest evolution in developer productivity.Your most trusted AI partnerJunie isnt just an assistant its your creative and strategic partner. Heres what Junie can do for you in RubyMine:Build entire Ruby apps, not just snippetsNeed more than individual code fragments? Junie can write entire applications, handling complex structures with ease and precision.Automate inspections and testingPairing Junie with RubyMines powerful code insight tools means inspections and automated tests (RSpec, minitest) are no longer a chore. Let Junie ensure your code works and works well.Suggest features and code improvementsStuck? Junie brings fresh ideas to the table, pointing out areas for improvement, suggesting optimizations, or brainstorming entirely new features for your project.Clean and align code with your styleJunie doesnt just write code it ensures everything aligns with your coding style and guidelines, leaving your code polished, structured, and ready to deploy.With most of the heavy lifting off your plate, Junie saves you time and mental energy. Instead of getting bogged down in the mundane, youre free to focus on strategy, innovation, and big-picture ideas.You define the process, Junie elevates itWhile Junie is indeed powerful and capable, its designed to enhance your coding experience, not take control of it. You remain the decision-maker at every step from delegating tasks to reviewing Junies code suggestions.You control how and when AI contributes to your workflow. No matter what you entrust to Junie, it will adapt to your style and always give you the final say, ensuring that your code remains truly yours.Try Junie in RubyMine todayNow is the perfect time to try Junie in RubyMine and experience firsthand how AI can boost your productivity, simplify your workflow, and enhance your coding experience.To install Junie in RubyMine, visit this page.Follow us for updates and tipsFind more about Junie and the projects further development inthis article.Stay connected through our official RubyMine X channel. Dont forget to share your thoughts in the comments below and to suggest and vote for new features in our issue tracker.Happy developing!The RubyMine team0 Kommentare 0 Geteilt 15 Ansichten
-
Context Collection Competition by JetBrains and Mistral AIBuild smarter code completions and compete for a share of USD 12,000!In AI-enabled IDEs, code completion quality heavily depends on how well the IDE understands the surrounding code the context. That context is everything, and we want your help to find the best way to collect it.Join JetBrains and Mistral AI at the Context Collection Competition. Show us your best strategy for gathering code context, and compete for your share of USD 12,000 in prizes and a chance to present it at the workshop at ASE 2025.Why context mattersCode completion predicts what a developer will write next based on the current code. Our experiments at JetBrains Research show that context plays an important role in the quality of code completion. This is a hot topic in software engineering research, and we believe its a great time to push the boundaries even further.Goal and tracksThe goal of our competition is to create a context collection strategy that supplements the given completion points with useful information from across the whole repository. The strategy should maximize the chrF score averaged between three strong code models: Mellum by JetBrains, Codestral by Mistral AI, and Qwen2.5-Coder by Alibaba Cloud.The competition includes two tracks with the same problem, but in different programming languages:Python: A popular target for many novel AI-based programming assistance techniques due to its very wide user base.Kotlin: A modern statically-typed language with historically good support in JetBrains products, but with less interest in the research community.Were especially excited about universal solutions that work across both dynamic (Python) and static (Kotlin) typing systems.PrizesEach track awards prizes to the top three teams: 1st place: USD 3,000 2nd place: USD 2,000 3rd place: USD 1,000Thats a USD 12,000 prize pool, plus free ASE 2025 workshop registration for a representative from each top team.Top teams will also receive: A one-year JetBrains All Products Pack license for every team member (12 IDEs, 3 extensions, 2 profilers; worth USD 289 for individual use). USD 2,000 granted on La Plateforme, for you to use however you like.Join the competitionThe competition is hosted on Eval.AI. Get started here: https://jb.gg/co4.We have also released a starter kit to help you hit the ground running: https://github.com/JetBrains-Research/ase2025-starter-kit.Key dates are:June 2, 2025: competition opensJune 9, 2025: public phase beginsJuly 25, 2025: public phase endsJuly 25, 2025: private phase beginsJuly 25, 2025: solution paper submission opensAugust 18, 2025: private phase endsAugust 18, 2025: final results announcedAugust 26, 2025: solution paper submission closesNovember 2025: solutions presented at the workshopBy participating in the competition, you indicate your agreement to its terms and conditions.0 Kommentare 0 Geteilt 15 Ansichten
-
BLOG.JETBRAINS.COMHelp Predict the Future of AI in Software Development!Ever wanted to share your ideas about AI and have a chance at winning prizes at the same time? As a company dedicated to creating the best possible solutions for software development, we at JetBrains want to know what you think about AI in software development. Participate in our tournament! In this post, we tell you more about the tournament and offer tips for making accurate predictions. And in case youre new to forecasting platforms, weve included an overview below.Lets get started so that you can add your voice to community-sourced forecasting!JetBrains Researchs AI in Software Development 2025 tournamentTo participate in the tournament, all you have to do is register on Metaculus and complete this short survey .Make sure to input your predictions before the resolution on December 1, 2025!Tournament specsWith this forecasting challenge, we are primarily interested in seeing how accurately participants can predict emerging AI features in software development.We also want to understand:Developers attitudes about AI and how they are evolvingIndividual features of the best forecastersHow people estimate the future of various benchmarksCurrently, the tournament includes 13 questions. To keep everything fair, we have invited independent experts to review the questions and to evaluate the end resolutions. These experts are:Olga Megorskaya, Chief Executive Officer at TolokaGrigory Sapunov, Co-Founder and CTO at IntentoIftekhar Ahmed, Associate Professor at the University of California, IrvineHussein Mozannar, Senior Researcher at Microsoft Research AI FrontiersDmitiry Novakovskiy, Head of Customer Engineering at Google CloudRankings and the prize poolIn this tournament, your ranking will be calculated based on your peer score.Generally speaking, a positive score indicates higher accuracy, and a negative score lower (how exactly Metaculus calculates the peer score). A bit more specifically, the ranking is calculated from the sum of your peer scores over all the questions, which are individually weighted. That is, if you do not forecast a specific question, you score zero on that question.For the AI in Software Development 2025 Tournament, we have a USD 3,000 prize pool, which will be distributed across the first three leaderboard medals as follows (all prizes in USD):First place: $1,500Second place: $1,000Third place: $500Note that in order to be eligible for the prize pool, you must fill out the quick research survey!Tips for making accurate predictions on forecasting platformsHere are some tips to get you on the path to positive peer scores and higher rankings:Consider alternative scenarios before placing your forecast. This is generally a good idea, but also very useful if the event concerns something novel or very uncertain.Ongoing news can inform the probabilities of different outcomes, so stay informed!Be careful of being overconfident. Besides considering alternatives, it is useful to list offline the reasons why your forecast could be wrong.As with many skills, practicing helps. Especially on a platform like Metaculus, when practicing forecasting, you can improve by posting your reasoning in the discussion section and reading about other participants reasoning.If you have forecasted a few questions as practice, compare your track record with the community track record. (But dont only predict based on the community median. Your insights and evidence are valuable, too!)For more resources, check out Metaculus collection of analysis tools, tutorials, research literature, and tips, as well as their forecasting guide for each type of question.Online forecasting tools: a primerWhat are online forecasting tools? Via a combination of user inputs and sophisticated statistical modelling, these tools enable the prediction of future events.If youve never heard of forecasting platforms before, you might guess that they are like gambling sites. While there are some similarities with betting, online forecasting tools are not strictly synonymous with gambling, whether online or at the tracks. A crucial difference is that forecasting tools are used by people interested in gathering information about future events, not necessarily (or solely) to gain a profit based on the outcome of a future event. In particular, our forecasting tournament focuses on evaluating the prediction skills of participants; the prizes are merely perks for the top-ranked forecasters and an exception to most queries on the hosting platform Metaculus.Another type of information-gathering tool is a poll or a survey. While similar in empirical intent, the questions in polls often ask about participants (a) experiences, (b) ideas, or (c) preferences and not about tangible, objective facts that can be unambiguously resolved. Here are some real-world examples from YouGov (UK): (a) whether the participants have watched political content on TikTok, (b) participants views on banning phones in schools, and (c) which Doctor Who version the participant prefers.While there might be a clear winner among the respondents, the results will reflect peoples preferences and thoughts, sometimes about facts, but the results are not facts themselves. Likewise, any survey results are subject to differences among varying demographics.For the survey question (b), there is a clear winner in the results below, but this is only the opinion of the people in the UK who were asked. And while the respondent may be interested in the results (e.g. they really want schools to ban phones), there is no direct gain for having given a more popular or more accurate response. Source: YouGov plc, 2025, All rights reserved. [Last access: May 22, 2025]In contrast, a forecasting querys responses are evaluated for accuracy against facts at the time of resolution. Those participating are actively interested in the resolution, as it affects leaderboard prestige and/or financial reward, depending on the type of forecasting platform. This also means that participants are more motivated to give what they think are accurate predictions, even if it does not 100% align with their personal preferences at the time.Often forecasting platforms involve binary questions, like in Will DeepSeek be banned in the US this year?. The queries can also be about uncertain events with multiple possible outcomes, e.g. for the winner of Eurovision 2025, where until the finals, many countries have a chance. Similarly, queries with numerical ranges, such as in the prediction of the Rotten Tomatoes score of Mission: Impossible The Final Reckoning, can consider the weight of different ranges. Even if different platforms architectures handle the calculations slightly differently, the main takeaway is that there are resolution deadlines and that the event in question can be unambiguously resolved on forecasting platforms. See the figure below for a snapshot of the rules summary for the Mission: Impossible question on Kalshi.Source: Kalshi. [Last access: May 22, 2025]The following subsections present the history of forecasting tools, including the most common kinds and which one is relevant for this forecasting challenge.A history of predictionForecasting mechanisms have existed informally for centuries, where people could predict outcomes like papal or presidential election results. More formal forecasting tools were established at the end of the 20th century, starting with a similar focus, and have since gained currency while expanding their application.Well-known examples of formal forecasting mechanisms include the Iowa Electronic Market, created as an experimental tool in 1988 for the US presidential elections popular vote, still in use today; Robin Hansons paper-based market, created in 1990 for Project Xanadu employees to make predictions on both the companys product and scientific controversies; and the online Hollywood Stock Exchange, established in 1996 as a way for participants to bet on outcomes in the entertainment industry.These forecasting tools demonstrated how much more accurate aggregated predictions can be than individual ones (see for example The Wisdom of Crowds or Anatomy of an Experimental Political Stock Market), motivating economists to take their insights seriously. Around the same time, big companies such as Google, Microsoft, and Eli Lily began establishing company-internal prediction markets. These days, many companies have their internal prediction tools; for example, we at JetBrains recently launched our own platform, called JetPredict.For example, Googles internal product, Prophit, was launched in 2005 and offered financial incentives, plus leaderboard prestige, to employees best at predicting. Although an internal product, Prophit was known outside of Google as a prediction platform demonstrating relatively high accuracy. It eventually had to shut down in the late 2000s, due to federal regulations (and the 2008 financial crisis did not help either). Many publications covered this topic at the time, for example this 2005 NYTimes article At Google, the Workers are Placing their Bets, this 2007 Harvard Business Case Study Prediction Markets at Google, and the 2008 article Using Prediction Markets to Track Information Flows: Evidence from Google. More recently, there was an article about Prophit and a second internal market, Gleangen: The Death and Life of Prediction Markets at Google. Beyond big corporations, researchers have started using formal prediction tools to predict things like study replicability, a crucial scientific tenet. In a comparison of forecasting tools and survey beliefs predicting this, the former were much more accurate than the latter. If you are interested, The Science Prediction Market Project provides a collection of papers on the topic. Applying forecasting tools to research is still less widespread than forecasting in the business world, but its an exciting space to watch!Different forecasting tools todayNot all forecasting platforms are prediction markets, even if the terms are sometimes used interchangeably. Here we only look at overall differences without going into detail of, say, kinds of prediction markets or the math behind the models.If you are interested, here are further resources on these differences provided by WIFPR, Investopedia, and the Corporate Finance Institute.The hallmark of a prediction market is that participants are offered financial incentives by way of event contracts, sometimes also called shares. Key concepts include:The event contracts can be sold or bought depending on the participants belief in the outcome.The current price reflects what the broader community expects of the outcome.As the nominal contract values are typically USD 1, the sum of the share prices is USD 1 as well. So, for a markets implied probability of about 60%, the average share price to buy will be around 60 cents.Prices change in real-time as new information emerges.If the participant bought contract shares for the correct prediction, they earn money (USD 1 typically) for each share purchased. Incorrect predictions mean no money is earned.Translating those concepts into an example: A question on the prediction market Kalshi asks whether Anthropic will release Claude 4 before June 1, 2025. At the time of writing this post, the likelihood of Claude 4s release was at 34% according to the community, as shown in the figure below.Source: Kalshi. [Last access: May 16, 2025, 17:25 CEST]If you wanted to participate in the above market on May 16, the following scenarios could have occurred. If you believed the release would have happened before June 1, you could have bought shares for about 35 cents each. Say you bought 100 shares for USD 35 and, come June 1, Anthropic did indeed release Claude 4. You would then have won USD 100 (USD 1 multiplied by 100 shares), and your profit would be USD 65 (USD 100 win minus your USD 35 investment). If Anthropic did not release Claude 4 by June 1, you would then have lost your initial USD 35 investment.The figure above additionally shows that earlier in the year, the community thought that Claude 4 was more likely to be released by the resolution date. As more evidence rolls in, the outcomes likelihood can change.Aggregating community forecasts is also possible without share-buying and profit-seeking. Other forecasting platforms, such as Good Judgement or Metaculus, use a broader toolset for their prediction architecture, focusing primarily on leveraging collective intelligence and transparent scoring. By eliminating profit as the primary incentive and instead rewarding forecasters for their prediction accuracy over time, extreme predictions are discouraged.In particular, Metaculus is building a forecasting ecosystem with a strong empirical infrastructure, using techniques such as Bayesian statistics and machine learning. This creates a platform that is overall more cooperative and has a shared scientific intent. The platform encourages participants to publish the reasoning behind their picks, which fosters community discussions.Accuracy and the broader impact of community-sourced forecastingAs forecasting tools become more sophisticated, they are also getting more accurate in their predictions. In its current state, Metaculus already outperforms notoriously robust statistical models, as was recorded in Forecasting skill of a crowd-prediction platform: A comparison of exchange rate forecasts. The platform additionally keeps an ongoing record of all resolved questions with performance statistics.Metaculus is a platform that not only benefits from community inputs, but also provides vital information to the community. Take the COVID-19 pandemic for example: predictors on Metaculus accurately anticipated the impact of the virus before it was globally recognized as a pandemic. In turn, the insights on specific events within such a pandemic can be valuable to policymakers, like in this case study on an Omicron wave in the US.Researchers are continuously investigating various public health threats. An open question at the time of writing, on the possibility of the avian influenza virus becoming a public health emergency, is shown in the figure below. What would be your prediction?Source: Metaculus. [Last access: May 16, 2025]At JetBrains, our commitment goes beyond delivering top-tier software development solutions and innovative AI tools: We are passionate about nurturing a vibrant, engaged community and creating meaningful opportunities for learning and collaboration. We believe that open dialogue about the future of AI in software development is essential to advancing the field.With these shared values, we are proud to partner with Metaculus as the host for our forecasting challenge. Together, we look forward to inspiring thoughtful discussion, driving progress, and shaping the future of AI in software development.0 Kommentare 0 Geteilt 15 Ansichten
-
YUBNUB.NEWSMarines Will Be Deployed to LA as Riots Continue[View Article at Source]State of the Union: Trump is the first president to federalize the National Guard since Lyndon Johnson. The post Marines Will Be Deployed to LA as Riots Continue appeared first0 Kommentare 0 Geteilt 11 Ansichten
-
YUBNUB.NEWSIs Donald Trump the Only Adult in the Room? Here's What He's Really Doing in L.A.One thing I've said repeatedly since January 20 is that it's nice to have adults in charge again. Terrorists, criminals, and radical leftists with wacky agendas aren't able to run amok and do whatever0 Kommentare 0 Geteilt 11 Ansichten